V-REP – Virtual Robot Experimentation Platform

The robot simulator V-REP, with integrated development environment, is based on a distributed control architecture: each object/model can be individually controlled via an embedded script, a plugin, a ROS or BlueZero node, a remote API client, or a custom solution. This makes V-REP very versatile and ideal for multi-robot applications. Controllers can be written in C/C++, Python, Java, Lua, Matlab or Octave. V-REP is used for fast algorithm development, factory automation simulations, fast prototyping and verification, robotics related education, remote monitoring, safety double-checking, etc.


Bayesian Strategy for Modeling Retail Price with PyStan

Statistical modeling, partial pooling, Multilevel modeling, hierarchical modeling. Pricing is a common problem faced by any e-commerce business, and one that can be addressed effectively by Bayesian statistical methods. The Mercari Price Suggestion data set from Kaggle seems to be a good candidate for the Bayesian models I wanted to learn. If you remember, the purpose of the data set is to build a model that automatically suggests the right price for any given product for Mercari website sellers. I am here to attempt to see whether we can solve this problem by Bayesian statistical methods, using PyStan.


Bayesian Machine Learning

In the previous post we have learnt about the importance of Latent Variables in Bayesian modelling. Now starting from this post, we will see Bayesian in action. We will walk through different aspects of machine learning and see how Bayesian methods will help us in designing the solutions. And also the additional capabilities and insights we can have by using it. The sections which follows are generally known as Bayesian inference. In this post we will see how Bayesian methods can be used to do clustering on the given data.


Machine Learning on Mobile: An On-device Inference App for Skin Cancer Detection

Mobile health (mHealth) is considered one of the most transformative drivers for health informatics delivery of ubiquitous medical applications. Machine learning has proven to be a powerful tool in classifying medical images for detecting various diseases. However, supervised machine learning requires a large amount of data to train the model, whose storage and processing pose considerable system requirements challenges for mobile applications. Therefore, many studies focus on deploying cloud-based machine learning, which takes advantage of the Internet connection to outsource data intensive computing. However, this approach comes with certain drawbacks such as those related to latency and privacy, which need to be considered in the context of sensitive data. To tackle these challenges of mHealth applications, we present an on-device inference App and use a dataset of skin cancer images to demonstrate a proof of concept. We pre-trained a Convolutional Neural Network model using 10,015 skin cancer images. The model is then deployed on a mobile device, where the inference process takes place, i.e. when presented with new test image all computations are executed locally where the test data remains. This approach reduces latency, saves bandwidth and improves privacy.


Graph Analytics – Introduction and Concepts of Centrality

The advent of social networks, big data and e-commerce has re-emphasized the importance of analyzing a unique type of data structure- one which depicts relationships among its entities, also known as a Graph. It is imperative to briefly introduce the concept of a ‘Graph’ before I venture into the Introduction of Graph Analytics. Let’s start by looking at a sample graph of friends presented below. I will be using the same graph in some of the following sections to further explain the concepts of graph analytics.


How to get fbprophet working on AWS Lambda

Solving package size issues of fbprophet serverless deployment. I assume you’re reading this post because you’re looking for ways to use the awesome fbprophet (Facebook open source forecasting) library on AWS Lambda and you’re already familiar with the various issues around getting it done. I will be using a python 3.6 example, but the approach is applicable to other runtimes as well as other large ML libraries.


Searching for Food Deserts in Los Angeles County

For a recent data science project, I collaborated with several other Lambda School students to search for food deserts in L.A. County. A general definition for what qualifies as a food desert is an area that does not have access, within one mile, to a grocery store/market providing fresh, healthy food options, such as fruits, vegetables, meats, etc. We wanted to test the theory that lower income neighborhoods are more likely to live in a food desert.


Open-source Version Control System for Machine Learning Projects

DVC tracks ML models and data sets. DVC is built to make ML models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code.


AI and Analytics in Production – Best practices for production ready analytics

If you are in the process of deploying large-scale data systems into production or if you are using large-scale data in production now, this book is for you. In it we address the difference in big data hype versus serious large-scale projects that bring real value in a wide variety of enterprises. Whether this is your first large-scale data project or you are a seasoned user, you will find helpful content that should reinforce your chances for success. Here, we speak to business team leaders; CIOs, CDOs, and CTOs; business analysts; machine learning and artificial intelligence (AI) experts; and technical developers to explain in practical terms how to take big data analytics and machine learning/AI into production successfully. We address why this is challenging and offer ways to tackle those challenges. We provide suggestions for best practice, but the book is intended as neither a technical reference nor a com- prehensive guide to how to use big data technologies. You can understand it regardless of whether you have a deep technical back- ground. That said, we think that you’ll also benefit if you’re techni- cally adept, not so much from a review of tools as from fundamental ideas about how to make your work easier and more effective. The book is based on our experience and observations of real-world use cases so that you can gain from what has made others successful.


Monte Carlo Methods – Estimate Blackjack Policy

We have been discussing several reinforcement learning problems, within each we are trying to get the optimal policy by keep playing the game and update our estimates. In essence, we estimate the (state, value) pair or (state, action, value) pair and based on the estimates we generate policy by taking the action that gives the highest value. But this is not the only way to do it. In this paragraph, I will introduce Monte Carlo Method, which is another way to estimate the value of a state, or value of a policy. The Monte Carlo Method involves a wide broad of methods, but all follows the same principal – sampling. The idea is straightforward and intuitive, if you are not sure of the value of a state, just do sampling, which is keep visiting that state and averaging over the reward that got from simulated actions by interacting with the environment.


XTX, Covariance, Correlation and Cosine matrices

A walk through for calculating several popular association matrices. Ever been in a scenario where you needed to come up with pairwise covariance, correlation, or cosine matrices for data on the fly without the help of a function? Probably not.


Advance AI: Machine Learning design patterns

In the article, we’ll explore some architectural design patterns that support the machine learning model life cycle.


Interpreting Deep Learning Models for Computer Vision

Artificial Intelligence (AI) is no longer a field restricted only to research papers and academia. Businesses and organizations across diverse domains in the industry are building large-scale applications powered by AI. The questions to think about here would be, ‘Do we trust decisions made by AI models?’ and ‘How does a machine learning or deep learning model make its decisions?’. Interpreting machine learning or deep learning models has always been a task often overlooked in the entire data science lifecycle since data scientists or machine learning engineers would be more involved with actually pushing things out to production or getting a model up and running.


CHART.XKCD

Chart.xkcd is a chart library plots ‘sketchy’, ‘cartoony’ or ‘hand-drawn’ styled charts.


Robust learning from untrusted sources

Data Shapley provides us with one way of finding and correcting or eliminating low-value (and potentially harmful) data points from a training set. In today’s paper choice, Konstantinov & Lampert provide a way of assessing the value of datasets as a whole. The idea is that you might be learning e.g. a classifier by combining data from multiple sources. By assigning a value (weighting) to each of those data sources we can intelligently combine them to get the best possible performance out of the resulting trained model. So if you need more data in order to improve the performance of your model, ‘Robust learning from untrusted sources’ provides an approach that lets you tap into additional, potentially noisier, sources. It’s similar in spirit to Snorkel which we looked at last year, and is designed to let you incorporate data from multiple ‘weakly supervised’ (i.e. noisy) data sources. Snorkel replaces labels with probability-weighted labels, and then trains the final classifier using those.


Simple Neural Network Sentiment Analyser Using Keras

Want to know how a product owner can find out heaps on insight about how their products are received by customers, without having to read millions of articles and reviews? A Sentiment Analyser is the answer, these things can be hooked up to twitter, review sites, databases or all of the above utilising Neural Neworks in Keras. Its a great lazy way to understand how a product is viewed by a large group of customers in a very short space of time.


Best Investment Portfolio Via Monte-Carlo Simulation In Python

Finding An Optimum Investment Portfolio Using Monte-Carlo Simulation In Python From Start To End. This article focuses on generating an optimum investment portfolio via Monte-Carlo simulation. I have implemented an end-to-end application in Python and this article documents the solution so that a wider audience can benefit from it. The article will explain the required financial, mathematical and programming knowledge of investment management in an easy-to-understand manner. Hence, this article is useful for data scientists, programmers, mathematicians and those who are interested in finance.


Anomalies in Global Suicide Data

Every Mental Health Awareness Day (October 10), there is a peak in search interest for ‘mental health’ on Google Trends. However, this past October, there was the highest search interest ever seen. Mental health in the United States is growing as a part of the global conversation – partially due to its destigmatization, but mainly due to its relevance to technology (most notably social media), domestic terrorism, and drug addiction. While there has been a ton of pre-existing analysis on this topic conducted both by kagglers and nonprofits, I hope to use time series anomaly detection techniques for a different perspective on the suicide statistics datasets.


Simulate Images for ML in PyBullet – The Quick & Easy Way

When applying deep Reinforcement Learning (RL) to robotics, we are faced with a conundrum: how do we train a robot to do a task when deep learning requires hundreds of thousands, even millions, of examples? To achieve 96% grasp success on never-before-seen objects, researchers at Google and Berkeley trained a robotic agent through 580,000 real-world grasp attempts. This feat took seven robots and several weeks to accomplish. Without Google resources, it may seem hopeless for the average ML practitioner. We cannot expect to easily run hundreds of thousands of iterations of training using a physical robot, which is subject to wear-and-tear and requires human supervision, neither of which comes cheap. It would be much more feasible if we could pretrain such RL algorithms to drastically reduce the number of real world attempts needed.