It looks like Christmas is a little early this year 😉 Here’s a little something from me to all of you out there: a map to navigate ML services on AWS. With all the new stuff launched at re:Invent, I’m quite sure it will come in handy!
This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, decision trees, ensembles, correlation, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, cross-validation, model fitting, and many more.
The problem is simple, we have a slot machine with n number of arms. And we have limited numbers of trials on which arm we can pull, also we don’t know which arms will give us the most amount of money. Assuming that the probability distribution does not change over time (meaning that this is a stationary problem)…. Which arm should we pull? Should we pull the arm that gave us the most amount of reward in the past or should we explore in hopes of getting more optimal arm? There are multiple solutions to this problem, and usually, people measure regret in order to rank each solution. (Regret == simply put, the amount of penalty that we get for not pulling the optimal arm.). So to minimize regret we just have to pull the arm that has the highest probability of giving us a reward. But I wanted to look at an additional measurement, specifically, I will also take into account how well each algorithm estimates the probability distribution for each arm. (the probability that they will give us a reward). And see each of their performance on a smaller scale, in which we only have 12 arms, and a larger scale, in which we have 1000 arms. Finally, the aim of this post is to provide a simple implementation of each solution, for non-mathematicians (like me). Hence the theoretical guarantees and proofs are not discussed but, I have provided different links for people who wish to study this problem more in depth. Below is the list of methods that we are going to compare…..
Researchers borrowed equations from calculus to redesign the core machinery of deep learning so it can model continuous processes like changes in health.
Since Pearson developed principal component analysis (PCA) in 1901, feature learning (or called representation learning) has been studied for more than 100?years. During this period, many ‘shallow’ feature learning methods have been proposed based on various learning criteria and techniques, until the popular deep learning research in recent years. In this advanced review, we describe the historical profile of the shallow feature learning research and introduce the important developments of the deep learning models. Particularly, we survey the deep architectures with benefits from the optimization of their width and depth, as these models have achieved new records in many applications, such as image classification and object detection. Finally, several interesting directions of deep learning are presented and briefly discussed.
The k-nearest neighbors algorithm is characterized as a simple yet effective data mining technique. The main drawback of this technique appears when massive amounts of data – likely to contain noise and imperfections – are involved, turning this algorithm into an imprecise and especially inefficient technique. These disadvantages have been subject of research for many years, and among others approaches, data preprocessing techniques such as instance reduction or missing values imputation have targeted these weaknesses. As a result, these issues have turned out as strengths and the k-nearest neighbors rule has become a core algorithm to identify and correct imperfect data, removing noisy and redundant samples, or imputing missing values, transforming Big Data into Smart Data – which is data of sufficient quality to expect a good outcome from any data mining algorithm. The role of this smart data gleaning algorithm in a supervised learning context are investigated. This includes a brief overview of Smart Data, current and future trends for the k-nearest neighbor algorithm in the Big Data context, and the existing data preprocessing techniques based on this algorithm. We present the emerging big data-ready versions of these algorithms and develop some new methods to cope with Big Data. We carry out a thor- ough experimental analysis in a series of big datasets that provide guidelines as to how to use the k-nearest neighbor algorithm to obtain Smart/Quality Data for a high- quality data mining process. Moreover, multiple Spark Packages have been devel- oped including all the Smart Data algorithms analyzed.
Now that knowledge of machine learning is making its way into offices all around the world, company leaders have a strong desire to automate processes that have existed manually for years. Google’s Jasmeet Bhatia, a talented machine learning specialist, explained to us the ways in which Google is innovating unique processes meant to facilitate effective automation at the Data Science Salon in New York City in September 2018.
Clusterlab is a CRAN package (https://…/index.html ) for the routine testing of clustering algorithms. It can simulate positive (data-sets with >1 clusters) and negative controls (data-sets with 1 cluster). Why test clustering algorithms? Because they often fail in identifying the true K in practice, published algorithms are not always well tested, and we need to know about ones that have strange behaviour. I’ve found in many own experiments on clustering algorithms that algorithms many people are using are not necessary ones that provide the most sensible results. I can give a good example below.
Last week the R package ruimtehol was released on CRAN (https://…/ruimtehol ) allowing R users to easily build and apply neural embedding models on text data. It wraps the ‘StarSpace’ library https://…/StarSpace allowing users to calculate word, sentence, article, document, webpage, link and entity ’embeddings’. By using the ’embeddings’, you can perform text based multi-label classification, find similarities between texts and categories, do collaborative-filtering based recommendation as well as content-based recommendation, find out relations between entities, calculate graph ’embeddings’ as well as perform semi-supervised learning and multi-task learning on plain text. The techniques are explained in detail in the paper: ‘StarSpace: Embed All The Things!’ by Wu et al. (2017), available at https://…/1709.03856. You can get started with some common text analytical use cases by using the presentation we have built below. Enjoy!
A comparison of two transfer learning methods in Natural Language Processing: ‘ULMFiT’ and the ‘OpenAI Transformer’ for a multi-class classification task involving Twitter data.
Latent Dirichlet Allocation (LDA) is a ‘generative probabilistic model’ of a collection of composites made up of parts. In terms of topic modeling, the composites are documents and the parts are words and/or phrases (n-grams). But you could apply LDA to DNA and nucleotides, pizzas and toppings, molecules and atoms, employees and skills, or keyboards and crumbs.
In 2016, a Reddit user made a confession. FiletOfFish1066 had automated all of the work tasks and spent around six years ‘doing nothing’. While the original post seems to have disappeared from Reddit, there are numerous reports about the admission. The original poster suggested that he (all the stories refer to FiletOfFish1066 as male) spent about 50 hours doing ‘real work’. The rest?-?’nothing’. When his employer found out, FiletOfFish1066 was fired. I think this is the worst mistake an employer can make. He should have been given a pay rise. But that’s a topic for another article. Let’s talk about hiring algorithms to work for you?-?just like FiletOfFish1066 had a bunch of algorithms working for him.
mlrose provides functionality for implementing some of the most popular randomization and search algorithms, and applying them to a range of different optimization problem domains. In this tutorial, we will discuss what is meant by an optimization problem and step through an example of how mlrose can be used to solve them. This is the first in a series of three tutorials. Parts 2 and 3 will be published over the next two weeks.