There is currently a lot of talk about automated machine learning. There is also a high level of skepticism. I am here with data scientists Paolo Tamagnini, Simon Schmid and Christian Dietz, to ask a few questions on this topic from their point of view and I found this concept of guided automation quite interesting as well, since it is directly involved in the practice of automated machine learning.
Imagine running a clustering algorithm which takes a sweet 4-5 hours to complete now either you wait for it to be finished or keep checking every hour with increasing frequency as your patience dwindles to see if the process has been completed. Imagine all such scenario and then imagine a notification mechanism which will keep you aware on the progress or notify you once it has completed. We are here to discuss that. Let’s start this short article with the easiest of all notification mechanisms fondly called the ‘beepr’.
One of the fundamental principle that is taught in classical statistics and by extension in machine learning is ‘thou shall not have a model that overfits’ ! What ‘overfitting’ means is that your neural network has learned a function that performs really well on the training data but when it is shown new data (also called test data) it does not provide correct inference/predictions. Typically in order for you to not have an overfitted model (or in other words you aspire to have a generalized model), you should make use of a concept called bias-variance tradeoff.
A layman’s guide to understanding the differences between Data Scientist, Research Scientist, Applied Scientist, and Business Intelligence Engineer. The rise of A.I., Machine Learning and Data Science in recent times has been unprecedented. Every industry has geared up for the race to embrace A.I. as an indispensable tool to pursue new avenues in strengthening or diversifying their existing offerings. This new adventure has, therefore, led to massive hiring of people with relevant A.I. skills. However, given the huge demand and supply gap combined with the hazy requirements from different organizations, the expectations from a data scientist and similar roles are largely misunderstood. Recently, with the hype around Data Science, the market has been floating several fancy roles that further fuelled confusion among the new entrants into the field. For aspiring enthusiasts, understanding where to start from and which job family to pursue is indeed overwhelming. This blog post aims to delineate the differences among the various job families within data science and help you understand how each role is different and what skills are required to pursue each of the respective job titles.
Wikidata is a nonprofit knowledge base that anyone can edit and use. Because of this, AI can be shaped to a certain degree by anyone. Backed by the Wikimedia foundation, a vibrant ecosystem helps Wikidata to make a mark on modern content processes. Its coverage (56 million items in April 2019), intuitive tools for end users and powerful interfaces for programmers make it a versatile tool for a large variety of usage scenarios – such as knowledge discovery, content enrichment, terminology work and translation. In autumn 2018 Wikidata enhanced its capabilities to capture information related to words, phrases and sentences in many of the world’s languages.
On the one hand, simple experiments like dice rolling, coin flipping, and spinning spinners are very much overdone in elementary curricula around the world. On the other hand, they can be a way to experience and understand some truly massive ideas about chance. Here is a lesson idea for coin flipping. I also wrote about this in my book, Teaching Mathematics Through Problem-Solving in K-12 Classrooms. A coin flip is a true 50/50 proposition. It is, by definition, a fair game. We experience ’50/50ness’ as true randomness. If I flip a coin in front of you, and it lands heads, there is a decent chance that, if asked, you would predict that the next one would be tails.
Asyou might know, supervised machine learning is one of the most commonly used and successful types of machine learning. In this article, we will describe supervised learning in more detail and explain several popular supervised learning algorithms. Remember that supervised learning is used whenever we want to predict a certain outcome from a given input, and we have examples of input/output pairs. We build a machine learning model from these input/output pairs, which comprise our training set. Our goal is to make accurate predictions for new, never-before-seen data. Supervised learning often requires human effort to build the training set, but afterwards automates and often speeds up an otherwise laborious or infeasible task.
Gartner’s 2019 Hype Cycle for Emerging Technologies is out, so it is a good moment to take a deep look at the report and reflect on our AI strategy as a company. You can find a brief summary of the complete report here. First of all, and before going into details about the content of the report and its implications on companies AI strategy, I would like to address a very repeated comment that I have seen in social networks these last days. A lot of people was surprised to see certain technologies dissappearing completely from the report despite appearing during previous years. As Gartner explains in its research, its Hype Cycle covers a very broad spectrum of topics, so if a specific technology is not featured it does not necessarily imply that they are not important, quite the opposite. One reason for some technologies to dissappear from the Hype Cycle might be that they are no longer ’emerging’ but key for business and IT.
Xarray is a python package for working with labeled multi-dimensional (a.k.a. N-dimensional, ND) arrays, it includes functions for advanced analytics and visualization. Xarray is heavily inspired by pandas and it uses pandas internally. While pandas is a great tool for working with tabular data, it can get a little awkward when data is of higher dimension. Pandas’ main data structures are Series (for 1-dimensional data) and DataFrame (for 2-dimensional data). It used to have Panel (for 3-dimensional data) but it was removed in version 0.25.0. The reader is assumed to be familiar with pandas, if you do not know what pandas is, you should check it out before xarray.
When using reinforcement learning (RL) algorithms it is common, given a large state space, to introduce some form of approximation architecture for the value function (VF). The exact form of this architecture can have a significant effect on an agent’s performance, however, and determining a suitable approximation architecture can often be a highly complex task. Consequently there is currently interest among researchers in the potential for allowing RL algorithms to adaptively generate (i.e. to learn) approximation architectures. One relatively unexplored method of adapting approximation architectures involves using feedback regarding the frequency with which an agent has visited certain states to guide which areas of the state space to approximate with greater detail. In this article we will: (a) informally discuss the potential advantages offered by such methods; (b) introduce a new algorithm based on such methods which adapts a state aggregation approximation architecture on-line and is designed for use in conjunction with SARSA; (c) provide theoretical results, in a policy evaluation setting, regarding this particular algorithm’s complexity, convergence properties and potential to reduce VF error; and finally (d) test experimentally the extent to which this algorithm can improve performance given a number of different test problems. Taken together our results suggest that our algorithm (and potentially such methods more generally) can provide a versatile and computationally lightweight means of significantly boosting RL performance given suitable conditions which are commonly encountered in practice.
New methods for time-to-event prediction are proposed by extending the Cox proportional hazards model with neural networks. Building on methodology from nested case-control studies, we propose a loss function that scales well to large data sets and enables fitting of both proportional and non-proportional extensions of the Cox model. Through simulation studies, the proposed loss function is verified to be a good approximation for the Cox partial log-likelihood. The proposed methodology is compared to existing methodologies on real-world data sets and is found to be highly competitive, typically yielding the best performance in terms of Brier score and binomial log-likelihood. A python package for the proposed methods is available at https://…/pycox.
We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning. We present several reinforcement learning algorithms that leverage randomized value functions and demonstrate their efficacy through computational studies. We also prove a regret bound that establishes statistical efficiency with a tabular representation.
We develop an automated variational method for inference in models with Gaussian process (GP) priors and general likelihoods. The method supports multiple outputs and multiple latent functions and does not require detailed knowledge of the conditional likelihood, only needing its evaluation as a black-box function. Using a mixture of Gaussians as the variational distribution, we show that the evidence lower bound and its gradients can be estimated efficiently using samples from univariate Gaussian distributions. Furthermore, the method is scalable to large datasets which is achieved by using an augmented prior via the inducing-variable approach underpinning most sparse GP approximations, along with parallel computation and stochastic optimization. We evaluate our approach quantitatively and qualitatively with experiments on small datasets, medium-scale datasets and large datasets, showing its competitiveness under different likelihood models and sparsity levels. On the large-scale experiments involving prediction of airline delays and classification of handwritten digits, we show that our method is on par with the state-of-the-art hard-coded approaches for scalable GP regression and classification.
Visualize the inner workings of machine learning models with greater detail and flexibility. The SHAP (SHapley Additive exPlanations) framework has proved to be an important advancement in the field of machine learning model interpretation. Developed by Scott Lundberg and Su-In Lee, SHAP combines several existing methods to create an intuitive, theoretically-sound approach to explain predictions for any model. SHAP builds model explanations by asking the same question for every prediction and feature: ‘How does prediction i change when feature j is removed from the model?’ So-called SHAP values are the answers. They quantify the magnitude and direction (positive or negative) of a feature’s effect on a prediction. As this article will show, SHAP values can produce model explanations with the clarity of a linear model.