Gartners Ethics related Predicts for 2019

• Organizations will require a professional code of conduct incorporating ethical use of data and AI.
• Legislation will require that 100% of conversational assistant applications, which use speech or text, identify themselves as being nonhuman entities.
• Consumers in mature markets will rely on artificial intelligence (AI) to decide what they eat, what they wear or where they live.
• Organizations will use explainable AI models to build trust with business stakeholders, up from almost no usage today.
• Fortune 1000 antitrust case will hinge on whether tacit cooperation among autonomous AI agents in competitive markets constitutes collusion.
• Large organizations will hire AI behavior forensic, privacy and customer trust specialists to reduce brand and reputation risk.


What to do when your training and testing data come from different distributions

To build a well-performing machine learning (ML) model, it is essential to train the model on and test it against data that come from the same target distribution. However, sometimes only a limited amount of data from the target distribution can be collected. It may not be sufficient to build the needed train/dev/test sets. Yet similar data from other data distributions might be readily available. What to do in such a case? Let us discuss some ideas!


Ensemble Learning: 5 Main Approaches

We outline the most popular Ensemble methods including bagging, boosting, stacking, and more.


Approaches to Text Summarization: An Overview

This article will present the main approaches to text summarization currently employed, as well as discuss some of their characteristics.


The Backpropagation Algorithm Demystified

One of the most important aspects of machine learning is its ability to recognize error margins in its output and be able to interpret data more precisely as increasing numbers of datasets are fed through its neural network. Commonly referred to as backpropagation, it is a process that isn’t as complex as you might think.


Childhood’s End

All revolutions come to an end, whether they succeed or fail. The digital revolution began when stored-program computers broke the distinction between numbers that mean things and numbers that do things. Numbers that do things now rule the world. But who rules over the machines? Once it was simple: programmers wrote the instructions that were supplied to the machines. Since the machines were controlled by these instructions, those who wrote the instructions controlled the machines.


Towards a Human Artificial Intelligence for Human Development

This paper discusses the possibility of applying the key principles and tools of current artificial intelligence (AI) to design future human systems in ways that could make them more efficient, fair, responsive, and inclusive.


Why Data Is Never Raw

In everyday usage, the term ‘data’ is associated with a jumble of notions about information, science, and knowledge. Countless reports marvel at the astonishing volumes of data being produced and manipulated, the efficiencies and new opportunities this has made possible, and the myriad ways in which society is changing as a result. We speak of ‘raw’ data and laud it for its independence from human judgment. On this basis, ‘data-driven’ (or ‘evidence-based’) decision-making is widely endorsed. Yet data’s purported freedom from human subjectivity also seems to allow us to invest it with agency: ‘Let the data speak for itself,’ for ‘The data doesn’t lie.’


Adding Firebase Authentication to Shiny

Firebase is a mobile and web application development platform owned by Google. Firebase provides front-end solutions for authentication, database storage, object storage, messaging, and more. Firebase drastically reduces the time needed to develop certain types of highly scalable and secure applications. As heavy R users, Shiny is our favorite web development tool. Shiny allows us to build apps to quickly communicate our R analyses. Shiny’s elegant reactive model makes it simple to construct complex interactive data flows. Naturally we are interested in how Firebase can super charge our Shiny applications. There are many ways the two tools can be used together, but, to us, the most obvious was to add Firebase authentication to a Shiny app.


Simulating Multi-state Models with R

Multi-state models are used to model a trajectory through multiple states. Survival models are a special case in which there are two states, alive and dead. Multi-state models are therefore useful in clinical settings because they can be used to predict or simulate disease progression in detail. Putter et al. provide a helpful tutorial. In this post, we will consider ‘clock-reset’ (i.e., semi-Markov) models rather than ‘clock-forward’ (i.e., Markov) models. In a ‘clock-reset’ model, time refers to time since entering the most recent state, whereas in a ‘clock-forward’ model time refers to time since entering the initial state. When using a ‘clock-reset’ approach, state occupancy probabilities can only be computed in a general fashion via simulation. The analysis will be restricted to parametric models, which are useful for extrapolating beyond the the time horizon in the existing data. Example probability distributions include the exponential, Weibull, Gompertz, gamma, log-logistic, lognormal, and generalized gamma. The flexsurv package will be used to estimate the parametric models and the mstate and hesim (admittedly developed by me) packages will be used to simulate the estimated models. We will compare the computational efficiency of different simulation methods.


6 Steps to Create Value from Machine Learning for Your Business

Step I: Build data fabric for your organization
Step II: Hire the right talent
Step III: Create a lab environment
Step IV: Operationalize successful pilots
Step V: Scale up for enterprise-wide adoption
Step VI: Drive cultural change


Comparing Machine Learning Models: Statistical vs. Practical Significance

A lot of work has been done on building and tuning ML models, but a natural question that eventually comes up after all that hard work is?-?how do we actually compare the models we’ve built? If we’re facing a choice between models A and B, which one is the winner and why? Could the models be combined together so that optimal performance is achieved? A very shallow approach would be to compare the overall accuracy on the test set, say, model A’s accuracy is 94% vs. model B’s accuracy is 95%, and blindly conclude that B won the race. In fact, there is so much more than the overall accuracy to investigate and more facts to consider. In this blog post, I’d love to share my recent findings on model comparison. I like using simple language when explaining statistics, so this post is a good read for those who are not so strong in statistics, but would love to learn a little more.


Multitask learning in TensorFlow with the Head API

A fundamental characteristic of human learning is that we learn many things simultaneously. The equivalent idea in machine learning is called multi-task learning (MTL), and it has become increasingly useful in practice, particularly for reinforcement learning and natural language processing. In fact, even in standard single-task situations, additional auxiliary tasks can be devised and included in the optimization process to help learning. This post provides an introduction to the field by showing how to solve a simple multi-task problem in an image classification benchmark. The focus is on an experimental component of TensorFlow, the Head API, that helps in designing custom estimators for MTL, by decoupling the shared component of the neural network from the task-specific ones. Along the route we will also have the opportunity to discuss additional features from the TensorFlow core, including tf.data, tf.image, and the custom estimators.


Goal-Based Forecasting in Prophet & R

Forecasting tools are everywhere. But it’s surprisingly hard to figure out how to adjust modeled forecasts to meet a goal or target, which business teams often ask for! With some historical data, you can use forecasting tools to predict into the future a specific metric (e.g., number of active users, revenue, products sold, babies born). Good forecasting tools will factor in seasonality (fluctuations based on days of the week or months of the year), overall growth, holidays, and other variables to model more plausible, reliable predictions. However, businesses set ‘goals’ and ‘targets.’ How do we construct forecasts that meet those goals and targets? Let’s pretend we’re data scientists at Tuckersoft, and we’ve just built a forecast that predicts 15,000 copies of Nohzdyve will be sold in July 1985. Because we’re great data scientists, we’ve also provided projected sales for every day in July (e.g., ‘On July 3rd, we project to sell 472 copies’), and those daily sales take into account seasonality, growth, and holidays.


Advanced Jupyter Notebooks: A Tutorial

Lying at the heart of modern data science and analysis, Jupyter Notebooks are an incredibly powerful tool at both ends of the project lifecycle. Whether you’re rapidly prototyping ideas, demonstrating your work, or producing fully fledged reports, notebooks can provide an efficient edge over IDEs or traditional desktop applications. Following on from ‘Jupyter Notebook for Beginners: A Tutorial’, this guide will take you on a journey from the truly vanilla to the downright dangerous. That’s right! Jupyter’s wacky world of out-of-order execution has the power to faze, and when it comes to running notebooks inside notebooks, things can get complicated fast. This guide aims to straighten out some sources of confusion and spread ideas to pique your interest and spark your imagination. There are already plenty of great listicles of neat tips and tricks, so here we will take a more thorough look at Jupyter’s offerings.
Advertisements