Checking residual distributions for non-normal GLMs

If you are fitting a linear regression with Gaussian (normally distributed) errors, then one of the standard checks is to make sure the residuals are approximately normally distributed. It is a good idea to do these checks for non-normal GLMs too, to make sure your residuals approximate the model’s assumption. Here I explain how to create quantile-quantile plots for non-normal data, using an example of fitting a GLM using Student-t distributed errors. Such models can be appropriate when the residuals are overdispersed.

Catching up: The Near-future of Commercial AI

Melissa Runfeldt is an Insight alumna and now currently a machine learning engineer at Salesforce, where she integrates new machine- and deep- learning frameworks into the Einstein Platform. She received her PhD in Computational Neuroscience from The University of Chicago and was a postdoctoral fellow at UCSF, where she researched information coding in the mammalian neocortex.

Singular Value Decomposition (SVD) Tutorial: Applications, Examples, Exercises

Every so often, maybe once or twice a decade, a new mathematical technique or algorithm comes along that changes the way we do things. Maybe the method starts out in a small niche or field but eventually expands to many other, completely unrelated disciplines and you cannot stop thinking of new uses for it. We’re talking about techniques like fast Fourier decomposition, Monte Carlo integration, simulated annealing, Runge Kutta integration, and pseudo-random number generation.

IBM Unveils a New High-Powered Analytics System for Fast Access to Data Science

IBM (NYSE: IBM) announced the Integrated Analytics System, a new unified data system designed to give users fast, easy access to advanced data science capabilities and the ability to work with their data across private, public or hybrid cloud environments.

Data Science 101: Sentiment Analysis in R Tutorial

Welcome back to Data Science 101! Do you have text data? Do you want to figure out whether the opinions expressed in it are positive or negative? Then you’ve come to the right place! Today, we’re going to get you up to speed on sentiment analysis. By the end of this tutorial you will:
• Understand what sentiment analysis is and how it works
• Read text from a dataset & tokenize it
• Use a sentiment lexicon to analyze the sentiment of texts
• Visualize the sentiment of text
If you’re the hands-on type, you might want to head directly to the notebook for this tutorial. You can fork it and have your very own version of the code to run, modify and experiment with as we go along.

ARIMA models and Intervention Analysis

In my previous tutorial Structural Changes in Global Warming I introduced the strucchange package and some basic examples to date structural breaks in time series. In the present tutorial, I am going to show how dating structural changes (if any) and then Intervention Analysis can help in finding better ARIMA models. Dating structural changes consists in determining if there are any structural breaks in the time series data generating process, and, if so, their dates. Intervention analysis estimates the effect of an external or exogenous intervention on a time series. As an example of intervention, a permanent level shift, as we will see in this tutorial. In our scenario, the external or exogenous intervention is not known in advance, (or supposed to be known), it is inferred from the structural break we will identify.

Building A Logistic Regression in Python, Step by Step

Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). In other words, the logistic regression model predicts P(Y=1) as a function of X.

Applied Data Science Series : Solving a Predictive Maintenance Business Problem

Over the past few months, many people have been asking me to write on what it entails to do a data science project end to end i.e from the business problem defining phase to modelling and its final deployment. When I pondered on that request, I thought it made sense. The data science literature is replete with articles on specific algorithms or definitive methods with code on how to deal with a problem. However an end to end view of what it takes to do a data science project for a specific business use case is little hard to find. From this week onward, we would be starting a new series called the Applied Data Science Series. In this series I would be giving an end to end perspective on tackling business use cases or societal problems within the framework of Data Science. In this first article of the applied data science series we will deal with a predictive maintenance business use case. The use case involved is to predict the end life of large industrial batteries, which falls under the genre of use cases called preventive maintenance use cases.

Tenets of Site Reliability Engineering (SRE)

While the nuances of workflows, priorities, and day-to-day operations vary from SRE team to SRE team, all share a set of basic responsibilities for the service(s) they support, and adhere to the same core tenets. In general, an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s). We have codified rules of engagement and principles for how SRE teams interact with their environment—not only the production environment, but also the product development teams, the testing teams, the users, and so on. Those rules and work practices help us to maintain our focus on engineering work, as opposed to operations work.

Automating content delivery in a DevOps world

Craig Adams explores the traditional DevOps pipeline, addresses how to think about CDN automation, and explains how Akamai is baking automation into its CDN.

Deep Learning for Object Detection: A Comprehensive Review

With the rise of autonomous vehicles, smart video surveillance, facial detection and various people counting applications, fast and accurate object detection systems are rising in demand. These systems involve not only recognizing and classifying every object in an image, but localizing each one by drawing the appropriate bounding box around it. This makes object detection a significantly harder task than its traditional computer vision predecessor, image classification.