**XmR Chart – Step-by-Step Guide by Hand and with R**

Is your process in control? The XmR chart is a great statistical process control (SPC) tool that can help you answer this question, reduce waste, and increase productivity. We’ll cover the concepts behind XmR charting and explain the XmR control constant with some super simple R code. Lastly, we’ll cover how to make the XmR plot by hand, with base R code, and the ggQC package.

This came up in a discussion a few years ago, where people were arguing about the meaning of probability: is it long-run frequency, is it subjective belief, is it betting odds, etc? I wrote:

• Probability is a mathematical concept. I think Martha Smith’s analogy to points, lines, and arithmetic is a good one. Probabilities are probabilities to the extent that they follow the Kolmogorov axioms. (Let me set aside quantum probability for the moment.) The different definitions of probabilities (betting, long-run frequency, etc), can be usefully thought of as models rather than definitions. They are different examples of paradigmatic real-world scenarios in which the Kolmogorov axioms (thus, probability).

• Probability is a mathematical concept. To define it based on any imperfect real-world counterpart (such as betting or long-run frequency) makes about as much sense as defining a line in Euclidean space as the edge of a perfectly straight piece of metal, or as the space occupied by a very thin thread that is pulled taut. Ultimately, a line is a line, and probabilities are mathematical objects that follow Kolmogorov’s laws. Real-world models are important for the application of probability, and it makes a lot of sense to me that such an important concept has many different real-world analogies, none of which are perfect.

• Probability is a mathematical concept. I think Martha Smith’s analogy to points, lines, and arithmetic is a good one. Probabilities are probabilities to the extent that they follow the Kolmogorov axioms. (Let me set aside quantum probability for the moment.) The different definitions of probabilities (betting, long-run frequency, etc), can be usefully thought of as models rather than definitions. They are different examples of paradigmatic real-world scenarios in which the Kolmogorov axioms (thus, probability).

• Probability is a mathematical concept. To define it based on any imperfect real-world counterpart (such as betting or long-run frequency) makes about as much sense as defining a line in Euclidean space as the edge of a perfectly straight piece of metal, or as the space occupied by a very thin thread that is pulled taut. Ultimately, a line is a line, and probabilities are mathematical objects that follow Kolmogorov’s laws. Real-world models are important for the application of probability, and it makes a lot of sense to me that such an important concept has many different real-world analogies, none of which are perfect.

**Unleash the power of Jupyter Notebooks**

Jupyter Notebooks are an incredibly powerful tool at both ends of the project life-cycle. Whether you’re rapidly prototyping ideas, demonstrating your work, or producing fully fledged reports, notebooks can provide an efficient edge over IDEs or traditional desktop applications. It is a very flexible tool to create readable analyses because one can keep code, images, comments, formula and plots together. And it is completely free.

**Understanding how to explain predictions with ‘explanation vectors’**

In a recent post I introduced three existing approaches to explain individual predictions of any machine learning model. After the posts focused on LIME and Shapley values, now it’s the turn of Explanation vectors, a method presented by David Baehrens, Timon Schroeter and Stefan Harmeling in 2010. As we have seen in the mentioned posts, explaining a decision of a black box model implies understanding what input features made the model give its prediction for the observation being explained. Intuitively, a feature has a lot of influence on the model decision if small variations in its value cause large variations of the model’s output, while a feature has little influence on the prediction if big changes in that variable barely affect the model’s output.

**Top Sources For Machine Learning Datasets**

It can be quite hard to find a specific dataset to use for a variety of machine learning problems or to even experiment on. The list below does not only contain great datasets for experimentation but also contains a description, usage examples and in some cases the algorithm code to solve the machine learning problem associated with that dataset.

**Top 10 Books on NLP and Text Analysis**

1. Natural Language Processing with Python

2. Foundations of Statistical Natural Language Processing

3. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition

4. The Oxford Handbook of Computational Linguistics

5. Text Mining with R

6. Neural Network Methods in Natural Language Processing (Synthesis Lectures on Human Language Technologies)

7. Taming Text

8. Deep Learning in Natural Language Processing

9. Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning

10. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems 1st Edition

2. Foundations of Statistical Natural Language Processing

3. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition

4. The Oxford Handbook of Computational Linguistics

5. Text Mining with R

6. Neural Network Methods in Natural Language Processing (Synthesis Lectures on Human Language Technologies)

7. Taming Text

8. Deep Learning in Natural Language Processing

9. Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning

10. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems 1st Edition

**The Next Level of Data Visualization in Python**

The sunk-cost fallacy is one of many harmful cognitive biases to which humans fall prey. It refers to our tendency to continue to devote time and resources to a lost cause because we have already spent?-?sunk?-?so much time in the pursuit. The sunk-cost fallacy applies to staying in bad jobs longer than we should, slaving away at a project even when it’s clear it won’t work, and yes, continuing to use a tedious, outdated plotting library?-?matplotlib?-?when more efficient, interactive, and better-looking alternatives exist.

**The Most Intuitive and Easiest Guide for Artificial Neural Network**

Neural Network! Deep learning! Artificial Intelligence! Anyone who is living in a world of 2019, would have heard of these words more than once. And you probably have seen the awesome works such as image classification, computer vision, and speech recognition. So are you also interested in building those cool AI project but still have no idea of what artificial neural network is? There are already hundreds of articles explaining the concept of the artificial neural network with the name of ‘a beginner’s guide on back propagation in ANN’ or ‘A gentle introduction of the artificial neural network.’ They are really great already, but I found It could be still hard for someone who is not comfortable with mathematical expressions.

**The Difference Between Random Factors and Random Effects**

Mixed models are hard. They’re abstract, they’re a little weird, and there is not a common vocabulary or notation for them. But they’re also extremely important to understand because many data sets require their use. Repeated measures ANOVA has too many limitations. It just doesn’t cut it any more. One of the most difficult parts of fitting mixed models is figuring out which random effects to include in a model. And that’s hard to do if you don’t really understand what a random effect is or how it differs from a fixed effect. I have found one issue particularly pervasive in making this even more confusing than it has to be. People in the know use the terms ‘random effects’ and ‘random factors’ interchangeably. But they’re different. ??????This difference is probably not something you’ve thought about. But it’s impossible to really understand random effects if you can’t separate out these two concepts.

**The Book of Why’ by Pearl and Mackenzie**

Judea Pearl and Dana Mackenzie sent me a copy of their new book, ‘The book of why: The new science of cause and effect.’ There are some things I don’t like about their book, and I’ll get to that, but I want to start with a central point of theirs with which I agree strongly.

**Taming False Discoveries with Empirical Bayes**

Todays data scientist have an enormous amount of data at their disposal. But they also face a new problem: With so many features to choose from, how do we prevent making false discoveries? p-values lend themselves to false discoveries. Assuming that there is no effect, running 100 independent p-value tests will yield 5 positive outcomes on average. Being misled 5 times is manageable, but if we run millions of hypothesis tests, the situation quickly becomes unbearable. We need a method that allows us to control the amount of false positives we find. It should scale with the number of hypothesis we run and allow us to be confident about our findings as a whole.

**Survival Analysis: Intuition & Implementation in Python**

There is a statistical technique which can answer business questions as follows:

• How long will a particular customer remain with your business? In other words, after how much time this customer will churn?

• How long will this machine last, after successfully running for a year ?

• What is the relative retention rate of different marketing channels?

• What is the likelihood that a patient will survive, after being diagnosed?

If you find any of the above questions (or even the questions remotely related to them) interesting then read on.

The purpose of this article is to build an intuition, so that we can apply this technique in different business settings.

• How long will a particular customer remain with your business? In other words, after how much time this customer will churn?

• How long will this machine last, after successfully running for a year ?

• What is the relative retention rate of different marketing channels?

• What is the likelihood that a patient will survive, after being diagnosed?

If you find any of the above questions (or even the questions remotely related to them) interesting then read on.

The purpose of this article is to build an intuition, so that we can apply this technique in different business settings.

**Scaling Jupyter notebooks with Kubernetes and Tensorflow**

Gathering facts and data to understand better the world we live in has become the new norm. From self-driving cars to smart personal assistants, data and data science is everywhere. Even the phones that we carry in our pockets now feature dedicated units for machine learning. In fact, there has never been more need for a more performant and efficient system to ingest and extract meanings out of large volumes of numbers. The challenges for data scientists and engineers is how to design pipelines and processes that can operate at scale and in real-time. Traditionally, the setup of the services and tools required for effective development and deployment of deep learning model has been time-consuming and error-prone.

Advertisements