A python library for decision tree visualization and model interpretation.

Decision trees are the fundamental building block of gradient boosting machines and Random Forests(tm), probably the two most popular machine learning models for structured data. Visualizing decision trees is a tremendous aid when learning how these models work and when interpreting models. Unfortunately, current visualization packages are rudimentary and not immediately helpful to the novice. For example, we couldn’t find a library that visualizes how decision nodes split up the feature space. It is also uncommon for libraries to support visualizing a specific feature vector as it weaves down through a tree’s decision nodes; we could only find one image showing this. So, we’ve created a general package for scikit-learn decision tree visualization and model interpretation, which we’ll be using heavily in an upcoming machine learning book (written with Jeremy Howard).

Counting No. of Parameters in Deep Learning Models by Hand

Counting the number of trainable parameters of deep learning models is considered too trivial, because your code can already do this for you. But I’d like to keep my notes here for us to refer to once in a while. Here are the models that we’ll run through:
1. Feed-Forward Neural Network (FFNN)
2. Recurrent Neural Network (RNN)
3. Convolutional Neural Network (CNN)

Model-based clustering and classification of functional data

Complex data analysis is a central topic of modern statistics and learning systems which is becoming of broader interest with the increasing prevalence of high-dimensional data. The challenge is to develop statistical models and autonomous algorithms that are able to discern knowledge from raw data, which can be achieved through clustering techniques, or to make predictions of future data via classification techniques. Latent data models, including mixture model-based approaches, are among the most popular and successful approaches in both supervised and unsupervised learning. Although being traditional tools in multivariate analysis, they are growing in popularity when considered in the framework of functional data analysis (FDA). FDA is the data analysis paradigm in which each datum is a function, rather than a real vector. In many areas of application, including signal and image processing, functional imaging, bioinformatics, etc., the analyzed data are indeed often available in the form of discretized values of functions, curves, or surfaces. This functional aspect of the data adds additional difficulties when compared to classical multivariate data analysis. We review and present approaches for model-based clustering and classification of functional data. We present well-grounded statistical models along with efficient algorithmic tools to address problems regarding the clustering and the classification of these functional data, including their heterogeneity, missing information, and dynamical hidden structures. The presented models and algorithms are illustrated via real-world functional data analysis problems from several areas of application.

Decision Boundary Visualization(A-Z)

Classification problems have been very common and essential in the field of Data Science. For example: Diabetic Retinopathy, Mood or Sentiment Analysis, Digit Recognition, Cancer-Type prediction (Malignant or Benign) etc. These problems are often solved by Machine Learning or Deep Learning. Also in Computer Vision, projects like Diabetic Retinopathy or Glaucoma Detection, Texture Analysis is often used now-a-days instead of Classical Machine Learning with conventional Image Processing or Deep Learning. Although Deep Learning has been the state-of-the-art in Diabetic Retinopathy as per the research paper: ‘A Deep Learning Method for the detection of Diabetic Retinopathy’ .

What lies ahead for Python, Java, Go, C#, Kotlin, and Rust

Change is the only constant in the technology world, and programming languages are no exception. Competition among languages has led to improvements across the board. Established players like Java have added major features, while upstart languages like Go and Rust look to improve packaging and exception handling to add ‘fit and finish’ to their ecosystems. As we enter 2019, we asked some of our O’Reilly authors and training course instructors for their thoughts on what’s in store for established players and fast-growing languages.

Using Machine Learning to Build Better Machine Learning

Automated machine learning(AutoML) is becoming one of the most popular topics in modern data science applications. Often, people see AutoML as a mechanism to use out-of-the-box machine learning models without the need of sophisticated data science knowledge. While theoretically, this argument makes sense the reality if a bit different. In the current stage of artificial intelligence(AI), most real world applications require some level of machine learning knowledge. The scenarios that you can solve with a vanilla API like the Watson Developer Cloud or Microsoft Cognitive Services are very basic and represent only a small percentage of the broader spectrum of machine learning scenarios. If that’s the case, then we should wonder what’s the real value of AutoML. In our experience at Invector Labs, the best way to think about AutoML is as a helper or enabler of machine learning applications that complements the skill of a data scientists. More specifically, we believe the most viable case for AutoML in the real world is to help data scientists to select and architect the right machine learning model for a given problem.

Automated Machine Learning in Python

As we already know, machine learning is a way of automating complex problem-solving. But can machine learning itself be automated? That’s what we’ll explore in this article. By its end, we’ll have answered that question and shown practical ways it can be accomplished.

How to build an API for a machine learning model in 5 minutes using Flask

As a data scientist consultant, I want to make impact with my machine learning models. However, this is easier said than done. When starting a new project, it starts with playing around with the data in a Jupyter notebook. Once you’ve got a full understanding of what data you’re dealing with and have aligned with the client on what steps to take, one of the outcomes can be to create a predictive model. You get excited and go back to your notebook to make the best model possible. The model and the results are presented and everyone is happy. The client wants to run the model in their infrastructure to test if they can really create the expected impact. Also, when people can use the model, you get the input necessary to improve it step by step. But how can we quickly do this, given that the client has some complicated infrastructure that you might not be familiar with?

National Strategy for Artificial Intelligence – Discussion Paper

Artificial Intelligence (AI) is poised to disrupt our world. With intelligent machines enabling high-level cognitive processes like thinking, perceiving, learning, problem solving and decision making, coupled with advances in data collection and aggregation, analytics and computer processing power, AI presents opportunities to complement and supplement human intelligence and enrich the way people live and work.
This strategy document is premised on the proposition that India, given its strengths and characteristics, has the potential to position itself among leaders on the global AI map – with a unique brand of #AIforAll. The approach in this paper focuses on how India can leverage the transformative technologies to ensure social and inclusive growth in line with the development philosophy of the government. In addition, India should strive to replicate these solutions in other similarly placed developing countries.

Understanding AdaBoost

Anyone starting to learn Boosting technique should start first with AdaBoost or Adaptive Boosting. Here I will try to explain it conceptually. When nothing works, Boosting does. Nowadays many people use either XGBoost or LightGBM or CatBoost to win competitions at Kaggle or Hackathons. AdaBoost is the first stepping stone in the world of Boosting. AdaBoost is one of the first boosting algorithms to be adapted in solving practices. Adaboost helps you combine multiple ‘weak classifiers’ into a single ‘strong classifier’. Here are some (fun) facts about Adaboost!
• The weak learners in AdaBoost are decision trees with a single split, called decision stumps.
• AdaBoost works by putting more weight on difficult to classify instances and less on those already handled well.
• AdaBoost algorithms can be used for both classification and regression problem.

Recommender Systems and Hyper-parameter tuning

Almost all media services have a particular section where the system recommends things to you, being things a movie in Netflix, a product to buy in Amazon, a playlist in Spotify, a page to like in Facebook and so many others. We all have seen the ‘You might like this’ section at least once in almost all services we daily use; the algorithm behind that is a Recommender System. Some might consider a RS as a system that tries to get to know you better than yourself and give you exactly what you need. Sounds kind of (maybe a lot) creepy if you put it this way but its goal is to make your experience the best possible by learning your likes and dislikes.

The Poisson Distribution and Poisson Process Explained

A tragedy of statistics in most schools is how dull it’s made. Teachers spend hours wading through derivations, equations, and theorems, and, when you finally get to the best part?-?applying concepts to actual numbers?-?it’s with irrelevant, unimaginative examples like rolling dice. This is a shame as stats can be enjoyable if you skip the derivations (which you’ll likely never need) and focus on using the ideas to solve interesting problems. In this article, we’ll cover Poisson Processes and the Poisson distribution, two important probability concepts. After highlighting only the relevant theory, we’ll work through a real-world example, showing equations and graphs to put the ideas in a proper context.

Evaluating A Real-Life Recommender System, Error-Based and Ranking-Based

A recommender system aims to find and suggest items of likely interest based on the users’ preferences. Recommender system is one of the most valuable applications in machine learning today. Amazon attributes its 35% of revenue to its recommender system. Evaluation is an integral part of researching and developing any recommender system. Depends on your business and available data, there are many ways to evaluate a recommender system. We will try a few today.

Five Latest Machine Learning eBooks

1. Artificial Intelligence and Machine Learning Fundamentals by Zsolt Nagy
2. Go Machine Learning Projects by Xuanyi Chew
3. Machine Learning for Mobile by Revathi Gopalakrishnan and Avinash Venkateswarlu
4. Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen
5. R Machine Learning Projects by Dr. Sunil Kumar Chinnamgari

Open sourcing wav2letter++, the fastest state-of-the-art speech system, and flashlight, an ML library going native

A new fully convolutional approach to automatic speech recognition and wav2letter++, the fastest state-of-the-art end-to-end speech recognition system available. The approach leverages convolutional neural networks (CNNs) for acoustic modeling and language modeling, and is reproducible, thanks to the toolkits we are releasing jointly.