Understanding Logistic Regression in Python

Learn about Logistic Regression, its basic properties, and build a machine learning model on a real-world application in Python.


Subgraphs is a visual IDE for developing computational graphs, particularly designed for deep neural networks. Subgraphs is built with tensorflow.js, node, and react, and serves on Google Cloud. An instance of subgraphs is available at https://…/.


An industrial-grade RPC framework used throughout Baidu, with 1,000,000+ instances(not counting clients) and thousands kinds of services, called ‘baidu-rpc’ inside Baidu. Only C++ implementation is opensourced right now.

Preventing Deep Neural Network from Overfitting

Thanks to a huge number of parameters (thousands and sometimes even millions) neural networks have a lot of freedom and can fit a variety of complex datasets. This unique ability has allowed them to take over many areas in which it has been difficult to make any progress in the ‘traditional’ machine learning era – such as image recognition, object detection or natural language processing. Sometimes, however, their greatest advantage becomes a potential weakness. Lack of control over the learning process of our model may lead to overfitting – situation when our neural network is so closely fitted to the training set that it is difficult to generalize and make predictions for new data. Understanding the origins of this problem and ways of preventing it from happening, is essential for a successful design of NN.

Machine Learning vs. Deep Learning

Artificial Intelligence encompasses a very broad scope. You could even consider something like Dijkstra’s shortest path algorithm as Artificial Intelligence. However, two categories of AI are frequently mixed up: Machine Learning and Deep Learning. Both of these refer to statistical modeling of data to extract useful information or make predictions. In this article, we will list the reasons why these two statistical modeling techniques are not the same and help you further frame your understanding of these data modeling paradigms.

Implementing Git in Data Science

I hope Part 1 sold you on the idea that version control is a critical tool for managing data science experiments. But the devil is in the details, so let´s talk about how to implement version control in a data science project. There are several paradigms for using git, but I have essentially adapted ‘feature branching’ for the purposes of data science experiments. Briefly, feature branching means there is a ‘master’ branch that you use as a baseline, and new features are added to the code base by branching off of ‘master’, making all the changes required to implement the feature, and then merging the new branch back to master once successful.

A Data Scientist’s Guide to Data Structures & Algorithms, Part 2

In my last post, I described Big O notation, why it matters, and common search and sort algorithms and their time complexity (essentially, how fast a given algorithm will run as data size changes). Now, with the basics down, we can begin to discuss data structures, space complexity, and more complex graphing algorithms.

Art of Choosing Metrics in Supervised Models Part 1

Evaluating results might be the most important part of each research since demonstrates how much accurate you were and how much you are close to (far from) your objectives. So choosing appropriate performance metrics is a challenging part of a research for everyone. Compared to other fields of science, in machine learning problems selecting correct evaluation metrics is a tricky deed. In this article I´m going to describe evaluation metrics which are commonly used in supervised learning. Supervised learning has been used in wide range of machine learning problems. It is first choice whenever you have label for data. In regards to the type of data, supervised learning is divided into two categories: Regression problems and classification problems. Your data have a direct impact on your choices from a pool of evaluation methods. If data is discrete so the predictive method should be chosen from classification methods and in opposite side if data is continuous, regression methods may fit to the data. Thus in Supervised models whether data is discrete or continuous our options become different; for example in regression methods people usually use metrics like Mean Absolute Error and Root Mean Squared Error. Precision, Recall, F1 Measure and Accuracy are some of frequently used metrics in classification. In the rest of this article I´ll describe and compare two most common continuous evaluation metrics and then their applications will be investigated.

Structuring ML projects so they can grow

I´ve worked on many data science and machine learning projects – big and small. From projects that took only a few notebooks to the ones that grew into tens of thousands of lines of code.

AI and the Future of Privacy

The basic definition of privacy is having the power to seclude oneself, or information about oneself, in order to limit the influence others can have on our behavior. Privacy has been traditionally recognized as a prerequisite for the exercise of human rights such as the freedom of expressions, the freedom of association, and the freedom of choice.

Reinforcement Learning Part 3: Practical Reinforcement Learning

Reinforcement Learning Part 3: Practical Reinforcement Learning

Calculus in Data Science and it uses

The calculus, more properly called analysis is the branch of mathematics studying the rate of change of quantities (which can be interpreted as slopes of curves) and the length, area, and volume of objects. The calculus is divided into differential and integral calculus.

Understanding quality of Primary Education in Karnataka through Cluster Analysis

The new millennium has seen a big jump in primary enrolment levels, up from 79% in 2001 to 93% in 2014. But as I showed in my earlier blog ‘Deciphering India through data visualization: Indian Education System’, the already low learning levels across many Indian states are declining even further in recent years. This has happened in spite of the fact that the factors which are supposed to influence the learning outcomes are showing a favourable trend. Private schooling is up, dropout rate is low and holding steady, number of students in a class are dropping and so is number of students per teacher. With the passage of RTE Act in 2009, more money is being pumped into the education system in the form of facilities, learning materials etc. So, what explains this sorry state of learning? Are the socio-economic factors like adult literacy, caste playing a role? In this study, I use clustering algorithm to dig into the potential reasons through a case study of Karnataka. The algorithm creates clusters of districts with a thrust on learning outcomes at primary level and the characteristics defining each cluster are analysed.