Generating Text Using a Markov Model

A Markov Chain is a random process, where we assume the previous state(s) hold sufficient predictive power in predicting the next state. Unlike flipping a coin, these events are dependent. It’s easier to understand through an example.

How Many Experts Does It Take

Just supposing the big boss just gave you this assignment: I want to get our analysts and data scientists organized around a standard set of software – make a recommendation for which platform we should choose.

Meta Collection of Data Science and Big Data Analytics Best Practices, Lessons Learned, and Success Stories

The following list of collections of collections of Data Science and Big Data Analytics Best Practices, Lessons Learned, and Success Stories is an updated version of a previous list posted here.

Data science blogs

A curated list of data science blogs

3 Companies & Case Studies of AI In Investing

Artificial Intelligence in investing has long been practiced by secretive hedge funds like Renaissance Technologies. However, some of the same capabilities are being offered by companies to other players. Here are three examples of companies on how they are using data sets to predict stock & other financial market outcomes.

Big Data Analytics Pain Points

Big data analytics is still in infancy, and we haven’t yet embraced a data-driven decision making. Here, we discussed the current pain points in it and how you can deal them in better ways.

Statistics – Understanding the Levels of Measurement

One of the most important and basic step in learning Statistics is understanding the levels of measurement for the variables. Let’s take a step back and first look at what a variable is? A variable is any quantity that can be measured and whose value varies through the population. For example, if we consider a population of students, the student’s nationality, marks, grades, etc are all the variables defined for the entity student, and their corresponding value will differ for each student. Looking at the larger picture, if we want to compute the average salary of the US citizens, we can go out and record the salary of each and every person to compute the average or choose a random sample from the entire population and compute the average salary for that sample, and then use the statistical tests to derive conclusions for a wider population.

Contracting and simplifying a network graph

I discovered a few beautiful functions in the igraph package that allows you to contract and simplify a graph.

Practical Kullback-Leibler (KL) Divergence: Discrete Case

KL divergence (Kullback-Leibler57) or KL distance is non-symmetric measure of difference between two probability distributions. It is related to mutual information and can be used to measure the association between two random variables. In this short tutorial, I show how to compute KL divergence and mutual information for two categorical variables, interpreted as discrete random variables.