Artificial Intelligence Is Almost Ready for Business
Artificial Intelligence (AI) is an idea that has oscillated through many hype cycles over many years, as scientists and sci-fi visionaries have declared the imminent arrival of thinking machines. But it seems we’re now at an actual tipping point. AI, expert systems, and business intelligence have been with us for decades, but this time the reality almost matches the rhetoric, driven by the exponential growth in technology capabilities (e.g., Moore’s Law), smarter analytics engines, and the surge in data.

More than you ever wanted to know about GeoJSON
Let’s look at GeoJSON in a little more depth, from the ground up. Understanding these concepts will help you understand geospatial data in general, too: the basic concepts behind GeoJSON have been a part of geo since the very beginning. This should be read along with the GeoJSON spec itself, which is authoritative and, for a formal specification of a format, pretty readable.

Factor Evaluation in Quantitative Portfolio Management
When it comes to managing a portfolio of stocks versus a benchmark the problem is very different from defining an absolute return strategy. In the former one has to hold more stocks than in the later where no stocks at all can be held if there is not good enough opportunity. The reason for that is the tracking error. This is defined as the standard deviation of the portfolio return minus the benchmark return. The less stocks is held vs. a benchmark the higher the tracking error (e.g higher risk).

R: Spliting a Node in a Tree
… by Gini index or by entropy …

Presto versus Hive: What You Need to Know
There is much discussion in the industry about analytic engines and, specifically, which engines best meet various analytic needs. This post looks at two popular engines, Hive and Presto, and assesses the best uses for each.

Book Review: Statistics Done Wrong
Many times we data scientists, not being statisticians in the strictest sense, have concern that we may commit some kind of statistical faux pas. Fear no more! With the release of a probing new book ‘Statistics Done Wrong’ by Alex Reinhart, we now have a curious road map for avoiding statistical fallacies. As a Ph.D. student and statistics instructor at Carnegie Mellon University, Reinhart shows how scientific progress depends on good research, and good research needs good statistics. But statistical analysis is tricky to get right, even for the best data scientists. You’ll be surprised how many practicing data scientists are doing it wrong. Although written for a broad audience of scientific researchers, I found the book compelling for me personally as someone working daily in data science. Many of the same principles I use regularly, such as linear regression, overfitting, confounding variables, cross validation, feature selection, p-values, confidence intervals, etc., are familiar concepts covered in the book. The best part of the book is all the examples of statistical blunders in modern science. Reinhart provides ample cases of embarrassing errors and omissions in recent research. You’ll learn about the misconceptions and scientific politics that allow these mistakes to happen, and lead you to a path of reform in the way you do statistics.

Do We Need More Training Data or More Complex Models?
Do we need more training data? Which models will suffer from performance saturation as data grows large? Do we need larger models or more complicated models, and what is the difference?