Introduction to bootstrap with applications to mixed-effect models

Bootstrap is one of the most famous resampling technique and is very useful to get confidence intervals in situations where classical approach (t- or z- tests) would fail.


Statistical Models That Support Design Thinking: Driver Analysis vs. Partial Correlation Networks

We have been talking about design thinking in marketing since Tim Brown’s Harvard Business Review article in 2008. It might be easy for the data scientist to dismiss the approach as merely a type of brainstorming for new products or services. Yet, design issues do arise in data visualization where we are concerned with communicating our findings. However, my interest is model selection: Should the analyst select one statistical model over another because the user might find it more helpful in planning interventions or designing new products and services?


The Machine Learning Times of Year 2015 – A Powerful Growth Story

2015 has been the year of machine learning. The revolution to let machines make sense from huge data is gaining momentum by the day (we just created a few data points in writing and reading this article!). Not just Google, companies like Amazon, Accenture, Toyota, Hitachi, Tesla, Johnson & Johnson and many more embraced machine learning at massive scale and improved their products & services. Also, it is not just big companies, startups have been an equal part of this revolution. Start-ups have come out with innovative applications of machine learning and some of them got acquired before they could test the market! In order to bring out the developments, we’ve created an e-paper ‘Machine Learning Times’ (till Nov’ 15). We’ll also update it at the end December. It conveys significant development happened during the year with Machine Learning. Catch the complete story below.


Using machine learning to predict gender

This all started with a simple question: could we train an algorithm to determine if a Twitter account belonged to a man or a woman? With that in mind, we ran a simple data categorization job, fired up our brand new CrowdFlower AI feature, and tried to answer just that. What we found was, well, pretty damn interesting. But no spoilers. We’ll get to all that in a second. Let’s take a step back and start at the beginning.


R Online Classes With Leading Experts at Statistics.com

Statistics.com is an online learning website with 100+ courses in statistics, analytics, data mining, text mining, forecasting, social network analysis, spatial analysis, etc.


Making an R based ML model accessible through a simple API

Building an accurate machine learning (ML) model is a feat on its own. But once you’re there, you still need to find a way to make the model accessible to users. If you want to create a GUI for it, the obvious approach is going after shiny. However, often you don’t want a direct GUI for a ML model but you want to integrate the logic you’ve created into an existing (or new) application things become a bit more difficult.


Fitting linear mixed models for QTL mapping

Linear mixed models (LMMs) have become widely used for dealing with population structure in human GWAS, and they’re becoming increasing important for QTL mapping in model organisms, particularly for the analysis of advanced intercross lines (AIL), which often exhibit variation in the relationships among individuals.


Top 5 Graph Visualisation Tools

1. Gephi
2. Tom Sawyer Perspectives
3. Keylines
4. Linkurious
5. GraphX


a programming bug with weird consequences

One student of mine coded by mistake an independent Metropolis-Hastings algorithm with too small a variance in the proposal when compared with the target variance.


Detecting In-App Purchase Fraud with Machine Learning

Hacking applications such as Freedom, iAP Cracker, iAPFree, etc. allow users to make in-app purchases for free. With these kinds of hacks the player receives the coins, gems, levels or lives they purchased without paying any money. If the game developer did not implement any validation process on the in-app purchases, such as SOOMLA’s fraud protection, the purchases are recorded as real purchases in his system. As a result, the reported revenue may differ greatly from the real revenue (especially in popular games with lots of fraud).