Top Certification Courses in SAS, R, Python, Machine Learning, Big Data, Spark ( 2015-16 )

There are plenty of courses / certifications available to kick-start your career in analytics. These courses are provided in online, offline or hybrid mode. The only difficulty students face is to decide the best out of these courses. With some newly introduced courses, it has become even more difficult to make a convincing decision. The fear of investing in unworthy courses continues to remain the biggest hurdle for students. Last year, I received thousand of emails after I published Top Certifications on SAS, R, Python, Machine Learning. Later, I came to know that my analysis helped many people in deciding the best course for themselves. The year 2016 is no different either. I am back with my thorough analysis and rankings of best certifications courses in India. I assure you these rankings are unbiased. Last month, we released our rankings on Top Business Analytics Programs in India 2015-16. If you too are planning for a degree in analytics, you may like to consider these institutes. In this article, I’ll focus on ranking short duration and certification courses.


Bayesian regression with STAN Part 2: Beyond normality

In a previous post we saw how to perform bayesian regression in R using STAN for normally distributed data. In this post we will look at how to fit non-normal model in STAN using three example distributions commonly found in empirical data: negative-binomial (overdispersed poisson data), gamma (right-skewed continuous data) and beta-binomial (overdispersed binomial data). The STAN code for the different models is at the end of this posts together with some explanations.


Applied Spatial Data Science with R

I recently started working on my Ph.D dissertation which utilizes a vast amount of different spatial data types. During the process, I discovered that there were a lot of concepts about using R for spatial data analysis that I was not aware of. The purpose of this report is to document some of those concepts and my favorite packages for spatial data analysis. This report is organized as follows: Firstly we need to ask why R is a good tool of choice for spatial analysis; secondly we shall go through a typical data analysis life-cycle from getting spatial data to data preparation, exploration, visualization and geostatistical analysis.


The SDK for Jetpac’s iOS Deep Belief image recognition framework

This is a framework implementing the convolutional neural network architecture described by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. The processing code has been highly optimized to run within the memory and processing constraints of modern mobile devices, and can analyze an image in under 300ms on an iPhone 5S. It’s also easy to use together with OpenCV. We’re releasing this framework because we’re excited by the power of this approach for general image recognition, especially when it can run locally on low-power devices. It gives your phone the ability to see, and I can’t wait to see what applications that helps you build.


Deep Learning with Spark and TensorFlow

Neural networks have seen spectacular progress during the last few years and they are now the state of the art in image recognition and automated translation. TensorFlow is a new framework released by Google for numerical computations and neural networks. In this blog post, we are going to demonstrate how to use TensorFlow and Spark together to train and apply deep learning models.


7 Common Data Science Mistakes and How to Avoid Them

1. Confusion between Correlation and Causation
2. Not Choosing the Right Visualization Tools
3. Not Choosing the Right Model- Validation Frequency
4. Analysis without a Question/Plan
5. Paying Attention Only to Data
6. Ignore the probabilities
7. Building a Model on the Wrong Population


How to create confounders with regression: a lesson from causal inference

Regression is a tool that can be used to address causal questions in an observational study, though no one said it would be easy. While this article won’t close the vexing gap between correlation and causation, it will offer specific advice when you’re after a causal truth – keep an eye out for variables called ‘colliders,’ and keep them out of your regression!


high dimension Metropolis-Hastings algorithms

When discussing high dimension models with Ingmar Schüster Schuster [blame my fascination for accented characters!] the other day, we came across the following paradox with Metropolis-Hastings algorithms. If attempting to simulate from a multivariate standard normal distribution in a large dimension, when starting from the mode of the target, i.e., its mean γ, leaving the mode γis extremely unlikely, given the huge drop between the value of the density at the mode γ and at likely realisations (corresponding to the blue sequence). Even when relying on the very scale that makes the proposal identical to the target! Resorting to a tiny scale like Σ/p manages to escape the unhealthy neighbourhood of the highly unlikely mode (as shown with the brown sequence).


Mapping US Religion Adherence by County in R

Today’s guest post is by Julia Silge. After reading Julia’s analysis of religions in America (“This is the Place, Apparently“) I invited her to teach my readers how to map information about US Religious Adherence by County in R. Julia can be found blogging here or on Twitter. I took Ari’s free email course for getting started with the choroplethr package last year, and I have so enjoyed making choropleth maps and using them to explore demographic data. Earlier this month, I posted a project on my blog exploring the religious demographics of my adopted home state of Utah that made heavy use of the choroplethr package and today I’m happy to share some of the details of the data set I used here on Ari’s blog and do some new analysis.


Do basic R operations much faster in bash

However, sometimes you want to do really basic stuff with huge or a lot of files. At work, I have to do that a lot because I am mostly dealing with language data that often needs some pre-processing. Most of these operations are done much, much faster on the level of the operating system (preferably in Bash on Linux or Unix, i.e. Mac OS). And since R tries to load everything into working memory, these functions might also help you to do stuff with files that are too big for your RAM. This blog post is some kind of cheat sheet for me to remember some of the bash functions that prove very useful to me. (Most of the functions are quite basic for an advanced user of Linux or Unix, I guess).


I will survive!

I have to say I’m not really a massive expert on survival analysis, in the sense that it’s never been my main area of interest/research. But I think the particular case of cost-effectiveness modelling is actually very interesting − the main point is that unlike in a standard trial, where the observed data are used to determine some median estimate of the survival curve (typically across the different treatment arms), in health economic evaluations the target quantity is actually the mean survival time(s), because these are then usually used to extrapolate the (limited!) trial data into a full decision-analytic model, often covering a relatively large time horizon.