1. Multi-lingual is the rule. 2. Text analysis gains recognition as a key business-solution capability 3. Machine learning, stats, and language engineering coexist. 4. Image analysis enters the mainstream. 5. A break-out for speech analytics, with video to come. 6. Expanded emotion analytics. 7. ISO emoji analytics. 8. Deeper insights from networks plus content 9. In 2016, you’ll be reading (and interacting with) lots more machine-written content. 10. Machine translation matures.
One of the important assumptions of linear regression is that, there should be no heteroscedasticity of residuals. In simpler terms, this means that the variance of residuals should not increase with fitted values of response variable. In this post, I am going to explain why it is important to check for heteroscedasticity, how to detect it in your model? If is present, how to make amends to rectify the problem, with example R codes. This process is sometimes referred to as residual analysis.
We’re happy to announce that Stan 2.9.0 is fully available for CmdStan, RStan, and PyStan – it should also work for Stan.jl (Julia), MatlabStan, and StataStan.
Analysis has been with us since the dawn of human consciousness. When early man tried to understand why an animal drops when you hit it with a rock, he made the first analysis. Analysis means nothing more than breaking down an action or event into its individual parts in order to understand it better. Today’s world is to a great extent run by computers. With it, analysis morphed into a complex and sophisticated discipline that looks at data and data streams to discover inherent patterns. This is called analytics. It uses science and computer technology to find, track and gather data to analyze and interpret. Analytical probing can become the basis for predictive analytics. If you own a website, a blog, an email or social media account you see ads. They are ubiquitous and often annoying. Ever wondered why you suddenly are bombarded with ads for automobiles as soon tell a friend on Facebook that you are looking for a new car? You just have met custom analytics applications. Google analytics to be specific. We also refer to it often by the generic term ‘data mining.’ However, Google is only one of the most visible incarnations of a specific type of analytics. There are many more categories of custom analytics. Together they spawned an entire interdisciplinary industry that delivers this type of comprehensive analytics. Forio is a company that specializes in developing custom analytics applications for their customers.
In recent years, companies have invested millions of dollars in the field of data science. They have shown immense faith in its potential to create a better world, a better life and a better future. The powerful trio of mathematics, computer science and domain expertise has re-defined the process of decision making. Intuition or gut no longer remains the key to make complicated decisions. What was considered to be path breaking invention few years back, has now become obsolete. Data Science has empowered us with possibilities beyond imagination. Over the years, lots of things have decayed & evolved. Still, the best of technology is yet to come. I’m excited to see it in front of my eyes!
By visually accessible, I mean that you don’t see the title, but instead, the top picture contained in each tweeted article, as illustrated below. It allows you to easily find out if you already read the article in question, or if it is worth reading. Also, I’m not sure exactly what the total is, it is certainly well above 500, probably in the thousands if not above 10,000 articles, as you can scroll down indefinitely when visiting these Twitter pages.
Over the last 6 years, thousands of students and faculty have downloaded Revolution R Enterprise (RRE) from Revolution Analytics for free, making it possible for them to do statistical modeling on large data sets with the same R language used by savvy statisticians and data scientists in business and industry. In addition to this individual scholar program (ISP), Revolution Analytics launched two initiatives in 2014 to provide academic institutions and non-profit public service companies with site licenses for the nominal annual licensing fee of $999. Both the Academic Institution Program (AIP) and Public Service program (PSP) enabled qualifying institutions to install RRE on servers and Hadoop clusters without restrictions. Now, seven months after Microsoft’s acquisition of Revolution Analytics, all three of these programs are being folded into Microsoft programs that will make it even easier for individual students and institutions to get started with the newest release of RRE, now known as Microsoft R Server.
In the nine months since Microsoft acquired Revolution Analytics, there have been a steady stream of updates to Revolution R Open and Revolution R Enterprise (not to mention integration of R with SQL Server, PowerBI, Azure and Cortana Analytics). Now, we have yet more updates to announce along with fresh new names. Revolution R Open is now Microsoft R Open with an update coming later this month, and Revolution R Enterprise is now Microsoft R Server, and available for purchase now, or for download free of charge for developers and students.
You might have missed one significant bit of news tucked into yesterday’s Microsoft R announcement: R is coming to Visual Studio:
A bit of explanation: Random forests are a collection of independent trees. Each tree is made up of nodes arranged in tree structure. Every node receives data from the top, and splits it to its 2 children based on some very simple decision (such as if x-coordinate > 3). To get the decision, during training a few random splitting rules are generated at each node and the ‘best’ one is kept.
ggplot2 is the most elegant and aesthetically pleasing graphics framework available in R. It has a nicely planned structure to it. This tutorial focusses on exposing this underlying structure you can use to make any ggplot. But, the way you make plots in ggplot2 is very different from base graphics making the learning curve steep. So leave what you know about base graphics behind and follow along. You are just 5 steps away from cracking the ggplot puzzle.
I’ve recently purchased a Nexus 6p phone. It’s a pretty sweet device – it’s the first phone that actually fits in my larger than average hands. One thing I really don’t like about it is the compass; when I’m using google maps to navigate, the compass will regularly point in the wrong direction. When Android discovers this, it annoyingly asks me to wave my phone around like in the image above. When the compass is miscalibrated, the direction it points isn’t random, however – it’s predictably wrong relative to the direction I’m actually pointing it. I.e., if I’m facing north, the compass will point east. If I’m facing south, the compass will point west. In general, at least on a given day, if I’m pointing in a direction θθ, the compass on my phone will reliably point in a fixed direction θ+bθ+b. Rather than waving the phone around, wouldn’t it be great if the phone could figure out it’s own direction based on observing how I walk?
In this post I’ll share my experience and explain my approach for the Kaggle Right Whale challenge. I managed to finish in 2nd place.
While there is no specific methodology to solve Data Science for IoT (IoT Analytics) problems, perhaps it is time to draft one.
The very purpose of authoring this book was to rethink the way we have been teaching statistics and analytics to students and practitioners. It is no secret that most students required to take the mandatory stats course dislike it. I believe it has something to do with the way we have been teaching the subject than to do with the aptitude of our students. Furthermore, I believe there is a greater opportunity to equip the students with the skills needed in a world awash with data where competing on analytics defines the real competitive advantage.
There are several reasons why everyone isn’t using Bayesian methods for regression modeling. One reason is that Bayesian modeling requires more thought: you need pesky things like priors, and you can’t assume that if a procedure runs without throwing an error that the answers are valid. A second reason is that MCMC sampling — the bedrock of practical Bayesian modeling – can be slow compared to closed-form or MLE procedures. A third reason is that existing Bayesian solutions have either been highly-specialized (and thus inflexible), or have required knowing how to use a generalized tool like BUGS, JAGS, or Stan. This third reason has recently been shattered in the R world by not one but two packages: brms and rstanarm. Interestingly, both of these packages are elegant front ends to Stan, via rstan and shinystan.