Mathematical Statistics Lesson of the Day – Ancillary Statistics
The set-up for today’s post mirrors my earlier Statistics Lessons of the Day on sufficient statistics and complete statistics.
7 most commonly asked questions on Correlation
1. Correlation vs. Dependency
2. Is Correlation Transitive?
3. Is Pearson coefficient sensitive to outliers?
4. Does causation imply correlation?
5. Difference between Correlation and Simple Linear Regression
6. Pearson vs. Spearman
7. Correlation vs. co-variance
Infographic: Must Read Books in Analytics / Data Science
There are 2 attributes all the members in our team at Analytics Vidhya share:
• We all are voracious readers
• We all love to share our knowledge with people in simplified manner, so that everyone gets access to this knowledge.
These two attributes lead us to naturally gravitate towards sharing some of the best reads we come across. You can think of this infographic as an ideal list of books to have in bookshelf of every data scientist / analyst. These books cover a wide range of topics and perspective (not only technical knowledge), which should help you become a well rounded data scientist.
For your convenience, we have categorized this bookshelf in following categories: Analytics, Data Science, Data Visualization and Web Analytics. The ones classified under category ‘Analytics’ are actually books which help you understand the perspective of data based decisioning.
8 Best Python IDE for Fast and Bug-free Programming
1. Eclipse with PyDev
2. Komodo Edit
3. Vim
4. Sublime Text
5. Pycharm
6. Emacs
7. Wing
8. Pyscripter
Profiling Top Kagglers: Owen Zhang, Currently #1 in the World
Next up in our series on top Kagglers is the #1: Owen Zhang (Zhonghua Zhang). Owen comes from an engineering background and currently works as the Chief Product Officer at DataRobot.
Why does Deep Learning work?
Why does Deep Learning work? This is the big question on everyone’s mind these days. C’mon we all know the answer already: ‘the long-term behavior of certain neural network models are governed by the statistical mechanism of infinite-range Ising spin-glass Hamiltonians’. In other words, Multilayer Neural Networks are just Spin Glasses? Right? This is kinda true-depending on what you mean by a spin glass.
Top 10 Machine Learning Videos on YouTube
1. MarI/O – Machine Learning for Video Games (1,514,045 views)
2. Machine Learning (Stanford) (761,843 views)
3. The Next Generation of Neural Networks (401,740 views)
4. The Future of Robotics and Artificial Intelligence (Andrew Ng, Stanford University, STAN 2011) (233,875 views)
5. Caltech Machine Learning (233,703 views)
6. Brains, Sex, and Machine Learning (104,808 views)
7. Epic NHL goal celebration hack with a hue light show and real time machine learning (103,166 views)
8. I am a legend: Hacking Hearthstone with machine learning – Defcon 22 (92,820 views)
9. Bay Area Vision Meeting: Unsupervised Feature Learning and Deep Learning (89,518 views)
10. Deep Learning: Intelligence from Big Data (89,506 views)
Brief Overview of Apache Spark
With the advent of new technologies, there has been an increase in the number of data sources. Web server logs, machine log files, user activity on social media, recording a user’s clicks on the website and many other data sources have caused an exponential growth of data. Individually this content may not be very large, but when taken across billions of users, it produces terabytes or petabytes of data. For example, Facebook is collecting 500 terabytes(TB) of data everyday with more than 950 million users. Such a massive amount of data which is not only structured but also unstructured and semi-structured is considered under the roof known as Big Data.
Short is the new Long with longurl for R (plus working with weblogs & URLs in R)
Why do we need longurl? I’ll point readers to a paper—Two Years of Short URLs Internet Measurement: Security Threats and Countermeasures [PDF]—by Maggi, Frossi, Zanero, Stringhini, Stone-Gross, Kruegel & Vigna where the authors look at the potential (and actual) evil behind short URLs. Many things can hide there, from malware to phishing sites and knowing both the short and full URL can help defenders stop attacks before they are fully successful.
Looking for Preference in All the Wrong Places: Neuroscience Suggests Choice Model Misspecification
At its core, choice modeling is a utility estimating machine. Everything has a value reflected in the price that we are willing to pay in order to obtain it. Here are a collection of Smart Watches from a search of Google Shopping. You are free to click on any one, look for more, or opt out altogether and buy nothing.
Setting Rstudio server using Amazon Web Services
In this post we present a step-by-step screenshot tutorial that will get you to know Amazon EC2 service. We will set up an EC2 instance (Amazon virtual server), install an Rstudio server on it and use our beloved Rstudio via browser (all for free!). The slides below will also include an introduction to linux commands (basic), instructions for connecting to a remote server via ssh and more. No previous knowledge is required.
Why does designing a simple A/B test seem so complicated?
An A/B test is a very simple controlled experiment where one group is subject to a new treatment (often group ‘B’) and the other group (often group ‘A’) is considered a control group. The classic example is attempting to compare defect rates of two production processes (the current process, and perhaps a new machine).