Software Goes Invisible
Software is getting smarter, thanks to predictive analytics, machine learning, and artificial intelligence (AI). Whereas the current generation of software is about enabling smarter decision-making for humans, we’re starting to see “invisible software” capable of performing tasks without human intervention.
Comprehensive list of data science resources
We blended together the best of the best resources posted recently on DSC. It would be great to organize them by category, but for now they are organized by date. This is very useful too, since you are likely to have seen old entries already, and can focus on more recent stuff. Starred entries have interesting charts.
Big Data and Analytics Trends for 2015
1. Extreme Real Time Big Data Analytics
2. Growth of Data Lakes
3. Accelerated Adoption of Big Data Analytics at the Point of Decision
4. Data Governance for Big Data
5. Big Data Analytics Processing in the Cloud
6. Developing Deep Learning Capabilities
7. Big Data Analytics Shaping Consumer Behaviour
8. Wearable Technology Fueling Big Data
9. SQL on Hadoop
10.Big Data Beyond Data Scientists
Quick Guide to Making an Interactive D3 Map
…
Contextual Awareness: What’s All the Fuss About?
One of the hallmarks of recent technology is the way it is designed to respond to the people who use it. Easy examples of where we already see that happening are Google Now Cards (or Waze) that follow the user’s travels, notifications of the user’s upcoming appointments, app preferences, and accurate contact selection when the user enters only a few letters. In the industry, this responsiveness is known as “contextual awareness,” where a given computing device uses user-specific data, like location-based technology and sensors to determine their users’ circumstances. Then it seems as if the device “knows” where the user is and what he or she needs. It goes without saying that, even though desktop computers make use of this technology to some degree, mobile devices are the greatest providers of contextual awareness.
Why is Python a language of choice for data scientists?
Python is an interpreted, dynamically-typed language with a precise and efficient syntax. Python has a good REPL and new modules can be explored from the REPL with dir() and docstrings. That’s one reason to prefer Python over C, C++, or Java.
Inequalities and Quantile Regression
In the course on inequality measure, we’ve seen how to compute various (standard) inequality indices, based on some sample of incomes (that can be binned, in various categories). On Thursday, we discussed the fact that incomes can be related to different variables (e.g. experience), and that comparing income inequalities between coutries can be biased, if they have very different age structures.
IBM Watson Cloud Gains Eyes, Ears, And A Voice
IBM Watson developer cloud adds speech-to-text, text-to-speech, visual recognition, and decision services. Will businesses build their own Jeopardy apps?
Forecasting Beer Consumption with Sklearn
In this post we will see how to implement a straightforward forecasting model based on the linear regression object of sklearn. The model that we are going to build is based on the idea idea that past observations are good predictors of a future value. Using some symbols, given xn-k,…,xn-2,xn-1 we want to estimate xn+h where h is the forecast horizon just using the given values.
More Data, Less Accuracy
Statistical methods should do better with more data. That’s essentially what the technical term “consistency” means. But with improper numerical techniques, the the numerical error can increase with more data, overshadowing the decreasing statistical error.
Big Data: What is HBASE?
When we talk about the tools that we use when working with Big Data, an overwhelming majority will discuss Hadoop, the Apache foundations implementation of Map Reduce and Distributed File Systems (HDFS in this instance. Which was created by Doug Cutting after Reading papers on the subject produced by Google Engineers while he was at Yahoo. (He is now at Cloudera). But big data tools rarely if ever work alone. It is a collection of tools and databases that help Data Scientists be more effective in their analysis (or just help to speed things up). One of these technologies is HBase. HBase is a non-relational (NoSQL) database that is a Java implementation of Google Big Table. It is what is referred to as a Columnar Database. As oppose to Relational Database which stores its Data in Rows, it stores its data in Columns.
R Graph Catalog
This catalog is a complement to “Creating More Effective Graphs” by Naomi Robbins. All graphs were produced using the R language and the add-on package ggplot2, written by Hadley Wickham. The gallery is maintained by Joanna Zhao and Jennifer Bryan.