Neural Nets in Azure ML – Introduction to Net#
Neural networks are one of the most popular machine learning algorithms today. One of the challenges when using neural networks is how to define a network topology given the variety of possible layer types, connections among them, and activation functions. Net# solves this problem by providing a succinct way to define almost any neural network architecture in a descriptive, easy-to-read format. This post provides a short tutorial for building a neural network using the Net# language to classify images of handwritten numeric digits in Microsoft Azure Machine Learning.

Survey Analysis in R
The purpose of this online course, “Survey Analysis in R” is to teach survey researchers who are familiar with R how to use it in survey research. The course uses the Survey package for R, which was created by the instructor. You will learn how to describe to R the design of a survey; both simple and complex designs are covered. You will then learn how to get R to produce descriptive statistics and graphs with teh survey data, and also to perform regression analysis on the data.

Big Data: Matching Personalities In The Call Center
You hear the recording whenever you phone customer support: “Calls may be recorded for training and quality purposes.” Of course, companies use call center recordings for training and compliance purposes, but they could do a lot more with this data. Or so says Jason Wesbecher, chief marketing officer of Mattersight, a Chicago-based developer of personality-based applications for call centers. The company’s novel Behavioral Analytics software captures and examines a variety of contextual data, including voice calls from customers, to route callers to the best available representative — at least from a personality-match standpoint. “When you think of big data, there’s a ton of data in that spoken conversation. No one is taking that data and trying to operationalize it in a new way,” Wesbecher told InformationWeek in a phone interview. “Call-recording vendors are out there, but they’re not focused on this kind of behavioral science angle.”

Impact of Big Data on Analytics
(slideshow)

Automatic Statistician and the Profoundly Desired Automation for Data Science
The Automatic Statistician project by Univ. of Cambridge and MIT is pushing ahead the frontiers of automation for the selection and evaluation of machine learning models. In general, what does automation mean to Data Science?

How to Prepare for (and Nail) a Data Science Interview
Nir Kaldero, GalvanizeU’s leading faculty member, shares insights & perspectives on making it through a data science job interview. Familiarizing yourself with the following questions, topics and concepts will help get you on track to impress your future employer.

Introducing DataFrames in Spark for Large Scale Data Science
Today, we are excited to announce a new DataFrame API designed to make big data processing even easier for a wider audience. When we first open sourced Spark, we aimed to provide a simple API for distributed data processing in general-purpose programming languages (Java, Python, Scala). Spark enabled distributed data processing through functional transformations on distributed collections of data (RDDs). This was an incredibly powerful API: tasks that used to take thousands of lines of code to express could be reduced to dozens. As Spark continues to grow, we want to enable wider audiences beyond “Big Data” engineers to leverage the power of distributed processing. The new DataFrames API was created with this goal in mind. This API is inspired by data frames in R and Python (Pandas), but designed from the ground-up to support modern big data and data science applications.

Clustering US Senators with k-means
Clustering is a powerful way to split up datasets into groups based on similarity. A very popular clustering algorithm is k-means. In k-means clustering, we divide data up into a fixed number of clusters while trying to ensure that the items in each cluster are as similar as possible. In this post, we’ll explore cluster US Senators using an interactive python environment. We’ll use the voting history from the 114th Congress to split Senators into clusters. In the editor below, you can follow along, and modify the code however you want to do your own analysis.