Data Scientists in Software Teams: State of the Art and Challenges

The demand for analyzing large scale telemetry, machine, and quality data is rapidly increasing in software industry. Data scientists are becoming popular within software teams, e.g., Facebook, LinkedIn and Microsoft are creating a new career path for data scientists. In this paper, we present a large-scale survey with 793 professional data scientists at Microsoft to understand their educational background, problem topics that they work on, tool usages, and activities. We cluster these data scientists based on the time spent for various activities and identify 9 distinct clusters of data scientists, and their corresponding characteristics. We also discuss the challenges that they face and the best practices they share with other data scientists. Our study finds several trends about data scientists in the software engineering context at Microsoft, and should inform managers on how to leverage data science capability effectively within their teams.

A news recommendation engine driven by collaborative reader behavior

This blog post introduces a news recommendation engine which combines collaborative-filtering with content-based filtering to diversify news recommendations. This so-called hybrid-filtering recommendation system takes into account not only the content of the articles and the user’s reading history, but also the reading history of people who share similar interests. By learning from the history of people with similar interests, this engine will recommend news with a broader coverage of topics, even when the historical information about a particular user is limited. Give it a try!

Passive Agressive Algorithms

In a previous document I described how bayesian models can recursively update, thus making them ideal as a starting point for designing streaming machine learning models. In this document I will describe a different method proposed by Crammer et al. which includes a passive agressive approach to model updates. I will focus on intuition first before moving to the mathy bits. At the end I will demo some sklearn code with implementations of these models.

NumPy Array To Tensorflow Tensor And Back

Convert a NumPy array to a Tensorflow Tensor as well as convert a TensorFlow Tensor to a NumPy array.

Python 2 vs Python 3

“Should I learn Python 2 or Python 3?” For everyone who has just started to learn Python for Data Science, this is an important initial question to answer. There are many ongoing discussions on the topic and you might have found it hard to get a straightforward answer. I was also having this question for quite a while – as I want to teach the most relevant Python version here, on the blog. So I’ve decided to reach out to practicing senior Data Scientists and asked their opinion about it. After several hours of discussions and research I have a definite answer for you. In this article I will summarize my top takeaways.

12 Artificial Intelligence Terms You Need to Know

• Artificial Intelligence
• Machine Learning
• Deep Learning
• Cognitive Computing
• Neural Network
• Supervised and Unsupervised Learning
• Algorithm
• Chatbot
• Data Mining
• Natural Language Processing
• Predictive Analytics
• Turing Test

Learn Generalized Linear Models (GLM) using R

In this article, we aim to discuss various GLMs that are widely used in the industry. We focus on: a) log-linear regression b) interpreting log-transformations and c) binary logistic regression.

Emotional arithmetic: How machine learning helps you understand customers in real time

Chad W. Jennings walks through a serverless big data architecture on Google Cloud that helps unravel the mysteries of human emotion.

Comparing smooths in factor-smooth interactions

One of the really appealing features of the mgcv package for fitting GAMs is the functionality it exposes for fitting quite complex models, models that lie well beyond what many of us may have learned about what GAMs can do. One of those features that I use a lot is the ability to model the smooth effects of some covariate x x in the different levels of a factor. Having estimated a separate smoother for each level of the factor, the obvious question is, which smooths are different? In this post I’ll take a look at one way to do this using by-variable smooths.