Emotions run deep in every conversation we humans have. Deciphering these underlying emotions is the key to making machine interactions more human. Detecting emotions in text is difficult enough for human beings, let alone artificially created machines, as many of our emotions are conveyed through expressions and tone of voice. At Microsoft, we are working to create human-like AI, Ruuh and on this journey detecting user emotions is a critical piece. So, I teamed up with Microsoft researchers Umang Gupta and Radhakrishnan Srikanth to take on this challenge. Also, Ankush Chatterjee, an intern from IIT Kharagpur, joined us taking on his first machine learning research assignment!
After the Traveling Salesman part, we will see tools used to study flows in networks this week. Slides are now online (from slide 42).
Machine learning has made huge advances in many applications including natural language processing, computer vision and recommendation systems by capturing complex input/output relationships using highly flexible models. However, a remaining challenge is problems with semantically meaningful inputs that obey known global relationships, like “the estimated time to drive a road goes up if traffic is heavier, and all else is the same.” Flexible models like DNNs and random forests may not learn these relationships, and then may fail to generalize well to examples drawn from a different sampling distribution than the examples the model was trained on.
Whatever method is chosen, business owners need to remember that a machine learning project and data set are tools that can help them achieve business sustainability. Whether or not they will serve their purpose depends largely on whether the company uses them effectively. Before venturing into a machine learning project and creating a data set, a company must first identify its goals and verify the data it has is relevant to its goals. Big data is good, but companies need to know how to use it.
R-Brain is a next generation platform for data science built on top of Jupyterlab with Docker, which supports not only R, but also Python, SQL, has integrated intellisense, debugging, packaging, and publishing capabilities.
Edge analytics is the collection, processing, and analysis of data at the edge of a network either at or close to a sensor, a network switch or some other connected device.
In this podcast episode, I speak with Gary Orenstein, chief marketing officer at MemSQL, a platform for real-time analytics that combines a database, a data warehouse, and streaming workloads into one system. We discuss trends that are driving advancements in data warehousing, how related technologies are changing as machine learning and AI evolve, and example use cases across industries.
The Azure Data Lake store is an Apache Hadoop file system compatible with HDFS, hosted and managed in the Azure Cloud. You can store and access the data within directly via the API, by connecting the filesystem directly to Azure HDInsight services, or via HDFS-compatible open-source applications. And for data science applications, you can also access the data directly from R, as this tutorial explains.
How does Linear Discriminant Analysis work and how do you use it in R? This post answers these questions and provides an introduction to Linear Discriminant Analysis.
Today, we go back a bit to where we probably should have started in the first place, but it wouldn’t have been as much fun. In our previous work on volatility, we zipped through the steps of data import, tidy and transformation. Let’s correct that oversight and do some spade work on transforming daily asset prices to monthly portfolio log returns.
Bayesian Nonparametrics is a class of models with a potentially infinite number of parameters. High flexibility and expressive power of this approach enables better data modelling compared to parametric methods. Bayesian Nonparametrics is used in problems where a dimension of interest grows with data, for example, in problems where the number of features is not fixed but allowed to vary as we observe more data. Another example is clustering where the number of clusters is automatically inferred from data. The Statsbot team asked a data scientist, Vadim Smolyakov, to introduce us to Bayesian Nonparametric models. In this article, he describes the Dirichlet process along with associated models and links to their implementations.