Announcing Microsoft Research Open Data – Datasets by Microsoft Research now available in the cloud

The Microsoft Research Outreach team has worked extensively with the external research community to enable adoption of cloud-based research infrastructure over the past few years. Through this process, we experienced the ubiquity of Jim Gray´s fourth paradigm of discovery based on data-intensive science – that is, almost all research projects have a data component to them. This data deluge also demonstrated a clear need for curated and meaningful datasets in the research community, not only in computer science but also in interdisciplinary and domain sciences. Today we are excited to launch Microsoft Research Open Data – a new data repository in the cloud dedicated to facilitating collaboration across the global research community. Microsoft Research Open Data, in a single, convenient, cloud-hosted location, offers datasets representing many years of data curation and research efforts by Microsoft that were used in published research studies.

Introducing 3.0.0

Data visualization is a critical component of analysis in these domains, and even though there are literally dozens of libraries in Python´s Visualization Landscape, I couldn´t find one that supported the full set of features that I need (or at least that I want ). These include: Support for a wide range of plot types covering statistical, 3D, and geographic use cases; efficient GPU acceleration to handle realistically large data sets; offline export of high-quality static images; two-way interactivity in the Jupyter Notebook; and stand-alone dashboarding support.

The Rise Of AI-Fueled Speech Analytics: Key Takeaways From Forrester’s New Wave

The recently published report, ‘The Forrester New Wave: AI-Fueled Speech Analytics Solutions, Q2 2018’ report identified the 11 most significant players in this market: Aspect, CallMiner, Clarabridge, Cogito, Genesys, Invoca, Mattersight, NICE, OpenText, Tethr, and Verint. Based on our comprehensive criteria, Forrester identified CallMiner and NICE as Leaders in the space today.

Scalable Deep Reinforcement Learning for Robotic Manipulation

How can robots acquire skills that generalize effectively to diverse, real-world objects and situations While designing robotic systems that effectively perform repetitive tasks in controlled environments, like building products on an assembly line, is fairly routine, designing robots that can observe their surroundings and decide the best course of action while reacting to unexpected outcomes is exceptionally difficult. However, there are two tools that can help robots acquire such skills from experience: deep learning, which is excellent at handling unstructured real-world scenarios, and reinforcement learning, which enables longer-term reasoning while exhibiting more complex and robust sequential decision making. Combining these two techniques has the potential to enable robots to learn continuously from their experience, allowing them to master basic sensorimotor skills using data rather than manual engineering.

Comparing predictions: World Cup scores

As many others too, me and some colleges at STATWORX took part in a little betting game for the World Cup 2018. Since the group stage is over, I was wondering how well – or better – how worse my prediction was. I am comparing my result with other predictions by using the point system of the betting game. All functions, code and data can be found at our Github page.

Beeswarms instead of histograms

Histograms are good, density plots are also good. Violin and bean plots too. Recently I had someone ask for a plot where you could see each individual point along a continuum, give the points specific colours based on a second variable (similar to the figure), which deviates somewhat from the typical density type plots. Apparently, they´re called beeplots or beeswarms. And there´s a way to make them in R (of course, there´s probably more than one… ggplot2 ).