Dafny is a programming language with a program verifier. As you type in your program, the verifier constantly looks over your shoulders and flags any errors.


Sourcegraph is a fast, open-source, fully-featured code search and navigation engine.

Lots of Free Open Source Datasets to Make Your AI Better

There are several approaches to reducing the cost of training data for AI, one of which is to get it for free. Here are some excellent sources.

Social Network Analysis in Python

Networks today are part of our everyday life. Let’s learn how to visualize and understand a social network in Python using Networks.

DevOps 2.0: Applying Machine Learning in the CI/CD Chain

Explore how ML can be implemented in your organization, so you can (for example) enable the automated assessment of test results for far more complex criteria, such as defining thresholds based on statistical significance rather than just presence/absence of specific criteria.

Recent Advances for a Better Understanding of Deep Learning – Part I

This call for a better understanding of deep learning was the core of Ali Rahimi´s Test-of-Time Award presentation at NIPS in December 2017. By comparing deep learning with alchemy, the goal of Ali was not to dismiss the entire field, but ‘to open a conversation’. This goal has definitely been achieved and people are still debating whether our current practice of deep learning should be considered as alchemy, engineering or science.

More Effective Transfer Learning for NLP

This spring I presented a talk entitled ‘Effective Transfer Learning for NLP’ at ODSC East. The talk was intended to demonstrate how surprisingly effective pre-trained word and document embeddings are at low training data volumes, and to lay out a set of practical recommendations for applying these techniques to your own tasks. Thanks to some excellent research by Alec Radford and the team at OpenAI, our recommendations are beginning to change. To explain why the tides are shifting, let´s first walk through the rubric we use at Indico to evaluate whether or not a novel machine learning method is viable for industry use.

AI, Machine Learning and Data Science Announcements from Microsoft Ignite

• Azure Machine Learning Service (Azure ML) adds several new capabilities.
• Automated Machine Learning, a new Azure ML service in preview, automates the selection of machine learning algorithms (and associated parameters and hyperparameters) to optimize predictive performance.
• Visual Studio Code Tools for AI has been updated to provide a convenient interface to Azure ML for users of the popular open-source editor.
• SQL Server 2019 is now in public preview, with many improvements.
• Azure Databricks is now supported in more regions, offers GPU support for deep learning, and Databricks Delta is now available (in preview) for transactional data capabilities.
• Microsoft Bot Framework SDK v4 is now available.
• Cortana Skills Kit for Enterprise, a development platform based on Azure Bot Service and Language Understanding is now in private preview.
• Speech Service in Azure Cognitive Services is now generally available, and includes a new neural text-to-speech capability for humanlike synthesized speech.
• AI for Humanitarian Action announced.

Getting Started with Markov Decision Processes: Reinforcement Learning

In this blog post I will be explaining the concepts required to understand how to solve problems with Reinforcement Learning. This series of blog posts contain a summary of concepts explained in Introduction to Reinforcement Learning by David Silver.

Neural Network Embeddings Explained

Applications of neural networks have expanded significantly in recent years from image segmentation to natural language processing to time-series forecasting. One notably successful use of deep learning is embedding, a method used to represent discrete variables as continuous vectors. This technique has found practical applications with word embeddings for machine translation and entity embeddings for categorical variables. In this article, I’ll explain what neural network embeddings are, why we want to use them, and how they are learned. We’ll go through these concepts in the context of a real problem I’m working on: representing all the books on Wikipedia as vectors to create a book recommendation system.

Digging into Data Science Tools: Docker

Docker is a tool for creating and managing ‘containers’ which are like little virtual machines where you can run your code. A Docker container is like a little Linux OS, preinstalled with everything you need to run your web app, machine learning model, script, or any other code you write. Docker containers are like a really lightweight version of virtual machines. They use way less computer resources than a virtual machine, and can spin up in seconds rather than minutes. (The reason for this performance improvement is Docker containers share the kernel of the host machine, whereas virtual machines run a separate OS with a separate kernel for every virtual machine.)

What is K?

The K-means clustering algorithm is probably the best-known and most used clustering algorithm around. When you pick a statistics or machine-learning book, odds are that this is the first algorithm in the clustering chapter, like in PRML’s chapter 9 [1]. The same is true for entry-level online courses, where K-means is usually taught first. This is a very visual algorithm that allows you to build very strong intuitions about what is going on, and this might be one of the reasons why it is a favorite, both for teaching and using. Central to the K-means algorithm is the selection of the K hyper-parameter, the number of expected clusters to be found on the data set. Contrary to other clustering algorithms that can infer the optimal number of clusters (like DBSCAN), the K-means algorithm expects to be told how many clusters to look for. And this feels a bit like a conundrum because we want this unsupervised machine learning algorithm to find something for us, namely some hidden structure of a data set, but we must provide it with a capital piece of information about said structure: how many clusters do we expect to find. Fortunately, there is a workable solution for this apparent chicken-or-egg problem: run the clustering algorithm using a range of values for K and, somehow, evaluate the result to find the best K. The way we are taught to do this is by using a plot of the total cluster distortion versus K, where we should look for an elbow.

Building a Simple Chat Bot using Deep Learning

While there are many ways to build a chat bot, advances in natural language processing have made deep learning approaches very popular. There are several deep learning approaches to choose from, with new methods constantly being developed. In this post, I will cover one method on a very high level. This method tackles the problem using a Q & A framework. The incoming message is treated as a ‘question’ which the chat bot finds the ‘answer’ to by looking through previously sent responses. Using this approach, my partner and I were able to train a deep learning chat bot over his Facebook Messenger data, which somewhat emulated his personality and idiosyncrasies.