Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.


Speech2Face: Learning the Face Behind a Voice

How much can we infer about a person’s looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform this task using millions of natural videos of people speaking from Internet/Youtube. During training, our model learns audiovisual, voice-face correlations that allow it to produce images that capture various physical attributes of the speakers such as age, gender and ethnicity. This is done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos, without the need to model attributes explicitly. Our reconstructions, obtained directly from audio, reveal the correlations between faces and voices. We evaluate and numerically quantify how–and in what manner–our Speech2Face reconstructions from audio resemble the true face images of the speakers.


Nines are Not Enough: Meaningful Metrics for Clouds

Cloud customers want reliable, understandable promises from cloud providers that their applications will run reliably and with adequate performance, but today, providers offer only limited guarantees, which creates uncertainty for customers. Providers also must define internal metrics to allow them to operate their systems without violating customer promises or expectations. We explore why these guarantees are hard to define. We show that this problem shares some similarities with the challenges of applying statistics to make decisions based on sampled data. We also suggest that defining guarantees in terms of defense against threats, rather than guarantees for application-visible outcomes, can reduce the complexity of these problems. Overall, we offer a partial framework for thinking about Service Level Objectives (SLOs), and discuss some unsolved challenges.


Smart city big data analytics: An advanced review

With the increasing role of ICT in enabling and supporting smart cities, the demand for big data analytics solutions is increasing. Various artificial intelligence, data mining, machine learning and statistical analysis-based solutions have been successfully applied in thematic domains like climate science, energy management, transport, air quality management and weather pattern analysis. In this paper, we present a systematic review of the literature on smart city big data analytics. We have searched a number of different repositories using specific keywords and followed a structured data mining methodology for selecting material for the review. We have also performed a technological and thematic analysis of the shortlisted literature, identified various data mining/machine learning techniques and presented the results. Based on this analysis we also present a classification model that studies four aspects of research in this domain. These include data models, computing models, security and privacy aspects and major market drivers in the smart cities domain. Moreover, we present a gap analysis and identify future directions for research. For the thematic analysis we identified the themes smart city governance, economy, environment, transport and energy. We present the major challenges in these themes, the major research work done in the field of data analytics to address these challenges and future research directions.


Benchmarking in classification and regression

The article presents an overview of the status quo in benchmarking in classification and nonlinear regression. It outlines guidelines for a comparative analysis in machine learning, benchmarking principles, accuracy estimation, and model validation. It provides references to established repositories and competitions and discusses the objectives and limitations of benchmarking. Benchmarking is key to progress in machine learning as it allows an unprejudiced comparison among alternative methods. This article presents guidelines and best practices for benchmarking in classification and regression. It reviews state-of-the-art approaches in machine learning, establishes benchmarking principles and discusses performance metrics for a sound statistical comparative analysis.


Efficient Bayesian Inference of Sigmoidal Gaussian Cox Processes

We present an approximate Bayesian inference approach for estimating the intensity of a inhomogeneous Poisson process, where the intensity function is modelled using a Gaussian process (GP) prior via a sigmoid link function. Augmenting the model using a latent marked Poisson process and P’olya-Gamma random variables we obtain a representation of the likelihood which is conjugate to the GP prior. We estimate the posterior using a variational free-form mean field optimisation together with the framework of sparse GPs. Furthermore, as alternative approximation we suggest a sparse Laplace’s method for the posterior, for which an efficient expectation-maximisation algorithm is derived to find the posterior’s mode. Both algorithms compare well against exact inference obtained by a Markov Chain Monte Carlo sampler and standard variational Gauss approach solving the same model, while being one order of magnitude faster. Furthermore, the performance and speed of our method is competitive with that of another recently proposed Poisson process model based on a quadratic link function, while not being limited to GPs with squared exponential kernels and rectangular domains.


One Simple Trick for Speeding up your Python Code with Numpy

Python is huge. Over the past several years the popularity of Python has grown rapidly. A big part of that has been the rise of Data Science, Machine Learning, and AI, all of which have high-level Python libraries to work with! When using Python for those types of work, it’s often necessary to work with very large datasets. Those large datasets get read directly into memor, and are stored and processed as Python arrays, lists, or dictionaries. Working with such huge arrays can be time consuming; really that’s just the nature of the problem. You have thousands, millions, or even billions of data points. Every microsecond added to the processing of a single one of those points can drastically slow you down as a result of the large scale of the data you’re working with.


Spark NLP: Getting Started With The World’s Most Widely Used NLP Library In The Enterprise

The Spark NLP library has become a popular AI framework that delivers speed and scalability to your projects. Check out what’s under the hood and learn about how to getting started leveraging Spark NLP from John Snow Labs.


AI Adoption in the Enterprise

While O’Reilly has identified several trends among enterprise companies for adopting artificial intelligence, we decided to drill down further to learn just how businesses worldwide are planning and prioritizing this work. In a recent survey, we asked respondents about revenue-bearing AI projects their organizations have in production. How might their AI adoption patterns change over the course of the next year? In this report, data science experts Ben Lorica and Paco Nathan examine the survey’s results to reveal how (and how many) respondents are ramping up AI projects. You’ll also learn how the scope of AI use among companies is quickly expanding into deep learning, human in the loop, knowledge graphs, and reinforcement learning.


K Means Clustering With Dask (Image Filters for Cat Pictures)

Applying filters to images is not a new concept to anyone. We take a picture, make a few changes to it, and now it looks cooler. But where does Artificial Intelligence come in? Let’s try out a fun use for Unsupervised Machine Learning with K Means Clustering in Python. I’ve written before about K Means Clustering, so I will assume you’re familiar with the algorithm this time. If you’re not, this is the in-depth K-Means Clustering introduction I wrote. And I also tried my hand at image compression (well, reconstruction) with autoencoders, to varying degrees of success. However this time, my goal is not to reconstruct the best possible image, but just to see the effects of recreating a picture with the least possible colors. Instead of making the picture look as similar to the original as possible, I just want us to look at it and say ‘neat!’. So how do we do this? I’m glad you asked.


Cohen’s D for Experimental Planning

In this note, we discuss the use of Cohen’s D for planning difference-of-mean experiments.


New Versions of R GUIs: BlueSky, JASP, jamovi

It has been only two months since I summarized my reviews of point-and-click front ends for R, and it’s already out of date! I have converted that post into a regularly-updated article and added a plot of total features, which I repeat below. It shows the total number of features in each package, including the latest versions of BlueSky Statistics, JASP, and jamovi. The reviews which initially appeared as blog posts are now regularly-updated pages.