What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

Bayesian online changepoint detection

A Jupyter widgets library for time and datetime pickers

display and evaluate math on jupyter notebook

A prototype quantum programming language. Qurry is a prototype of a quantum probabilistic programming language, done with the ( ). The official project duration is one year, but the language may be usable before then (and in fact, can already be used to use all of the QUIL spec with some useful abstractions on top, like if statements, variable names, and so on).

Random Neural Network Simulator implemented in Python.

An feature extraction algorithm

Command line end to end processing

Simple library to export images from Jupyter notebook

Jupyter notebook extension which support coding auto-completion based on Deep Learning. This extension for Jupyter Notebook enables the use of coding auto-completion based on Deep Learning. Other client plugins of TabNine require starting a child process for TabNine binary and using Pipe for communication. This can’t be done with Jupyter Notebook, since child process can’t be created with JQuery and Jupyter Notebook doesn’t provide any way for adding third-part js libs to plugins.

a hierachical community detection algorithm by Girvan Newman. A Girvan Newman step is defined as a couple of successive edge removes such that a new community occurs.

A Python package for data analysis.

An feature extraction algorithm

Pytorch implementation of the learning rate range test. A PyTorch implementation of the learning rate range test detailed in [Cyclical Learning Rates for Training Neural Networks](https://…/1506.01186 ) by Leslie N. Smith and the tweaked version used by [fastai](https://…/fastai ). The learning rate range test is a test that provides valuable information about the optimal learning rate. During a pre-training run, the learning rate is increased linearly or exponentially between two boundaries. The low initial learning rate allows the network to start converging and as the learning rate is increased it will eventually be too large and the network will diverge.

A Wasserstein Subsequence Kernel for Time Series

Darknet Neural Network Configuration Generator. If you have used darknet for one of your projects, you also understand the pain of editing the config file when you want to modify your network, optimization, and image augmentation parameters only to realize you forgot to edit another parameter after commencing training (bummer). You will also understand the pain of editing the configuration file to run inference. I implemented this to allow me to describe my neural network in a keras-like fashion and have a darknet config file generated.


Book Memo: “Discrepancy Theory”

The contributions in this book focus on a variety of topics related to discrepancy theory, comprising Fourier techniques to analyze discrepancy, low discrepancy point sets for quasi-Monte Carlo integration, probabilistic discrepancy bounds, dispersion of point sets, pair correlation of sequences, integer points in convex bodies, discrepancy with respect to geometric shapes other than rectangular boxes, and also open problems in discrepany theory.

Distilled News

XLNet, Ernie 2.0, and Roberta: What you Need to Know About new 2019 Transformer Models

Large pretrained language models are definitely the main trend of the latest research advances in natural language processing (NLP). While lots of AI experts agree with Anna Rogers’s statement that getting state-of-the-art results with just more data and computing power is not research news, other NLP opinion leaders also see some positive moments in the current trend. For example, Sebastian Ruder, a research scientist at DeepMind, points out that these big language frameworks help us see the fundamental limitations of the current paradigm. With transformers occupying the NLP leaderboards, it’s often hard to follow what the amendments are that enabled a new big language model to set another state-of-the-art result. To help you stay up to date with the latest NLP breakthroughs, we’ve summarized research papers featuring the current leaders of the GLUE benchmark: XLNet from Carnegie Mellon University, ERNIE 2.0 from Baidu, and RoBERTa from Facebook AI.

MLIR: accelerating AI with open-source infrastructure

Machine learning now runs on everything from cloud infrastructure containing GPUs and TPUs, to mobile phones, to even the smallest hardware like microcontrollers that power smart devices. The combination of advancements in hardware and open-source software frameworks like TensorFlow is making all of the incredible AI applications we’re seeing today possible–whether it’s predicting extreme weather, helping people with speech impairments communicate better, or assisting farmers to detect plant diseases. But with all this progress happening so quickly, the industry is struggling to keep up with making different machine learning software frameworks work with a diverse and growing set of hardware. The machine learning ecosystem is dependent on many different technologies with varying levels of complexity that often don’t work well together. The burden of managing this complexity falls on researchers, enterprises and developers. By slowing the pace at which new machine learning-driven products can go from research to reality, this complexity ultimately affects our ability to solve challenging, real-world problems. Earlier this year we announced MLIR, open source machine learning compiler infrastructure that addresses the complexity caused by growing software and hardware fragmentation and makes it easier to build AI applications. It offers new infrastructure and a design philosophy that enables machine learning models to be consistently represented and executed on any type of hardware. And today we’re announcing that we’re contributing MLIR to the nonprofit LLVM Foundation. This will enable even faster adoption of MLIR by the industry as a whole.


A simple interface to extract texts from (almost) any url.

D3 Deconstructor

The D3 Deconstructor is a Google Chrome extension for extracting data from D3.js visualizations. D3 binds data to DOM elements when building a visualization. Our D3 Deconstructor extracts this data and the visual mark attributes (such as position, width, height, and color) for each element in a D3 visualization. In the example below, we apply the D3 Deconstructor on the visualization (left) by right clicking on it and selecting the extension from the context menu. The D3 Deconstructor then extracts the data table (right).

Create Chatbot using Rasa Part-1

Rasa is an open source machine learning framework for building AI assistants and chatbots. Mostly you don’t need any programming language experience to work in Rasa. Although there is something called ‘Rasa Action Server’ where you need to write code in Python, that mainly used to trigger External actions like Calling Google API or REST API etc.

Dynamic UI Elements in Shiny

At STATWORX, we regularly deploy our project results with the help of Shiny. It’s not only an easy way of letting potential users interact with your R-code, but it’s also fun to design a good-looking app. One of Shiny’s biggest strengths is its inherent reactivity after all being reactive to user input is a web-applications prime purpose. Unfortunately, many apps seem to only make use of Shiny’s responsiveness on the server side while keeping the UI completely static. This doesn’t have to be necessarily bad. Some apps wouldn’t profit from having dynamic UI elements. Adding them regardless could result in the app feeling gimmicky. But in many cases adding reactivity to the UI can not only result in less clutter on the screen but also cleaner code. And we all like that, don’t we?

Monte Carlo Learning

In this article I will cover Monte Carlo Method of reinforcement learning. I have briefly covered Dynamic programming (Value Iteration and Policy Iteration) method in earlier article. In Dynamic programming we need a model(agent knows the MDP transition and rewards) and agent does planning (once model is available agent need to plan its action in each state). There is no real learning by the agent in Dynamic programming method.
Monte Carlo method on the other hand is a very simple concept where agent learn about the states and reward when it interacts with the environment. In this method agent generate experienced samples and then based on average return, value is calculated for a state or state-action. Below are key characteristics of Monte Carlo (MC) method:
• There is no model (agent does not know state MDP transitions)
• agent learns from sampled experience
• learn state value vp(s) under policy p by experiencing average return from all sampled episodes (value = average return)
• only after a complete episode, values are updated (because of this algorithm convergence is slow and update happens after a episode is Complete)
• There is no bootstrapping
• Only can be used in episodic problems

Temporal-Difference Learning

In this article I will cover Temporal-Difference Learning methods. Temporal-Difference(TD) method is a blend of Monte Carlo (MC) method and Dynamic Programming (DP) method. Below are key characteristics of Monte Carlo (MC) method:
• There is no model (agent does not know state MDP transitions)
• agent learn from sampled experience (Similar to MC)
• Like DP, TD methods update estimates based in part on other learned estimates, without waiting for a final outcome (they bootstrap like DP).
• It can learn from incomplete episode thus this method can be used in continuous problems as well
• TD updates a guess towards a guess and revise the guess based on real experience
To understand this better, consider a real life analogy; if Monte Carlo learning is like annual examination where student completes its episode at the end of the year. Similarly we have TD learning , can be thought like a weekly or monthly examination (student can adjust their performance based on this score (reward received) after every small interval and final score is accumulation of the all weekly tests (total rewards)).

What to expect from a causal inference business project: an executive’s guide III

This is the third part of the post ‘What to expect from a causal inference business project: an executive’s guide’. You will find the second one here. Most of these words have fuzzy meaning, at least at a popular level. Let me define first what will they mean some of them in this post.

What to expect from a causal inference business project: an executive’s guide II

Casual inference models how variables affect each other. Based on this information, uses some calculation tools to answer questions like what would have happened if instead of doing this I had done that? can I have an estimate of the effect of a variable to another? Causal inference provides a broad-brush approach to get preliminary estimates of causal effects. If you want more definitive conclusions, you should go, whenever is possible, for more precise and clear measurements with A/B tests. These do not suffer from confounding and you don’t need any modeling, beyond statistical calculations.

What to expect from a causal inference business project: an executive’s guide I

This is the fifth post on a series about causal inference and data science. The previous one was ‘Solving Simpson’s Paradox’. You will find the second part of this post here. Causal inference is a new language to model causality to help understand better causes and impacts so that we can make better decisions. Here we will explain how it can help a company or organization to gain insights from their data. This post is written for those in a data-driven company, not necessarily technical staff, who want to understand which are the key points in a causal inference project.

A DevOps Process for Deploying R to Production

I’ve been at the EARL Conference in London this week, and as always it’s been inspiring to see so many examples of R being used in production at companies like Sainsbury’s, BMW, Austria Post, PartnerRe, Royal Free Hospital, the BBC, the Financial Times, and many others. My own talk, A DevOps Process for Deploying R to Production, presented one process for automating the process of building and deploying R-based applications using Azure Pipelines and Azure Machine Learning Service.

Survival analysis with strata, clusters, frailties and competing risks in in Finalfit

In healthcare, we deal with a lot of binary outcomes. Death yes/no, disease recurrence yes/no, for instance. These outcomes are often easily analysed using binary logistic regression via finalfit(). When the time taken for the outcome to occur is important, we need a different approach. For instance, in patients with cancer, the time taken until recurrence of the cancer is often just as important as the fact it has recurred. Finalfit wraps a number of functions to make these analyses easy to perform and output into PDFs and Word documents.

Hierarchical Neural Architecture Search

Many researchers and developers are interested in what Neural Architecture Search can offer their Deep Learning models, but are deterred by monstrous computational costs. Many techniques have been developed to promote more efficient search, notably Differentiable Architecture Search, parameter sharing, predictive termination, and hierarchical representations of architectures. This article will explain the idea of hierarchical representations because it is by far the easiest way to achieve the desired balance of efficiency and a sufficiently expressive search space. This representation of neural networks is so powerful that you can achieve competitive results with random search, eliminating the need to implement bayesian, evolutionary, reinforcement learning, or differentiable search algorithms.

Automate Data Cleaning with Unsupervised Learning

In this post, I propose my solution to improve the quality of textual data at my disposal. I develop a workflow which aims to clean data AUTOMATICALLY and in an UNSUPERVISED way. I say ‘automatically’ because it is useless to follow an unsupervised approach if we have to check manually all the time the data to understand what the model outputs. We need certainties and don’t want to lose our time.

Importance of Loss Function in Machine Learning

Assume you are given a task to fill a bag with 10 Kg of sand. You fill it up till the measuring machine gives you a perfect reading of 10 Kg or you take out the sand if the reading exceeds 10kg. Just like that weighing machine, if your predictions are off, your loss function will output a higher number. If they’re pretty good, it’ll output a lower number. As you experiment with your algorithm to try and improve your model, your loss function will tell you if you’re getting(or reaching) anywhere. ‘The function we want to minimize or maximize is called the objective function or criterion. When we are minimizing it, we may also call it the cost function, loss function, or error function’ – Source At its core, a loss function is a measure of how good your prediction model does in terms of being able to predict the expected outcome(or value). We convert the learning problem into an optimization problem, define a loss function and then optimize the algorithm to minimize the loss function.

Document worth reading: “Progressive Data Science: Potential and Challenges”

Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped up significantly by providing quick feedback on the impact of changes. The idea of progressive data science is to compute the results of changes in a progressive manner, returning a first approximation of results quickly and allow iterative refinements until converging to a final result. Enabling the user to interact with the intermediate results allows an early detection of erroneous or suboptimal choices, the guided definition of modifications to the pipeline and their quick assessment. In this paper, we discuss the progressiveness challenges arising in different steps of the data science pipeline. We describe how changes in each step of the pipeline impact the subsequent steps and outline why progressive data science will help to make the process more effective. Computing progressive approximations of outcomes resulting from changes creates numerous research challenges, especially if the changes are made in the early steps of the pipeline. We discuss these challenges and outline first steps towards progressiveness, which, we argue, will ultimately help to significantly speed-up the overall data science process. Progressive Data Science: Potential and Challenges

If you did not already know

Generalized Sparse Additive Model google
We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We prove minimax optimal convergence bounds for this class under a weak compatibility condition. In addition, we characterize the rate of convergence when this compatibility condition is not met. Finally, we also show that the optimal penalty parameters for structure and sparsity penalties in our framework are linked, allowing cross-validation to be conducted over only a single tuning parameter. We complement our theoretical results with empirical studies comparing some existing methods within this framework. …

Spectral Inference Network google
We present Spectral Inference Networks, a framework for learning eigenfunctions of linear operators by stochastic optimization. Spectral Inference Networks generalize Slow Feature Analysis to generic symmetric operators, and are closely related to Variational Monte Carlo methods from computational physics. As such, they can be a powerful tool for unsupervised representation learning from video or pairs of data. We derive a training algorithm for Spectral Inference Networks that addresses the bias in the gradients due to finite batch size and allows for online learning of multiple eigenfunctions. We show results of training Spectral Inference Networks on problems in quantum mechanics and feature learning for videos on synthetic datasets as well as the Arcade Learning Environment. Our results demonstrate that Spectral Inference Networks accurately recover eigenfunctions of linear operators, can discover interpretable representations from video and find meaningful subgoals in reinforcement learning environments. …

Visual Knowledge Memory Network (VKMN) google
Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can’t be directly or clearly answered from visual content but require reasoning from structured human knowledge with confirmation from visual content. This paper proposes visual knowledge memory network (VKMN) to address this issue, which seamlessly incorporates structured human knowledge and deep visual features into memory networks in an end-to-end learning framework. Comparing to existing methods for leveraging external knowledge for supporting VQA, this paper stresses more on two missing mechanisms. First is the mechanism for integrating visual contents with knowledge facts. VKMN handles this issue by embedding knowledge triples (subject, relation, target) and deep visual features jointly into the visual knowledge features. Second is the mechanism for handling multiple knowledge facts expanding from question and answer pairs. VKMN stores joint embedding using key-value pair structure in the memory networks so that it is easy to handle multiple facts. Experiments show that the proposed method achieves promising results on both VQA v1.0 and v2.0 benchmarks, while outperforms state-of-the-art methods on the knowledge-reasoning related questions. …

Wikipedia WordNet Based QE Technique (WWQE) google
Query expansion (QE) is a well known technique to enhance the effectiveness of information retrieval (IR). QE reformulates the initial query by adding similar terms that helps in retrieving more relevant results. Several approaches have been proposed with remarkable outcome, but they are not evenly favorable for all types of queries. One of the main reasons for this is the use of the same data source while expanding both the individual and the phrase query terms. As a result, the holistic relationship among the query terms is not well captured. To address this issue, we have selected separate data sources for individual and phrase terms. Specifically, we have used WordNet for expanding individual terms and Wikipedia for expanding phrase terms. We have also proposed novel schemes for weighting expanded terms: inlink score (for terms extracted from Wikipedia) and a tfidf based scheme (for terms extracted from WordNet). In the proposed Wikipedia WordNet based QE technique (WWQE), we weigh the expansion terms twice: first, they are scored by the weighting scheme individually, and then, the weighting scheme scores the selected expansion terms in relation to the entire query using correlation score. The experimental results show that the proposed approach successfully combines Wikipedia and WordNet as demonstrated through a better performance on standard evaluation metrics on FIRE dataset. The proposed WWQE approach is also suitable with other standard weighting models for improving the effectiveness of IR. …

Book Memo: “Building Intelligent Cloud Applications”

Develop Scalable Models Using Serverless Architectures with Azure
Serverless computing is radically changing the way we build and deploy applications. With cloud providers running servers and managing machine resources, companies now can focus solely on the application’s business logic and functionality. This hands-on book shows experienced programmers how to build and deploy scalable machine learning and deep learning models using serverless architectures with Microsoft Azure. You’ll learn step-by-step how to code machine learning into your projects using Python and pre-trained models that include tools such as image recognition, speech recognition, and classification. You’ll also examine issues around deployment and continuous delivery including scaling, security, and monitoring.

Let’s get it right

Paper: A Legal Definition of AI

When policy makers want to regulate AI, they must first define what AI is. However, legal definitions differ significantly from definitions of other disciplines. They are working definitions. Courts must be able to determine precisely whether or not a concrete system is considered AI by the law. In this paper we examine how policy makers should define the material scope of AI regulations. We argue that they should not use the term ‘artificial intelligence’ for regulatory purposes because there is no definition of AI which meets the requirements for legal definitions. Instead, they should define certain designs, use cases or capabilities following a risk-based approach. The goal of this paper is to help policy makers who work on AI regulations.

Paper: Valuating User Data in a Human-Centric Data Economy

The idea of paying people for their data is increasingly seen as a promising direction for resolving privacy debates, improving the quality of online data, and even offering an alternative to labor-based compensation in a future dominated by automation and self-operating machines. In this paper we demonstrate how a Human-Centric Data Economy would compensate the users of an online streaming service. We borrow the notion of the Shapley value from cooperative game theory to define what a fair compensation for each user should be for movie scores offered to the recommender system of the service. Since determining the Shapley value exactly is computationally inefficient in the general case, we derive faster alternatives using clustering, dimensionality reduction, and partial information. We apply our algorithms to a movie recommendation data set and demonstrate that different users may have a vastly different value for the service. We also analyze the reasons that some movie ratings may be more valuable than others and discuss the consequences for compensating users fairly.

Article: Why Accessibility Is the Future of Tech

Designing solutions for people with disabilities offers a peephole into the future. ‘It’s just the right thing to do.’ Very few people think that those of us who are blind should be exiled from the web altogether, or that people with hearing loss shouldn’t have iPhones. That’s as it should be. But all too often, the importance of accessibility – the catch-all term for designing technology that people with disabilities can use – is framed in terms of charity alone. And that’s a shame because it makes accessibility seem grudging and boring, when the reality is that it’s the most exciting school of design on the planet.

Article: The Anthropologist of Artificial Intelligence

How do new scientific disciplines get started? For Iyad Rahwan, a computational social scientist with self-described ‘maverick’ tendencies, it happened on a sunny afternoon in Cambridge, Massachusetts, in October 2017. Rahwan and Manuel Cebrian, a colleague from the MIT Media Lab, were sitting in Harvard Yard discussing how to best describe their preferred brand of multidisciplinary research. The rapid rise of artificial intelligence technology had generated new questions about the relationship between people and machines, which they had set out to explore. Rahwan, for example, had been exploring the question of ethical behavior for a self-driving car – should it swerve to avoid an oncoming SUV, even if it means hitting a cyclist? – in his Moral Machine experiment.

Paper: Avoiding Resentment Via Monotonic Fairness

Classifiers that achieve demographic balance by explicitly using protected attributes such as race or gender are often politically or culturally controversial due to their lack of individual fairness, i.e. individuals with similar qualifications will receive different outcomes. Individually and group fair decision criteria can produce counter-intuitive results, e.g. that the optimal constrained boundary may reject intuitively better candidates due to demographic imbalance in similar candidates. Both approaches can be seen as introducing individual resentment, where some individuals would have received a better outcome if they either belonged to a different demographic class and had the same qualifications, or if they remained in the same class but had objectively worse qualifications (e.g. lower test scores). We show that both forms of resentment can be avoided by using monotonically constrained machine learning models to create individually fair, demographically balanced classifiers.

Article: Developing AI responsibly

Sarah Bird discusses the major challenges of responsible AI development and examines promising new tools and technologies to help enable it in practice.

Article: Open-endedness: The last grand challenge you’ve never heard of

Artificial intelligence (AI) is a grand challenge for computer science. Lifetimes of effort and billions of dollars have powered its pursuit. Yet, today its most ambitious vision remains unmet: though progress continues, no human-competitive general digital intelligence is within our reach. However, such an elusive goal is exactly what we expect from a ‘grand challenge’ – it’s something that will take astronomical effort over expansive time to achieve – and is likely worth the wait. There are other grand challenges, like curing cancer, achieving 100% renewable energy, or unifying physics. Some fields have entire sets of grand challenges, such as David Hilbert’s 23 unsolved problems in mathematics, which laid down the gauntlet for the entire 20th century. What’s unusual, though, is for there to be a problem whose solution could radically alter our civilization and our understanding of ourselves while being known only to the smallest sliver of researchers. Despite how strangely implausible that sounds, it is precisely the scenario today with the challenge of open-endedness. Almost no one has even heard of this problem, let alone cares about its solution, even though it is among the most fascinating and profound challenges that might actually someday be solved. With this article, we hope to help fix this surprising disconnect. We’ll explain just what this challenge is, its amazing implications if solved, and how to join the quest if we’ve inspired your interest.

Article: Regulation and Ethics in Data Science and Machine Learning

Statistical inference, reinforcement learning, deep neural networks, and other jargon has recently attracted much attention, and indeed, for a fundamental reason. Statistical inference extends the basis of our decisions and changes the deliberative process in making decisions. This change constitutes the essential differentiator from what I name as the pre-data science to the subsequent data science era. In the data science era, decisions are taken based on data and algorithms. Often, decisions are made solely by algorithms and humans constitute an important actor only in the process of gathering, cleaning, structuring the data and setting up the framework for the algorithm selection (often, the algorithm itself is chosen by a metric). Given this fundamental change, it is important to take a closer look at both the extended base of decisions and the changes in thought processes in deliberation of this extended base when taking decisions in the data science era.

If you did not already know

CM3 google
We propose CM3, a new deep reinforcement learning method for cooperative multi-agent problems where agents must coordinate for joint success in achieving different individual goals. We restructure multi-agent learning into a two-stage curriculum, consisting of a single-agent stage for learning to accomplish individual tasks, followed by a multi-agent stage for learning to cooperate in the presence of other agents. These two stages are bridged by modular augmentation of neural network policy and value functions. We further adapt the actor-critic framework to this curriculum by formulating local and global views of the policy gradient and learning via a double critic, consisting of a decentralized value function and a centralized action-value function. We evaluated CM3 on a new high-dimensional multi-agent environment with sparse rewards: negotiating lane changes among multiple autonomous vehicles in the Simulation of Urban Mobility (SUMO) traffic simulator. Detailed ablation experiments show the positive contribution of each component in CM3, and the overall synthesis converges significantly faster to higher performance policies than existing cooperative multi-agent methods. …

RoPAD google
For enterprise, personal and societal applications, there is now an increasing demand for automated authentication of identity from images using computer vision. However, current authentication technologies are still vulnerable to presentation attacks. We present RoPAD, an end-to-end deep learning model for presentation attack detection that employs unsupervised adversarial invariance to ignore visual distractors in images for increased robustness and reduced overfitting. Experiments show that the proposed framework exhibits state-of-the-art performance on presentation attack detection on several benchmark datasets. …

iTM-VAE google
This work focuses on combining nonparametric topic models with Auto-Encoding Variational Bayes (AEVB). Specifically, we first propose iTM-VAE, where the topics are treated as trainable parameters and the document-specific topic proportions are obtained by a stick-breaking construction. The inference of iTM-VAE is modeled by neural networks such that it can be computed in a simple feed-forward manner. We also describe how to introduce a hyper-prior into iTM-VAE so as to model the uncertainty of the prior parameter. Actually, the hyper-prior technique is quite general and we show that it can be applied to other AEVB based models to alleviate the {\it collapse-to-prior} problem elegantly. Moreover, we also propose HiTM-VAE, where the document-specific topic distributions are generated in a hierarchical manner. HiTM-VAE is even more flexible and can generate topic distributions with better variability. Experimental results on 20News and Reuters RCV1-V2 datasets show that the proposed models outperform the state-of-the-art baselines significantly. The advantages of the hyper-prior technique and the hierarchical model construction are also confirmed by experiments. …

Aspect-Aware LSTM (AA-LSTM) google
Aspect-based sentiment analysis (ABSA) aims to predict fine-grained sentiments of comments with respect to given aspect terms or categories. In previous ABSA methods, the importance of aspect has been realized and verified. Most existing LSTM-based models take aspect into account via the attention mechanism, where the attention weights are calculated after the context is modeled in the form of contextual vectors. However, aspect-related information may be already discarded and aspect-irrelevant information may be retained in classic LSTM cells in the context modeling process, which can be improved to generate more effective context representations. This paper proposes a novel variant of LSTM, termed as aspect-aware LSTM (AA-LSTM), which incorporates aspect information into LSTM cells in the context modeling stage before the attention mechanism. Therefore, our AA-LSTM can dynamically produce aspect-aware contextual representations. We experiment with several representative LSTM-based models by replacing the classic LSTM cells with the AA-LSTM cells. Experimental results on SemEval-2014 Datasets demonstrate the effectiveness of AA-LSTM. …

Distilled News

Knowledge Graph Embedding: A Survey of Approaches and Applications

Knowledge graph (KG) embedding is to embed components of a KG including entities and relations into continuous vector spaces, so as to simplify the manipulation while preserving the inherent structure of the KG. It can benefit a variety of downstream tasks such as KG completion and relation extraction, and hence has quickly gained massive attention. In this article, we provide a systematic review of existing techniques, including not only the state-of-the-arts but also those with latest trends. Particularly, we make the review based on the type of information used in the embedding task. Techniques that conduct embedding using only facts observed in the KG are first introduced. We describe the overall framework, specific model design, typical training procedures, as well as pros and cons of such techniques. After that, we discuss techniques that further incorporate additional information besides facts. We focus specifically on the use of entity types, relation paths, textual descriptions, and logical rules. Finally, we briefly introduce how KG embedding can be applied to and benefit a wide variety of downstream tasks such as KG completion, relation extraction, question answering, and so forth.

Awesome Knowledge Graph Embedding Approaches

This list contains repositories of libraries and approaches for knowledge graph embeddings, which are vector representations of entities and relations in a multi-relational directed labelled graph

Open-endedness: The last grand challenge you’ve never heard of

Artificial intelligence (AI) is a grand challenge for computer science. Lifetimes of effort and billions of dollars have powered its pursuit. Yet, today its most ambitious vision remains unmet: though progress continues, no human-competitive general digital intelligence is within our reach. However, such an elusive goal is exactly what we expect from a ‘grand challenge’ – it’s something that will take astronomical effort over expansive time to achieve – and is likely worth the wait. There are other grand challenges, like curing cancer, achieving 100% renewable energy, or unifying physics. Some fields have entire sets of grand challenges, such as David Hilbert’s 23 unsolved problems in mathematics, which laid down the gauntlet for the entire 20th century. What’s unusual, though, is for there to be a problem whose solution could radically alter our civilization and our understanding of ourselves while being known only to the smallest sliver of researchers. Despite how strangely implausible that sounds, it is precisely the scenario today with the challenge of open-endedness. Almost no one has even heard of this problem, let alone cares about its solution, even though it is among the most fascinating and profound challenges that might actually someday be solved. With this article, we hope to help fix this surprising disconnect. We’ll explain just what this challenge is, its amazing implications if solved, and how to join the quest if we’ve inspired your interest.

Universal Adversarial Triggers for Attacking and Analyzing NLP

Adversarial examples highlight model vulnerabilities and are useful for evaluation and interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset. We propose a gradientguided search over tokens which finds short trigger sequences (e.g., one word for classification and four words for language modeling) that successfully trigger the target prediction. For example, triggers cause SNLI entailment accuracy to drop from 89.94% to 0.55%, 72% of ‘why’ questions in SQuAD to be answered ‘to kill american people’, and the GPT-2 language model to spew racist output even when conditioned on non-racial contexts. Furthermore, although the triggers are optimized using white-box access to a specific model, they transfer to other models for all tasks we consider. Finally, since triggers are input-agnostic, they provide an analysis of global model behavior. For instance, they confirm that SNLI models exploit dataset biases and help to diagnose heuristics learned by reading comprehension models.

Bias-Variance: A Comprehensive Graphical Representation

To build an accurate machine learning model we need to have a proper understanding of the error. In forming predictions of a model there are three sources of error: noise, bias, and variance. Having proper knowledge of error and bias-variance would help us building accurate models and avoiding mistakes of overfitting and underfitting. In this tutorial, our case study will be how to predict house prices. We have a dataset which consists of house prices with the square feet of the house associated with it.

AI projects need the right infrastructure – but what is the scope of ‘right’ ?

As artificial intelligence (AI) becomes more ubiquitous, organisations everywhere are starting to try to work out how best to implement AI. At a recent event at the University of Frankfurt, I talked to a young bank manager about this. For him, the question boiled down to getting the right infrastructure and environment. But what, he wondered, was ‘right’ in this context?

Alibaba Open-Sources Mars to Complement NumPy

NumPy’ is a beloved tool for the huge population of Python users who are mathematicians, engineers, etc. and working deeply in scientific computing. The NumPy Base N-dimensional array package usually contains:
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random number capabilities
Alibaba Cloud recently announced that it has open sourced Mars – its tensor-based framework for large-scale data computation – on Github. Mars can be regarded as ‘a parallel and distributed NumPy.’ Mars can tile a large tensor into small chunks and describe the inner computation with a directed graph, enabling the running of parallel computation on a wide range of distributed environments, from a single machine to a cluster comprising thousands of machines.

Everything you Should Know about p-value from Scratch for Data Science

Does the below scenario look familiar when you talk about p-value to aspiring data scientists?
I cannot tell you the number of times data scientists, even established ones, flounder when it comes to explaining how to interpret a p-value. In fact, take a moment to answer these questions:
• How do you interpret a p-value?
• How much importance should we place in the p-value?
• How will you explain the significance of p-value to a non-data science person (a stakeholder for example)?

Beyond Interactive: Notebook Innovation at Netflix

Notebooks have rapidly grown in popularity among data scientists to become the de facto standard for quick prototyping and exploratory analysis. At Netflix, we’re pushing the boundaries even further, reimagining what a notebook can be, who can use it, and what they can do with it. And we’re making big investments to help make this vision a reality. In this post, we’ll share our motivations and why we find Jupyter notebooks so compelling. We’ll also introduce components of our notebook infrastructure and explore some of the novel ways we’re using notebooks at Netflix.

How AI will drive profitability in Micro-mobility

Driven by a passion for AI and micro-mobility solutions, we wanted to write an article related to AI adoption within the micro-mobility industry whose influence keeps rising in our cities. Thanks to our experience at ofo, Mobike and within several AI departments (Global brands and Consulting firms), we wrote this article. AI solutions need to address specific challenges. Through our experience, we have seen that this market is quite unique because of its nature (explosion of data available, the number of competitors, etc.). The main risk is for micro-mobility players to end up using generic solutions that don’t really help them to create a unique competitive advantage.

Bag of words code – The easiest explanation of NLP technique using a python

Today I am going to explain Bag of words technique to you. If you’re here, you probably know why we use it but if you do not know then I’ll tell you with an example.

System & Language Agnostic Hyperparameter Optimization at Scale

In recent years, there has been an explosion in the development of new machine learning architectures that achieve tasks once unfathomable for AI. While this article could rave endlessly about these exciting developments, it will instead be about the necessary framework that helps AI extend past the boundaries of the human mind. Boring? Maybe. But imagine a microservice which can optimize any ML model because it is invariant to language, infrastructure, and result/model storage desires. At Capital One, I am proud to have helped build a cloud-based, system- and language-agnostic hyperparameter optimization framework that has helped us achieve state-of-the-art results. Before getting into the how, it is important to understand the purpose as well as the considerations that went into its development within the Capital One development and deployment environments on AWS.

Building A Collaborative Filtering Recommender System with TensorFlow

Collaborative Filtering is a technique widely used by recommender systems when you have a decent size of user – item data. It makes recommendations based on the content preferences of similar users. Therefore, collaborative filtering is not a suitable model to deal with cold start problem, in which it cannot draw any inference for users or items about which it has not yet gathered sufficient information. But once you have relative large user – item interaction data, then collaborative filtering is the most widely used recommendation approach. And we are going to learn how to build a collaborative filtering recommender system using TensorFlow.

Why You Need to Know the Difference Between AI and Automation

Pop quiz: Can you define the differences between artificial intelligence and automation? I won’t judge you if the answer is ‘no.’ There’s a blurry line between AI and automation, with the terms often used interchangeably, even in tech-forward professions. But there’s a very real difference between the two – and it’s one that’s becoming ever-more critical for organizations to understand. Both automation and AI play an increasing role in the modern workplace, thanks to rapid advancements and the massive amount of data at organizations’ disposal. But while more than one-third (37%) of organizations use AI in some form, this figure doesn’t account for the sophistication of each implementation. A truly robust use of AI and automation in everyday work – not just narrow applications – will require both education and an open-minded approach.