If you did not already know

Survival-CRPS google
Personalized probabilistic forecasts of time to event (such as mortality) can be crucial in decision making, especially in the clinical setting. Inspired by ideas from the meteorology literature, we approach this problem through the paradigm of maximizing sharpness of prediction distributions, subject to calibration. In regression problems, it has been shown that optimizing the continuous ranked probability score (CRPS) instead of maximum likelihood leads to sharper prediction distributions while maintaining calibration. We introduce the Survival-CRPS, a generalization of the CRPS to the time to event setting, and present right-censored and interval-censored variants. To holistically evaluate the quality of predicted distributions over time to event, we present the Survival-AUPRC evaluation metric, an analog to area under the precision-recall curve. We apply these ideas by building a recurrent neural network for mortality prediction, using an Electronic Health Record dataset covering millions of patients. We demonstrate significant benefits in models trained by the Survival-CRPS objective instead of maximum likelihood. …

Temporal Logic google
In logic, temporal logic is any system of rules and symbolism for representing, and reasoning about, propositions qualified in terms of time (for example, ‘I am always hungry’, ‘I will eventually be hungry’, or ‘I will be hungry until I eat something’). It is sometimes also used to refer to tense logic, a modal logic-based system of temporal logic introduced by Arthur Prior in the late 1950s, with important contributions by Hans Kamp. It has been further developed by computer scientists, notably Amir Pnueli, and logicians. Temporal logic has found an important application in formal verification, where it is used to state requirements of hardware or software systems. For instance, one may wish to say that whenever a request is made, access to a resource is eventually granted, but it is never granted to two requestors simultaneously. Such a statement can conveniently be expressed in a temporal logic. …

Mask-ShadowGAN google
This paper presents a new method for shadow removal using unpaired data, enabling us to avoid tedious annotations and obtain more diverse training samples. However, directly employing adversarial learning and cycle-consistency constraints is insufficient to learn the underlying relationship between the shadow and shadow-free domains, since the mapping between shadow and shadow-free images is not simply one-to-one. To address the problem, we formulate Mask-ShadowGAN, a new deep framework that automatically learns to produce a shadow mask from the input shadow image and then takes the mask to guide the shadow generation via re-formulated cycle-consistency constraints. Particularly, the framework simultaneously learns to produce shadow masks and learns to remove shadows, to maximize the overall performance. Also, we prepared an unpaired dataset for shadow removal and demonstrated the effectiveness of Mask-ShadowGAN on various experiments, even it was trained on unpaired data. …


Document worth reading: “A review of swarmalators and their potential in bio-inspired computing”

From fireflies to heart cells, many systems in Nature show the remarkable ability to spontaneously fall into synchrony. By imitating Nature’s success at self-synchronizing, scientists have designed cost-effective methods to achieve synchrony in the lab, with applications ranging from wireless sensor networks to radio transmission. A similar story has occurred in the study of swarms, where inspiration from the behavior flocks of birds and schools of fish has led to ‘low-footprint’ algorithms for multi-robot systems. Here, we continue this ‘bio-inspired’ tradition, by speculating on the technological benefit of fusing swarming with synchronization. The subject of recent theoretical work, minimal models of so-called ‘swarmalator’ systems exhibit rich spatiotemporal patterns, hinting at utility in ‘bottom-up’ robotic swarms. We review the theoretical work on swarmalators, identify possible realizations in Nature, and discuss their potential applications in technology. A review of swarmalators and their potential in bio-inspired computing

R Packages worth a look

Discretization and Grouping for Logistic Regression (glmdisc)
A Stochastic-Expectation-Maximization (SEM) algorithm (Celeux et al. (1995) <https:// …

LIME-Based Explanations with Interpretable Inputs Based on Ceteris Paribus Profiles (
Local explanations of machine learning models describe, how features contributed to a single prediction. This package implements an explanation method …

Multiple Fill and Color Scales in ‘ggplot2’ (ggnewscale)
Use multiple fill and color scales in ‘ggplot2’.

Plots for Circular Data (cplots)
Provides functions to produce some circular plots for circular data, in a height- or area-proportional manner. They include barplots, smooth density pl …

Stochastic Precipitation Downscaling with the RainFARM Method (rainfarmr)
An implementation of the RainFARM (Rainfall Filtered Autoregressive Model) stochastic precipitation downscaling method (Rebora et al. (2006) <doi:10 …

Estimation of the Proportion of Treatment Effect Explained by Surrogate Outcome Information (SurrogateOutcome)
Provides functions to estimate the proportion of treatment effect on a censored primary outcome that is explained by the treatment effect on a censored …

Magister Dixit

“Distinguishing between feature selection and dimensionality reduction might seem counter-intuitive at first, since feature selection will eventually lead (reduce dimensionality) to a smaller feature space. In practice, the key difference between the terms “feature selection” and “dimensionality reduction” is that in feature selection, we keep the “original feature axis”, whereas dimensionality reduction usually involves a transformation technique.” Sebastian Raschka ( August 24, 2014 )

Book Memo: “Intelligent Decision Support Systems – A Journey To Smarter Healthcare”

The goal of this book is to provide, in a friendly and refreshing manner, both theoretical concepts and practical techniques for the important and exciting field of Artificial Intelligence that can be directly applied to real-world healthcare problems. Healthcare – the final frontier. Lately, it seems like Pandora opened the box and evil was released into the world. Fortunately, there was one thing left in the box: hope. In recent decades, hope has been increasingly represented by Intelligent Decision Support Systems. Their continuing mission: to explore strange new diseases, to seek out new treatments and drugs, and to intelligently manage healthcare resources and patients. Hence, this book is designed for all those who wish to learn how to explore, analyze and find new solutions for the most challenging domain of all time: healthcare.

Whats new on arXiv

Probabilistic Kernel Support Vector Machines

We propose a probabilistic enhancement of standard {\em kernel Support Vector Machines} for binary classification, in order to address the case when, along with given data sets, a description of uncertainty (e.g., error bounds) may be available on each datum. In the present paper, we specifically consider Gaussian distributions to model uncertainty. Thereby, our data consist of pairs (x_i,\Sigma_i), i\in\{1,\ldots,N\}, along with an indicator y_i\in\{-1,1\} to declare membership in one of two categories for each pair. These pairs may be viewed to represent the mean and covariance, respectively, of random vectors \xi_i taking values in a suitable linear space (typically {\mathbb R}^n). Thus, our setting may also be viewed as a modification of Support Vector Machines to classify distributions, albeit, at present, only Gaussian ones. We outline the formalism that allows computing suitable classifiers via a natural modification of the standard “kernel trick.” The main contribution of this work is to point out a suitable kernel function for applying Support Vector techniques to the setting of uncertain data for which a detailed uncertainty description is also available (herein, “Gaussian points”).

Learning to Engage with Interactive Systems: A field Study

Physical agents that can autonomously generate engaging, life-like behaviour will lead to more responsive and interesting robots and other autonomous systems. Although many advances have been made for one-to-one interactions in well controlled settings, future physical agents should be capable of interacting with humans in natural settings, including group interaction. In order to generate engaging behaviours, the autonomous system must first be able to estimate its human partners’ engagement level. In this paper, we propose an approach for estimating engagement from behaviour and use the measure within a reinforcement learning framework to learn engaging interactive behaviours. The proposed approach is implemented in an interactive sculptural system in a museum setting. We compare the learning system to a baseline using pre-scripted interactive behaviours. Analysis based on sensory data and survey data shows that adaptable behaviours within a perceivable and understandable range can achieve higher engagement and likeability.

Infinite Probabilistic Databases

Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this is not compatible with an open world semantics (Ceylan et al., KR 2016) and with application scenarios that are modeled by continuous probability distributions (Dalvi et al., CACM 2009). We recently introduced a model of PDBs as infinite probability spaces that addresses these issues (Grohe and Lindner, PODS 2019). While that work was mainly concerned with countably infinite probability spaces, our focus here is on uncountable spaces. Such an extension is necessary to model typical continuous probability distributions that appear in many applications. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries. It turns out that so-called finite point processes are the appropriate model from probability theory for dealing with probabilistic databases. This model allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and datalog queries.

No Adjective Ordering Mystery, and No Raven Paradox, Just an Ontological Mishap

In the concluding remarks of Ontological Promiscuity Hobbs (1985) made what we believe to be a very insightful observation: given that semantics is an attempt at specifying the relation between language and the world, if ‘one can assume a theory of the world that is isomorphic to the way we talk about it … then semantics becomes nearly trivial’. But how exactly can we rectify our logical formalisms so that semantics, an endeavor that has occupied the most penetrating minds for over two centuries, can become (nearly) trivial, and what exactly does it mean to assume a theory of the world in our semantics? In this paper we hope to provide answers for both questions. First, we believe that a commonsense theory of the world can (and should) be embedded in our semantic formalisms resulting in a logical semantics grounded in commonsense metaphysics. Moreover, we believe the first step to accomplishing this vision is rectifying what we think was a crucial oversight in logical semantics, namely the failure to distinguish between two fundamentally different types of concepts: (i) ontological concepts, that correspond to what Cocchiarella (2001) calls first-intension concepts and are types in a strongly-typed ontology; and (ii) logical concepts (or second intension concepts), that are predicates corresponding to properties of (and relations between) objects of various ontological types1. In such a framework, which we will refer to henceforth by ontologik, it will be shown how type unification and other type operations can be used to account for the `missing text phenomenon’ (MTP) (see Saba, 2019a) that is at the heart of most challenges in the semantics of natural language, by uncovering the significant amount of missing text that is never explicitly stated in everyday discourse, but is often implicitly assumed as shared background knowledge.

Improving interactive reinforcement learning: What makes a good teacher?

Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.

GraphTSNE: A Visualization Technique for Graph-Structured Data

We present GraphTSNE, a novel visualization technique for graph-structured data based on t-SNE. The growing interest in graph-structured data increases the importance of gaining human insight into such datasets by means of visualization. However, among the most popular visualization techniques, classical t-SNE is not suitable on such datasets because it has no mechanism to make use of information from graph connectivity. On the other hand, standard graph visualization techniques, such as Laplacian Eigenmaps, have no mechanism to make use of information from node features. Our proposed method GraphTSNE is able to produce visualizations which account for both graph connectivity and node features. It is based on scalable and unsupervised training of a graph convolutional network on a modified t-SNE loss. By assembling a suite of evaluation metrics, we demonstrate that our method produces desirable visualizations on three benchmark datasets.

Deep Comprehensive Correlation Mining for Image Clustering

Recent developed deep unsupervised methods allow us to jointly learn representation and cluster unlabelled data. These deep clustering methods %like DAC start with mainly focus on the correlation among samples, e.g., selecting high precision pairs to gradually tune the feature representation, which neglects other useful correlations. In this paper, we propose a novel clustering framework, named deep comprehensive correlation mining(DCCM), for exploring and taking full advantage of various kinds of correlations behind the unlabeled data from three aspects: 1) Instead of only using pair-wise information, pseudo-label supervision is proposed to investigate category information and learn discriminative features. 2) The features’ robustness to image transformation of input space is fully explored, which benefits the network learning and significantly improves the performance. 3) The triplet mutual information among features is presented for clustering problem to lift the recently discovered instance-level deep mutual information to a triplet-level formation, which further helps to learn more discriminative features. Extensive experiments on several challenging datasets show that our method achieves good performance, e.g., attaining 62.3\% clustering accuracy on CIFAR-10, and 34.0\% on CIFAR-100, both of which significantly surpass the state-of-the-art results more than 10.0\%.

Human-Guided Learning of Column Networks: Augmenting Deep Learning with Advice

Recently, deep models have been successfully applied in several applications, especially with low-level representations. However, sparse, noisy samples and structured domains (with multiple objects and interactions) are some of the open challenges in most deep models. Column Networks, a deep architecture, can succinctly capture such domain structure and interactions, but may still be prone to sub-optimal learning from sparse and noisy samples. Inspired by the success of human-advice guided learning in AI, especially in data-scarce domains, we propose Knowledge-augmented Column Networks that leverage human advice/knowledge for better learning with noisy/sparse samples. Our experiments demonstrate that our approach leads to either superior overall performance or faster convergence (i.e., both effective and efficient).

LeanResNet: A Low-cost yet Effective Convolutional Residual Networks

Convolutional Neural Networks (CNNs) filter the input data using a series of spatial convolution operators with compact stencils and point-wise non-linearities. Commonly, the convolution operators couple features from all channels, which leads to immense computational cost in the training of and prediction with CNNs. To improve the efficiency of CNNs, we introduce lean convolution operators that reduce the number of parameters and computational complexity. Our new operators can be used in a wide range of existing CNNs. Here, we exemplify their use in residual networks (ResNets), which have been very reliable for a few years now and analyzed intensively. In our experiments on three image classification problems, the proposed LeanResNet yields results that are comparable to other recently proposed reduced architectures using similar number of parameters.

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

The goal of this paper is to study why stochastic gradient descent (SGD) is efficient for neural networks, and how neural net design affects SGD. In particular, we investigate how overparameterization — an increase in the number of parameters beyond the number of training data — affects the dynamics of SGD. We introduce a simple concept called gradient confusion. When confusion is high, stochastic gradients produced by different data samples may be negatively correlated, slowing down convergence. But when gradient confusion is low, we show that SGD has better convergence properties than predicted by classical theory. Using theoretical and experimental results, we study how overparameterization affects gradient confusion, and thus the convergence of SGD, on linear models and neural networks. We show that increasing the number of parameters of linear models or increasing the width of neural networks leads to lower gradient confusion, and thus faster and easier model training. We also show how overparameterization by increasing the depth of neural networks results in higher gradient confusion, making deeper models harder to train. Finally, we observe empirically that techniques like batch normalization and skip connections reduce gradient confusion, which helps reduce the training burden of deep networks.

A Hitchhiker’s Guide to Statistical Comparisons of Reinforcement Learning Algorithms

Consistently checking the statistical significance of experimental results is the first mandatory step towards reproducible science. This paper presents a hitchhiker’s guide to rigorous comparisons of reinforcement learning algorithms. After introducing the concepts of statistical testing, we review the relevant statistical tests and compare them empirically in terms of false positive rate and statistical power as a function of the sample size (number of seeds) and effect size. We further investigate the robustness of these tests to violations of the most common hypotheses (normal distributions, same distributions, equal variances). Beside simulations, we compare empirical distributions obtained by running Soft-Actor Critic and Twin-Delayed Deep Deterministic Policy Gradient on Half-Cheetah. We conclude by providing guidelines and code to perform rigorous comparisons of RL algorithm performances.

SR-GAN: Semantic Rectifying Generative Adversarial Network for Zero-shot Learning

The existing Zero-Shot learning (ZSL) methods may suffer from the vague class attributes that are highly overlapped for different classes. Unlike these methods that ignore the discrimination among classes, in this paper, we propose to classify unseen image by rectifying the semantic space guided by the visual space. First, we pre-train a Semantic Rectifying Network (SRN) to rectify semantic space with a semantic loss and a rectifying loss. Then, a Semantic Rectifying Generative Adversarial Network (SR-GAN) is built to generate plausible visual feature of unseen class from both semantic feature and rectified semantic feature. To guarantee the effectiveness of rectified semantic features and synthetic visual features, a pre-reconstruction and a post reconstruction networks are proposed, which keep the consistency between visual feature and semantic feature. Experimental results demonstrate that our approach significantly outperforms the state-of-the-arts on four benchmark datasets.

The Varchenko Determinant

Varchenko introduced a distance function on chambers of a hyperplane arrangement which gives rise to a determinant indexed by chambers whose entry in position (C,D) is the distance between C and D, and proved that that determinant has a nice factorization: that is the Varchenko determinant. Recently, Aguiar and Mahajan defined a generalization of that distance function, and proved that, for a central hyperplane arrangement, the determinant given rise by their distance function has also a nice factorization. We prove that, for any hyperplane arrangement, the determinant given rise by the distance function of Aguiar and Mahajan has a nice factorization. We also prove that the same is true for the determinant indexed by chambers of an apartment.

Copula-like Variational Inference

This paper considers a new family of variational distributions motivated by Sklar’s theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently, i.e. with a complexity linear in the dimension of state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity \mathcal{O}(d \log d). We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.

A Discussion on Solving Partial Differential Equations using Neural Networks

Can neural networks learn to solve partial differential equations (PDEs)? We investigate this question for two (systems of) PDEs, namely, the Poisson equation and the steady Navier–Stokes equations. The contributions of this paper are five-fold. (1) Numerical experiments show that small neural networks (< 500 learnable parameters) are able to accurately learn complex solutions for systems of partial differential equations. (2) It investigates the influence of random weight initialization on the quality of the neural network approximate solution and demonstrates how one can take advantage of this non-determinism using ensemble learning. (3) It investigates the suitability of the loss function used in this work. (4) It studies the benefits and drawbacks of solving (systems of) PDEs with neural networks compared to classical numerical methods. (5) It proposes an exhaustive list of possible directions of future work.

Tutorial: Safe and Reliable Machine Learning

This document serves as a brief overview of the ‘Safe and Reliable Machine Learning’ tutorial given at the 2019 ACM Conference on Fairness, Accountability, and Transparency (FAT* 2019). The talk slides can be found here:, while a video of the talk is available here:, and a complete list of references for the tutorial here:

Distilled News

Why Machine Learning Models Crash And Burn In Production

One magical aspect of software is that it just keeps working. If you code a calculator app, it will still correctly add and multiply numbers a month, a year, or 10 years later. The fact that the marginal cost of software approaches zero has been a bedrock of the software industry’s business model since the 1980s. This is no longer the case when you are deploying machine learning (ML) models. Making this faulty assumption is the most common mistake of companies taking their first artificial intelligence (AI) products to market. The moment you put a model in production, it starts degrading.

Towards Zero-Overhead Reproducibility: Docker Support for ML Training

So why doesn’t everyone use Docker for their ML experiments? Some people don’t want the overhead of an added piece of software while running experiments. More on how we can help deal with this at https://…/towards-reproducibility

Why software projects take longer than you think – a statistical model

Anyone who built software for a while knows that estimating how long something is going to take is hard. It’s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about solving something. One pet theory I’ve had for a really long time, is that some of this is really just a statistical artifact.

MorphNet: Towards Faster and Smaller Neural Networks

MorphNet optimizes a neural network through a cycle of shrinking and expanding phases. In the shrinking phase, MorphNet identifies inefficient neurons and prunes them from the network by applying a sparsifying regularizer such that the total loss function of the network includes a cost for each neuron. However, rather than applying a uniform cost per neuron, MorphNet calculates a neuron cost with respect to the targeted resource. As training progresses, the optimizer is aware of the resource cost when calculating gradients, and thus learns which neurons are resource-efficient and which can be removed.

Building a Flask API to Automatically Extract Named Entities Using SpaCy

How to use the Named Entity Recognition module in spaCy to identify people, organizations, or locations in text, then deploy a Python API with Flask.

Unsupervised Learning: Dimensionality Reduction

As stated in previous articles, unsupervised learning refers to a kind of machine learning algorithms and techniques that are trained and fed with unlabeled data. In other words, we do not know the correct solutions or the values of the target variable beforehand. The main goal of these types of algorithms is to study the intrinsic and hidden structure of the data in order to get meaningful insights, segment the datasets in similar groups or to simplify them. Throughout this article, we are going to explore some of the algorithms and techniques most commonly used to reduce the dimensionality of datasets.

Introduction to LSTM Units While Playing Jazz

Long short-term memory (LSTM) units allow to learn very long sequences. It is a more general and robust version of the gated recurrent unit (GRU), which will not be addressed in this post. In this post, we will learn how an LSTM unit works, and we will apply it to generate some jazz music.

8 Useful R Packages for Data Science You Aren’t Using (But Should!)

I’m a big fan of R – it’s no secret. I have relied on it since my days of learning statistics back in university. In fact, R is still my go-to language for machine learning projects. Three things primarily attracted me to R:
• The easy-to-understand and use syntax
• The incredible RStudio tool
• R packages!
R offers a plethora of packages for performing machine learning tasks, including ‘dplyr’ for data manipulation, ‘ggplot2’ for data visualization, ‘caret’ for building ML models, etc.

Calculating the Semantic Brand Score with Python

The Semantic Brand Score (SBS) is a novel metric designed to assess the importance of one or more brands, in different contexts and whenever it is possible to analyze textual data, even big data. The advantage with respect to some traditional measures is that the SBS do not relies on surveys administered to small samples of consumers. The measure can be calculated on any source of text documents, such as newspaper articles, emails, tweets, posts on online forums, blogs and social media. The idea is to capture insights and honest signals through the analysis of big textual data. Spontaneous expressions of consumers, or other brand stakeholders, can be collected from the places where they normally appear – for example a travel forum, if studying the importance of museum brands. This has the advantage of reducing the biases induced by the use of questionnaires, where interviewees know that they are being observed. The SBS can also be adapted to different languages and to study the importance of specific words, or set of words, not necessarily ‘brands’.

Deep embedding’s for categorical variables (Cat2Vec)

In this blog I am going to take you through the steps involved in creating a embedding for categorical variables using a deep learning network on top of keras. The concept was originally introduced by Jeremy Howard in his fastai course. Please see the link for more details.

Face Recognition using Artificial Intelligence

Face can be considered as the unique identity of an individual. People across the world have unique faces and facial features. It plays a major role for interacting with other people in society. Considering these facts, facial recognition is implemented in the real world. What is a Facial Recognition System? In simple words a Facial Recognition System can be defined as a technology which can identify or verify a person from a digital image or video source by comparing and analyzing patterns based on the person’s facial contours.

Chatbots aren’t as difficult to make as You Think

Every website must implement it. Every Data Scientist must know about them. Anytime we talk about AI; Chatbots must be discussed. But they intimidate someone very new to the field. We struggle with a lot of questions before we even begin to start working on them. Are they hard to create? What technologies should I know before working on them? In the end, we end up discouraged reading through so many posts on the internet and effectively accomplish nothing.

Breaking the curse of small data sets in Machine Learning: Part 2

This is Part 2 of the series Breaking the curse of small datasets in Machine Learning. In Part 1, I have discussed how the size of the data set impacts traditional Machine Learning algorithms and a few ways to mitigate those issues. In Part 2, I will discuss how deep learning model performance depends on data size and how to work with smaller data sets to get similar performances.

Facing the ARIMA Model against Neural Networks

The purpose of this small project is to go through the ARIMA model to evaluate its performance in a univariate dataset. Also, its performance will be compared with other techniques that are currently available to create predictions in time series using neural networks. This post consists of different methods for forecasting time series. However, none of these methods is perfect as there is no perfect way to predict the future, so these results should be taken with care and always with the advice of an expert.

Book Memo: “Artificial Neural Networks with Java”

Tools for Building Neural Network Applications
Use Java to develop neural network applications in this practical book. After learning the rules involved in neural network processing, you will manually process the first neural network example. This covers the internals of front and back propagation, and facilitates the understanding of the main principles of neural network processing. Artificial Neural Networks with Java also teaches you how to prepare the data to be used in neural network development and suggests various techniques of data preparation for many unconventional tasks. The next big topic discussed in the book is using Java for neural network processing. You will use the Encog Java framework and discover how to do rapid development with Encog, allowing you to create large-scale neural network applications. The book also discusses the inability of neural networks to approximate complex non-continuous functions, and it introduces the micro-batch method that solves this issue. The step-by-step approach includes plenty of examples, diagrams, and screen shots to help you grasp the concepts quickly and easily.

If you did not already know

Neural Exploration-Exploitation Tree (NEXT) google
Sampling-based algorithms such as RRT and its variants are powerful tools for path planning problems in high-dimensional continuous state and action spaces. While these algorithms perform systematic exploration of the state space, they do not fully exploit past planning experiences from similar environments. In this paper, we design a meta path planning algorithm, called \emph{Neural Exploration-Exploitation Trees} (NEXT), which can exploit past experience to drastically reduce the sample requirement for solving new path planning problems. More specifically, NEXT contains a novel neural architecture which can learn from experiences the dependency between task structures and promising path search directions. Then this learned prior is integrated with a UCB-type algorithm to achieve an online balance between \emph{exploration} and \emph{exploitation} when solving a new problem. Empirically, we show that NEXT can complete the planning tasks with very small searching trees and significantly outperforms previous state-of-the-arts on several benchmark problems. …

Inference Enterprise Model google
An Inference enterprise is an entity within an organization that uses data, tools, people, and processes to make inferences about variables that are critical to the success of the organization. …

ATOMIC google
We present ATOMIC, an atlas of everyday commonsense reasoning, organized through 300k textual descriptions. Compared to existing resources that center around taxonomic knowledge, ATOMIC focuses on inferential knowledge organized as typed if-then relations with variables (e.g., ‘if X pays Y a compliment, then Y will likely return the compliment’). We propose nine if-then relation types to distinguish causes v.s. effects, agents v.s. themes, voluntary v.s. involuntary events, and actions v.s. mental states. By generatively training on the rich inferential knowledge described in ATOMIC, we show that neural models can acquire simple commonsense capabilities and reason about previously unseen events. Experimental results demonstrate that multitask models that incorporate the hierarchical structure of if-then relation types lead to more accurate inference compared to models trained in isolation, as measured by both automatic and human evaluation. …