ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler

Domain specific accelerators present new challenges and opportunities for code generation onto novel instruction sets, communication fabrics, and memory architectures. In this paper we introduce an intermediate representation (IR) which enables both deep learning computational kernels and hardware capabilities to be described in the same IR. We then formulate and apply instruction mapping to determine the possible ways a computation can be performed on a hardware system. Next, our scheduler chooses a specific mapping and determines the data movement and computation order. In order to manage the large search space of mappings and schedules, we developed a flexible framework that allows heuristics, cost models, and potentially machine learning to facilitate this search problem. With this system, we demonstrate the automated extraction of matrix multiplication kernels out of recent deep learning kernels such as depthwise-separable convolution. In addition, we demonstrate two to five times better performance on DeepBench sized GEMMs and GRU RNN execution when compared to state-of-the-art (SOTA) implementations on new hardware and up to 85% of the performance for SOTA implementations on existing hardware.

Pyro: Deep Universal Probabilistic Programming

Pyro is a probabilistic programming language built on Python as a platform for developing advanced probabilistic models in AI research. To scale to large datasets and high-dimensional models, Pyro uses stochastic variational inference algorithms and probability distributions built on top of PyTorch, a modern GPU-accelerated deep learning framework. To accommodate complex or model-specific algorithmic behavior, Pyro leverages Poutine, a library of composable building blocks for modifying the behavior of probabilistic programs.

A Bayesian Nonparametric Method for Estimating Causal Treatment Effects on Zero-Inflated Outcomes

We present a Bayesian nonparametric method for estimating causal effects on continuous, zero-inflated outcomes. This work is motivated by a need for estimates of causal treatment effects on medical costs; that is, estimates contrasting average total costs that would have accrued under one treatment versus another. Cost data tend to be zero-inflated, skewed, and multi-modal. This presents a significant statistical challenge, even if the usual causal identification assumptions hold. Our approach flexibly models expected cost conditional on treatment and covariates using an infinite mixture of zero-inflated regressions. This conditional mean model is incorporated into the Bayesian standardization formula to obtain nonparametric estimates of causal effects. Moreover, the estimation procedure predicts latent cluster membership for each patient – automatically identifying patients with different cost-covariate profiles. We present a generative model, an MCMC method for sampling from the posterior and posterior predictive, and a Monte Carlo standardization procedure for computing causal effects. Our simulation studies show the resulting causal effect estimates and credible interval estimates to have low bias and close to nominal coverage, respectively. These results hold even under highly irregular data distributions. Relative to a standard infinite mixture of regressions, our method yields interval estimates with better coverage probability. We apply the method to compare inpatient costs among endometrial cancer patients receiving either chemotherapy or radiation therapy in the SEER Medicare database.

Univariate Mean Change Point Detection: Penalization, CUSUM and Optimality

The problem of univariate mean change point detection and localization based on a sequence of n independent observations with piecewise constant means has been intensively studied for more than half century, and serves as a blueprint for change point problems in more complex settings. We provide a complete characterization of this classical problem in a general framework in which the upper bound on the noise variance \sigma^2, the minimal spacing \Delta between two consecutive change points and the minimal magnitude of the changes \kappa, are allowed to vary with n. We first show that consistent localization of the change points when the signal-to-noise ratio \frac{\kappa \sqrt{\Delta}}{\sigma} is uniformly bounded from above is impossible. In contrast, when \frac{\kappa \sqrt{\Delta}}{\sigma} is diverging in n at any arbitrary slow rate, we demonstrate that two computationally-efficient change point estimators, one based on the solution to an \ell_0-penalized least squares problem and the other on the popular WBS algorithm, are both consistent and achieve a localization rate of the order \frac{\sigma^2}{\kappa^2} \log(n). We further show that such rate is minimax optimal, up to a \log(n) term.

How to train your MAML

The field of few-shot learning has recently seen substantial advancements. Most of these advancements came from casting few-shot learning as a meta-learning problem. Model Agnostic Meta Learning or MAML is currently one of the best approaches for few-shot learning via meta-learning. MAML is simple, elegant and very powerful, however, it has a variety of issues, such as being very sensitive to neural network architectures, often leading to instability during training, requiring arduous hyperparameter searches to stabilize training and achieve high generalization and being very computationally expensive at both training and inference times. In this paper, we propose various modifications to MAML that not only stabilize the system, but also substantially improve the generalization performance, convergence speed and computational overhead of MAML, which we call MAML++.

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Recurrent neural network (RNN) models are widely used for processing sequential data governed by a latent tree structure. Previous work shows that RNN models (especially Long Short-Term Memory (LSTM) based models) could learn to exploit the underlying tree structure. However, its performance consistently lags behind that of tree-based models. This work proposes a new inductive bias Ordered Neurons, which enforces an order of updating frequencies between hidden state neurons. We show that the ordered neurons could explicitly integrate the latent tree structure into recurrent models. To this end, we propose a new RNN unit: ON-LSTM, which achieve good performances on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.

An Efficient Bandit Algorithm for Realtime Multivariate Optimization

Optimization is commonly employed to determine the content of web pages, such as to maximize conversions on landing pages or click-through rates on search engine result pages. Often the layout of these pages can be decoupled into several separate decisions. For example, the composition of a landing page may involve deciding which image to show, which wording to use, what color background to display, etc. Such optimization is a combinatorial problem over an exponentially large decision space. Randomized experiments do not scale well to this setting, and therefore, in practice, one is typically limited to optimizing a single aspect of a web page at a time. This represents a missed opportunity in both the speed of experimentation and the exploitation of possible interactions between layout decisions. Here we focus on multivariate optimization of interactive web pages. We formulate an approach where the possible interactions between different components of the page are modeled explicitly. We apply bandit methodology to explore the layout space efficiently and use hill-climbing to select optimal content in realtime. Our algorithm also extends to contextualization and personalization of layout selection. Simulation results show the suitability of our approach to large decision spaces with strong interactions between content. We further apply our algorithm to optimize a message that promotes adoption of an Amazon service. After only a single week of online optimization, we saw a 21% conversion increase compared to the median layout. Our technique is currently being deployed to optimize content across several locations at

A Fully Attention-Based Information Retriever

Recurrent neural networks are now the state-of-the-art in natural language processing because they can build rich contextual representations and process texts of arbitrary length. However, recent developments on attention mechanisms have equipped feedforward networks with similar capabilities, hence enabling faster computations due to the increase in the number of operations that can be parallelized. We explore this new type of architecture in the domain of question-answering and propose a novel approach that we call Fully Attention Based Information Retriever (FABIR). We show that FABIR achieves competitive results in the Stanford Question Answering Dataset (SQuAD) while having fewer parameters and being faster at both learning and inference than rival methods.

Model Selection Techniques — An Overview

In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of- the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection.

Towards Universal Dialogue State Tracking

Dialogue state tracking is the core part of a spoken dialogue system. It estimates the beliefs of possible user’s goals at every dialogue turn. However, for most current approaches, it’s difficult to scale to large dialogue domains. They have one or more of following limitations: (a) Some models don’t work in the situation where slot values in ontology changes dynamically; (b) The number of model parameters is proportional to the number of slots; (c) Some models extract features based on hand-crafted lexicons. To tackle these challenges, we propose StateNet, a universal dialogue state tracker. It is independent of the number of values, shares parameters across all slots, and uses pre-trained word vectors instead of explicit semantic dictionaries. Our experiments on two datasets show that our approach not only overcomes the limitations, but also significantly outperforms the performance of state-of-the-art approaches.

Applying Deep Learning To Airbnb Search

The application to search ranking is one of the biggest machine learning success stories at Airbnb. Much of the initial gains were driven by a gradient boosted decision tree model. The gains, however, plateaued over time. This paper discusses the work done in applying neural networks in an attempt to break out of that plateau. We present our perspective not with the intention of pushing the frontier of new modeling techniques. Instead, ours is a story of the elements we found useful in applying neural networks to a real life product. Deep learning was steep learning for us. To other teams embarking on similar journeys, we hope an account of our struggles and triumphs will provide some useful pointers. Bon voyage!

Calendar-based graphics for visualizing people’s daily schedules

Calendars are broadly used in society to display temporal information, and events. This paper describes a new R package with functionality to organize and display temporal data, collected on sub-daily resolution, into a calendar layout. The function `frame_calendar` uses linear algebra on the date variable to restructure data into a format lending itself to calendar layouts. The user can apply the grammar of graphics to create plots inside each calendar cell, and thus the displays synchronize neatly with ggplot2 graphics. The motivating application is studying pedestrian behavior in Melbourne, Australia, based on counts which are captured at hourly intervals by sensors scattered around the city. Faceting by the usual features such as day and month, was insufficient to examine the behavior. Making displays on a monthly calendar format helps to understand pedestrian patterns relative to events such as work days, weekends, holidays, and special events. The layout algorithm has several format options and variations. It is implemented in the R package sugrrants.

What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play

Machine learning is an important tool for decision making, but its ethical and responsible application requires rigorous vetting of its interpretability and utility: an understudied problem, particularly for natural language processing models. We design a task-specific evaluation for a question answering task and evaluate how well a model interpretation improves human performance in a human-machine cooperative setting. We evaluate interpretation methods in a grounded, realistic setting: playing a trivia game as a team. We also provide design guidance for natural language processing human-in-the-loop settings.

Online learning with feedback graphs and switching costs

We study online learning when partial feedback information is provided following every action of the learning process, and the learner incurs switching costs for changing his actions. In this setting, the feedback information system can be represented by a graph, and previous work provided the expected regret of the learner in the case of a clique (Expert setup), or disconnected single loops (Multi-Armed Bandits). We provide a lower bound on the expected regret in the partial information (PI) setting, namely for general feedback graphs —excluding the clique. We show that all algorithms that are optimal without switching costs are necessarily sub-optimal in the presence of switching costs, which motivates the need to design new algorithms in this setup. We propose two novel algorithms: Threshold Based EXP3 and EXP3.SC. For the two special cases of symmetric PI setting and Multi-Armed-Bandits, we show that the expected regret of both algorithms is order optimal in the duration of the learning process with a pre-constant dependent on the feedback system. Additionally, we show that Threshold Based EXP3 is order optimal in the switching cost, whereas EXP3.SC is not. Finally, empirical evaluations show that Threshold Based EXP3 outperforms previous algorithm EXP3 SET in the presence of switching costs, and Batch EXP3 in the special setting of Multi-Armed Bandits with switching costs, where both algorithms are order optimal.

numpywren: serverless linear algebra

Linear algebra operations are widely used in scientific computing and machine learning applications. However, it is challenging for scientists and data analysts to run linear algebra at scales beyond a single machine. Traditional approaches either require access to supercomputing clusters, or impose configuration and cluster management challenges. In this paper we show how the disaggregation of storage and compute resources in so-called ‘serverless’ environments, combined with compute-intensive workload characteristics, can be exploited to achieve elastic scalability and ease of management. We present numpywren, a system for linear algebra built on a serverless architecture. We also introduce LAmbdaPACK, a domain-specific language designed to implement highly parallel linear algebra algorithms in a serverless setting. We show that, for certain linear algebra algorithms such as matrix multiply, singular value decomposition, and Cholesky decomposition, numpywren’s performance (completion time) is within 33% of ScaLAPACK, and its compute efficiency (total CPU-hours) is up to 240% better due to elasticity, while providing an easier to use interface and better fault tolerance. At the same time, we show that the inability of serverless runtimes to exploit locality across the cores in a machine fundamentally limits their network efficiency, which limits performance on other algorithms such as QR factorization. This highlights how cloud providers could better support these types of computations through small changes in their infrastructure.

Bivariate modelling of precipitation and temperature: Bivariate modelling of precipitation and temperature using a non-homogeneous hidden Markov model

Aiming to generate realistic synthetic times series of the bivariate process of daily mean temperature and precipitations, we introduce a non-homogeneous hidden Markov model. The non-homogeneity lies in periodic transition probabilities between the hidden states, and time-dependent emission distributions. This enables the model to account for the non-stationary behaviour of weather variables. By carefully choosing the emission distributions, it is also possible to model the dependance structure between the two variables. The model is applied to several weather stations in Europe with various climates, and we show that it is able to simulate realistic bivariate time series.

Ain’t Nobody Got Time For Coding: Structure-Aware Program Synthesis From Natural Language

Program synthesis from natural language (NL) is practical for humans and, once technically feasible, would significantly facilitate software development and revolutionize end-user programming. We present SAPS, an end-to-end neural network capable of mapping relatively complex, multi-sentence NL specifications to snippets of executable code. The proposed architecture relies exclusively on neural components, and is built upon a tree2tree autoencoder trained on abstract syntax trees, combined with a pretrained word embedding and a bi-directional multi-layer LSTM for NL processing. The decoder features a doubly-recurrent LSTM with a novel signal propagation scheme and soft attention mechanism. When applied to a large dataset of problems proposed in a previous study, SAPS performs on par with or better than the method proposed there, producing correct programs in over 90% of cases. In contrast to other methods, it does not involve any non-neural components to post-process the resulting programs, and uses a fixed-dimensional latent representation as the only link between the NL analyzer and source code generator.

CEREALS – Cost-Effective REgion-based Active Learning for Semantic Segmentation

State of the art methods for semantic image segmentation are trained in a supervised fashion using a large corpus of fully labeled training images. However, gathering such a corpus is expensive, due to human annotation effort, in contrast to gathering unlabeled data. We propose an active learning-based strategy, called CEREALS, in which a human only has to hand-label a few, automatically selected, regions within an unlabeled image corpus. This minimizes human annotation effort while maximizing the performance of a semantic image segmentation method. The automatic selection procedure is achieved by: a) using a suitable information measure combined with an estimate about human annotation effort, which is inferred from a learned cost model, and b) exploiting the spatial coherency of an image. The performance of CEREALS is demonstrated on Cityscapes, where we are able to reduce the annotation effort to 17%, while keeping 95% of the mean Intersection over Union (mIoU) of a model that was trained with the fully annotated training set of Cityscapes.

Goodness-of-Fit Tests for Large Datasets

Nowadays, data analysis in the world of Big Data is connected typically to data mining, descriptive or exploratory statistics, e.~g.\ cluster analysis, classification or regression analysis. Aside these techniques there is a huge area of methods from inferential statistics that are rarely considered in connection with Big Data. Nevertheless, inferential methods are also of use for Big Data analysis, especially for quantifying uncertainty. The article at hand will provide some insights to methodological and technical issues referring inferential methods in the Big Data area in order to bring together Big Data and inferential statistics, as it comes along with its difficulties. We present an approach that allows testing goodness-of-fit without model assumptions and relying on the empirical distribution. Especially, the method is able to utilize information from large datasets. Thereby, the approach is based on a clear theoretical background. We concentrate on the widely-used Kolmogorov-Smirnov test that is applied for testing goodness-of-fit in statistics. Our approach can be parallelized easily, which makes it applicable to distributed datasets particularly on a compute cluster. By this contribution, we turn to an audience that is interested in the technical and methodological backgrounds while implementing especially inferential statistical methods with Big Data tools as e. g. Spark.

DCSVM: Fast Multi-class Classification using Support Vector Machines

We present DCSVM, an efficient algorithm for multi-class classification using Support Vector Machines. DCSVM is a divide and conquer algorithm which relies on data sparsity in high dimensional space and performs a smart partitioning of the whole training data set into disjoint subsets that are easily separable. A single prediction performed between two partitions eliminates at once one or more classes in one partition, leaving only a reduced number of candidate classes for subsequent steps. The algorithm continues recursively, reducing the number of classes at each step, until a final binary decision is made between the last two classes left in the competition. In the best case scenario, our algorithm makes a final decision between k classes in O(\log k) decision steps and in the worst case scenario DCSVM makes a final decision in k-1 steps, which is not worse than the existent techniques.

DropFilter: Dropout for Convolutions

Using a large number of parameters , deep neural networks have achieved remarkable performance on computer vison and natural language processing tasks. However the networks usually suffer from overfitting by using too much parameters. Dropout is a widely use method to deal with overfitting. Although dropout can significantly regularize densely connected layers in neural networks, it leads to suboptimal results when using for convolutional layers. To track this problem, we propose DropFilter, a new dropout method for convolutional layers. DropFilter randomly suppresses the outputs of some filters. Because it is observed that co-adaptions are more likely to occurs inter filters rather than intra filters in convolutional layers. Using DropFilter, we remarkably improve the performance of convolutional networks on CIFAR and ImageNet.

Deep Neural Network inference with reduced word length

Deep neural networks (DNN) are powerful models for many pattern recognition tasks, yet their high computational complexity and memory requirement limit them to applications on high-performance computing platforms. In this paper, we propose a new method to evaluate DNNs trained with 32bit floating point (float32) accuracy using only low precision integer arithmetics in combination with binary shift and clipping operations. Because hardware implementation of these operations is much simpler than high precision floating point calculation, our method can be used for an efficient DNN inference on dedicated hardware. In experiments on MNIST, we demonstrate that DNNs trained with float32 can be evaluated using a combination of 2bit integer arithmetics and a few float32 calculations in each layer or only 3bit integer arithmetics in combination with binary shift and clipping without significant performance degradation.

Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs

Google’s Cloud TPUs are a promising new hardware architecture for machine learning workloads. They have powered many of Google’s milestone machine learning achievements in recent years. Google has now made TPUs available for general use on their cloud platform and as of very recently has opened them up further to allow use by non-TensorFlow frontends. We describe a method and implementation for offloading suitable sections of Julia programs to TPUs via this new API and the Google XLA compiler. Our method is able to completely fuse the forward pass of a VGG19 model expressed as a Julia program into a single TPU executable to be offloaded to the device. Our method composes well with existing compiler-based automatic differentiation techniques on Julia code, and we are thus able to also automatically obtain the VGG19 backwards pass and similarly offload it to the TPU. Targeting TPUs using our compiler, we are able to evaluate the VGG19 forward pass on a batch of 100 images in 0.23s which compares favorably to the 52.4s required for the original model on the CPU. Our implementation is less than 1000 lines of Julia, with no TPU specific changes made to the core Julia compiler or any other Julia packages.

Feasibility of Supervised Machine Learning for Cloud Security

Cloud computing is gaining significant attention, however, security is the biggest hurdle in its wide acceptance. Users of cloud services are under constant fear of data loss, security threats and availability issues. Recently, learning-based methods for security applications are gaining popularity in the literature with the advents in machine learning techniques. However, the major challenge in these methods is obtaining real-time and unbiased datasets. Many datasets are internal and cannot be shared due to privacy issues or may lack certain statistical characteristics. As a result of this, researchers prefer to generate datasets for training and testing purpose in the simulated or closed experimental environments which may lack comprehensiveness. Machine learning models trained with such a single dataset generally result in a semantic gap between results and their application. There is a dearth of research work which demonstrates the effectiveness of these models across multiple datasets obtained in different environments. We argue that it is necessary to test the robustness of the machine learning models, especially in diversified operating conditions, which are prevalent in cloud scenarios. In this work, we use the UNSW dataset to train the supervised machine learning models. We then test these models with ISOT dataset. We present our results and argue that more research in the field of machine learning is still required for its applicability to the cloud security.

Heterogeneous large datasets integration using Bayesian factor regression

Two key challenges in modern statistical applications are the large amount of information recorded per individual, and that such data are often not collected all at once but in batches. These batch effects can be complex, causing distortions in both mean and variance. We propose a novel sparse latent factor regression model to integrate such heterogeneous data. The model provides a tool for data exploration via dimensionality reduction while correcting for a range of batch effects. We study the use of several sparse priors (local and non-local) to learn the dimension of the latent factors. Our model is fitted in a deterministic fashion by means of an EM algorithm for which we derive closed-form updates, contributing a novel scalable algorithm for non-local priors of interest beyond the immediate scope of this paper. We present several examples, with a focus on bioinformatics applications. Our results show an increase in the accuracy of the dimensionality reduction, with non-local priors substantially improving the reconstruction of factor cardinality, as well as the need to account for batch effects to obtain reliable results. Our model provides a novel approach to latent factor regression that balances sparsity with sensitivity and is highly computationally efficient.

Dynamic Likelihood-free Inference via Ratio Estimation (DIRE)

Parametric statistical models that are implicitly defined in terms of a stochastic data generating process are used in a wide range of scientific disciplines because they enable accurate modeling. However, learning the parameters from observed data is generally very difficult because their likelihood function is typically intractable. Likelihood-free Bayesian inference methods have been proposed which include the frameworks of approximate Bayesian computation (ABC), synthetic likelihood, and its recent generalization that performs likelihood-free inference by ratio estimation (LFIRE). A major difficulty in all these methods is choosing summary statistics that reduce the dimensionality of the data to facilitate inference. While several methods for choosing summary statistics have been proposed for ABC, the literature for synthetic likelihood and LFIRE is very thin to date. We here address this gap in the literature, focusing on the important special case of time-series models. We show that convolutional neural networks trained to predict the input parameters from the data provide suitable summary statistics for LFIRE. On a wide range of time-series models, a single neural network architecture produced equally or more accurate posteriors than alternative methods.

Clustering Time Series with Nonlinear Dynamics: A Bayesian Non-Parametric and Particle-Based Approach

We propose a statistical framework for clustering multiple time series that exhibit nonlinear dynamics into an a-priori-unknown number of sub-groups that each comprise time series with similar dynamics. Our motivation comes from neuroscience where an important problem is to identify, within a large assembly of neurons, sub-groups that respond similarly to a stimulus or contingency. In the neural setting, conditioned on cluster membership and the parameters governing the dynamics, time series within a cluster are assumed independent and generated according to a nonlinear binomial state-space model. We derive a Metropolis-within-Gibbs algorithm for full Bayesian inference that alternates between sampling of cluster membership and sampling of parameters of interest. The Metropolis step is a PMMH iteration that requires an unbiased, low variance estimate of the likelihood function of a nonlinear state-space model. We leverage recent results on controlled sequential Monte Carlo to estimate likelihood functions more efficiently compared to the bootstrap particle filter. We apply the framework to time series acquired from the prefrontal cortex of mice in an experiment designed to characterize the neural underpinnings of fear.

Brand > Logo: Visual Analysis of Fashion Brands

While lots of people may think branding begins and ends with a logo, fashion brands communicate their uniqueness through a wide range of visual cues such as color, patterns and shapes. In this work, we analyze learned visual representations by deep networks that are trained to recognize fashion brands. In particular, the activation strength and extent of neurons are studied to provide interesting insights about visual brand expressions. The proposed method identifies where a brand stands in the spectrum of branding strategy, i.e., from trademark-emblazoned goods with bold logos to implicit no logo marketing. By quantifying attention maps, we are able to interpret the visual characteristics of a brand present in a single image and model the general design direction of a brand as a whole. We further investigate versatility of neurons and discover ‘specialists’ that are highly brand-specific and ‘generalists’ that detect diverse visual features. A human experiment based on three main visual scenarios of fashion brands is conducted to verify the alignment of our quantitative measures with the human perception of brands. This paper demonstrate how deep networks go beyond logos in order to recognize clothing brands in an image.

Preprocessor Selection for Machine Learning Pipelines

Much of the work in metalearning has focused on classifier selection, combined more recently with hyperparameter optimization, with little concern for data preprocessing. Yet, it is generally well accepted that machine learning applications require not only model building, but also data preprocessing. In other words, practical solutions consist of pipelines of machine learning operators rather than single algorithms. Interestingly, our experiments suggest that, on average, data preprocessing hinders accuracy, while the best performing pipelines do actually make use of preprocessors. Here, we conduct an extensive empirical study over a wide range of learning algorithms and preprocessors, and use metalearning to determine when one should make use of preprocessors in ML pipeline design.

Meta-Learning Multi-task Communication

In this paper, we describe a general framework: Parameters Read-Write Networks (PRaWNs) to systematically analyze current neural models for multi-task learning, in which we find that existing models expect to disentangle features into different spaces while features learned in practice are still entangled in shared space, leaving potential hazards for other training or unseen tasks. We propose to alleviate this problem by incorporating an inductive bias into the process of multi-task learning, that each task can keep informed of not only the knowledge stored in other tasks but the way how other tasks maintain their knowledge. In practice, we achieve above inductive bias by allowing different tasks to communicate by passing both hidden variables and gradients explicitly. Experimentally, we evaluate proposed methods on three groups of tasks and two types of settings (\textsc{in-task} and \textsc{out-of-task}). Quantitative and qualitative results show their effectiveness.

Computation Scheduling for Distributed Machine Learning with Straggling Workers

We study the scheduling of computation tasks across n workers in a large scale distributed learning problem. Computation speeds of the workers are assumed to be heterogeneous and unknown to the master, and redundant computations are assigned to workers in order to tolerate straggling workers. We consider sequential computation and instantaneous communication from each worker to the master, and each computation round, which can model a single iteration of the stochastic gradient descent algorithm, is completed once the master receives k distinct computations from the workers. Our goal is to characterize the average completion time as a function of the computation load, which denotes the portion of the dataset available at each worker. We propose two computation scheduling schemes that specify the computation tasks assigned to each worker, as well as their computation schedule, i.e., the order of execution, and derive the corresponding average completion time in closed-form. We also establish a lower bound on the minimum average completion time. Numerical results show a significant reduction in the average completion time over existing coded computing schemes, which are designed to mitigate straggling servers, but often ignore computations of non-persistent stragglers, as well as uncoded computing schemes. Furthermore, it is shown numerically that when the speeds of different workers are relatively skewed, the gap between the upper and lower bounds is relatively small. The reduction in the average completion time is obtained at the expense of increased communication from the workers to the master. We have studied the resulting trade-off by comparing the average number of distinct computations sent from the workers to the master for each scheme, defined as the communication load.

Bayesian Model Search for Nonstationary Periodic Time Series

We propose a novel Bayesian methodology for analyzing nonstationary time series that exhibit oscillatory behaviour. We approximate the time series using a piecewise oscillatory model with unknown periodicities, where our goal is to estimate the change-points while simultaneously identifying the potentially changing periodicities in the data. Our proposed methodology is based on a trans-dimensional Markov chain Monte Carlo (MCMC) algorithm that simultaneously updates the change-points and the periodicities relevant to any segment between them. We show that the proposed methodology successfully identifies time changing oscillatory behaviour in two applications which are relevant to e-Health and sleep research, namely the occurrence of ultradian oscillations in human skin temperature during the time of night rest, and the detection of instances of sleep apnea in plethysmographic respiratory traces.

Improving Stock Movement Prediction with Adversarial Training
A Scalable, Flexible Augmentation of the Student Education Process
On the ability of discontinuous Galerkin methods to simulate under-resolved turbulent flows
Resonant Inductive Coupling as a Potential Means for Wireless Power Transfer to Printed Spiral Coil
On Fractional Annealing Process
Non-data-aided SNR Estimation for QPSK Modulation in AWGN Channel
Estimating the Number of Sources: An Efficient Maximization Approach
Triad-NVM: Persistent-Security for Integrity-Protected and Encrypted Non-Volatile Memories (NVMs)
Data models for service failure prediction in supply-chain networks
Deep multi-survey classification of variable stars
On simultaneous conjugation of permutations
OS Scheduling Algorithms for Improving the Performance of Multithreaded Workloads
Mechanism Design for Social Good
Digital holographic particle volume reconstruction using a deep neural network
Controllability and maximum matchings of complex networks
Atomic Characterizations of Weak Martingale Musielak–Orlicz Hardy Spaces and Their Applications
A Method for Robust Online Classification using Dictionary Learning: Development and Assessment for Monitoring Manual Material Handling Activities Using Wearable Sensors
Health Monitoring of Critical Power System Equipments using Identifying Codes
Highly accurate acoustic scattering: Isogeometric Analysis coupled with local high order Farfield Expansion ABC
A jamming transition from under- to over-parametrization affects loss landscape and generalization
Spectral operators of matrices: semismoothness and characterizations of the generalized Jacobian
Single Image Haze Removal using a Generative Adversarial Network
Scaling Up Cartesian Genetic Programming through Preferential Selection of Larger Solutions
Diagnostic Accuracy of Content Based Dermatoscopic Image Retrieval with Deep Classification Features
Hierarchical multi-class segmentation of glioma images using networks with multi-level activation function
A Central Limit Theorem for the stochastic heat equation
Two view constraints on the epipoles from few correspondences
Comparing Two Approaches in Heteroscedastic Regression Models
A Comparative Study of Fruit Detection and Counting Methods for Yield Mapping in Apple Orchards
A Family of Statistical Divergences Based on Quasiarithmetic Means
Automatically Detecting Self-Reported Birth Defect Outcomes on Twitter for Large-scale Epidemiological Research
Monitoring & Mitigation of Delayed Voltage Recovery using μPMU Measurements with Reduced Distribution System Model
Monitoring Long Term Voltage Instability due to Distribution & Transmission Interaction using Unbalanced μPMU & PMU Measurements
Universal origin of boson peak vibrational anomalies in ordered crystals and in amorphous materials
Adversarial Risk Bounds for Binary Classification via Function Transformation
Multivariate stable distributions and their applications for modelling cryptocurrency-returns
Selection of BJI configuration: Approach based on minimal transversals
Non-equilibrium Fluctuations of Interacting Particle Systems
A Weakly Supervised Approach for Estimating Spatial Density Functions from High-Resolution Satellite Imagery
Recovery, detection and confidence sets of communities in a sparse stochastic block model
Enhanced Representative Days and System States Modeling for Energy Storage Investment Analysis
On local time at time varying curve
Introducing Curvature to the Label Space
A switch convergence for a small perturbation of a linear recurrence equation
Average group effect of strongly correlated predictor variables is estimable
Learning Probabilistic Trajectory Models of Aircraft in Terminal Airspace from Position Data
Perturbation Bounds for Procrustes, Classical Scaling, and Trilateration, with Applications to Manifold Learning
Secondary voltage control for microgrids using nonlinear multiple models adaptive control with unmodeled dynamics
Bioresorbable Scaffold Visualization in IVOCT Images Using CNNs and Weakly Supervised Localization
Two-path 3D CNNs for calibration of system parameters for OCT-based motion compensation
Martingale theory for housekeeping heat
The Lives of Bots
MiME: Multilevel Medical Embedding of Electronic Health Records for Predictive Healthcare
Biomedical Document Clustering and Visualization based on the Concepts of Diseases
Explainable artificial intelligence (XAI), the goodness criteria and the grasp-ability test
Zero temperature limit for the Brownian directed polymer among Poissonian disasters
Malleability of complex networks
Neural Transition-based Syntactic Linearization
How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval
Sparse DNNs with Improved Adversarial Robustness
Deep Neural Ranking for Crowdsourced Geopolitical Event Forecasting
A Neural Compositional Paradigm for Image Captioning
Point-cloud-based place recognition using CNN feature extraction
One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy
Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space
Face Recognition from Sequential Sparse 3D data via Deep Registration
Large scale visual place recognition with sub-linear storage growth
Two-way Function Computation
Consistency of the total least squares estimator in the linear errors-in-variables regression
Cliquet option pricing in a jump-diffusion Lévy model
Capacity Degradation with Modeling Hardware Impairment in Large Intelligent Surface
Action-Agnostic Human Pose Forecasting
A Local Limit Theorem for Robbins-Monro Procedure
On a bound of the absolute constant in the Berry–Esseen inequality for i.i.d. Bernoulli random variables
Unsupervised Features Extraction for Binary Similarity Using Graph Embedding Neural Networks
Direct experimental determination of critical disorder in one-dimensional weakly disordered photonic crystals
Challenges of Convex Quadratic Bi-objective Benchmark Problems
The Key Player Problem in Complex Oscillator Networks and Electric Power Grids: Resistance Centralities Identify Local Vulnerabilities
The Interpretation of Linear Prediction by Interpolation Framework and Two General Constructive Methods
Semi-supervised acoustic model training for speech with code-switching
Consistency-aware Shading Orders Selective Fusion for Intrinsic Image Decomposition
On the difference-to-sum power ratio of speech and wind noise based on the Corcos model
Finding Appropriate Traffic Regulations via Graph Convolutional Networks
Color naming guided intrinsic image decomposition
On the tree cover number and the positive semidefinite maximum nullity of a graph
Design Challenges of Multi-UAV Systems in Cyber-Physical Applications: A Comprehensive Survey, and Future Directions
A Generalization of Smillie’s Theorem on Strongly Cooperative Tridiagonal Systems
OCAPIS: R package for Ordinal Classification And Preprocessing In Scala
Domain Adaptive Segmentation in Volume Electron Microscopy Imaging
Convolutional Neural Network Pruning to Accelerate Membrane Segmentation in Electron Microscopy
More on rainbow disconnection in graphs
Bayesian Deconvolution of Scanning Electron Microscopy Images Using Point-spread Function Estimation and Non-local Regularization
A generalization of Noel-Reed-Wu Theorem to signed graphs
On the Secrecy Unicast Throughput Performance of NOMA Assisted Multicast-Unicast Streaming With Partial Channel Information
On PAC-Bayesian Bounds for Random Forests
Objective Bayesian Comparison of Order-Constrained Models in Contingency Tables
Analysis of Atomistic Representations Using Weighted Skip-Connections
Estimation of Spatial-Temporal Gait Parameters based on the Fusion of Inertial and Film-Pressure Signals
Adaptation Bounds for Confidence Bands under Self-Similarity
Monochromatic combinatorial lines of length three
High Performance Computing with FPGAs and OpenCL
Neural Network Models for Natural Language Inference Fail to Capture the Semantics of Inference
Visual Semantic Re-ranker for Text Spotting
Optimal Analysis of Discrete-time Affine Systems
Algebraic Localization from Power-Law Interactions in Disordered Quantum Wires
Heuristics-based Query Reordering for Federated Queries in SPARQL 1.1 and SPARQL-LD
A Social Network Analysis of Articles on Social Network Analysis
Fixing Match-Fixing
A method to search for long duration gravitational wave transients from isolated neutron stars using the generalized FrequencyHough
SING: Symbol-to-Instrument Neural Generator
Characteristic Functionals of Dirichlet Measures
A proof of the Shepp-Olkin entropy monotonicity conjecture
On the bilinear control of the Gross-Pitaevskii equation
Operational Methods in the Study of Sobolev-Jacobi Polynomials
Heading in the right direction? Using head moves to traverse phylogenetic network space
Expression Recognition Using the Periocular Region: A Feasibility Study
Improving Automated Latent Fingerprint Identification using Extended Minutia Types
A Community Microgrid Architecture with an Internal Local Market
Action and intention recognition of pedestrians in urban traffic
PreCo: A Large-scale Dataset in Preschool Vocabulary for Coreference Resolution
Multivariate Locally Stationary Wavelet Process Analysis with the mvLSW R Package
Fruit and Vegetable Identification Using Machine Learning for Retail Applications
Hybrid Beamforming With Sub-arrayed MIMO Radar: Enabling Joint Sensing and Communication at mmWave Band
Learning Optimal Scheduling Policy for Remote State Estimation under Uncertain Channel Condition
Self-Erasing Network for Integral Object Attention
Capacitated Assortment Optimization with Pricing under the Paired Combinatorial Logit Model
LincoSim: a web based HPC-cloud platform for automatic virtual towing tank analysis
On orthogonal symmetric chain decompositions
Persistence exponents via perturbation theory: AR(1)-processes
Non-convex approach to binary compressed sensing
Scaling of the Sasamoto-Spohn model in equilibrium
Perfect Codes for Generalized Deletions from Minuscule Elements of Weyl Groups
A predictive processing model of perception and action for self-other distinction
Empirical Regularized Optimal Transport: Statistical Theory and Applications
Term structure modeling for multiple curves with stochastic discontinuities
Positional strategies in games of best choice
Linear Receivers in Non-stationary Massive MIMO Channels with Visibility Regions
Asymptotic Theory of Bayes Factor for Nonparametric Model and Variable Selection in the Gaussian Process Framework
Impedance/Admittance Modeling of Three-Phase AC Systems: A General Framework
Efficient Bayesian Experimental Design for Implicit Models
Connectivity of inhomogeneous random K-out graphs
Learning Classical Planning Strategies with Policy Gradient
Expansion of coset graphs of PSL_2(F_p)
Machine Learning Accelerated Likelihood-Free Event Reconstruction in Dark Matter Direct Detection
Random Bernstein-Markov factors
Dynamics of Order Parameters of Non-stoquastic Hamiltonians in the Adaptive Quantum Monte Carlo Method
Interpretable LSTMs For Whole-Brain Neuroimaging Analyses
Object-oriented lexical encoding of multiword expressions: Short and sweet
Stepwise Acquisition of Dialogue Act Through Human-Robot Interaction
GhostVLAD for set-based face recognition
Agent-Based Modeling and Simulation of Connected and Automated Vehicles Using Game Engine: A Cooperative On-Ramp Merging Study
Sharply $k$-arc-transitive-digraphs: finite and infinite examples
Social Status and Communication Behavior in an Evolving Social Network
Using Deep Learning for price prediction by exploiting stationary limit order book features
Efficient Eligibility Traces for Deep Reinforcement Learning
Algorithmic Traversals of Infinite Graphs
Learning First-to-Spike Policies for Neuromorphic Control Using Policy Gradients
A Systematic Framework and Characterization of Influence-Based Network Centrality
Automated Reasoning in Normative Detachment Structures with Ideal Conditions
Deep Graph Convolutional Encoders for Structured Data to Text Generation