Nonparametric Neural Networks

Automatically determining the optimal size of a neural network for a given task without prior information currently requires an expensive global search and training many networks from scratch. In this paper, we address the problem of automatically finding a good network size during a single training cycle. We introduce *nonparametric neural networks*, a non-probabilistic framework for conducting optimization over all possible network sizes and prove its soundness when network growth is limited via an L_p penalty. We train networks under this framework by continuously adding new units while eliminating redundant units via an L_2 penalty. We employ a novel optimization algorithm, which we term *adaptive radial-angular gradient descent* or *AdaRad*, and obtain promising results.

AI2-THOR: An Interactive 3D Environment for Visual AI

We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning, learning by interaction, planning, visual question answering, unsupervised representation learning, object detection and segmentation, and learning models of cognition. The goal of AI2-THOR is to facilitate building visually intelligent models and push the research forward in this domain.

Graph-Sparse Logistic Regression

We introduce Graph-Sparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val- idate this algorithm against synthetic data and benchmark it against L1-regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experimental code public and provide GSLR as an open source package.

Inverse Reinforce Learning with Nonparametric Behavior Clustering

Inverse Reinforcement Learning (IRL) is the task of learning a single reward function given a Markov Decision Process (MDP) without defining the reward function, and a set of demonstrations generated by humans/experts. However, in practice, it may be unreasonable to assume that human behaviors can be explained by one reward function since they may be inherently inconsistent. Also, demonstrations may be collected from various users and aggregated to infer and predict user’s behaviors. In this paper, we introduce the Non-parametric Behavior Clustering IRL algorithm to simultaneously cluster demonstrations and learn multiple reward functions from demonstrations that may be generated from more than one behaviors. Our method is iterative: It alternates between clustering demonstrations into different behavior clusters and inverse learning the reward functions until convergence. It is built upon the Expectation-Maximization formulation and non-parametric clustering in the IRL setting. Further, to improve the computation efficiency, we remove the need of completely solving multiple IRL problems for multiple clusters during the iteration steps and introduce a resampling technique to avoid generating too many unlikely clusters. We demonstrate the convergence and efficiency of the proposed method through learning multiple driver behaviors from demonstrations generated from a grid-world environment and continuous trajectories collected from autonomous robot cars using the Gazebo robot simulator.

A Workload-Specific Memory Capacity Configuration Approach for In-Memory Data Analytic Platforms

We propose WSMC, a workload-specific memory capacity configuration approach for the Spark workloads, which guides users on the memory capacity configuration with the accurate prediction of the workload’s memory requirement under various input data size and parameter settings.First, WSMC classifies the in-memory computing workloads into four categories according to the workloads’ Data Expansion Ratio. Second, WSMC establishes a memory requirement prediction model with the consideration of the input data size, the shuffle data size, the parallelism of the workloads and the data block size. Finally, for each workload category, WSMC calculates the shuffle data size in the prediction model in a workload-specific way. For the ad-hoc workload, WSMC can profile its Data Expansion Ratio with small-sized input data and decide the category that the workload falls into. Users can then determine the accurate configuration in accordance with the corresponding memory requirement prediction.Through the comprehensive evaluations with SparkBench workloads, we found that, contrasting with the default configuration, configuration with the guide of WSMC can save over 40% memory capacity with the workload performance slight degradation (only 5%), and compared to the proper configuration found out manually, the configuration with the guide of WSMC leads to only 7% increase in the memory waste with the workload’s performance slight improvement (about 1%)

Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval

Semantic similarity based retrieval is playing an increasingly important role in many IR systems such as modern web search, question-answering, similar document retrieval etc. Improvements in retrieval of semantically similar content are very significant to applications like Quora, Stack Overflow, Siri etc. We propose a novel unsupervised model for semantic similarity based content retrieval, where we construct semantic flow graphs for each query, and introduce the concept of ‘soft seeding’ in graph based semi-supervised learning (SSL) to convert this into an unsupervised model. We demonstrate the effectiveness of our model on an equivalent question retrieval problem on the Stack Exchange QA dataset, where our unsupervised approach significantly outperforms the state-of-the-art unsupervised models, and produces comparable results to the best supervised models. Our research provides a method to tackle semantic similarity based retrieval without any training data, and allows seamless extension to different domain QA communities, as well as to other semantic equivalence tasks.

BT-Nets: Simplifying Deep Neural Networks via Block Term Decomposition

Recently, deep neural networks (DNNs) have been regarded as the state-of-the-art classification methods in a wide range of applications, especially in image classification. Despite the success, the huge number of parameters blocks its deployment to situations with light computing resources. Researchers resort to the redundancy in the weights of DNNs and attempt to find how fewer parameters can be chosen while preserving the accuracy at the same time. Although several promising results have been shown along this research line, most existing methods either fail to significantly compress a well-trained deep network or require a heavy fine-tuning process for the compressed network to regain the original performance. In this paper, we propose the \textit{Block Term} networks (BT-nets) in which the commonly used fully-connected layers (FC-layers) are replaced with block term layers (BT-layers). In BT-layers, the inputs and the outputs are reshaped into two low-dimensional high-order tensors, then block-term decomposition is applied as tensor operators to connect them. We conduct extensive experiments on benchmark datasets to demonstrate that BT-layers can achieve a very large compression ratio on the number of parameters while preserving the representation power of the original FC-layers as much as possible. Specifically, we can get a higher performance while requiring fewer parameters compared with the tensor train method.

Sockeye: A Toolkit for Neural Machine Translation

We describe Sockeye (version 1.12), an open-source sequence-to-sequence toolkit for Neural Machine Translation (NMT). Sockeye is a production-ready framework for training and applying models as well as an experimental platform for researchers. Written in Python and built on MXNet, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attentional recurrent neural networks, self-attentional transformers, and fully convolutional networks. Sockeye also supports a wide range of optimizers, normalization and regularization techniques, and inference improvements from current NMT literature. Users can easily run standard training recipes, explore different model settings, and incorporate new ideas. In this paper, we highlight Sockeye’s features and benchmark it against other NMT toolkits on two language arcs from the 2017 Conference on Machine Translation (WMT): English-German and Latvian-English. We report competitive BLEU scores across all three architectures, including an overall best score for Sockeye’s transformer implementation. To facilitate further comparison, we release all system outputs and training scripts used in our experiments. The Sockeye toolkit is free software released under the Apache 2.0 license.

Lightweight Neural Networks

Most of the weights in a Lightweight Neural Network have a value of zero, while the remaining ones are either +1 or -1. These universal approximators require approximately 1.1 bits/weight of storage, posses a quick forward pass and achieve classification accuracies similar to conventional continuous-weight networks. Their training regimen focuses on error reduction initially, but later emphasizes discretization of weights. They ignore insignificant inputs, remove unnecessary weights, and drop unneeded hidden neurons. We have successfully tested them on the MNIST, credit card fraud, and credit card defaults data sets using networks having 2 to 16 hidden layers and up to 4.4 million weights.

A Theoretical Framework for Bayesian Nonparametric Regression: Orthonormal Random Series and Rates of Contraction

We develop a unifying framework for Bayesian nonparametric regression to study the rates of contraction with respect to the integrated L_2-distance without assuming the regression function space to be uniformly bounded. The framework is built upon orthonormal random series in a flexible manner. A general theorem for deriving rates of contraction for Bayesian nonparametric regression is provided under the proposed framework. As specific applications, we obtain the near-parametric rate of contraction for the squared-exponential Gaussian process when the true function is analytic, the adaptive rates of contraction for the sieve prior, and the adaptive-and-exact rates of contraction for the un-modified block prior when the true function is {\alpha}-smooth. Extensions to wavelet series priors and fixed-design regression problems are also discussed.

Multiple Changepoint Estimation in High-Dimensional Gaussian Graphical Models

We consider the consistency properties of a regularised estimator for the simultaneous identification of both changepoints and graphical dependency structure in multivariate time-series. Traditionally, estimation of Gaussian Graphical Models (GGM) is performed in an i.i.d setting. More recently, such models have been extended to allow for changes in the distribution, but only where changepoints are known a-priori. In this work, we study the Group-Fused Graphical Lasso (GFGL) which penalises partial-correlations with an L1 penalty while simultaneously inducing block-wise smoothness over time to detect multiple changepoints. We present a proof of consistency for the estimator, both in terms of changepoints, and the structure of the graphical models in each segment.

Realistic Traffic Generation for Web Robots

Critical to evaluating the capacity, scalability, and availability of web systems are realistic web traffic generators. Web traffic generation is a classic research problem, no generator accounts for the characteristics of web robots or crawlers that are now the dominant source of traffic to a web server. Administrators are thus unable to test, stress, and evaluate how their systems perform in the face of ever increasing levels of web robot traffic. To resolve this problem, this paper introduces a novel approach to generate synthetic web robot traffic with high fidelity. It generates traffic that accounts for both the temporal and behavioral qualities of robot traffic by statistical and Bayesian models that are fitted to the properties of robot traffic seen in web logs from North America and Europe. We evaluate our traffic generator by comparing the characteristics of generated traffic to those of the original data. We look at session arrival rates, inter-arrival times and session lengths, comparing and contrasting them between generated and real traffic. Finally, we show that our generated traffic affects cache performance similarly to actual traffic, using the common LRU and LFU eviction policies.

Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering

We present and analyze a new framework for graph clustering based on a specially weighted version of correlation clustering, that unifies several existing objectives and satisfies a number of attractive theoretical properties. Our framework, which we call LambdaCC, relies on a single resolution parameter \lambda, which implicitly controls both the edge density and sparsest cut of all output clusters. We prove that our new clustering objective interpolates between the cluster deletion problem and the minimum sparsest cut problem as we vary \lambda, and is also closely related to the well-studied maximum modularity objective. We provide several algorithms for optimizing our new objective, including a 5-approximation for the case where \lambda \geq 1/2, and also the first constant factor approximation algorithm for the NP-hard cluster deletion problem. We demonstrate the effectiveness of our framework and algorithms in finding communities in several real-world networks.

A Berkeley View of Systems Challenges for AI

With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production. These changes have been made possible by unprecedented levels of data and computation, by methodological advances in machine learning, by innovations in systems software and architectures, and by the broad accessibility of these technologies. The next generation of AI systems promises to accelerate these developments and increasingly impact our lives via frequent interactions and making (often mission-critical) decisions on our behalf, often in highly personalized contexts. Realizing this promise, however, raises daunting challenges. In particular, we need AI systems that make timely and safe decisions in unpredictable environments, that are robust against sophisticated adversaries, and that can process ever increasing amounts of data across organizations and individuals without compromising confidentiality. These challenges will be exacerbated by the end of the Moore’s Law, which will constrain the amount of data these technologies can store and process. In this paper, we propose several open research directions in systems, architectures, and security that can address these challenges and help unlock AI’s potential to improve lives and society.

Bubble-Flip—A New Generation Algorithm for Prefix Normal Words

We present a new recursive generation algorithm for prefix normal words. These are binary strings with the property that no substring has more 1s than the prefix of the same length. The new algorithm uses two operations on binary strings, which exploit certain properties of prefix normal words in a smart way. We introduce infinite prefix normal words and show that one of the operations used by the algorithm, if applied repeatedly to extend the string, produces an ultimately periodic infinite word, which is prefix normal and whose period’s length and density we can predict from the original word.

Ray: A Distributed Framework for Emerging AI Applications

The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this paper, we consider these requirements and present Ray—a distributed system to address them. Ray implements a dynamic task graph computation model that supports both the task-parallel and the actor programming models. To meet the performance requirements of AI applications, we propose an architecture that logically centralizes the system’s control state using a sharded storage system and a novel bottom-up distributed scheduler. In our experiments, we demonstrate sub-millisecond remote task latencies and linear throughput scaling beyond 1.8 million tasks per second. We empirically validate that Ray speeds up challenging benchmarks and serves as both a natural and performant fit for an emerging class of reinforcement learning applications and algorithms.

NSML: A Machine Learning Platform That Enables You to Focus on Your Models

Machine learning libraries such as TensorFlow and PyTorch simplify model implementation. However, researchers are still required to perform a non-trivial amount of manual tasks such as GPU allocation, training status tracking, and comparison of models with different hyperparameter settings. We propose a system to handle these tasks and help researchers focus on models. We present the requirements of the system based on a collection of discussions from an online study group comprising 25k members. These include automatic GPU allocation, learning status visualization, handling model parameter snapshots as well as hyperparameter modification during learning, and comparison of performance metrics between models via a leaderboard. We describe the system architecture that fulfills these requirements and present a proof-of-concept implementation, NAVER Smart Machine Learning (NSML). We test the system and confirm substantial efficiency improvements for model development.

A Stochastic Programming Approach for Risk Management in Mobile Cloud Computing

The development of mobile cloud computing has brought many benefits to mobile users as well as cloud service providers. However, mobile cloud computing is facing some challenges, especially security-related problems due to the growing number of cyberattacks which can cause serious losses. In this paper, we propose a dynamic framework together with advanced risk management strategies to minimize losses caused by cyberattacks to a cloud service provider. In particular, this framework allows the cloud service provider to select appropriate security solutions, e.g., security software/hardware implementation and insurance policies, to deal with different types of attacks. Furthermore, the stochastic programming approach is adopted to minimize the expected total loss for the cloud service provider under its financial capability and uncertainty of attacks and their potential losses. Through numerical evaluation, we show that our approach is an effective tool in not only dealing with cyberattacks under uncertainty, but also minimizing the total loss for the cloud service provider given its available budget.

Train Once, Test Anywhere: Zero-Shot Learning for Text Classification

Zero-shot Learners are models capable of predicting unseen classes. In this work, we propose a Zero-shot Learning approach for text categorization. Our method involves training model on a large corpus of sentences to learn the relationship between a sentence and embedding of sentence’s tags. Learning such relationship makes the model generalize to unseen sentences, tags, and even new datasets provided they can be put into same embedding space. The model learns to predict whether a given sentence is related to a tag or not; unlike other classifiers that learn to classify the sentence as one of the possible classes. We propose three different neural networks for the task and report their accuracy on the test set of the dataset used for training them as well as two other standard datasets for which no retraining was done. We show that our models generalize well across new unseen classes in both cases. Although the models do not achieve the accuracy level of the state of the art supervised models, yet it evidently is a step forward towards general intelligence in natural language processing.

Wasserstein Distributional Robustness and Regularization in Statistical Learning

A central question in statistical learning is to design algorithms that not only perform well on training data, but also generalize to new and unseen data. In this paper, we tackle this question by formulating a distributionally robust stochastic optimization (DRSO) problem, which seeks a solution that minimizes the worst-case expected loss over a family of distributions that are close to the empirical distribution in Wasserstein distances. We establish a connection between such Wasserstein DRSO and regularization. More precisely, we identify a broad class of loss functions, for which the Wasserstein DRSO is asymptotically equivalent to a regularization problem with a gradient-norm penalty. Such relation provides new interpretations for problems involving regularization, including a great number of statistical learning problems and discrete choice models (e.g. multinomial logit). The connection suggests a principled way to regularize high-dimensional, non-convex problems. This is demonstrated through two applications: the training of Wasserstein generative adversarial networks (WGANs) in deep learning, and learning heterogeneous consumer preferences with mixed logit choice model.

TensorFlow-Serving: Flexible, High-Performance ML Serving

We describe TensorFlow-Serving, a system to serve machine learning models inside Google which is also available in the cloud and via open-source. It is extremely flexible in terms of the types of ML platforms it supports, and ways to integrate with systems that convey new models and updated versions from training to serving. At the same time, the core code paths around model lookup and inference have been carefully optimized to avoid performance pitfalls observed in naive implementations. Google uses it in many production deployments, including a multi-tenant model hosting service called TFS.

Causal Inference: A Missing Data Perspective

Inferring causal effects of treatments is a central goal in many disciplines. The potential outcomes framework is a main statistical approach to causal inference, in which a causal effect is defined as a comparison of the potential outcomes of the same units under different treatment conditions. Because for each unit at most one of the potential outcomes is observed and the rest are missing, causal inference is inherently a missing data problem. Indeed, there is a close analogy in the terminology and the inferential framework between causal inference and missing data. Despite the intrinsic connection between the two subjects, statistical analyses of causal inference and missing data also have marked differences in aims, settings and methods. This article provides a systematic review of causal inference from the missing data perspective. Focusing on ignorable treatment assignment mechanisms, we discuss a wide range of causal inference methods that have analogues in missing data analysis, such as imputation, inverse probability weighting and doubly-robust methods. Under each of the three modes of inference–Frequentist, Bayesian, and Fisherian randomization–we present the general structure of inference for both finite-sample and super-population estimands, and illustrate via specific examples. We identify open questions to motivate more research to bridge the two fields.

Deep Neural Networks as 0-1 Mixed Integer Linear Programs: A Feasibility Study

Deep Neural Networks (DNNs) are very popular these days, and are the subject of a very intense investigation. A DNN is made by layers of internal units (or neurons), each of which computes an affine combination of the output of the units in the previous layer, applies a nonlinear operator, and outputs the corresponding value (also known as activation). A commonly-used nonlinear operator is the so-called rectified linear unit (ReLU), whose output is just the maximum between its input value and zero. In this (and other similar cases like max pooling, where the max operation involves more than one input value), one can model the DNN as a 0-1 Mixed Integer Linear Program (0-1 MILP) where the continuous variables correspond to the output values of each unit, and a binary variable is associated with each ReLU to model its yes/no nature. In this paper we discuss the peculiarity of this kind of 0-1 MILP models, and describe an effective bound-tightening technique intended to ease its solution. We also present possible applications of the 0-1 MILP model arising in feature visualization and in the construction of adversarial examples. Preliminary computational results are reported, aimed at investigating (on small DNNs) the computational performance of a state-of-the-art MILP solver when applied to a known test case, namely, hand-written digit recognition.

A Survey on Multi-View Clustering

With the fast development of information technology, especially the popularization of internet, multi-view learning becomes more and more popular in machine learning and data mining fields. As we all know that, multi-view semi-supervised learning, such as co-training, co-regularization has gained considerable attentions. Although recently, multi-view clustering (MVC) has developed rapidly, there are not a survey or review to summarize and analyze the current progress. Therefore, this paper sums up the common strategies of combining multiple views and based on that we proposed a novel taxonomy of the MVC approaches. We also discussed the relationships between MVC and multi-view representation, ensemble clustering, multi-task clustering, multi-view supervised and multi-view semi-supervised learning. Several representative real-world applications are elaborated. To promote the further development of MVC, we pointed out several open problems that are worth exploring in the future.

A Shapelet Transform for Multivariate Time Series Classification

Shapelets are phase independent subsequences designed for time series classification. We propose three adaptations to the Shapelet Transform (ST) to capture multivariate features in multivariate time series classification. We create a unified set of data to benchmark our work on, and compare with three other algorithms. We demonstrate that multivariate shapelets are not significantly worse than other state-of-the-art algorithms.

Does modelling need a Reformation? Ideas for a new grammar of modelling

The quality of mathematical modelling is looked at from the perspective of science’s own quality control arrangement and recent crises. It is argued that the crisis in the quality of modelling is at least as serious as that which has come to light in fields such as medicine, economics, psychology, and nutrition. In the context of the nascent sociology of quantification, the linkages between big data, algorithms, mathematical and statistical modelling (use and misuse of p-values) are evident. Looking at existing proposals for best practices the suggestion is put forward that the field needs a thorough Reformation, leading to a new grammar for modelling. Quantitative methodologies such as uncertainty and sensitivity analysis can form the bedrock on which the new grammar is built, while incorporating important normative and ethical elements. To this effect we introduce sensitivity auditing, quantitative storytelling, and ethics of quantification.

Different approaches to community detection

A precise definition of what constitutes a community in networks has remained elusive. Consequently, network scientists have compared community detection algorithms on benchmark networks with a particular form of community structure and classified them based on the mathematical techniques they employ. However, this comparison can be misleading because apparent similarities in their mathematical machinery can disguise different reasons for why we would want to employ community detection in the first place. Here we provide a focused review of these different motivations that underpin community detection. This problem-driven classification is useful in applied network science, where it is important to select an appropriate algorithm for the given purpose. Moreover, highlighting the different approaches to community detection also delineates the many lines of research and points out open directions and avenues for future research.

Dynamic Weight Alignment for Convolutional Neural Networks

In this paper, we propose a method of improving Convolutional Neural Networks (CNN) by determining the optimal alignment of weights and inputs using dynamic programming. Conventional CNNs convolve learnable shared weights, or filters, across the input data. The filters use a linear matching of weights to inputs using an inner product between the filter and a window of the input. However, it is possible that there exists a more optimal alignment of weights. Thus, we propose the use of Dynamic Time Warping (DTW) to dynamically align the weights to optimized input elements. This dynamic alignment is useful for time series recognition due to the complexities of temporal relations and temporal distortions. We demonstrate the effectiveness of the proposed architecture on the Unipen online handwritten digit and character datasets, the UCI Spoken Arabic Digit dataset, and the UCI Activities of Daily Life dataset.

The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning

Stochastic Gradient Descent (SGD) with small mini-batch is a key component in modern large-scale machine learning. However, its efficiency has not been easy to analyze as most theoretical results require adaptive rates and show convergence rates far slower than that for gradient descent, making computational comparisons difficult. In this paper we aim to clarify the issue of fast SGD convergence. The key observation is that most modern architectures are over-parametrized and are trained to interpolate the data by driving the empirical loss (classification and regression) close to zero. While it is still unclear why these interpolated solutions perform well on test data, these regimes allow for very fast convergence of SGD, comparable in the number of iterations to gradient descent. Specifically, consider the setting with quadratic objective function, or near a minimum, where the quadratic term is dominant. We show that: (1) Mini-batch size 1 with constant step size is optimal in terms of computations to achieve a given error. (2) There is a critical mini-batch size such that: (a. linear scaling) SGD iteration with mini-batch size m smaller than the critical size is nearly equivalent to m iterations of mini-batch size 1. (b. saturation) SGD iteration with mini-batch larger than the critical size is nearly equivalent to a gradient descent step. The critical mini-batch size can be viewed as the limit for effective mini-batch parallelization. It is also nearly independent of the data size, implying O(n) acceleration over GD per unit of computation. We give experimental evidence on real data, with the results closely following our theoretical analyses. Finally, we show how the interpolation perspective and our results fit with recent developments in training deep neural networks and discuss connections to adaptive rates for SGD and variance reduction.

On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent

Because stochastic gradient descent (SGD) has shown promise optimizing neural networks with millions of parameters and few if any alternatives are known to exist, it has moved to the heart of leading approaches to reinforcement learning (RL). For that reason, the recent result from OpenAI showing that a particular kind of evolution strategy (ES) can rival the performance of SGD-based deep RL methods with large neural networks provoked surprise. This result is difficult to interpret in part because of the lingering ambiguity on how ES actually relates to SGD. The aim of this paper is to significantly reduce this ambiguity through a series of MNIST-based experiments designed to uncover their relationship. As a simple supervised problem without domain noise (unlike in most RL), MNIST makes it possible (1) to measure the correlation between gradients computed by ES and SGD and (2) then to develop an SGD-based proxy that accurately predicts the performance of different ES population sizes. These innovations give a new level of insight into the real capabilities of ES, and lead also to some unconventional means for applying ES to supervised problems that shed further light on its differences from SGD. Incorporating these lessons, the paper concludes by demonstrating that ES can achieve 99% accuracy on MNIST, a number higher than any previously published result for any evolutionary method. While not by any means suggesting that ES should substitute for SGD in supervised learning, the suite of experiments herein enables more informed decisions on the application of ES within RL and other paradigms.

DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation
On exponential domination of the consecutive circulant graph
Achievability Performance Bounds for Integer-Forcing Source Coding
Stochastic Particle Gradient Descent for Infinite Ensembles
RAN4IQA: Restorative Adversarial Nets for No-Reference Image Quality Assessment
Statistical Inference for SPDEs: an overview
Online Submodular Welfare Maximization: Greedy Beats 1/2 in Random Order
Experimental design trade-offs for gene regulatory network inference: an in silico study of the yeast Saccharomyces cerevisiae cell cycle
Honey from the Hives: A Theoretical and Computational Exploration of Combinatorial Hives
Assessment Voting in Large Electorates
The universal $\mathfrak{sl}_2$ weight system and the Kreweras triangle
Sophisticated Attacks on Decoy Ballots: The Devil’s Menu and the Market for Lemons
Visual Based Navigation of Mobile Robots
Learning when to skim and when to read
Counting Solutions of a Polynomial System Locally and Exactly
Rate-optimal estimation of p-dimensional linear functionals in a sparse Gaussian model
What Can This Robot Do? Learning from Appearance and Experiments
Ordered field property for zero-sum stochastic games
Ergodicity of some classes of cellular automata subject to noise
Permutation Modules associated to the Face Complex of the Hyperoctahedron and Group Actions
Data Clustering using a Hybrid of Fuzzy C-Means and Quantum-behaved Particle Swarm Optimization
New Algorithms for Unordered Tree Inclusion
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
A nonparametric copula approach to conditional Value-at-Risk
Soft modes and strain redistribution in continuous models of amorphous plasticity: the Eshelby paradigm, and beyond?
Rectilinear Crossings in Complete Balanced d-Partite d-Uniform Hypergraphs
Anscombe’s Model for Sequential Clinical Trials Revisited
Safe Policy Search with Gaussian Process Models
CoDraw: Visual Dialog for Collaborative Drawing
Study on a Poisson’s Equation Solver Based On Deep Learning Technique
Magic squares with all subsquares of possible orders based on extended Langford sequences
The edge-Hosoya polynomial of benzenoid chains
A survey on the lace expansion for the nearest-neighbor models on the BCC lattice
Inplane anisotropy of longitudinal thermal conductivities and weak localization of magnons in a disordered spiral magnet
Gradients explode – Deep Networks are shallow – ResNet explained
Invariant Synthesis for Incomplete Verification Engines
Transfer Learning for OCRopus Model Training on Early Printed Books
Score estimation in the monotone single index model
Influence of the SIPG penalisation on the numerical properties of linear systems for elastic wave propagation
A note on doubly nonlinear SPDEs with singular drift in divergence form
A Novel Approach for Effective Learning in Low Resourced Scenarios
Fast Hough Transform and approximation properties of dyadic patterns
Avoiding Echo-Responses in a Retrieval-Based Conversation System
Sparse principal component analysis via random projections
The two periodic Aztec diamond and matrix valued orthogonal polynomials
Signed counts of real simple rational functions
Automated Image Analysis Framework for the High-Throughput Determination of Grapevine Berry Sizes Using Conditional Random Fields
Pre-training Attention Mechanisms
Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice
Proof of a conjecture on induced subgraphs of Ramsey graphs
Approximate controllability of the Jaynes-Cummings dynamics
Constructive Matrix Theory for Higher Order Interaction
On the Sample Complexity of Multichannel Frequency Estimation via Convex Optimization
Risk Sensitive Portfolio Optimization with Regime-Switching
Heterogeneous update mechanisms in evolutionary games: mixing innovative and imitative dynamics
Oracle inequalities for the stochastic differential equations
Connecting phase transition theory with unsupervised learning
Automated Selection of Post-Strata using a Model-Assisted Regression Tree Estimator
Random forward models and log-likelihoods in Bayesian inverse problems
Perfect Prediction in Normal Form: Superrational Thinking Extended to Non-Symmetric Games
Coverage Analysis and Load Balancing in HetNets with mmWave Multi-RAT Small Cells
Low SNR Asymptotic Rates of Vector Channels with One-Bit Outputs
On the global convergence of a randomly perturbed dissipative nonlinear oscillator
Information Processing by Networks of Quantum Decision Makers
Alternation, Sparsity and Sensitivity : Bounds and Exponential Gaps
Approximating stationary distributions of fast mixing Glauber dynamics, with applications to exponential random graphs
On the minimal ranks of matrix pencils and the existence of a best approximate block-term tensor decomposition
Stein’s Method for Stationary Distributions of Markov Chains and Application to Ising Models
Modeling Individual Cyclic Variation in Human Behavior
Understanding Career Progression in Baseball Through Machine Learning
Unsupervised Domain Adaptation for 3D Keypoint Prediction from a Single Depth Scan
On W[1]-Hardness as Evidence for Intractability
Fast algorithms for fitting L$_1$-penalized multivariate linear models to structured high-throughput data
Semantic Visual Localization
Sheep Soliton
Well-posedness of stochastic porous media equations with nonlinear, conservative noise
Multi-Attribute Robust Component Analysis for Facial UV Maps
Sentiment Predictability for Stocks
Deep Burst Denoising
Impossibility of deducing preferences and rationality from human policy
mmWave Massive MIMO with Simple RF and Appropriate DSP
Optimal top dag compression
Efficient Principally Stratified Treatment Effect Estimation in Crossover Studies with Absorbent Binary Endpoints
Mapping the world population one building at a time
Modeling recurrent event times subject to right-censoring with D-vine copulas
Hierarchical Text Generation and Planning for Strategic Dialogue
Efficient Global Monitoring Statistics for High-Dimensional Data
Resistance distance in bent linear 2-trees
WACSF – Weighted Atom-Centered Symmetry Functions as Descriptors in Machine Learning Potentials
Improved Target Acquisition Rates with Feedback Codes
A novel nonconvex approach to recover the low-tubal-rank tensor data: when t-SVD meets PSSV
Compact Linearization for Binary Quadratic Problems subject to Linear Equations
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
An MPI-Based Python Framework for Distributed Training with Keras
Hierarchical Bayesian Bradley-Terry for Applications in Major League Baseball
Approximability of the Six-vertex Model
Morphology dictates a robot’s ability to ground crowd-proposed language
On reproduction of On the regularization of Wasserstein GANs
Resistance distance in straight linear 2-trees
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
On Stochastic Shell Models of Turbulence
Low Rank Matrix Recovery for Joint Array Self-Calibration and Sparse Model DoA Estimation
Uncertainty in Cyber Security Investments
Mitigating Asymmetric Nonlinear Weight Update Effects in Hardware Neural Network based on Analog Resistive Synapse
Impression Network for Video Object Detection
NegBio: a high-performance tool for negation and uncertainty detection in radiology reports
Automatic Music Highlight Extraction using Convolutional Recurrent Attention Networks
Parallel Markov Chain Monte Carlo for Bayesian Hierarchical Models with Big Data, in Two Stages
Self-organization and the Maximum Empower Principle in the Framework of max-plus Algebra
An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems
Controlled Singular Volterra Integral Equations and Pontryagin Maximum Principle
Cyberattack Detection in Mobile Cloud Computing: A Deep Learning Approach
Statistical inference for Vasicek-type model driven by Hermite processes
SRPGAN: Perceptual Generative Adversarial Network for Single Image Super Resolution
A Machine Learning Framework for Resource Allocation Assisted by Cloud Computing
NDT: Neual Decision Tree Towards Fully Functioned Neural Graph
Stress-dependent electrical transport and its universal scaling in granular materials
An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns
Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017
Degrees of Freedom of Interference Networks with Transmitter-Side Caches
Ultrametricity increases the predictability of cultural dynamics
Learning a Virtual Codec Based on Deep Convolutional Neural Network to Compress Image
Priority Rules on ATN (PRT) Intersections
Joint Data-Aided Carrier Frequency Offset, Phase Offset, Amplitude and SNR Estimation for Millimeter-Wave MIMO Systems
Using Machine Learning to Enhance Vehicles Traffic in ATN (PRT) Systems
Graph partitioning using matrix differential equations
Taming Wild High Dimensional Text Data with a Fuzzy Lash
Characterizing Political Fake News in Twitter by its Meta-Data
Generalized Gelation Theory describes Human Online Aggregation in support of Extremism
How well does your sampler really work?
Population polarization dynamics and next-generation social media algorithms
StackInsights: Cognitive Learning for Hybrid Cloud Readiness
Guaranteed error control bounds for the stabilised space-time IgA approximations to parabolic problems
Predicting the dissolution kinetics of silicate glasses using machine learning
An ILP Solver for Multi-label MRFS with Connectivity Constraints
Bendable Cuboid Robot Path Planning with Collision Avoidance using Generalized $L_p$ Norms
Log-correlated Random Energy Models with extensive free energy fluctuations: pathologies caused by rare events as signatures of phase transitions
A Spectral Approach for the Design of Experiments: Design, Analysis and Algorithms
Positive Opetopes with Contractions form a Test Category
Universal Intermediate Gradient Method for Convex Problems with Inexact Oracle
The proximal point method revisited
Syndrome decoding of Reed-Muller codes and tensor decomposition over finite fields
Microbial community structure predicted by the stable marriage problem
Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization
MEDRoP: Memory-Efficient Dynamic Robust PCA
Computing Optimal Control of Cascading Failure in DC Networks
A proof of Tomescu’s graph coloring conjecture
Self-adaptation of Genetic Operators Through Genetic Programming Techniques
A MapReduce-based rotation forest classifier for epileptic seizure prediction
Benford’s Law and First Letter of Word
Using Deep learning methods for generation of a personalized list of shuffled songs
Reflexive polytopes arising from edge polytopes
Spatial As Deep: Spatial CNN for Traffic Scene Understanding
Deep Learning for Distant Speech Recognition
‘Zero-Shot’ Super-Resolution using Deep Internal Learning
Towards the 1G of Mobile Power Network: RF, Signal and System Designs to Make Smart Objects Autonomous
Opposition diagrams for automorphisms of large spherical buildings
Deep Learning in RF Sub-sampled B-mode Ultrasound Imaging
Local Dimension is Unbounded for Planar Posets
Query-Based Abstractive Summarization Using Neural Networks
Root geometry of polynomial sequences III: Type $(1,1)$ with positive coefficients
Railway Track Specific Traffic Signal Selection Using Deep Learning
Machine Learning and Integral Equations
Learning a Single Convolutional Super-Resolution Network for Multiple Degradations
Tverberg’s theorem is 50 years old: a survey
Hypothesis Testing for High-Dimensional Multinomials: A Selective Review
Probabilistic Spacetimes
Distributed SMC-PHD Fusion for Partial, Arithmetic Average Consensus
Super-sparse Learning in Similarity Spaces
Dynamic Boltzmann Machines for Second Order Moments and Generalized Gaussian Distributions
A new and five older Concurrent Memory Reclamation Schemes in Comparison (Stamp-it)
Cyclotomic shuffles
Cuts in matchings of 3-edge-connected cubic graphs
clcNet: Improving the Efficiency of Convolutional Neural Network using Channel Local Convolutions
Combination of analysis techniques for efficient track reconstruction in high multiplicity events
Generating and designing DNA with deep generative models
On a correction of a property of $GC$ sets
Optimal control of nonlinear elliptic problems with sparsity
A Survey of Concentration Inequalities for U-Statistics
Towards a science of human stories: using sentiment analysis and emotional arcs to understand the building blocks of complex social systems
Logical laws for short existential monadic second order sentences about graphs
Asymmetric access to information impacts the power-law exponent in networks
Cameron-Liebler sets of generators in finite classical polar spaces
Towards a Deep Reinforcement Learning Approach for Tower Line Wars
Structured Optimal Transport
Continious-time Importance Sampling: Monte Carlo Methods which Avoid Time-discretisation Error
Attenuation Correction for Brain PET imaging using Deep Neural Network based on Dixon and ZTE MR images
Probabilistic Semantic Retrieval for Surveillance Videos with Activity Graphs
Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms
On the Placement Delivery Array Design for Coded Caching Scheme in D2D Networks
Predicting Individual Physiologically Acceptable States for Discharge from a Pediatric Intensive Care Unit
Amenable cones: error bounds without constraint qualifications
Error-Tolerant Big Data Processing
Visual Explanations from Hadamard Product in Multimodal Deep Networks
Panoramic Robust PCA for Foreground-Background Separation on Noisy, Free-Motion Camera Video
Testing Sparsity-Inducing Penalties
Index Modulation for 5G: Striving to Do More with Less
Optimal Pricing of User-Initiated Data-Plan Sharing in a Roaming Market
Quantum Algorithms for Boolean Equation Solving and Quantum Algebraic Attack on Cryptosystems
Misspecified Nonconvex Statistical Optimization for Phase Retrieval
Incentive Mechanism Design for Wireless Energy Harvesting-Based Internet of Things
Crack detection in beam structures with a novel Laplace based Wavelet Finite Element method
Deep Neural Generative Model of Functional MRI Images for Psychiatric Disorder Diagnosis
Automated flow for compressing convolution neural networks for efficient edge-computation with FPGA
Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic
A martingale view of Blackwell’s renewal theorem and its extensions to a general counting process
Model Reduction in Chemical Reaction Networks: A Data-Driven Sparse-Learning Approach
A Bridge Between Hyperparameter Optimization and Larning-to-learn
Solving rough differential equations with the theory of the regularity structures
A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction
Gaussian representation of a class of Riesz probabilitydistributions
Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks
Mixed Moore Cayley graphs
FPT-algorithms for some problems related to integer programming
LSTM Pose Machines
Spatial-Temporal Memory Networks for Video Object Detection
The stability and rapid exponential stabilization of heat equation in non-cylindrical domain
Space-Filling Curve Indices as Acceleration Structure for Exemplar-Based Inpainting
On One Problem in Multichannel Signal Detection
Selective-Candidate Framework with Similarity Selection Rule for Evolutionary Optimization
Language and Noise Transfer in Speech Enhancement Generative Adversarial Network
Squeezed Convolutional Variational AutoEncoder for Unsupervised Anomaly Detection in Edge Device Industrial Internet of Things
Understanding Quantum Algorithms via Query Complexity
Periodicity of Grover walks on generalized Bethe trees
On convergence of infinite matrix products with alternating factors from two sets of matrices
The Saga of KPR: Theoretical and Experimental developments
A generalized Ihara zeta function formula for a simple graph with bounded degree
`Indifference’ methods for managing agent rewards
Heat conservation for generalized Dirac Laplacians on manifolds with boundary
Convex drawings of the complete graph: topology meets geometry
Short Packets over Block-Memoryless Fading Channels: Pilot-Assisted or Noncoherent Transmission?
On the Effectiveness of Least Squares Generative Adversarial Networks
Graph Transform Learning for Image Compression
Bounds for the Graham-Pollak Theorem for Hypergraphs
Automatic Classification of Functional Gait Disorders
The connected component of the partial duplication graph
Critical neural networks with short and long term plasticity
Detecting Hate Speech in Social Media
Control energy scaling in temporal networks
Automatic segmentation method of pelvic floor levator hiatus in ultrasound using a self-normalising neural network
Meeting Energy-Efficient and QoS Requirements of 5G Using D2D Communications
Super-Resolution with Deep Adaptive Image Resampling
A Power and Prediction Analysis for Knockoffs with Lasso Statistics
Multi-modal Face Pose Estimation with Multi-task Manifold Deep Learning
The Power of Vertex Sparsifiers in Dynamic Graph Algorithms
Order of the variance in the discrete Hammersley process with boundaries
(Wireless) Scheduling, Graph Classes, and $c$-Colorable Subgraphs
On monopoly and dynamic monopoly of Cartesian product of graphs with constant thresholds
Invincible Strategies of Iterated Prisoner’s Dilemma
Guiding human gaze with convolutional neural networks
HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA
Rigidity for the Hopf algebra of quasi-symmetric functions
A note on semi-groups of stochastic gradient descent and online principal component analysis
POD for optimal control of the Cahn-Hilliard system using spatially adapted snapshots
Deep generative models of genetic variation capture mutation effects
Non-criticality criteria for Abelian sandpile models with sources and sinks
Dependence structures – estimation and visualization using distance multivariance
Nonparametric Inference for Auto-Encoding Variational Bayes
Size-Independent Sample Complexity of Neural Networks
The geometry of random minimal factorizations of a long cycle
Enumerating the states of the twist knot
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents
Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients
Multi-point Vibration Measurement for Mode Identification of Bridge Structures using Video-based Motion Magnification
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
ES Is More Than Just a Traditional Finite-Difference Approximator
Combinatorics of chemical reaction systems
Parallel Complexity of Forward and Backward Propagation
Path loss, beamforming gain and time dynamics measurements at 28 GHz for 90% indoor coverage
Sum-Rate Analysis for High Altitude Platform (HAP) Drones with Tethered Balloon Relay
End-to-end Recovery of Human Shape and Pose
Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima