ReNN: Rule-embedded Neural Networks

The artificial neural network shows powerful ability of inference, but it is still criticized for lack of interpretability and prerequisite needs of big dataset. This paper proposes the Rule-embedded Neural Network (ReNN) to overcome the shortages. ReNN first makes local-based inferences to detect local patterns, and then uses rules based on domain knowledge about the local patterns to generate rule-modulated map. After that, ReNN makes global-based inferences that synthesizes the local patterns and the rule-modulated map. To solve the optimization problem caused by rules, we use a two-stage optimization strategy to train the ReNN model. By introducing rules into ReNN, we can strengthen traditional neural networks with long-term dependencies which are difficult to learn with limited empirical dataset, thus improving inference accuracy. The complexity of neural networks can be reduced since long-term dependencies are not modeled with neural connections, and thus the amount of data needed to optimize the neural networks can be reduced. Besides, inferences from ReNN can be analyzed with both local patterns and rules, and thus have better interpretability. In this paper, ReNN has been validated with a time-series detection problem.

Change point analysis in non-stationary processes – a mass excess approach

This paper considers the problem of testing if a sequence of means (\mu_t)_{t =1,\ldots ,n } of a non-stationary time series (X_t)_{t =1,\ldots ,n } is stable in the sense that the difference of the means \mu_1 and \mu_t between the initial time t=1 and any other time is smaller than a given level, that is | \mu_1 - \mu_t | \leq c for all t =1,\ldots ,n . A test for hypotheses of this type is developed using a biascorrected monotone rearranged local linear estimator and asymptotic normality of the corresponding test statistic is established. As the asymptotic variance depends on the location and order of the critical roots of the equation | \mu_1 - \mu_t | = c a new bootstrap procedure is proposed to obtain critical values and its consistency is established. As a consequence we are able to quantitatively describe relevant deviations of a non-stationary sequence from its initial value. The results are illustrated by means of a simulation study and by analyzing data examples.

Anomaly detection in wide area network mesh using two machine learning anomaly detection algorithms

Anomaly detection is the practice of identifying items or events that do not conform to an expected behavior or do not correlate with other items in a dataset. It has previously been applied to areas such as intrusion detection, system health monitoring, and fraud detection in credit card transactions. In this paper, we describe a new method for detecting anomalous behavior over network performance data, gathered by perfSONAR, using two machine learning algorithms: Boosted Decision Trees (BDT) and Simple Feedforward Neural Network. The effectiveness of each algorithm was evaluated and compared. Both have shown sufficient performance and sensitivity.

Parameter Hub: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training

Most work in the deep learning systems community has focused on faster inference, but arriving at a trained model requires lengthy experiments. Accelerating training lets developers iterate faster and come up with better models. DNN training is often seen as a compute-bound problem, best done in a single large compute node with many GPUs. As DNNs get bigger, training requires going distributed. Distributed deep neural network (DDNN) training constitutes an important workload on the cloud. Larger DNN models and faster compute engines shift training performance bottleneck from computation to communication. Our experiments show existing DNN training frameworks do not scale in a typical cloud environment due to insufficient bandwidth and inefficient parameter server software stacks. We propose PHub, a high performance parameter server (PS) software design that provides an optimized network stack and a streamlined gradient processing pipeline to benefit common PS setups, and PBox, a balanced, scalable central PS hardware that fully utilizes PHub capabilities. We show that in a typical cloud environment, PHub can achieve up to 3.8x speedup over state-of-theart designs when training ImageNet. We discuss future directions of integrating PHub with programmable switches for in-network aggregation during training, leveraging the datacenter network topology to reduce bandwidth usage and localize data movement.

Sampling techniques for big data analysis in finite population inference

In analyzing big data for finite population inference, it is critical to adjust for the selection bias in the big data. In this paper, we propose two methods of reducing the selection bias associated with the big data sample. The first method uses a version of inverse sampling by incorporating auxiliary information from external sources, and the second one borrows the idea of data integration by combining the big data sample with an independent probability sample. Two simulation studies show that the proposed methods are unbiased and have better coverage rates than their alternatives. In addition, the proposed methods are easy to implement in practice.

Discrete Autoencoders for Sequence Models

Recurrent models for sequences have been recently successful at many tasks, especially for language modeling and machine translation. Nevertheless, it remains challenging to extract good representations from these models. For instance, even though language has a clear hierarchical structure going from characters through words to sentences, it is not apparent in current language models. We propose to improve the representation in sequence models by augmenting current approaches with an autoencoder that is forced to compress the sequence through an intermediate discrete latent space. In order to propagate gradients though this discrete representation we introduce an improved semantic hashing technique. We show that this technique performs well on a newly proposed quantitative efficiency measure. We also analyze latent codes produced by the model showing how they correspond to words and phrases. Finally, we present an application of the autoencoder-augmented model to generating diverse translations.

Transformation Autoregressive Networks

The fundamental task of general density estimation has been of keen interest to machine learning. Recent advances in density estimation have either: a) proposed a flexible model to estimate the conditional factors of the chain rule, p(x_{i}\, |\, x_{i-1}, \ldots); or b) used flexible, non-linear transformations of variables of a simple base distribution. Instead, this work jointly leverages transformations of variables and autoregressive conditional models, and proposes novel methods for both. We provide a deeper understanding of our methods, showing a considerable improvement through a comprehensive study over both real world and synthetic data. Moreover, we illustrate the use of our models in outlier detection and image modeling tasks.

Algorithms for the Greater Good! On Mental Modeling and Acceptable Symbiosis in Human-AI Collaboration

Effective collaboration between humans and AI-based systems requires effective modeling of the human in the loop, both in terms of the mental state as well as the physical capabilities of the latter. However, these models can also open up pathways for manipulating and exploiting the human in the hopes of achieving some greater good, especially when the intent or values of the AI and the human are not aligned or when they have an asymmetrical relationship with respect to knowledge or computation power. In fact, such behavior does not necessarily require any malicious intent but can rather be borne out of cooperative scenarios. It is also beyond simple misinterpretation of intents, as in the case of value alignment problems, and thus can be effectively engineered if desired. Such techniques already exist and pose several unresolved ethical and moral questions with regards to the design of autonomy. In this paper, we illustrate some of these issues in a teaming scenario and investigate how they are perceived by participants in a thought experiment.

A State-of-the-Art of Semantic Change Computation

This paper reviews the state-of-the-art of semantic change computation, one emerging research field in computational linguistics, proposing a framework that summarizes the literature by identifying and expounding five essential components in the field: diachronic corpus, diachronic word sense characterization, change modelling, evaluation data and data visualization. Despite the potential of the field, the review shows that current studies are mainly focused on testifying hypotheses proposed in theoretical linguistics and that several core issues remain to be solved: the need for diachronic corpora of languages other than English, the need for comprehensive evaluation data for evaluation, the comparison and construction of approaches to diachronic word sense characterization and change modelling, and further exploration of data visualization techniques for hypothesis justification.

An Attention-Based Word-Level Interaction Model: Relation Detection for Knowledge Base Question Answering

Relation detection plays a crucial role in Knowledge Base Question Answering (KBQA) because of the high variance of relation expression in the question. Traditional deep learning methods follow an encoding-comparing paradigm, where the question and the candidate relation are represented as vectors to compare their semantic similarity. Max- or average- pooling operation, which compresses the sequence of words into fixed-dimensional vectors, becomes the bottleneck of information. In this paper, we propose to learn attention-based word-level interactions between questions and relations to alleviate the bottleneck issue. Similar to the traditional models, the question and relation are firstly represented as sequences of vectors. Then, instead of merging the sequence into a single vector with pooling operation, soft alignments between words from the question and the relation are learned. The aligned words are subsequently compared with the convolutional neural network (CNN) and the comparison results are merged finally. Through performing the comparison on low-level representations, the attention-based word-level interaction model (ABWIM) relieves the information loss issue caused by merging the sequence into a fixed-dimensional vector before the comparison. The experimental results of relation detection on both SimpleQuestions and WebQuestions datasets show that ABWIM achieves state-of-the-art accuracy, demonstrating its effectiveness.

COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints

Clustering is inherently ill-posed: there often exist multiple valid clusterings of a single dataset, and without any additional information a clustering system has no way of knowing which clustering it should produce. This motivates the use of constraints in clustering, as they allow users to communicate their interests to the clustering system. Active constraint-based clustering algorithms select the most useful constraints to query, aiming to produce a good clustering using as few constraints as possible. We propose COBRA, an active method that first over-clusters the data by running K-means with a K that is intended to be too large, and subsequently merges the resulting small clusters into larger ones based on pairwise constraints. In its merging step, COBRA is able to keep the number of pairwise queries low by maximally exploiting constraint transitivity and entailment. We experimentally show that COBRA outperforms the state of the art in terms of clustering quality and runtime, without requiring the number of clusters in advance.

Sliding Line Point Regression for Shape Robust Scene Text Detection

Traditional text detection methods mostly focus on quadrangle text. In this study we propose a novel method named sliding line point regression (SLPR) in order to detect arbitrary-shape text in natural scene. SLPR regresses multiple points on the edge of text line and then utilizes these points to sketch the outlines of the text. The proposed SLPR can be adapted to many object detection architectures such as Faster R-CNN and R-FCN. Specifically, we first generate the smallest rectangular box including the text with region proposal network (RPN), then isometrically regress the points on the edge of text by using the vertically and horizontally sliding lines. To make full use of information and reduce redundancy, we calculate x-coordinate or y-coordinate of target point by the rectangular box position, and just regress the remaining y-coordinate or x-coordinate. Accordingly we can not only reduce the parameters of system, but also restrain the points which will generate more regular polygon. Our approach achieved competitive results on traditional ICDAR2015 Incidental Scene Text benchmark and curve text detection dataset CTW1500.

The Benefits of Population Diversity in Evolutionary Algorithms: A Survey of Rigorous Runtime Analyses

Population diversity is crucial in evolutionary algorithms to enable global exploration and to avoid poor performance due to premature convergence. This book chapter reviews runtime analyses that have shown benefits of population diversity, either through explicit diversity mechanisms or through naturally emerging diversity. These works show that the benefits of diversity are manifold: diversity is important for global exploration and the ability to find several global optima. Diversity enhances crossover and enables crossover to be more effective than mutation. Diversity can be crucial in dynamic optimization, when the problem landscape changes over time. And, finally, it facilitates search for the whole Pareto front in evolutionary multiobjective optimization. The presented analyses rigorously quantify the performance of evolutionary algorithms in the light of population diversity, laying the foundation for a rigorous understanding of how search dynamics are affected by the presence or absence of population diversity and the introduction of diversity mechanisms.

Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence

We study the incremental learning problem for the classification task, a key component in developing life-long learning systems. The main challenges while learning in an incremental manner are to preserve and update the knowledge of the model. In this work, we propose a generalization of Path Integral (Zenke et al., 2017) and EWC (Kirkpatrick et al., 2016} with a theoretically grounded KL-divergence based perspective. We show that, to preserve and update the knowledge, regularizing the model’s likelihood distribution is more intuitive and provides better insights to the problem. To do so, we use KL-divergence as a measure of distance which is equivalent to computing distance in a Riemannian manifold induced by the Fisher information matrix. Furthermore, to enhance the learning flexibility, the regularization is weighted by a parameter importance score that is calculated along the entire training trajectory. Contrary to forgetting, as the algorithm progresses, the regularized loss makes the network intransigent, resulting in its inability to discriminate new tasks from the old ones. We show that this problem of intransigence can be addressed by storing a small subset of representative samples from previous datasets. In addition, in order to evaluate the performance of an incremental learning algorithm, we introduce two novel metrics to evaluate forgetting and intransigence. Experimental evaluation on incremental version of MNIST and CIFAR-100 classification datasets shows that our approach outperforms existing state-of-the-art baselines in all the evaluation metrics.

Links: A High-Dimensional Online Clustering Method

We present a novel algorithm, called Links, designed to perform online clustering on unit vectors in a high-dimensional Euclidean space. The algorithm is appropriate when it is necessary to cluster data efficiently as it streams in, and is to be contrasted with traditional batch clustering algorithms that have access to all data at once. For example, Links has been successfully applied to embedding vectors generated from face images or voice recordings for the purpose of recognizing people, thereby providing real-time identification during video or audio capture.

Analysis of the Continued Logarithm Algorithm

The Continued Logarithm Algorithm – CL for short- introduced by Gosper in 1978 computes the gcd of two integers; it seems very efficient, as it only performs shifts and subtractions. Shallit has studied its worst-case complexity in 2016 and showed it to be linear. We here perform the average-case analysis of the algorithm: we study its main parameters (number of iterations, total number of shifts) and obtain precise asymptotics for their mean values. Our ‘dynamical’ analysis involves the dynamical system underlying the algorithm, that produces continued fraction expansions whose quotients are powers of 2. Even though this CL system has already been studied by Chan (around 2005), the presence of powers of 2 in the quotients ingrains into the central parameters a dyadic flavour that cannot be grasped solely by studying the CL system. We thus introduce a dyadic component and deal with a two-component system. With this new mixed system at hand, we then provide a complete average-case analysis of the CL algorithm, with explicit constants.

Thermal conductivity in 1d: disorder-induced transition from anomalous to normal scaling
Temporally-Biased Sampling for Online Model Management
tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow
Object-based reasoning in VQA
A Generalized Circuit for the Hamiltonian Dynamics Through the Truncated Series
Model selection in sparse high-dimensional vine copula models with application to portfolio risk
A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts
Distributed Model Construction in Radio Interferometric Calibration
Deep Learning based Retinal OCT Segmentation
Denoising Arterial Spin Labeling Cerebral Blood Flow Images Using Deep Learning
Quantum Coarse-Graining, Symmetries and Reducibility of Dynamics
Diffeomorphic registration of discrete geometric distributions
Multicritical point on the de Almeida-Thouless line in spin glasses in $d>6$ dimensions
Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives
Evaluating approaches for supervised semantic labeling
FEAST Eigensolver for Nonlinear Eigenvalue Problems
Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Data
Communication-Efficient Search for an Approximate Closest Lattice Point
Earthmover Resilience and Testing in Ordered Structures
Matrix Completion for Low-Observability Voltage Estimation
Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
Predicting Rapid Fire Growth (Flashover) Using Conditional Generative Adversarial Networks
Algorithms for the Construction of Incoherent Frames Under Various Design Constraints
The Intriguing Properties of Model Explanations
A distributed-memory approximation algorithm for maximum weight perfect bipartite matching
Personalized Survival Prediction with Contextual Explanation Networks
Subgraph counts for dense random graphs with specified degrees
Spatiotemporal intermittency and localized dynamic fluctuations upon approaching the glass transition
Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data
Learning to Emulate an Expert Projective Cone Scheduler
Object Detection in Videos by Short and Long Range Object Linking
On the global stability of departure time user equilibrium: A Lyapunov approach
Robustness of classification ability of spiking neural networks
Weighted Community Detection and Data Clustering Using Message Passing
Mixture Proportion Estimation for Positive–Unlabeled Learning via Classifier Dimension Reduction
Antenna Selection for Large-Scale MIMO Systems with Low-Resolution ADCs
Open3D: A Modern Library for 3D Data Processing
Over-representation of Extreme Events in Decision-Making: A Rational Metacognitive Account
Sparsity in Max-Plus Algebra and Systems
Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning
Structured Memory based Deep Model to Detect as well as Characterize Novel Inputs
Accelerating recurrent neural network language model based online speech recognition system
New characterizations of freeness for hyperplane arrangements
Fast Power system security analysis with Guided Dropout
An infinite family of subcubic graphs with unbounded packing chromatic number
Boundary effect in competition processes
Comparison of robustness of statistical procedures for network structure analysis
Estimation of conditional extreme risk measures from heavy-tailed elliptical random vectors
Contribution of the Extreme Term in the Sum of Samples with Regularly Varying Tail
SIR Coverage Analysis in Cellular Networks with Temporal Traffic: A Stochastic Geometry Approach
Variational and viscosity operators for the evolutive Hamilton-Jacobi equation
Bayesian inverse problems with unknown operators
Pilot study for the COST Action ‘Reassembling the Republic of Letters’: language-driven network analysis of letters from the Hartlib’s Papers
Ito’s Formula for Gaussian Processes with Stochastic Discontinuities
Approximate ground states of the random-field Potts model from graph cuts
Properties of additive functionals of Brownian motion with resetting
A Dynamic Process Interpretation of the Sparse ERGM Reference Model
Input / Output Stability of a Damped String Equation coupled with Ordinary Differential System
E2E-MLT – an Unconstrained End-to-End Method for Multi-Language Scene Text
Analytical modeling and analysis of interleaving on correlated wireless channels
Diagnose like a Radiologist: Attention Guided Convolutional Neural Network for Thorax Disease Classification
The Necklace Process: A Generating Function Approach
PEYMA: A Tagged Corpus for Persian Named Entities
Fast Binary Compressive Sensing via \ell_0 Gradient Descent
Large Deviations in Renewal Theory and Renewal Models of Statistical Mechanics
Playing with universality classes of Barkhausen avalanches
Nonparametric Bayesian volatility estimation
Modeling Influence with Semantics in Social Networks: a Survey
Secure and Robust Identification via Classical-Quantum Channels
Social Event Scheduling
Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media
Extensions of Erdős-Gallai Theorem and Luo’s Theorem with Applications
Operator Product Expansion in Liouville Field Theory and Seiberg type transitions in log-correlated Random Energy Models
Standard modules, Jones-Wenzl projectors, and the valenced Temperley-Lieb algebra
Benjamini-Schramm convergence of random planar maps
Cardiac Arrhythmia Detection from ECG Combining Convolutional and Long Short-Term Memory Networks
An Iterative Spanning Forest Framework for Superpixel Segmentation
Analysis and optimal control of an intracellular delayed HIV model with CTL immune response
Features, Projections, and Representation Change for Generalized Planning
Uplink and Downlink Transceiver Design for OFDM with Index Modulation in Multi-user Networks
Rigorous Restricted Isometry Property of Low-Dimensional Subspaces
Deep Adversarial Attention Alignment for Unsupervised Domain Adaptation: the Benefit of Target Expectation Maximization
Spectrum of SYK model
A Machine Learning Approach to Quantitative Prosopography
Modelling structure and predicting dynamics of discussion threads in online boards
Performance of Media-based Modulation in Multi-user Networks
Creative Exploration Using Topic Based Bisociative Networks
An SPDE Model for Systemic Risk with Endogenous Contagion
Asymptotic Analysis for Low-Resolution Massive MIMO Systems with MMSE Receiver
Indistinguishable binomial decision tree of 3-SAT: Proof of class P is a proper subset of class NP
TransRev: Modeling Reviews as Translations from Users to Items
Graph limits of random unlabelled $k$-trees
SegDenseNet: Iris Segmentation for Pre and Post Cataract Surgery
Error estimates for spectral convergence of the graph Laplacian on random geometric graphs towards the Laplace–Beltrami operator
Surprise in Elections
Video-based Sign Language Recognition without Temporal Segmentation
Greedy Morse matchings and discrete smoothness
An Incremental Path-Following Splitting Method for Linearly Constrained Nonconvex Nonsmooth Programs
Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions
Universality for zeros of random polynomials
Information Measures for Microphone Arrays
Spherical CNNs
Long scale Ollivier-Ricci curvature of graphs
Random Access Communication for Wireless Control Systems with Energy Harvesting Sensors