An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality Discovery

While most classical approaches to Granger causality detection repose upon linear time series assumptions, many interactions in neuroscience and economics applications are nonlinear. We develop an approach to nonlinear Granger causality detection using multilayer perceptrons where the input to the network is the past time lags of all series and the output is the future value of a single series. A sufficient condition for Granger non-causality in this setting is that all of the outgoing weights of the input data, the past lags of a series, to the first hidden layer are zero. For estimation, we utilize a group lasso penalty to shrink groups of input weights to zero. We also propose a hierarchical penalty for simultaneous Granger causality and lag estimation. We validate our approach on simulated data from both a sparse linear autoregressive model and the sparse and nonlinear Lorenz-96 model.

An Efficient ADMM Algorithm for Structural Break Detection in Multivariate Time Series

We present an efficient alternating direction method of multipliers (ADMM) algorithm for segmenting a multivariate non-stationary time series with structural breaks into stationary regions. We draw from recent work where the series is assumed to follow a vector autoregressive model within segments and a convex estimation procedure may be formulated using group fused lasso penalties. Our ADMM approach first splits the convex problem into a global quadratic program and a simple group lasso proximal update. We show that the global problem may be parallelized over rows of the time dependent transition matrices and furthermore that each subproblem may be rewritten in a form identical to the log-likelihood of a Gaussian state space model. Consequently, we develop a Kalman smoothing algorithm to solve the global update in time linear in the length of the series.

Relief-Based Feature Selection: Introduction and Review

Feature selection plays a critical role in data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. interactions, so that informative features are not mistakenly eliminated prior to downstream modeling. This paper focuses on Relief-based algorithms (RBAs), a unique family of filter-style feature selection algorithms that strike an effective balance between these objectives while flexibly adapting to various data characteristics, e.g. classification vs. regression. First, this work broadly examines types of feature selection and defines RBAs within that context. Next, we introduce the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features. Lastly, we include an expansive review of RBA methodological research beyond Relief and its popular descendant, ReliefF. In particular, we characterize branches of RBA research, and provide comparative summaries of RBA algorithms including contributions, strategies, functionality, time complexity, adaptation to key data characteristics, and software availability.

Computing return times or return periods with rare event algorithms

The average time between two occurrences of the same event, referred to as its return time (or return period), is a useful statistical concept for practical applications. For instance insurances or public agency may be interested by the return time of a 10m flood of the Seine river in Paris. However, due to their scarcity, reliably estimating return times for rare events is very difficult using either observational data or direct numerical simulations. For rare events, an estimator for return times can be built from the extrema of the observable on trajectory blocks. Here, we show that this estimator can be improved to remain accurate for return times of the order of the block size. More importantly, we show that this approach can be generalised to estimate return times from numerical algorithms specifically designed to sample rare events. So far those algorithms often compute probabilities, rather than return times. The approach we propose provides a computationally extremely efficient way to estimate numerically the return times of rare events for a dynamical system, gaining several orders of magnitude of computational costs. We illustrate the method on two kinds of observables, instantaneous and time-averaged, using two different rare event algorithms, for a simple stochastic process, the Ornstein-Uhlenbeck process. As an example of realistic applications to complex systems, we finally discuss extreme values of the drag on an object in a turbulent flow.

Causal nearest neighbor rules for optimal treatment regimes

The estimation of optimal treatment regimes is of considerable interest to precision medicine. In this work, we propose a causal k-nearest neighbor method to estimate the optimal treatment regime. The method roots in the framework of causal inference, and estimates the causal treatment effects within the nearest neighborhood. Although the method is simple, it possesses nice theoretical properties. We show that the causal k-nearest neighbor regime is universally consistent. That is, the causal k-nearest neighbor regime will eventually learn the optimal treatment regime as the sample size increases. We also establish its convergence rate. However, the causal k-nearest neighbor regime may suffer from the curse of dimensionality, i.e. performance deteriorates as dimensionality increases. To alleviate this problem, we develop an adaptive causal k-nearest neighbor method to perform metric selection and variable selection simultaneously. The performance of the proposed methods is illustrated in simulation studies and in an analysis of a chronic depression clinical trial.

Recurrent Relational Networks for Complex Relational Reasoning

Humans possess an ability to abstractly reason about objects and their interactions, an ability not shared with state-of-the-art deep learning models. Relational networks, introduced by Santoro et al. (2017), add the capacity for relational reasoning to deep neural networks, but are limited in the complexity of the reasoning tasks they can address. We introduce recurrent relational networks which increase the suite of solvable tasks to those that require an order of magnitude more steps of relational reasoning. We use recurrent relational networks to solve Sudoku puzzles and achieve state-of-the-art results by solving 96.6% of the hardest Sudoku puzzles, where relational networks fail to solve any. We also apply our model to the BaBi textual QA dataset solving 19/20 tasks which is competitive with state-of-the-art sparse differentiable neural computers. The recurrent relational network is a general purpose module that can augment any neural network model with the capacity to do many-step relational reasoning.

Locally Smoothed Neural Networks

Convolutional Neural Networks (CNN) and the locally connected layer are limited in capturing the importance and relations of different local receptive fields, which are often crucial for tasks such as face verification, visual question answering, and word sequence prediction. To tackle the issue, we propose a novel locally smoothed neural network (LSNN) in this paper. The main idea is to represent the weight matrix of the locally connected layer as the product of the kernel and the smoother, where the kernel is shared over different local receptive fields, and the smoother is for determining the importance and relations of different local receptive fields. Specifically, a multi-variate Gaussian function is utilized to generate the smoother, for modeling the location relations among different local receptive fields. Furthermore, the content information can also be leveraged by setting the mean and precision of the Gaussian function according to the content. Experiments on some variant of MNIST clearly show our advantages over CNN and locally connected layer.

The Stochastic Firefighter Problem

The dynamics of infectious diseases spread is crucial in determining their risk and offering ways to contain them. We study sequential vaccination of individuals in networks. In the original (deterministic) version of the Firefighter problem, a fire breaks out at some node of a given graph. At each time step, b nodes can be protected by a firefighter and then the fire spreads to all unprotected neighbors of the nodes on fire. The process ends when the fire can no longer spread. We extend the Firefighter problem to a probabilistic setting, where the infection is stochastic. We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget. We derive methods for calculating upper and lower bounds of the expected number of infected individuals, as well as provide estimates on the budget needed for containment in expectation. We calculate these explicitly on trees, d-dimensional grids, and Erd\H{o}s R\'{e}nyi graphs. Finally, we construct a state-dependent budget allocation strategy and demonstrate its superiority over constant budget allocation on real networks following a first order acquaintance vaccination policy.

GraphGAN: Graph Representation Learning with Generative Adversarial Nets

The goal of graph representation learning is to embed each vertex in a graph into a low-dimensional vector space. Existing graph representation learning methods can be classified into two categories: generative models that learn the underlying connectivity distribution in the graph, and discriminative models that predict the probability of edge existence between a pair of vertices. In this paper, we propose GraphGAN, an innovative graph representation learning framework unifying above two classes of methods, in which the generative model and discriminative model play a game-theoretical minimax game. Specifically, for a given vertex, the generative model tries to fit its underlying true connectivity distribution over all other vertices and produces ‘fake’ samples to fool the discriminative model, while the discriminative model tries to detect whether the sampled vertex is from ground truth or generated by the generative model. With the competition between these two models, both of them can alternately and iteratively boost their performance. Moreover, when considering the implementation of generative model, we propose a novel graph softmax to overcome the limitations of traditional softmax function, which can be proven satisfying desirable properties of normalization, graph structure awareness, and computational efficiency. Through extensive experiments on real-world datasets, we demonstrate that GraphGAN achieves substantial gains in a variety of applications, including link prediction, node classification, and recommendation, over state-of-the-art baselines.

A Generic Framework for Engineering Graph Canonization Algorithms

The state-of-the-art tools for practical graph canonization are all based on the individualization-refinement paradigm, and their difference is primarily in the choice of heuristics they include and in the actual tool implementation. It is thus not possible to make a direct comparison of how individual algorithmic ideas affect the performance on different graph classes. We present an algorithmic software framework that facilitates implementation of heuristics as independent extensions to a common core algorithm. It therefore becomes easy to perform a detailed comparison of the performance and behaviour of different algorithmic ideas. Implementations are provided of a range of algorithms for tree traversal, target cell selection, and node invariant, including choices from the literature and new variations. The framework readily supports extraction and visualization of detailed data from separate algorithm executions for subsequent analysis and development of new heuristics. Using collections of different graph classes we investigate the effect of varying the selections of heuristics, often revealing exactly which individual algorithmic choice is responsible for particularly good or bad performance. On several benchmark collections, including a newly proposed class of difficult instances, we additionally find that our implementation performs better than the current state-of-the-art tools.

Building Machines that Learn and Think for Themselves: Commentary on Lake et al., Behavioral and Brain Sciences, 2017

We agree with Lake and colleagues on their list of key ingredients for building humanlike intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand-engineering. We believe an approach centered on autonomous learning has the greatest chance of success as we scale toward real-world complexity, tackling domains for which ready-made formal models are not available. Here we survey several important examples of the progress that has been made toward building autonomous agents with humanlike abilities, and highlight some outstanding challenges.

Identifying user habits through data mining on call data records

In this paper we propose a framework for identifying patterns and regularities in the pseudo-anonymized Call Data Records (CDR) pertaining a generic subscriber of a mobile operator. We face the challenging task of automatically deriving meaningful information from the available data, by using an unsupervised procedure of cluster analysis and without including in the model any \textit{a-priori} knowledge on the applicative context. Clusters mining results are employed for understanding users’ habits and to draw their characterizing profiles. We propose two implementations of the data mining procedure; the first is based on a novel system for clusters and knowledge discovery called LD-ABCD, capable of retrieving clusters and, at the same time, to automatically discover for each returned cluster the most appropriate dissimilarity measure (local metric). The second approach instead is based on PROCLUS, the well-know subclustering algorithm. The dataset under analysis contains records characterized only by few features and, consequently, we show how to generate additional fields which describe implicit information hidden in data. Finally, we propose an effective graphical representation of the results of the data-mining procedure, which can be easily understood and employed by analysts for practical applications.

From Monte Carlo to Las Vegas: Improving Restricted Boltzmann Machine Training Through Stopping Sets

We propose a Las Vegas transformation of Markov Chain Monte Carlo (MCMC) estimators of Restricted Boltzmann Machines (RBMs). We denote our approach Markov Chain Las Vegas (MCLV). MCLV gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has maximum number of Markov chain steps K (referred as MCLV-K). We present a MCLV-K gradient estimator (LVS-K) for RBMs and explore the correspondence and differences between LVS-K and Contrastive Divergence (CD-K), with LVS-K significantly outperforming CD-K training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models.

Generating Analytic Insights on Human Behaviour using Image Processing
Superconductiving fluctuations at arbitrary disorder strength
Deep Sparse Coding for Invariant Multimodal Halle Berry Neurons
Dynamic High Resolution Deformable Articulated Tracking
Personalization of Saliency Estimation
Manifold Assumption and Defenses Against Adversarial Perturbations
Relating Input Concepts to Convolutional Neural Network Decisions
Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition
A high order time discretization of the solution of the non-linear filtering problem
OSQP: An Operator Splitting Solver for Quadratic Programs
The Riemannian Geometry of Deep Generative Models
Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition
Disagreement-based combinatorial pure exploration: Efficient algorithms and an analysis with localization
First-order methods for constrained convex programming based on linearized augmented Lagrangian function
Avoidance bases for formulas with reversal
Variance-based sensitivity analysis for time-dependent processes
Alternating minimization, scaling algorithms, and the null-cone problem from invariant theory
Identifying Most Walkable Direction for Navigation in an Outdoor Environment
Revisiting the Set Cover Conjecture
A generative adversarial framework for positive-unlabeled classification
Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio
Efficient image deployment in cloud environments
Clonal analysis of newborn hippocampal dentate granule cell proliferation and development in temporal lobe epilepsy
Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces
Constrained empirical Bayes priors on regression coefficients
Schur Number Five
Modeling and emulation of nonstationary Gaussian fields
Robust Stackelberg Equilibria in Extensive-Form Games and Extension to Limited Lookahead
Parameter Estimation in Gaussian Mixture Models with Malicious Noise, without Balanced Mixing Coefficients
Application of Natural Language Processing to Determine User Satisfaction in Public Services
A note on recent criticisms to Birnbaum’s theorem
SNeCT: Scalable network constrained Tucker decomposition for integrative multi-platform data analysis
A Theorem on Matroid Homomorphism
Integrating both Visual and Audio Cues for Enhanced Video Caption
Asymmetric Action Abstractions for Multi-Unit Control in Adversarial Real-Time Games
CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation
Visual Question Answering as a Meta Learning Task
The Devil is in the Middle: Exploiting Mid-level Representations for Cross-Domain Instance Matching
Learning Discrete Distributions from Untrusted Batches
A Quantum-Inspired Ensemble Method and Quantum-Inspired Forest Regressors
An Amendment to ‘Control Contraction Metrics: Convex and Intrinsic Criteria for Nonlinear Feedback Design’
PULasso: High-dimensional variable selection with presence-only data
Contracting Nonlinear Observers: Convex Optimization and Learning from Data
Signal-Aligned Network Coding in K-User MIMO Interference Channels with Limited Receiver Cooperation
Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions
On the Feasibility of Full-duplex Large-scale MIMO Cellular Systems
Familywise Error Rate Controlling Procedures for Discrete Data
Accurate Real Time Localization Tracking in A Clinical Environment using Bluetooth Low Energy and Deep Learning
Structural Characteristics of Two-Sender Index Coding
Multiagent Simple Temporal Problem: The Arc-Consistency Approach
A Face Fairness Framework for 3D Meshes
D-solutions of BSDEs with Poisson jumps
Hypergraph $p$-Laplacian: A Differential Geometry View
Run-and-Inspect Method for Nonconvex Optimization and Global Optimality Bounds for R-Local Minimizers
Object Discovery By Generative Adversarial & Ranking Networks
Link Selection in Hybrid RF/VLC Systems under Statistical Queueing Constraints
Video Semantic Object Segmentation by Self-Adaptation of DCNN
Estimation of the multifractional function and the stability index of linear multifractional stable processes
AlignedReID: Surpassing Human-Level Performance in Person Re-Identification
Minimum co-degree condition for perfect matchings in k-partite k-graphs
An Analysis of Scale Invariance in Object Detection – SNIP
Generalized scale functions of standard processes with no positive jumps
On the Automatic Generation of Medical Imaging Reports
Strictly local one-dimensional topological quantum error correction with symmetry-constrained cellular automata
A multiobjective deep learning approach for predictive classification in Neuroblastoma
Ultra-Reliable Short-Packet Communications: Half-Duplex or Full-Duplex Relaying?
Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification
Phase diagram for the Harper model of the honeycomb lattice
Post-hoc labeling of arbitrary EEG recordings for data-efficient evaluation of neural decoding methods
An asymptotic bound for the strong chromatic number
Asymptotically optimal Boolean functions
Compressed Indexing with Signature Grammars
Micro and Macro Pedestrian Dynamics in Counterflow: the Impact of Social Groups
Proportionally Representative Participatory Budgeting: Axioms and Algorithms
An influence-based fast preceding questionnaire model for elderly assessments
Integral Human Pose Regression
Does Higher Order LSTM Have Better Accuracy in Chunking and Named Entity Recognition?
Multi-Level ResNets with Stacked SRUs for Action Recognition
Two-Dimensional Super-Resolution via Convex Relaxation
Sparsity-based Cholesky Factorization and its Application to Hyperspectral Anomaly Detection
3D Point Cloud Classification and Segmentation using 3D Modified Fisher Vector Representation for Convolutional Neural Networks
Adversarial Phenomenon in the Eyes of Bayesian Deep Learning
Decomposition Strategies for Constructive Preference Elicitation
Effect of self-deflection on a totally asymmetric simple exclusion process with functions of site-assignments
Random spectrahedra
A sufficient condition of a graph with boxicity at most its chromatic number
Efficient constrained sensor placement for observability of linear systems via greedy algorithms
Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses
High-dimensional Motion Planning using Latent Variable Models via Approximate Inference
Unleashing the Potential of CNNs for Interpretable Few-Shot Learning
Neuron-level Selective Context Aggregation for Scene Segmentation
Analysis of atmospheric effects on satellite based quantum communication: A comparative study
Variance reduction for antithetic integral control of stochastic reaction networks
Scaling limits of Cayley graphs with polynomially growing balls
Engineering Resilient Collective Adaptive Systems by Self-Stabilisation
A flag variety for the Delta Conjecture
Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network
Robust Bayes-Like Estimation: Rho-Bayes estimation
Adaptive Cardinality Estimation
Existence of densities for the dynamic $Φ^4_3$ model
A correlational analysis of multiagent sensorimotor interactions: clustering autonomous and controllable entities
Allocation Problems in Ride-Sharing Platforms: Online Matching with Offline Reusable Resources
Efficient decoding of random errors for quantum expander codes
Riemannian tangent space mapping and elastic net regularization for cost-effective EEG markers of brain atrophy in Alzheimer’s disease
ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks
Budget Allocation in Binary Opinion Dynamics
Variational Bayesian Inference For A Scale Mixture Of Normal Distributions Handling Missing Data
Mixture-of-tastes Models for Representing Users with Diverse Interests
Conditional Image-Text Embedding Networks
BlockDrop: Dynamic Inference Paths in Residual Networks
An Erdős-Kac law for local solubility in families of varieties
Hybrid Analog and Digital Beamforming for mmWave OFDM Large-Scale Antenna Arrays
An Orthogonally Equivariant Estimator of the Covariance Matrix in High Dimensions and Small Sample Size
Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes
SolarisNet: A Deep Regression Network for Solar Radiation Prediction
Equivalence of Equilibrium Propagation and Recurrent Backpropagation
On the optimal control of some nonsmooth distributed parameter systems arising in mechanics
Leverage Score Sampling for Faster Accelerated Regression and ERM
Fluctuation exponents for stationary exactly solvable lattice polymer models via a Mellin transform framework
Shellability is NP-complete
On Computing Min-Degree Elimination Orderings
VITON: An Image-based Virtual Try-on Network
Combating Computational Heterogeneity in Large-Scale Distributed Computing via Work Exchange
Fast and Stable Pascal Matrix Algorithms