A fluctuation theorem for time-series of signal-response models with the backward transfer entropy

The irreversibility of trajectories in stochastic dynamical systems is linked to the structure of their causal representation in terms of Bayesian networks. We consider stochastic maps resulting from a time discretization with interval \tau of signal-response models, and we find an integral fluctuation theorem that sets the backward transfer entropy as a lower bound to the conditional entropy production. We apply this to a linear signal-response model providing analytical solutions, and to a nonlinear model of receptor-ligand systems. We show that the observational time \tau has to be fine-tuned for an efficient detection of the irreversibility in time-series.

Principal Component Analysis with Tensor Train Subspace

Tensor train is a hierarchical tensor network structure that helps alleviate the curse of dimensionality by parameterizing large-scale multidimensional data via a set of network of low-rank tensors. Associated with such a construction is a notion of Tensor Train subspace and in this paper we propose a TT-PCA algorithm for estimating this structured subspace from the given data. By maintaining low rank tensor structure, TT-PCA is more robust to noise comparing with PCA or Tucker-PCA. This is borne out numerically by testing the proposed approach on the Extended YaleFace Dataset B.

Fractal AI: A fragile theory of intelligence

Fractal AI is a theory for general artificial intelligence. It allows to derive new mathematical tools that constitute the foundations for a new kind of stochastic calculus, by modelling information using cellular automaton-like structures instead of smooth functions. In the repository included we are presenting a new Agent, derived from the first principles of the theory, which is capable of solving Atari games several orders of magnitude more efficiently than other similar techniques, like Monte Carlo Tree Search. The code provided shows how it is now possible to beat some of the current state of the art benchmarks on Atari games, without previous learning and using less than 1000 samples to calculate each one of the actions when standard MCTS uses 3 Million samples. Among other things, Fractal AI makes it possible to generate a huge database of top performing examples with very little amount of computation required, transforming Reinforcement Learning into a supervised problem. The algorithm presented is capable of solving the exploration vs exploitation dilemma on both the discrete and continuous cases, while maintaining control over any aspect of the behavior of the Agent. From a general approach, new techniques presented here have direct applications to other areas such as: Non-equilibrium thermodynamics, chemistry, quantum physics, economics, information theory, and non-linear control theory.

Neural Lattice Language Models

In this work, we propose a new language modeling paradigm that has the ability to perform both prediction and moderation of information flow at multiple granularities: neural lattice language models. These models construct a lattice of possible paths through a sentence and marginalize across this lattice to calculate sequence probabilities or optimize parameters. This approach allows us to seamlessly incorporate linguistic intuitions – including polysemy and existence of multi-word lexical items – into our language model. Experiments on multiple language modeling tasks show that English neural lattice language models that utilize polysemous embeddings are able to improve perplexity by 9.95% relative to a word-level baseline, and that a Chinese model that handles multi-character tokens is able to improve perplexity by 20.94% relative to a character-level baseline.

Algorithmic Social Intervention

Social and behavioral interventions are a critical tool for governments and communities to tackle deep-rooted societal challenges such as homelessness, disease, and poverty. However, real-world interventions are almost always plagued by limited resources and limited data, which creates a computational challenge: how can we use algorithmic techniques to enhance the targeting and delivery of social and behavioral interventions? The goal of my thesis is to provide a unified study of such questions, collectively considered under the name ‘algorithmic social intervention’. This proposal introduces algorithmic social intervention as a distinct area with characteristic technical challenges, presents my published research in the context of these challenges, and outlines open problems for future work. A common technical theme is decision making under uncertainty: how can we find actions which will impact a social system in desirable ways under limitations of knowledge and resources? The primary application area for my work thus far is public health, e.g. HIV or tuberculosis prevention. For instance, I have developed a series of algorithms which optimize social network interventions for HIV prevention. Two of these algorithms have been pilot-tested in collaboration with LA-area service providers for homeless youth, with preliminary results showing substantial improvement over status-quo approaches. My work also spans other topics in infectious disease prevention and underlying algorithmic questions in robust and risk-aware submodular optimization.

Ranking with Adaptive Neighbors

Retrieving the most similar objects in a large-scale database for a given query is a fundamental building block in many application domains, ranging from web searches, visual, cross media, and document retrievals. State-of-the-art approaches have mainly focused on capturing the underlying geometry of the data manifolds. Graph-based approaches, in particular, define various diffusion processes on weighted data graphs. Despite success, these approaches rely on fixed-weight graphs, making ranking sensitive to the input affinity matrix. In this study, we propose a new ranking algorithm that simultaneously learns the data affinity matrix and the ranking scores. The proposed optimization formulation assigns adaptive neighbors to each point in the data based on the local connectivity, and the smoothness constraint assigns similar ranking scores to similar data points. We develop a novel and efficient algorithm to solve the optimization problem. Evaluations using synthetic and real datasets suggest that the proposed algorithm can outperform the existing methods.

Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data

Paucity of large curated hand-labeled training data for every domain-of-interest forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label has given a set of weak labeling functions. We validated our method on the MNIST, Fashion MNIST, CIFAR 10 and SVHN datasets, and it outperformed many state-of-the-art models. We conducted extensive experiments to study its usefulness, as well as showed how the proposed ADP framework can be used for transfer learning as well as multi-task learning, where data from two domains are generated simultaneously using the framework along with the label information. Our future work will involve understanding the theoretical implications of this new framework from a game-theoretic perspective, as well as explore the performance of the method on more complex datasets.

Latent Tree Variational Autoencoder for Joint Representation Learning and Multidimensional Clustering

Recently, deep learning based clustering methods are shown superior to traditional ones by jointly conducting representation learning and clustering. These methods rely on the assumptions that the number of clusters is known, and that there is one single partition over the data and all attributes define that partition. However, in real-world applications, prior knowledge of the number of clusters is usually unavailable and there are multiple ways to partition the data based on subsets of attributes. To resolve the issues, we propose latent tree variational autoencoder (LTVAE), which simultaneously performs representation learning and multidimensional clustering. LTVAE learns latent embeddings from data, discovers multi-facet clustering structures based on subsets of latent features, and automatically determines the number of clusters in each facet. Experiments show that the proposed method achieves state-of-the-art clustering performance and reals reasonable multifacet structures of the data.

Algebraic Machine Learning

Machine learning algorithms use error function minimization to fit a large set of parameters in a preexisting model. However, error minimization eventually leads to a memorization of the training dataset, losing the ability to generalize to other datasets. To achieve generalization something else is needed, for example a regularization method or stopping the training when error in a validation dataset is minimal. Here we propose a different approach to learning and generalization that is parameter-free, fully discrete and that does not use function minimization. We use the training data to find an algebraic representation with minimal size and maximal freedom, explicitly expressed as a product of irreducible components. This algebraic representation is shown to directly generalize, giving high accuracy in test data, more so the smaller the representation. We prove that the number of generalizing representations can be very large and the algebra only needs to find one. We also derive and test a relationship between compression and error rate. We give results for a simple problem solved step by step, hand-written character recognition, and the Queens Completion problem as an example of unsupervised learning. As an alternative to statistical learning, \enquote{algebraic learning} may offer advantages in combining bottom-up and top-down information, formal concept derivation from data and large-scale parallelization.

Newton-type Alternating Minimization Algorithm for Convex Optimization

We propose NAMA (Newton-type Alternating Minimization Algorithm) for solving structured nonsmooth convex optimization problems where the sum of two functions is to be minimized, one being strongly convex and the other composed with a linear mapping. The proposed algorithm is a line-search method over a continuous, real-valued, exact penalty function for the corresponding dual problem, which is computed by evaluating the augmented Lagrangian at the primal points obtained by alternating minimizations. As a consequence, NAMA relies on exactly the same computations as the classical alternating minimization algorithm (AMA), also known as the dual proximal gradient method. Under standard assumptions the proposed algorithm possesses strong convergence properties, while under mild additional assumptions the asymptotic convergence is superlinear, provided that the search directions are chosen according to quasi-Newton formulas. Due to its simplicity, the proposed method is well suited for embedded applications and large-scale problems. Experiments show that using limited-memory directions in NAMA greatly improves the convergence speed over AMA and its accelerated variant.

LCANet: End-to-End Lipreading with Cascaded Attention-CTC
Inference on a Distribution from Noisy Draws
On the Algebra in Boole’s Laws of Thought
Closure Operators and Spam Resistance for PageRank
Conditional Activation for Diverse Neurons in Heterogeneous Networks
A Probabilistic Disease Progression Model for Predicting Future Clinical Outcome
Decentralised Learning in Systems with Many, Many Strategic Agents
Spin-glass–like aging in colloidal and granular glasses
Limiting probabilities for vertices of a given rank in rooted trees
Variational zero-inflated Gaussian processes with sparse kernels
Controlled Islanding via Weak Submodularity
Learning to Explore with Meta-Policy Gradient
Analysis of Nonautonomous Adversarial Systems
Monochromatic loose paths in multicolored $k$-uniform cliques
Investigating the Effect of Music and Lyrics on Spoken-Word Recognition
Discussion on Bayesian Cluster Analysis: Point Estimation and Credible Balls by Sara Wade and Zoubin Ghahramani
Hot-Stuff the Linear, Optimal-Resilience, One-Message BFT Devil
A Multi-Modal Approach to Infer Image Affect
Development of Safety Performance Functions: Incorporating Unobserved Heterogeneity and Functional Form Analysis
Revisiting Salient Object Detection: Simultaneous Detection, Ranking, and Subitizing of Multiple Salient Objects
Block Diagonally Dominant Positive Definite Sub-optimal Filters and Smoothers
The $\mathbb{Z}_2$-genus of Kuratowski minors
Smoothing Spline Growth Curves With Covariates
Symbol-level precoding is symbol-perturbed ZF when energy Efficiency is sought
Noisy Adaptive Group Testing: Bounds and Algorithms
Model-Agnostic Private Learning via Stability
Robustness to incorrect priors in partially observed stochastic control
Bucket Renormalization for Approximate Inference
PT-Spike: A Precise-Time-Dependent Single Spike Neuromorphic Architecture with Efficient Supervised Learning
Uplift Modeling from Separate Labels
MT-Spike: A Multilayer Time-based Spiking Neuromorphic Architecture with Temporal Error Backpropagation
Topology guaranteed segmentation of the human retina from OCT using convolutional neural networks
Linear Quadratic Optimal Control and Stabilization for Discrete-time Markov Jump Linear Systems
Defensive Collaborative Multi-task Training – Defending against Adversarial Attack towards Deep Neural Networks
Damped Newton’s Method on Riemannian Manifolds
Feature Selection and Model Comparison on Microsoft Learning-to-Rank Data Sets
Signal Processing and Piecewise Convex Estimation
Feature extraction without learning in an analog Spatial Pooler memristive-CMOS circuit design of Hierarchical Temporal Memory
Neuron inspired data encoding memristive multi-level memory cell
Nearly defect-free dynamical models of disordered solids: The case of amorphous silicon
Network Coding for Real-time Wireless Communication for Automation
Bernstein type inequalities for self-normalized martingales with applications
The 2017 AIBIRDS Competition
Multiplicative Updates for Elastic Net Regularized Convolutional NMF Under $β$-Divergence
How to evaluate sentiment classifiers for Twitter time-ordered data?
A curious class of Hankel determinants
Fast generalised linear models by database sampling and one-step polishing
1D Mott variable-range hopping with external field
A generalization of the steepest-edge rule and its number of simplex iterations for a nondegenerate LP
xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
Multi-objective Analysis of MAP-Elites Performance
Localization due to topological stochastic disorder in active networks
Can Autism be Catered with Artificial Intelligence-Assisted Intervention Technology? A Literature Review
Approximative Theorem of Incomplete Riemann-Stieltjes Sum of Stochastic Integral
Determinantal elliptic Selberg integrals
Interlocking permutations
Higher order concentration in presence of Poincaré-type inequalities
Spatio-temporal Deep De-aliasing for Prospective Assessment of Real-time Ventricular Volumes
EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching
The Value of Reactive Power for Voltage Control in Lossy Networks
Enhancing Favorable Propagation in Cell-Free Massive MIMO Through Spatial User Grouping
Combining Multi-level Contexts of Superpixel using Convolutional Neural Networks to perform Natural Scene Labeling
The complete enumeration of 4-polytopes and 3-spheres with nine vertices
Building Sparse Deep Feedforward Networks using Tree Receptive Fields
LivDet 2017 Fingerprint Liveness Detection Competition 2017
Deep Image Demosaicking using a Cascade of Convolutional Residual Denoising Networks
MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge
A complex network framework to model cognition: unveiling correlation structures from connectivity
Stochastic Dynamic Utilities and Inter-Temporal Preferences
On the connectivity threshold for colorings of random graphs and hypergraphs
Identifiability of Undirected Dynamical Networks: a Graph-Theoretic Approach
The skeleton of the UIPT, seen from infinity
A mean-field game model for homogeneous flocking
Addressing the Challenges in Federating Edge Resources
Lovász extension and graph cut
Face-MagNet: Magnifying Feature Maps to Detect Small Faces
Products and Projective Limits of Continuous Valuations on $T_0$ Spaces
Learning to Play General Video-Games via an Object Embedding Network
All Graphs are S^n-Synchronizing; What About St(p, n)?
Rotation-Sensitive Regression for Oriented Scene Text Detection
On the Ambiguity of Registration Uncertainty
Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning
Constant delay algorithms for regular document spanners
Secure SWIPT for Directional Modulation Aided AF Relaying Networks
A Unified View of False Discovery Rate Control: Reconciliation of Bayesian and Frequentist Approaches
Domain Adaptation on Graphs by Learning Aligned Graph Bases
On the Security of Some Compact Keys for McEliece Scheme
A quantitative analysis of the 2017 Honduran election and the argument used to defend its outcome
Joint Modelling of Location, Scale and Skewness Parameters of the Skew Laplace Normal Distribution
A generalization of Croot-Lev -Pach’s Lemma and a new upper bound for the size of difference sets in polynomial rings
Euler-Lagrangian approach to 3D stochastic Euler equations
Efficient Realization of Givens Rotation through Algorithm-Architecture Co-design for Acceleration of QR Factorization
Predicting Oral Disintegrating Tablet Formulations by Neural Network Techniques
Measurement-based adaptation protocol with quantum reinforcement learning
Illumination-aware Faster R-CNN for Robust Multispectral Pedestrian Detection
Optimal Bounds for Johnson-Lindenstrauss Transformations
FEVER: a large-scale dataset for Fact Extraction and VERification
Approximating Generalized Network Design under (Dis)economies of Scale with Applications to Energy Efficiency
Complex activity patterns generated by short-term synaptic plasticity
Rigid reflections and Kac–Moody algebras
Greedy can also beat pure dynamic programming
Familywise error control in multi-armed response-adaptive trials
Towards Monocular Digital Elevation Model (DEM) Estimation by Convolutional Neural Networks – Application on Synthetic Aperture Radar Images
LSH Microbatches for Stochastic Gradients: Value in Rearrangement
Computational Techniques for the Analysis of Small Signals in High-Statistics Neutrino Oscillation Experiments
On the Universal Approximation Property and Equivalence of Stochastic Computing-based Neural Networks and Binary Neural Networks
Constructing Imperfect Recall Abstractions to Solve Large Extensive-Form Games
Shift-invert diagonalization of large many-body localizing spin chains
$H$-colouring $P_t$-free graphs in subexponential time
Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning
Image Colorization with Generative Adversarial Networks
Approximate Query Matching for Image Retrieval
Imitation Learning with Concurrent Actions in 3D Games
Additive quantile regression for clustered data with an application to children’s physical activity
Optimal energy decay for the wave-heat system on a rectangular domain
Averaging Weights Leads to Wider Optima and Better Generalization
Maximum likelihood drift estimation for a threshold diffusion
On trend and its derivatives estimation in repeated time series with subordinated long-range dependent errors
Generalised Structural CNNs (SCNNs) for time series data with arbitrary graph-toplogies
Totally Ordered Measured Trees and Splitting Trees with Infinite Variation II: Prolific Skeleton Decomposition