A fluctuation theorem for time-series of signal-response models with the backward transfer entropy
The irreversibility of trajectories in stochastic dynamical systems is linked to the structure of their causal representation in terms of Bayesian networks. We consider stochastic maps resulting from a time discretization with interval \tau of signal-response models, and we find an integral fluctuation theorem that sets the backward transfer entropy as a lower bound to the conditional entropy production. We apply this to a linear signal-response model providing analytical solutions, and to a nonlinear model of receptor-ligand systems. We show that the observational time \tau has to be fine-tuned for an efficient detection of the irreversibility in time-series.
Principal Component Analysis with Tensor Train Subspace
Tensor train is a hierarchical tensor network structure that helps alleviate the curse of dimensionality by parameterizing large-scale multidimensional data via a set of network of low-rank tensors. Associated with such a construction is a notion of Tensor Train subspace and in this paper we propose a TT-PCA algorithm for estimating this structured subspace from the given data. By maintaining low rank tensor structure, TT-PCA is more robust to noise comparing with PCA or Tucker-PCA. This is borne out numerically by testing the proposed approach on the Extended YaleFace Dataset B.
Fractal AI: A fragile theory of intelligence
Fractal AI is a theory for general artificial intelligence. It allows to derive new mathematical tools that constitute the foundations for a new kind of stochastic calculus, by modelling information using cellular automaton-like structures instead of smooth functions. In the repository included we are presenting a new Agent, derived from the first principles of the theory, which is capable of solving Atari games several orders of magnitude more efficiently than other similar techniques, like Monte Carlo Tree Search. The code provided shows how it is now possible to beat some of the current state of the art benchmarks on Atari games, without previous learning and using less than 1000 samples to calculate each one of the actions when standard MCTS uses 3 Million samples. Among other things, Fractal AI makes it possible to generate a huge database of top performing examples with very little amount of computation required, transforming Reinforcement Learning into a supervised problem. The algorithm presented is capable of solving the exploration vs exploitation dilemma on both the discrete and continuous cases, while maintaining control over any aspect of the behavior of the Agent. From a general approach, new techniques presented here have direct applications to other areas such as: Non-equilibrium thermodynamics, chemistry, quantum physics, economics, information theory, and non-linear control theory.
Neural Lattice Language Models
In this work, we propose a new language modeling paradigm that has the ability to perform both prediction and moderation of information flow at multiple granularities: neural lattice language models. These models construct a lattice of possible paths through a sentence and marginalize across this lattice to calculate sequence probabilities or optimize parameters. This approach allows us to seamlessly incorporate linguistic intuitions – including polysemy and existence of multi-word lexical items – into our language model. Experiments on multiple language modeling tasks show that English neural lattice language models that utilize polysemous embeddings are able to improve perplexity by 9.95% relative to a word-level baseline, and that a Chinese model that handles multi-character tokens is able to improve perplexity by 20.94% relative to a character-level baseline.
Algorithmic Social Intervention
Social and behavioral interventions are a critical tool for governments and communities to tackle deep-rooted societal challenges such as homelessness, disease, and poverty. However, real-world interventions are almost always plagued by limited resources and limited data, which creates a computational challenge: how can we use algorithmic techniques to enhance the targeting and delivery of social and behavioral interventions? The goal of my thesis is to provide a unified study of such questions, collectively considered under the name ‘algorithmic social intervention’. This proposal introduces algorithmic social intervention as a distinct area with characteristic technical challenges, presents my published research in the context of these challenges, and outlines open problems for future work. A common technical theme is decision making under uncertainty: how can we find actions which will impact a social system in desirable ways under limitations of knowledge and resources? The primary application area for my work thus far is public health, e.g. HIV or tuberculosis prevention. For instance, I have developed a series of algorithms which optimize social network interventions for HIV prevention. Two of these algorithms have been pilot-tested in collaboration with LA-area service providers for homeless youth, with preliminary results showing substantial improvement over status-quo approaches. My work also spans other topics in infectious disease prevention and underlying algorithmic questions in robust and risk-aware submodular optimization.
Ranking with Adaptive Neighbors
Retrieving the most similar objects in a large-scale database for a given query is a fundamental building block in many application domains, ranging from web searches, visual, cross media, and document retrievals. State-of-the-art approaches have mainly focused on capturing the underlying geometry of the data manifolds. Graph-based approaches, in particular, define various diffusion processes on weighted data graphs. Despite success, these approaches rely on fixed-weight graphs, making ranking sensitive to the input affinity matrix. In this study, we propose a new ranking algorithm that simultaneously learns the data affinity matrix and the ranking scores. The proposed optimization formulation assigns adaptive neighbors to each point in the data based on the local connectivity, and the smoothness constraint assigns similar ranking scores to similar data points. We develop a novel and efficient algorithm to solve the optimization problem. Evaluations using synthetic and real datasets suggest that the proposed algorithm can outperform the existing methods.
Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data
Paucity of large curated hand-labeled training data for every domain-of-interest forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label has given a set of weak labeling functions. We validated our method on the MNIST, Fashion MNIST, CIFAR 10 and SVHN datasets, and it outperformed many state-of-the-art models. We conducted extensive experiments to study its usefulness, as well as showed how the proposed ADP framework can be used for transfer learning as well as multi-task learning, where data from two domains are generated simultaneously using the framework along with the label information. Our future work will involve understanding the theoretical implications of this new framework from a game-theoretic perspective, as well as explore the performance of the method on more complex datasets.
Latent Tree Variational Autoencoder for Joint Representation Learning and Multidimensional Clustering
Recently, deep learning based clustering methods are shown superior to traditional ones by jointly conducting representation learning and clustering. These methods rely on the assumptions that the number of clusters is known, and that there is one single partition over the data and all attributes define that partition. However, in real-world applications, prior knowledge of the number of clusters is usually unavailable and there are multiple ways to partition the data based on subsets of attributes. To resolve the issues, we propose latent tree variational autoencoder (LTVAE), which simultaneously performs representation learning and multidimensional clustering. LTVAE learns latent embeddings from data, discovers multi-facet clustering structures based on subsets of latent features, and automatically determines the number of clusters in each facet. Experiments show that the proposed method achieves state-of-the-art clustering performance and reals reasonable multifacet structures of the data.
Algebraic Machine Learning
Machine learning algorithms use error function minimization to fit a large set of parameters in a preexisting model. However, error minimization eventually leads to a memorization of the training dataset, losing the ability to generalize to other datasets. To achieve generalization something else is needed, for example a regularization method or stopping the training when error in a validation dataset is minimal. Here we propose a different approach to learning and generalization that is parameter-free, fully discrete and that does not use function minimization. We use the training data to find an algebraic representation with minimal size and maximal freedom, explicitly expressed as a product of irreducible components. This algebraic representation is shown to directly generalize, giving high accuracy in test data, more so the smaller the representation. We prove that the number of generalizing representations can be very large and the algebra only needs to find one. We also derive and test a relationship between compression and error rate. We give results for a simple problem solved step by step, hand-written character recognition, and the Queens Completion problem as an example of unsupervised learning. As an alternative to statistical learning, \enquote{algebraic learning} may offer advantages in combining bottom-up and top-down information, formal concept derivation from data and large-scale parallelization.
Newton-type Alternating Minimization Algorithm for Convex Optimization
We propose NAMA (Newton-type Alternating Minimization Algorithm) for solving structured nonsmooth convex optimization problems where the sum of two functions is to be minimized, one being strongly convex and the other composed with a linear mapping. The proposed algorithm is a line-search method over a continuous, real-valued, exact penalty function for the corresponding dual problem, which is computed by evaluating the augmented Lagrangian at the primal points obtained by alternating minimizations. As a consequence, NAMA relies on exactly the same computations as the classical alternating minimization algorithm (AMA), also known as the dual proximal gradient method. Under standard assumptions the proposed algorithm possesses strong convergence properties, while under mild additional assumptions the asymptotic convergence is superlinear, provided that the search directions are chosen according to quasi-Newton formulas. Due to its simplicity, the proposed method is well suited for embedded applications and large-scale problems. Experiments show that using limited-memory directions in NAMA greatly improves the convergence speed over AMA and its accelerated variant.
• LCANet: End-to-End Lipreading with Cascaded Attention-CTC
• Inference on a Distribution from Noisy Draws
• On the Algebra in Boole’s Laws of Thought
• Closure Operators and Spam Resistance for PageRank
• Conditional Activation for Diverse Neurons in Heterogeneous Networks
• A Probabilistic Disease Progression Model for Predicting Future Clinical Outcome
• Decentralised Learning in Systems with Many, Many Strategic Agents
• Spin-glass–like aging in colloidal and granular glasses
• Limiting probabilities for vertices of a given rank in rooted trees
• Variational zero-inflated Gaussian processes with sparse kernels
• Controlled Islanding via Weak Submodularity
• Learning to Explore with Meta-Policy Gradient
• Analysis of Nonautonomous Adversarial Systems
• Monochromatic loose paths in multicolored $k$-uniform cliques
• Investigating the Effect of Music and Lyrics on Spoken-Word Recognition
• Discussion on Bayesian Cluster Analysis: Point Estimation and Credible Balls by Sara Wade and Zoubin Ghahramani
• Hot-Stuff the Linear, Optimal-Resilience, One-Message BFT Devil
• A Multi-Modal Approach to Infer Image Affect
• Development of Safety Performance Functions: Incorporating Unobserved Heterogeneity and Functional Form Analysis
• Revisiting Salient Object Detection: Simultaneous Detection, Ranking, and Subitizing of Multiple Salient Objects
• Block Diagonally Dominant Positive Definite Sub-optimal Filters and Smoothers
• The $\mathbb{Z}_2$-genus of Kuratowski minors
• Smoothing Spline Growth Curves With Covariates
• Symbol-level precoding is symbol-perturbed ZF when energy Efficiency is sought
• Noisy Adaptive Group Testing: Bounds and Algorithms
• Model-Agnostic Private Learning via Stability
• Robustness to incorrect priors in partially observed stochastic control
• Bucket Renormalization for Approximate Inference
• PT-Spike: A Precise-Time-Dependent Single Spike Neuromorphic Architecture with Efficient Supervised Learning
• Uplift Modeling from Separate Labels
• MT-Spike: A Multilayer Time-based Spiking Neuromorphic Architecture with Temporal Error Backpropagation
• Topology guaranteed segmentation of the human retina from OCT using convolutional neural networks
• Linear Quadratic Optimal Control and Stabilization for Discrete-time Markov Jump Linear Systems
• Defensive Collaborative Multi-task Training – Defending against Adversarial Attack towards Deep Neural Networks
• Damped Newton’s Method on Riemannian Manifolds
• Feature Selection and Model Comparison on Microsoft Learning-to-Rank Data Sets
• Signal Processing and Piecewise Convex Estimation
• Feature extraction without learning in an analog Spatial Pooler memristive-CMOS circuit design of Hierarchical Temporal Memory
• Neuron inspired data encoding memristive multi-level memory cell
• Nearly defect-free dynamical models of disordered solids: The case of amorphous silicon
• Network Coding for Real-time Wireless Communication for Automation
• Bernstein type inequalities for self-normalized martingales with applications
• The 2017 AIBIRDS Competition
• Multiplicative Updates for Elastic Net Regularized Convolutional NMF Under $β$-Divergence
• How to evaluate sentiment classifiers for Twitter time-ordered data?
• A curious class of Hankel determinants
• Fast generalised linear models by database sampling and one-step polishing
• 1D Mott variable-range hopping with external field
• A generalization of the steepest-edge rule and its number of simplex iterations for a nondegenerate LP
• xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
• Multi-objective Analysis of MAP-Elites Performance
• Localization due to topological stochastic disorder in active networks
• Can Autism be Catered with Artificial Intelligence-Assisted Intervention Technology? A Literature Review
• Approximative Theorem of Incomplete Riemann-Stieltjes Sum of Stochastic Integral
• Determinantal elliptic Selberg integrals
• Interlocking permutations
• Higher order concentration in presence of Poincaré-type inequalities
• Spatio-temporal Deep De-aliasing for Prospective Assessment of Real-time Ventricular Volumes
• EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching
• The Value of Reactive Power for Voltage Control in Lossy Networks
• Enhancing Favorable Propagation in Cell-Free Massive MIMO Through Spatial User Grouping
• Combining Multi-level Contexts of Superpixel using Convolutional Neural Networks to perform Natural Scene Labeling
• The complete enumeration of 4-polytopes and 3-spheres with nine vertices
• Building Sparse Deep Feedforward Networks using Tree Receptive Fields
• LivDet 2017 Fingerprint Liveness Detection Competition 2017
• Deep Image Demosaicking using a Cascade of Convolutional Residual Denoising Networks
• MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge
• A complex network framework to model cognition: unveiling correlation structures from connectivity
• Stochastic Dynamic Utilities and Inter-Temporal Preferences
• On the connectivity threshold for colorings of random graphs and hypergraphs
• Identifiability of Undirected Dynamical Networks: a Graph-Theoretic Approach
• The skeleton of the UIPT, seen from infinity
• A mean-field game model for homogeneous flocking
• Addressing the Challenges in Federating Edge Resources
• Lovász extension and graph cut
• Face-MagNet: Magnifying Feature Maps to Detect Small Faces
• Products and Projective Limits of Continuous Valuations on $T_0$ Spaces
• Learning to Play General Video-Games via an Object Embedding Network
• All Graphs are S^n-Synchronizing; What About St(p, n)?
• Rotation-Sensitive Regression for Oriented Scene Text Detection
• On the Ambiguity of Registration Uncertainty
• Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning
• Constant delay algorithms for regular document spanners
• Secure SWIPT for Directional Modulation Aided AF Relaying Networks
• A Unified View of False Discovery Rate Control: Reconciliation of Bayesian and Frequentist Approaches
• Domain Adaptation on Graphs by Learning Aligned Graph Bases
• On the Security of Some Compact Keys for McEliece Scheme
• A quantitative analysis of the 2017 Honduran election and the argument used to defend its outcome
• Joint Modelling of Location, Scale and Skewness Parameters of the Skew Laplace Normal Distribution
• A generalization of Croot-Lev -Pach’s Lemma and a new upper bound for the size of difference sets in polynomial rings
• Euler-Lagrangian approach to 3D stochastic Euler equations
• Efficient Realization of Givens Rotation through Algorithm-Architecture Co-design for Acceleration of QR Factorization
• Predicting Oral Disintegrating Tablet Formulations by Neural Network Techniques
• Measurement-based adaptation protocol with quantum reinforcement learning
• Illumination-aware Faster R-CNN for Robust Multispectral Pedestrian Detection
• Optimal Bounds for Johnson-Lindenstrauss Transformations
• FEVER: a large-scale dataset for Fact Extraction and VERification
• Approximating Generalized Network Design under (Dis)economies of Scale with Applications to Energy Efficiency
• Complex activity patterns generated by short-term synaptic plasticity
• Rigid reflections and Kac–Moody algebras
• Greedy can also beat pure dynamic programming
• Familywise error control in multi-armed response-adaptive trials
• Towards Monocular Digital Elevation Model (DEM) Estimation by Convolutional Neural Networks – Application on Synthetic Aperture Radar Images
• LSH Microbatches for Stochastic Gradients: Value in Rearrangement
• Computational Techniques for the Analysis of Small Signals in High-Statistics Neutrino Oscillation Experiments
• On the Universal Approximation Property and Equivalence of Stochastic Computing-based Neural Networks and Binary Neural Networks
• Constructing Imperfect Recall Abstractions to Solve Large Extensive-Form Games
• Shift-invert diagonalization of large many-body localizing spin chains
• $H$-colouring $P_t$-free graphs in subexponential time
• Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning
• Image Colorization with Generative Adversarial Networks
• Approximate Query Matching for Image Retrieval
• Imitation Learning with Concurrent Actions in 3D Games
• Additive quantile regression for clustered data with an application to children’s physical activity
• Optimal energy decay for the wave-heat system on a rectangular domain
• Averaging Weights Leads to Wider Optima and Better Generalization
• Maximum likelihood drift estimation for a threshold diffusion
• On trend and its derivatives estimation in repeated time series with subordinated long-range dependent errors
• Generalised Structural CNNs (SCNNs) for time series data with arbitrary graph-toplogies
• Totally Ordered Measured Trees and Splitting Trees with Infinite Variation II: Prolific Skeleton Decomposition
Like this:
Like Loading...
Related