Discontinuous Hamiltonian Monte Carlo for Probabilistic Programs

Hamiltonian Monte Carlo (HMC) is the dominant statistical inference algorithm used in most popular first-order differentiable probabilistic programming languages. HMC requires that the joint density be differentiable with respect to all latent variables. This complicates expressing some models in such languages and prohibits others. A recently proposed new integrator for HMC yielded a new Discontinuous HMC (DHMC) algorithm that can be used for inference in models with joint densities that have discontinuities. In this paper we show how to use DHMC for inference in probabilistic programs. To do this we introduce a sufficient set of language restrictions, a corresponding mathematical formalism that ensures that any joint density denoted in such a language has a suitably low measure of discontinuous points, and a recipe for how to apply DHMC in the more general probabilistic-programming context. Our experimental findings demonstrate the correctness of this approach.

Adversarial Time-to-Event Modeling

Modern health data science applications leverage abundant molecular and electronic health data, providing opportunities for machine learning to build statistical models to support clinical practice. Time-to-event analysis, also called survival analysis, stands as one of the most representative examples of such statistical models. We present a novel deep-network-based approach that leverages adversarial learning to address a key challenge in modern time-to-event modeling: nonparametric estimation of event-time distributions. We also introduce a principled cost function to exploit information from censored events (events that occur subsequent to the observation window). Unlike most time-to-event models, we focus on the estimation of time-to-event distributions, rather than time ordering. We validate our model on both benchmark and real datasets, demonstrating that the proposed formulation yields significant performance gains relative to a parametric alternative, which we also propose.

Human-Guided Data Exploration

The outcome of the explorative data analysis (EDA) phase is vital for successful data analysis. EDA is more effective when the user interacts with the system used to carry out the exploration. In the recently proposed paradigm of iterative data mining the user controls the exploration by inputting knowledge in the form of patterns observed during the process. The system then shows the user views of the data that are maximally informative given the user’s current knowledge. Although this scheme is good at showing surprising views of the data to the user, there is a clear shortcoming: the user cannot steer the process. In many real cases we want to focus on investigating specific questions concerning the data. This paper presents the Human Guided Data Exploration framework, generalising previous research. This framework allows the user to incorporate existing knowledge into the exploration process, focus on exploring a subset of the data, and compare different complex hypotheses concerning relations in the data. The framework utilises a computationally efficient constrained randomisation scheme. To showcase the framework, we developed a free open-source tool, using which the empirical evaluation on real-world datasets was carried out. Our evaluation shows that the ability to focus on particular subsets and being able to compare hypotheses are important additions to the interactive iterative data mining process.

Merging joint distributions via causal model classes with low VC dimension

If X,Y,Z denote sets of random variables, two different data sources may contain samples from P_{X,Y} and P_{Y,Z}, respectively. We argue that causal inference can help inferring properties of the ‘unobserved joint distributions’ P_{X,Y,Z} or P_{X,Z}. The properties may be conditional independences or also quantitative statements about dependences. More generally, we define a learning scenario where the input is a subset of variables and the label is some statistical property of that subset. Sets of jointly observed variables define the training points, while unobserved sets are possible test points. To solve this learning task, we infer, as an intermediate step, a causal model from the observations that then entails properties of unobserved sets. Accordingly, we can define the VC dimension of a class of causal models and derive generalization bounds for the predictions. Here, causal inference becomes more modest and better accessible to empirical tests than usual: rather than trying to find a causal hypothesis that is ‘true’ (which is a problematic term when it is unclear how to define interventions) a causal hypothesis is useful whenever it correctly predicts statistical properties of unobserved joint distributions. Within such a ‘pragmatic’ application of causal inference, some popular heuristic approaches become justified in retrospect. It is, for instance, allowed to infer DAGs from partial correlations instead of conditional independences if the DAGs are only used to predict partial correlations. I hypothesize that our pragmatic view on causality may even cover the usual meaning in terms of interventions and sketch why predicting the impact of interventions can sometimes also be phrased as a task of the above type.

A Deep Active Survival Analysis Approach for Precision Treatment Recommendations: Application of Prostate Cancer

Survival analysis has been developed and applied in the number of areas including manufacturing, finance, economics and healthcare. In healthcare domain, usually clinical data are high-dimensional, sparse and complex and sometimes there exists few amount of time-to-event (labeled) instances. Therefore building an accurate survival model from electronic health records is challenging. With this motivation, we address this issue and provide a new survival analysis framework using deep learning and active learning with a novel sampling strategy. First, our approach provides better representation with lower dimensions from clinical features using labeled (time-to-event) and unlabeled (censored) instances and then actively trains the survival model by labeling the censored data using an oracle. As a clinical assistive tool, we introduce a simple effective treatment recommendation approach based on our survival model. In the experimental study, we apply our approach on SEER-Medicare data related to prostate cancer among African-Americans and white patients. The results indicate that our approach outperforms significantly than baseline models.

A Mathematical Framework for Superintelligent Machines

We describe a class calculus that is expressive enough to describe and improve its own learning process. It can design and debug programs that satisfy given input/output constraints, based on its ontology of previously learned programs. It can improve its own model of the world by checking the actual results of the actions of its robotic activators. For instance, it could check the black box of a car crash to determine if it was probably caused by electric failure, a stuck electronic gate, dark ice, or some other condition that it must add to its ontology in order to meet its sub-goal of preventing such crashes in the future. Class algebra basically defines the eval/eval-1 Galois connection between the residuated Boolean algebras of 1. equivalence classes and super/sub classes of class algebra type expressions, and 2. a residual Boolean algebra of biclique relationships. It distinguishes which formulas are equivalent, entailed, or unrelated, based on a simplification algorithm that may be thought of as producing a unique pair of Karnaugh maps that describe the rough sets of maximal bicliques of relations. Such maps divide the n-dimensional space of up to 2n-1 conjunctions of up to n propositions into clopen (i.e. a closed set of regions and their boundaries) causal sets. This class algebra is generalized to type-2 fuzzy class algebra by using relative frequencies as probabilities. It is also generalized to a class calculus involving assignments that change the states of programs. INDEX TERMS 4-valued Boolean Logic, Artificial Intelligence, causal sets, class algebra, consciousness, intelligent design, IS-A hierarchy, mathematical logic, meta-theory, pointless topological space, residuated lattices, rough sets, type-2 fuzzy sets

Personalization of Health Interventions using Cluster-Based Reinforcement Learning

Research has shown that personalization of health interventions can contribute to an improved effectiveness. Reinforcement learning algorithms can be used to perform such tailoring using data that is collected about users. Learning is however very fragile for health interventions as only limited time is available to learn from the user before disengagement takes place, or before the opportunity to intervene passes. In this paper, we present a cluster-based reinforcement learning approach which learns across groups of users. Such an approach can speed up the learning process while still giving a level of personalization. The clustering algorithm uses a distance metric over traces of states and rewards. We apply both online and batch learning to learn policies over the clusters and introduce a publicly available simulator which we have developed to evaluate the approach. The results show batch learning clearly outperforms online learning. Furthermore, clustering can be beneficial provided that a proper clustering is found.

Visual Analytics for Explainable Deep Learning

Recently, deep learning has been advancing the state of the art in artificial intelligence to a new level, and humans rely on artificial intelligence techniques more than ever. However, even with such unprecedented advancements, the lack of explanation regarding the decisions made by deep learning models and absence of control over their internal processes act as major drawbacks in critical decision-making processes, such as precision medicine and law enforcement. In response, efforts are being made to make deep learning interpretable and controllable by humans. In this paper, we review visual analytics, information visualization, and machine learning perspectives relevant to this aim, and discuss potential challenges and future research directions.

An ADMM-Based Universal Framework for Adversarial Attacks on Deep Neural Networks

Deep neural networks (DNNs) are known vulnerable to adversarial attacks. That is, adversarial examples, obtained by adding delicately crafted distortions onto original legal inputs, can mislead a DNN to classify them as any target labels. In a successful adversarial attack, the targeted mis-classification should be achieved with the minimal distortion added. In the literature, the added distortions are usually measured by L0, L1, L2, and L infinity norms, namely, L0, L1, L2, and L infinity attacks, respectively. However, there lacks a versatile framework for all types of adversarial attacks. This work for the first time unifies the methods of generating adversarial examples by leveraging ADMM (Alternating Direction Method of Multipliers), an operator splitting optimization approach, such that L0, L1, L2, and L infinity attacks can be effectively implemented by this general framework with little modifications. Comparing with the state-of-the-art attacks in each category, our ADMM-based attacks are so far the strongest, achieving both the 100% attack success rate and the minimal distortion.

Cortex Neural Network: learning with Neural Network groups

Neural Network has been successfully applied to many real-world problems, such as image recognition and machine translation. However, for the current architecture of neural networks, it is hard to perform complex cognitive tasks, for example, to process the image and audio inputs together. Cortex, as an important architecture in the brain, is important for animals to perform the complex cognitive task. We view the architecture of Cortex in the brain as a missing part in the design of the current artificial neural network. In this paper, we purpose Cortex Neural Network (CrtxNN). The Cortex Neural Network is an upper architecture of neural networks which motivated from cerebral cortex in the brain to handle different tasks in the same learning system. It is able to identify different tasks and solve them with different methods. In our implementation, the Cortex Neural Network is able to process different cognitive tasks and perform reflection to get a higher accuracy. We provide a series of experiments to examine the capability of the cortex architecture on traditional neural networks. Our experiments proved its ability on the Cortex Neural Network can reach accuracy by 98.32% on MNIST and 62% on CIFAR10 at the same time, which can promisingly reduce the loss by 40%.

Modular Generative Adversarial Networks

Existing methods for multi-domain image-to-image translation (or generation) attempt to directly map an input image (or a random vector) to an image in one of the output domains. However, most existing methods have limited scalability and robustness, since they require building independent models for each pair of domains in question. This leads to two significant shortcomings: (1) the need to train exponential number of pairwise models, and (2) the inability to leverage data from other domains when training a particular pairwise mapping. Inspired by recent work on module networks, this paper proposes ModularGAN for multi-domain image generation and image-to-image translation. ModularGAN consists of several reusable and composable modules that carry on different functions (e.g., encoding, decoding, transformations). These modules can be trained simultaneously, leveraging data from all domains, and then combined to construct specific GAN networks at test time, according to the specific image translation task. This leads to ModularGAN’s superior flexibility of generating (or translating to) an image in any desired domain. Experimental results demonstrate that our model not only presents compelling perceptual results but also outperforms state-of-the-art methods on multi-domain facial attribute transfer.

Learning Latent Events from Network Message Logs: A Decomposition Based Approach

In this communication, we describe a novel technique for event mining using a decomposition based approach that combines non-parametric change-point detection with LDA. We prove theoretical guarantees about sample-complexity and consistency of the approach. In a companion paper, we will perform a thorough evaluation of our approach with detailed experiments.

QA4IE: A Question Answering based Framework for Information Extraction

Information Extraction (IE) refers to automatically extracting structured relation tuples from unstructured texts. Common IE solutions, including Relation Extraction (RE) and open IE systems, can hardly handle cross-sentence tuples, and are severely restricted by limited relation types as well as informal relation specifications (e.g., free-text based relation tuples). In order to overcome these weaknesses, we propose a novel IE framework named QA4IE, which leverages the flexible question answering (QA) approaches to produce high quality relation triples across sentences. Based on the framework, we develop a large IE benchmark with high quality human evaluation. This benchmark contains 293K documents, 2M golden relation triples, and 636 relation types. We compare our system with some IE baselines on our benchmark and the results show that our system achieves great improvements.

A Hierarchical Latent Structure for Variational Conversation Modeling

Variational autoencoders (VAE) combined with hierarchical RNNs have emerged as a powerful framework for conversation modeling. However, they suffer from the notorious degeneration problem, where the decoders learn to ignore latent variables and reduce to vanilla RNNs. We empirically show that this degeneracy occurs mostly due to two reasons. First, the expressive power of hierarchical RNN decoders is often high enough to model the data using only its decoding distributions without relying on the latent variables. Second, the conditional VAE structure whose generation process is conditioned on a context, makes the range of training targets very sparse; that is, the RNN decoders can easily overfit to the training data ignoring the latent variables. To solve the degeneration problem, we propose a novel model named Variational Hierarchical Conversation RNNs (VHCR), involving two key ideas of (1) using a hierarchical structure of latent variables, and (2) exploiting an utterance drop regularization. With evaluations on two datasets of Cornell Movie Dialog and Ubuntu Dialog Corpus, we show that our VHCR successfully utilizes latent variables and outperforms state-of-the-art models for conversation generation. Moreover, it can perform several new utterance control tasks, thanks to its hierarchical latent structure.

Graphical Generative Adversarial Networks

We propose Graphical Generative Adversarial Networks (Graphical-GAN) to model structured data. Graphical-GAN conjoins the power of Bayesian networks on compactly representing the dependency structures among random variables and that of generative adversarial networks on learning expressive dependency functions. We introduce a structured recognition model to infer the posterior distribution of latent variables given observations. We propose two alternative divergence minimization approaches to learn the generative model and recognition model jointly. The first one treats all variables as a whole, while the second one utilizes the structural information by checking the individual local factors defined by the generative model and works better in practice. Finally, we present two important instances of Graphical-GAN, i.e. Gaussian Mixture GAN (GMGAN) and State Space GAN (SSGAN), which can successfully learn the discrete and temporal structures on visual datasets, respectively.

Evaluating virtual hosted desktops for graphics-intensive astronomy
The Monge-Kantorovich Optimal Transport Distance for Image Comparison
Recommendation System of Grants-in-Aid for Researchers by using JSPS Keyword
Positive definite functions on semilattices
Markerless tracking of user-defined features with deep learning
Cauchy noise loss for stochastic optimization of random matrix models via free deterministic equivalents
Linear Programming Bounds for Randomly Sampling Colorings
The Sound of Pixels
$\mathcal{G}$-Distillation: Reducing Overconfident Errors on Novel Samples
Computational identification of the lowest space-wise dependent coefficient of a parabolic equation
Frank-Wolfe Splitting via Augmented Lagrangian Method
On the Posted Pricing in Crowdsourcing: Power of Bonus
Cluster Failure Revisited: Impact of First Level Design and Data Quality on Cluster False Positive Rates
Studying the Effects of Deep Brain Stimulation and Medication on the Dynamics of STN-LFP Signals for Human Behavior Analysis
Almost homomorphisms between the Boolean cube and groups of prime order
Contextual Search via Intrinsic Volumes
Fully Dynamic Set Cover — Improved and Simple
Deep Learning Classification of Polygenic Obesity using Genome Wide Association Study SNPs
Scalable Factorized Hierarchical Variational Autoencoder Training
On random polynomials generated by a symmetric three-term recurrence relation
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
Exploiting Partial Structural Symmetry For Patient-Specific Image Augmentation in Trauma Interventions
NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications
MSE-optimal 1-bit Precoding for Multiuser MIMO via Branch and Bound
Large scale distributed neural network training through online distillation
Building Function Approximators on top of Haar Scattering Networks
A GPU-based WFST Decoder with Exact Lattice Generation
Prompt Scheduling for Selfish Agents
Poly-Spline Finite Element Method
Minimal time mean field games
Fine-grained Activity Recognition in Baseball Videos
A Carlitz type result for linearized polynomials
Structural break analysis for spectrum and trace of covariance operators?
Restructuring expression dags for efficient parallelization
Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings
Characterising information-theoretic storage and transfer in continuous time processes
Towards Deep Cellular Phenotyping in Placental Histology
Better bounds for poset dimension and boxicity
Parameter estimation with data-driven nonparametric likelihood functions
On the Supermodularity of Active Graph-based Semi-supervised Learning with Stieltjes Matrix Regularization
Efficient Predictor Ranking and False Discovery Proportion Control in High-Dimensional Regression
Identifiability for graphexes and the weak kernel metric
Recurrent Neural Networks for Person Re-identification Revisited
Echo-Liquid State Deep Learning for $360^\circ$ Content Transmission and Caching in Wireless VR Networks with Cellular-Connected UAVs
Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing
A derivation of the Black-Scholes option pricing model using a central limit theorem argument
Amorphous complexions enable a new region of high temperature stability in nanocrystalline Ni-W
A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers
On top fan-in vs formal degree for depth-$3$ arithmetic circuits
Seeking Open-Ended Evolution in Swarm Chemistry II: Analyzing Long-Term Dynamics via Automated Object Harvesting
Adversarial Training Versus Weight Decay
Thermodynamics and evolutionary biology through optimal control
A Markov Chain Sampler for Plane Curves
Crafting a Toolchain for Image Restoration by Deep Reinforcement Learning
Individual and Group Stability in Neutral Restrictions of Hedonic Games
Optimal Solution of Vehicle Routing Problems with Fractional Objective Function
Question Answering over Freebase via Attentive RNN with Similarity Matrix based CNN
Abelian networks IV. Dynamics of nonhalting networks
Increasing Parallelism in the ROOT I/O Subsystem
Implementing Push-Pull Efficiently in GraphBLAS
Representation Tradeoffs for Hyperbolic Embeddings
TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent
Revealing the Micro-Structure of the Giant Component in Random Graph Ensembles
Segmentation of Multiple Sclerosis lesion in brain MR images using Fuzzy C-Means
On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses
Toward Formalizing Teleportation of Pedagogical Artificial Agents
Mean Field Network based Graph Refinement with application to Airway Tree Extraction
Information theory for fields
The Erdos-Moser sum-free set problem
Reference-Conditioned Super-Resolution by Neural Texture Transfer
Testing equality of spectral density operators for functional linear processes
Learning an Optimizer for Image Deconvolution
Sharp Beckner-type inequalities for Cauchy and spherical distributions
Maximum likelihood estimation for Gaussian processes under inequality constraints
Learning Pose Specific Representations by Predicting Different Views
Distributed Mixed-Integer Linear Programming via Cut Generation and Constraint Exchange
Roto-Translation Covariant Convolutional Networks for Medical Image Analysis
Effects of Higher Order and Long-Range Synchronizations for Classification and Computing in Oscillator-Based Spiking Neural Networks
Dimer model, bead model and standard Young tableaux: finite cases and limit shapes
A Fast Hierarchically Preconditioned Eigensolver Based On Multiresolution Matrix Decomposition
Better than Rician: Modelling millimetre wave channels as Two-Wave with Diffuse Power
The Sum-Rate-Distortion Region of Correlated Gauss-Markov Sources
Ubiquitous Cell-Free Massive MIMO Communications
Parameterized Algorithms for the Matrix Completion Problem
Feedback Coding Schemes for the Broadcast Channel with Mutual Secrecy Requirement
Who framed Roger Reindeer? De-censorship of Facebook posts by snippet classification
A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines
The AGINAO Self-Programming Engine
Evaluating Actuators in a Purely Information-Theory Based Reward Model
The Brain on Low Power Architectures – Efficient Simulation of Cortical Slow Waves and Asynchronous States
Bridges with random length: Gamma case
RSGAN: Face Swapping and Editing using Face and Hair Representation in Latent Spaces
End of Potential Line
Pair correlation functions for identifying spatial correlation in discrete domains
Universal features of mountain ridge patterns on Earth
Intersection of unit balls in classical matrix ensembles
Exact asymptotic volume and volume ratio of Schatten unit balls
Sparse Signal Processing for Grant-Free Massive IoT Connectivity
Exploring Disentangled Feature Representation Beyond Face Identification
The orbit algebra of a permutation group with polynomial profile is Cohen-Macaulay
PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition
Strong consistency of Krichevsky-Trofimov estimator for the number of communities in the Stochastic Block Model
Uniqueness for the 3-State Antiferromagnetic Potts Model on the Tree
Hyperparameters and Tuning Strategies for Random Forest
Bimonoidal Structure of Probability Monads
Mining Social Media for Newsgathering
Sensing Hidden Vehicles by Exploiting Multi-Path V2V Transmission
A real-time and unsupervised face Re-Identification system for Human-Robot Interaction
Two Stream 3D Semantic Scene Completion
On the Well-posedness of a Class of Non-Autonomous SPDEs: An Operator-Theoretical Perspective
Perfect colourings of regular graphs
Evaluation of the visual odometry methods for semi-dense real-time
Restoring Uniqueness to Mean-Field Games by Randomizing the Equilibria
An unbiased Ito type stochastic representation for transport PDEs
A re-entrant phase transition in the survival of secondary infections on networks
Large Field and High Resolution: Detecting Needle in Haystack
Towards Training Probabilistic Topic Models on Neuromorphic Multi-chip Systems
SWAT: A System for Detecting Salient Wikipedia Entities in Texts
New inequalities for families without k pairwise disjoint members
Counter Machines and Distributed Automata: A Story about Exchanging Space and Time
Classification of Point Cloud Scenes with Multiscale Voxel Deep Network
Geometrical analysis of polynomial lens distortion models
Symmetries on plabic graphs and associated polytopes
Controlling seizure propagation in large-scale brain networks
Automatic Recognition of Space-Time Constellations by Learning on the Grassmann Manifold
Approximating multiobjective combinatorial optimization problems with the OWA criterion
A Deep Information Sharing Network for Multi-contrast Compressed Sensing MRI Reconstruction
Understanding disentangling in $β$-VAE
Nonparametric Estimation of Surface Integrals on Density Level Sets
Optimal Document Exchange and New Codes for Small Number of Insertions and Deletions
Imagine This! Scripts to Compositions to Videos
Binary Space Partitioning as Intrinsic Reward
Efficient approximation for global functions of matrix product operators
Subsampled Optimization: Statistical Guarantees, Mean Squared Error Approximation, and Sampling Method
Fast and scalable non-parametric Bayesian inference for Poisson point processes
Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
Small time asymptotics of spectral heat contents for subordinate killed Brownian motions
Probabilistic Prediction of Vehicle Semantic Intention and Motion
Enumeration of alternating sign triangles using a constant term approach
Semantic embeddings for program behavior patterns
Testing Identity of Multidimensional Histograms
Model-Free Conditional Feature Screening with Exposure Variables
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Approximating Operator Norms via Generalized Krivine Rounding
Short proof of two cases of Chvátal’s conjecture
Counterexamples for Cohen-Macaulayness of Lattice Ideals