TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

Detecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem that has widespread NLP applications, such as extractive document summarization, tracking development of news events, predicting impact of scholarly articles, etc. Important though the problem is, we are unaware of any benchmark document level data that correctly addresses the evaluation of automatic novelty detection techniques in a classification framework. To bridge this gap, we present here a resource for benchmarking the techniques for document level novelty detection. We create the resource via event-specific crawling of news documents across several domains in a periodic manner. We release the annotated corpus with necessary statistics and show its use with a developed system for the problem in concern.

How to Tackle an Extremely Hard Learning Problem: Learning Causal Structures from Non-Experimental Data without the Faithfulness Assumption or the Like

Most methods for learning causal structures from non-experimental data rely on some assumptions of simplicity, the most famous of which is known as the Faithfulness condition. Without assuming such conditions to begin with, we develop a learning theory for inferring the structure of a causal Bayesian network, and we use the theory to provide a novel justification of a certain assumption of simplicity that is closely related to Faithfulness. Here is the idea. With only the Markov and IID assumptions, causal learning is notoriously too hard to achieve statistical consistency but we show that it can still achieve a quite desirable ‘combined’ mode of stochastic convergence to the truth: having almost sure convergence to the true causal hypothesis with respect to almost all causal Bayesian networks, together with a certain kind of locally uniform convergence. Furthermore, every learning algorithm achieving at least that joint mode of convergence has this property: having stochastic convergence to the truth with respect to a causal Bayesian network N only if N satisfies a certain variant of Faithfulness, known as Pearl’s Minimality condition—as if the learning algorithm were designed by assuming that condition. This explains, for the first time, why it is not merely optional but mandatory to assume the Minimality condition—or to proceed as if we assumed it.

Learning of Optimal Forecast Aggregation in Partial Evidence Environments

We consider the forecast aggregation problem in repeated settings, where the forecasts are done on a binary event. At each period multiple experts provide forecasts about an event. The goal of the aggregator is to aggregate those forecasts into a subjective accurate forecast. We assume that experts are Bayesian; namely they share a common prior, each expert is exposed to some evidence, and each expert applies Bayes rule to deduce his forecast. The aggregator is ignorant with respect to the information structure (i.e., distribution over evidence) according to which experts make their prediction. The aggregator observes the experts’ forecasts only. At the end of each period the actual state is realized. We focus on the question whether the aggregator can learn to aggregate optimally the forecasts of the experts, where the optimal aggregation is the Bayesian aggregation that takes into account all the information (evidence) in the system. We consider the class of partial evidence information structures, where each expert is exposed to a different subset of conditionally independent signals. Our main results are positive; We show that optimal aggregation can be learned in polynomial time in a quite wide range of instances of the partial evidence environments. We provide a tight characterization of the instances where learning is possible and impossible.

Divide, Denoise, and Defend against Adversarial Attacks

Deep neural networks, although shown to be a successful class of machine learning algorithms, are known to be extremely unstable to adversarial perturbations. Improving the robustness of neural networks against these attacks is important, especially for security-critical applications. To defend against such attacks, we propose dividing the input image into multiple patches, denoising each patch independently, and reconstructing the image, without losing significant image content. This proposed defense mechanism is non-differentiable which makes it non-trivial for an adversary to apply gradient-based attacks. Moreover, we do not fine-tune the network with adversarial examples, making it more robust against unknown attacks. We present a thorough analysis of the tradeoff between accuracy and robustness against adversarial attacks. We evaluate our method under black-box, grey-box, and white-box settings. The proposed method outperforms the state-of-the-art by a significant margin on the ImageNet dataset under grey-box attacks while maintaining good accuracy on clean images. We also establish a strong baseline for a novel white-box attack.

DeepThin: A Self-Compressing Library for Deep Neural Networks

As the industry deploys increasingly large and complex neural networks to mobile devices, more pressure is put on the memory and compute resources of those devices. Deep compression, or compression of deep neural network weight matrices, is a technique to stretch resources for such scenarios. Existing compression methods cannot effectively compress models smaller than 1-2% of their original size. We develop a new compression technique, DeepThin, building on existing research in the area of low rank factorization. We identify and break artificial constraints imposed by low rank approximations by combining rank factorization with a reshaping process that adds nonlinearity to the approximation function. We deploy DeepThin as a plug-gable library integrated with TensorFlow that enables users to seamlessly compress models at different granularities. We evaluate DeepThin on two state-of-the-art acoustic models, TFKaldi and DeepSpeech, comparing it to previous compression work (Pruning, HashNet, and Rank Factorization), empirical limit study approaches, and hand-tuned models. For TFKaldi, our DeepThin networks show better word error rates (WER) than competing methods at practically all tested compression rates, achieving an average of 60% relative improvement over rank factorization, 57% over pruning, 23% over hand-tuned same-size networks, and 6% over the computationally expensive HashedNets. For DeepSpeech, DeepThin-compressed networks achieve better test loss than all other compression methods, reaching a 28% better result than rank factorization, 27% better than pruning, 20% better than hand-tuned same-size networks, and 12% better than HashedNets. DeepThin also provide inference performance benefits ranging from 2X to 14X speedups, depending on the compression ratio and platform cache sizes.

Efficient Embedding of MPI Collectives in MXNET DAGs for scaling Deep Learning

Availability of high performance computing infrastructures such as clusters of GPUs and CPUs have fueled the growth of distributed learning systems. Deep Learning frameworks express neural nets as DAGs and execute these DAGs on computation resources such as GPUs. In this paper, we propose efficient designs of embedding MPI collective operations into data parallel DAGs. Incorrect designs can easily lead to deadlocks or program crashes. In particular, we demonstrate three designs: Funneled, Concurrent communication and Dependency chaining of using MPI collectives with DAGs. These designs automatically enable overlap of computation with communication by allowing for concurrent execution with the other tasks. We directly implement these designs into the KVStore API of the MXNET. This allows us to directly leverage the rest of the infrastructure. Using ImageNet and CIFAR data sets, we show the potential of our designs. In particular, our designs scale to 256 GPUs with as low as 50 seconds of epoch times for ImageNet 1K datasets.

Local Differential Privacy for Evolving Data

There are now several large scale deployments of differential privacy used to track statistical information about users. However, these systems periodically recollect the data and recompute the statistics using algorithms designed for a single use and as a result do not provide meaningful privacy guarantees over long time scales. Moreover, existing techniques to mitigate this effect do not apply in the ‘local’ model of differential privacy that these systems use. In this paper, we introduce a new local differential privacy technique to maintain persistently up-to-date statistics over time, with privacy guarantees scaling only with the number of changes in the underlying distribution rather than the number of collection periods. The key ideas include batching time into epochs — varying the epoch size allows us to trade off accuracy against frequency of updates — and a protocol for users to ‘vote’ to update out-of-date statistics while losing very little privacy. We prove our main results for the setting where users hold a single bit, redrawn at every time period, from a common (but changing) distribution; however, our framework is quite general and we give an application to frequency and heavy-hitter estimation.

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation

This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promising areas for further research that could expand the portfolio of defenses, or make attacks less effective or harder to execute. Finally, we discuss, but do not conclusively resolve, the long-term equilibrium of attackers and defenders.

Integrated Tools for Engineering Ontologies

The article presents an overview of current specialized ontology engineering tools, as well as texts’ annotation tools based on ontologies. The main functions and features of these tools, their advantages and disadvantages are discussed. A systematic comparative analysis of means for engineering ontologies is presented.

Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types

We present a neural model for question generation from knowledge base triples in a ‘Zero-Shot’ setup, that is generating questions for triples containing predicates, subject types or object types that were not seen at training time. Our model leverages triples occurrences in the natural language corpus in an encoder-decoder architecture, paired with an original part-of-speech copy action mechanism to generate questions. Benchmark and human evaluation show that our model sets a new state-of-the-art for zero-shot QG.

Distribution Matching in Variational Inference

The difficulties in matching the latent posterior to the prior, balancing powerful posteriors with computational efficiency, and the reduced flexibility of data likelihoods are the biggest challenges in the advancement of Variational Autoencoders. We show that these issues arise due to struggles in marginal divergence minimization, and explore an alternative to using conditional distributions that is inspired by Generative Adversarial Networks. The class probability estimation that GANs offer for marginal divergence minimization uncovers a family of VAE-GAN hybrids, which offer the promise of addressing these major challenges in variational inference. We systematically explore the solutions available for distribution matching, but show that these hybrid methods do not fulfill this promise, and the trade-off between generation and inference that they give rise to remains an ongoing research topic.

Empirical Bayes Matrix Factorization

Matrix factorization methods – including Factor analysis (FA), and Principal Components Analysis (PCA) – are widely used for inferring and summarizing structure in multivariate data. Many matrix factorization methods exist, corresponding to different assumptions on the elements of the underlying matrix factors. For example, many recent methods use a penalty or prior distribution to achieve sparse representations (‘Sparse FA/PCA’). Here we introduce a general Empirical Bayes approach to matrix factorization (EBMF), whose key feature is that it uses the observed data to estimate prior distributions on matrix elements. We derive a correspondingly-general variational fitting algorithm, which reduces fitting EBMF to solving a simpler problem – the so-called ‘normal means’ problem. We implement this general algorithm, but focus particular attention on the use of sparsity-inducing priors that are uni-modal at 0. This yields a sparse EBMF approach – essentially a version of sparse FA/PCA – that automatically adapts the amount of sparsity to the data. We demonstrate the benefits of our approach through both numerical comparisons with competing methods and through analysis of data from the GTEx (Genotype Tissue Expression) project on genetic associations across 44 human tissues. In numerical comparisons EBMF often provides more accurate inferences than other methods. In the GTEx data, EBMF identifies interpretable structure that concords with known relationships among human tissues. Software implementing our approach is available at https://…/flashr.

How to analyze data in a factorial design? An extensive simulation study

Factorial designs are frequently used in different fields of science, e.g. psychological, medical or biometric studies. Standard approaches, as the ANOVA F-test, make different assumptions on the distribution of the error terms, the variances or the sample sizes in the different groups. Because of time constraints or a lack of statistical background, many users do not check these assumptions; enhancing the risk of potentially inflated type-I error rates or a substantial loss of power. It is the aim of the present paper, to give an overview of different methods without such restrictive assumptions and to identify situations in which one method is superior compared to others. In particular, after summarizing their underlying assumptions, the different approaches are compared within extensive simulations. To also address the current discussion about redefining the statistical significance level, we also included simulations for the 0.5\% level.

Memetic Graph Clustering

It is common knowledge that there is no single best strategy for graph clustering, which justifies a plethora of existing approaches. In this paper, we present a general memetic algorithm, VieClus, to tackle the graph clustering problem. This algorithm can be adapted to optimize different objective functions. A key component of our contribution are natural recombine operators that employ ensemble clusterings as well as multi-level techniques. Lastly, we combine these techniques with a scalable communication protocol, producing a system that is able to compute high-quality solutions in a short amount of time. We instantiate our scheme with local search for modularity and show that our algorithm successfully improves or reproduces all entries of the 10th DIMACS implementation~challenge under consideration using a small amount of time.

Do Deep Learning Models Have Too Many Parameters? An Information Theory Viewpoint

Deep learning models often have more parameters than observations, and still perform well. This is sometimes described as a paradox. In this work, we show experimentally that despite their huge number of parameters, deep neural networks can compress the data losslessly even when taking the cost of encoding the parameters into account. Such a compression viewpoint originally motivated the use of variational methods in neural networks. However, we show that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. Better encoding methods, imported from the Minimum Description Length (MDL) toolbox, yield much better compression values on deep networks, corroborating the hypothesis that good compression on the training set correlates with good test performance.

Structured Uncertainty Prediction Networks

This paper is the first work to propose a network to predict a structured uncertainty distribution for a reconstructed image. Our novel model learns to predict a full Gaussian covariance matrix for each reconstruction, which permits efficient sampling and likelihood evaluation. We demonstrate that our model can accurately reconstruct ground truth correlated residual distributions for synthetic datasets and generate plausible high frequency samples for real face images. We also illustrate the use of these predicted covariances for structure preserving image denoising.

Relative Worst-Order Analysis: A Survey

Relative worst-order analysis is a technique for assessing the relative quality of online algorithms. We survey the most important results obtained with this technique and compare it with other quality measures.

i-RevNet: Deep Invertible Networks

It is widely believed that the success of deep convolutional networks is based on progressively discarding uninformative variability about the input with respect to the problem at hand. This is supported empirically by the difficulty of recovering images from their hidden representations, in most commonly used network architectures. In this paper we show via a one-to-one mapping that this loss of information is not a necessary condition to learn representations that generalize well on complicated problems, such as ImageNet. Via a cascade of homeomorphic layers, we build the i-RevNet, a network that can be fully inverted up to the final projection onto the classes, i.e. no information is discarded. Building an invertible architecture is difficult, for one, because the local inversion is ill-conditioned, we overcome this by providing an explicit inverse. An analysis of i-RevNets learned representations suggests an alternative explanation for the success of deep networks by a progressive contraction and linear separation with depth. To shed light on the nature of the model learned by the i-RevNet we reconstruct linear interpolations between natural image representations.

Towards Deep Representation Learning with Genetic Programming

Genetic Programming (GP) is an evolutionary algorithm commonly used for machine learning tasks. In this paper we present a method that allows GP to transform the representation of a large-scale machine learning dataset into a more compact representation, by means of processing features from the original representation at individual level. We develop as a proof of concept of this method an autoencoder. We tested a preliminary version of our approach in a variety of well-known machine learning image datasets. We speculate that this method, used in an iterative manner, can produce results competitive with state-of-art deep neural networks.

The Algebraic Approach to Duality: An Introduction

This survey article gives an elementary introduction to the algebraic approach to Markov process duality, as opposed to the pathwise approach. In the algebraic approach, a Markov generator is written as the sum of products of simpler operators, which each have a dual with respect to some duality function. We discuss at length the recent suggestion by Giardin\`a, Redig, and others, that it may be a good idea to choose these simpler operators in such a way that they form an irreducible representation of some known Lie algebra. In particular, we collect the necessary background on representations of Lie algebras that is crucial for this approach. We also discuss older work by Lloyd and Sudbury on duality functions of product form and the relation between intertwining and duality.

The Gaussian Process Autoregressive Regression Model (GPAR)

Multi-output regression models must exploit dependencies between outputs to maximise predictive performance. The application of Gaussian processes (GPs) to this setting typically yields models that are computationally demanding and have limited representational power. We present the Gaussian Process Autoregressive Regression (GPAR) model, a scalable multi-output GP model that is able to capture nonlinear, possibly input-varying, dependencies between outputs in a simple and tractable way: the product rule is used to decompose the joint distribution over the outputs into a set of conditionals, each of which is modelled by a standard GP. GPAR’s efficacy is demonstrated on a variety of synthetic and real-world problems, outperforming existing GP models and achieving state-of-the-art performance on the tasks with existing benchmarks.

Actively Avoiding Nonsense in Generative Models

A generative model may generate utter nonsense when it is fit to maximize the likelihood of observed data. This happens due to ‘model error,’ i.e., when the true data generating distribution does not fit within the class of generative models being learned. To address this, we propose a model of active distribution learning using a binary invalidity oracle that identifies some examples as clearly invalid, together with random positive examples sampled from the true distribution. The goal is to maximize the likelihood of the positive examples subject to the constraint of (almost) never generating examples labeled invalid by the oracle. Guarantees are agnostic compared to a class of probability distributions. We show that, while proper learning often requires exponentially many queries to the invalidity oracle, improper distribution learning can be done using polynomially many queries.

Semiclassical echo dynamics in the Sachdev-Ye-Kitaev model
On the Complexity of Opinions and Online Discussions
Shield: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression
Tools for higher-order network analysis
Online Action Detection in Untrimmed, Streaming Videos – Modeling and Evaluation
Entropy-Isomap: Manifold Learning for High-dimensional Dynamic Processes
Multi-resolution Tensor Learning for Large-Scale Spatial Data
Comments on: ‘Lyapunov matrices for a class of time delay systems’ by V. L. Kharitonov
Deep Learning for Joint Source-Channel Coding of Text
Seasonal Web Search Query Selection for Influenza-Like Illness (ILI) Estimation
Bregman Parallel Direction Method of Multipliers for Distributed Optimization via Mirror Markov Mixing
Voice Impersonation using Generative Adversarial Networks
Global Pose Estimation with an Attention-based Recurrent Network
Maximum value of the standardized log of odds ratio and celestial mechanics
The NWRA Classification Infrastructure: Description and Extension to the Discriminant Analysis Flare Forecasting System (DAFFS)
Automated soft tissue lesion detection and segmentation in digital mammography using a u-net deep learning network
Expert System for Diagnosis of Chest Diseases Using Neural Networks
Almost logarithmic-time space optimal leader election in population protocols
Population Protocols Made Easy
Coverings and the heat equation on graphs: stochastic incompleteness, the Feller property and uniform transience
Automated Playtesting with Procedural Personas through MCTS with Evolved Heuristics
Fourier Policy Gradients
Real World Evaluation of Approaches to Research Paper Recommendation
Learning Word Vectors for 157 Languages
Learning Hidden Markov Models from Pairwise Co-occurrences with Applications to Topic Modeling
Hierarchical Expertise-Level Modeling for User Specific Robot-Behavior Explanations
Machine Learning Methods for Solving Assignment Problems in Multi-Target Tracking
EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement
Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization
Communication-Optimal Convolutional Neural Nets
Single or Multiple Frames Content Delivery for Robust and Secure Communication in Energy and Spectrum Efficient 5G Networks?
Simplicial Closure and Higher-order Link Prediction
Layer-wise synapse optimization for implementing neural networks on general neuromorphic architectures
Teaching Categories to Human Learners with Visual Explanations
Inexact Non-Convex Newton-Type Methods
Scale Optimization for Full-Image-CNN Vehicle Detection
On Lyapunov exponents and adversarial perturbation
Memcomputing: Leveraging memory and physics to compute efficiently
AMC and HARQ: How to Increase the Throughput
Non-Local Graph-Based Prediction For Reversible Data Hiding In Images
Online Learning with an Unknown Fairness Metric
Estimator of Prediction Error Based on Approximate Message Passing for Penalized Linear Regression
Using Automatic Generation of Relaxation Constraints to Improve the Preimage Attack on 39-step MD4
Distilling Knowledge Using Parallel Data for Far-field Speech Recognition
Comparison Based Learning from Weak Oracles
Distributed Symmetry Breaking in Sampling (Optimal Distributed Randomly Coloring with Fewer Colors)
Stochastic dominance and weak concentration for sums of independent symmetric random vectors
Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation
MIMO Transmit Beampattern Matching Under Waveform Constraints
On the automorphism groups of distance-regular graphs and rank-4 primitive coherent configurations
Agile Amulet: Real-Time Salient Object Detection with Contextual Attention
Laurent phenomenon algebras arising from surfaces II: Laminated surfaces
Neural Network Ensembles to Real-time Identification of Plug-level Appliance Measurements
Co-occurrence matrix analysis-based semi-supervised training for object detection
Generalized Mixability Constant Regret, Generalized Mixability, and Mirror Descent
Computing the Cumulative Distribution Function and Quantiles of the One-sided Kolmogorov-Smirnov Statistic
Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approach
Geometry of Discrete Copulas
A survey on trajectory clustering analysis
The Weyl-Kac weight formula
The critical exponent: a novel graph invariant
Unsupervised Band Selection of Hyperspectral Images via Multi-dictionary Sparse Representation
Fitting New Speakers Based on a Short Untranscribed Sample
Binary linear complementary dual codes
Sublinear Algorithms for MAXCUT and Correlation Clustering
CASPaxos: Replicated State Machines without logs
High-Order Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting
Segmentation hiérarchique faiblement supervisée
Capacity-achieving decoding by guessing noise
Generalized nil-Coxeter algebras
Fusing Video and Inertial Sensor Data for Walking Person Identification
Discovering Hidden Topical Hubs and Authorities in Online Social Networks
Learning to Abstain via Curve Optimization
High-Dimensional Bayesian Optimization via Additive Models with Overlapping Groups
On a fully fuzzy framework for minimax mixed integer linear programming
A multicriteria selection system based on player performance. Case study: The Spanish ACB Basketball League
Scale Free Bounds on the Amplification of Disturbances in Mass Chains
Selection from heaps, row-sorted matrices and $X+Y$ using soft heaps
Do deep nets really need weight decay and dropout?
On the Mabinogion urn model
Toric Fano varieties associated to graph cubeahedra
Novel View Synthesis for Large-scale Scene using Adversarial Loss
Distributed Power Control in Downlink Cellular Massive MIMO Systems
Composite Optimization by Nonconvex Majorization-Minimization
Robust Maximization of Non-Submodular Objectives
On Bernstein processes generated by hierarchies of linear parabolic systems in Rd
Correlation Flow: Robust Optical Flow Using Kernel Cross-Correlators
Support of Laurent series algebraic over the field of formal power series
The isomorphism problem for finite extensions of free groups is in PSPACE
Attentive Tensor Product Learning for Language Generation and Grammar Parsing
The Parameterized Complexity of Packing Arc-Disjoint Cycles in Tournaments
An Efficient Semismooth Newton Based Algorithm for Convex Clustering
Camera-based vehicle velocity estimation from monocular video
Uncertainty Estimates for Optical Flow with Multi-Hypotheses Networks
Do Less, Get More: Streaming Submodular Maximization with Subsampling
Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields
Temporal Vertex Cover with a Sliding Time Window
Bias Compensation in Iterative Soft-Feedback Algorithms with Application to (Discrete) Compressed Sensing
Asymptotic Distribution of Parameters in Random Maps
Combining Textual Content and Structure to Improve Dialog Similarity
Reproducing kernel orthogonal polynomials on the multinomial distribution
On the ‘size’ of free subshifts
Out-distribution training confers robustness to deep neural networks
Deep BCD-Net Using Identical Encoding-Decoding CNN Structures for Iterative Image Recovery
Characterization of generalized Petersen graphs that are Kronecker covers
Mallows permutations as stable matchings
ILP-based Local Search for Graph Partitioning
Correlated pseudo-marginal schemes for time-discretised stochastic kinetic models
Transport-Based Pattern Theory: A Signal Transformation Approach
Cubic graphs, their Ehrhart quasi-polynomials, and a scissors congruence phenomenon
High-Quality Prediction Intervals for Deep Learning: A Distribution-Free, Ensembled Approach
The parameterized complexity of finding a 2-sphere in a simplicial complex
Adaptive Sampling for Coarse Ranking
Wireless Expanders
Stochastic compressible Euler equations and inviscid limits
On Davenport constant of finite abelian groups
AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning
Distributed Symmetry-Breaking Algorithms for Congested Cliques
Real-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded Devices
Stability of items: an experimental test
Stable processes conditioned to avoid an interval
Implicit Argument Prediction with Event Knowledge
A Minimaxmax Problem for Improving the Torsional Stability of Rectangular Plates
Continual Reinforcement Learning with Complex Synapses
Cobalt: BFT Governance in Open Networks
Analysis of the XRP Ledger Consensus Protocol
Value iteration for approximate dynamic programming under convexity
Meta-Reinforcement Learning of Structured Exploration Strategies
Neuro-adaptive distributed control with prescribed performance for the synchronization of unknown nonlinear networked systems