Indirect inference through prediction

By recasting indirect inference estimation as a prediction rather than a minimization and by using regularized regressions, we can bypass the three major problems of estimation: selecting the summary statistics, defining the distance function and minimizing it numerically. By substituting regression with classification we can extend this approach to model selection as well. We present three examples: a statistical fit, the parametrization of a simple real business cycle model and heuristics selection in a fishery agent-based model. The outcome is a method that automatically chooses summary statistics, weighs them and use them to parametrize models without running any direct minimization.

AND: Autoregressive Novelty Detectors

We propose an unsupervised model for novelty detection. The subject is treated as a density estimation problem, in which a deep neural network is employed to learn a parametric function that maximizes probabilities of training samples. This is achieved by equipping an autoencoder with a novel module, responsible for the maximization of compressed codes’ likelihood by means of autoregression. We illustrate design choices and proper layers to perform autoregressive density estimation when dealing with both image and video inputs. Despite a very general formulation, our model shows promising results in diverse one-class novelty detection and video anomaly detection benchmarks.

Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations

In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Unlike recent robustness research, this benchmark evaluates performance on commonplace corruptions not worst-case adversarial corruptions. We find that there are negligible changes in relative corruption robustness from AlexNet to ResNet classifiers, and we discover ways to enhance corruption robustness. Then we propose a new dataset called Icons-50 which opens research on a new kind of robustness, surface variation robustness. With this dataset we evaluate the frailty of classifiers on new styles of known objects and unexpected instances of known classes. We also demonstrate two methods that improve surface variation robustness. Together our benchmarks may aid future work toward networks that learn fundamental class structure and also robustly generalize.

Quantum Random Self-Modifiable Computation

Among the fundamental questions in computer science, at least two have a deep impact on mathematics. What can computation compute How many steps does a computation require to solve an instance of the 3-SAT problem Our work addresses the first question, by introducing a new model called the x-machine. The x-machine executes Turing machine instructions and two special types of instructions. Quantum random instructions are physically realizable with a quantum random number generator. Meta instructions can add new states and add new instructions to the x-machine. A countable set of x-machines is constructed, each with a finite number of states and instructions; each x-machine can compute a Turing incomputable language, whenever the quantum randomness measurements behave like unbiased Bernoulli trials. In 1936, Alan Turing posed the halting problem for Turing machines and proved that this problem is unsolvable for Turing machines. Consider an enumeration E_a(i) = (M_i, T_i) of all Turing machines M_i and initial tapes T_i. Does there exist an x-machine X that has at least one evolutionary path X –> X_1 –> X_2 –> . . . –> X_m, so at the mth stage x-machine X_m can correctly determine for 0 <= i <= m whether M_i’s execution on tape T_i eventually halts We demonstrate an x-machine Q(x) that has one such evolutionary path. The existence of this evolutionary path suggests that David Hilbert was not misguided to propose in 1900 that mathematicians search for finite processes to help construct mathematical proofs. Our refinement is that we cannot use a fixed computer program that behaves according to a fixed set of mechanical rules. We must pursue methods that exploit randomness and self-modification so that the complexity of the program can increase as it computes.

Visual Pattern-Driven Exploration of Big Data

Pattern extraction algorithms are enabling insights into the ever-growing amount of today’s datasets by translating reoccurring data properties into compact representations. Yet, a practical problem arises: With increasing data volumes and complexity also the number of patterns increases, leaving the analyst with a vast result space. Current algorithmic and especially visualization approaches often fail to answer central overview questions essential for a comprehensive understanding of pattern distributions and support, their quality, and relevance to the analysis task. To address these challenges, we contribute a visual analytics pipeline targeted on the pattern-driven exploration of result spaces in a semi-automatic fashion. Specifically, we combine image feature analysis and unsupervised learning to partition the pattern space into interpretable, coherent chunks, which should be given priority in a subsequent in-depth analysis. In our analysis scenarios, no ground-truth is given. Thus, we employ and evaluate novel quality metrics derived from the distance distributions of our image feature vectors and the derived cluster model to guide the feature selection process. We visualize our results interactively, allowing the user to drill down from overview to detail into the pattern space and demonstrate our techniques in a case study on biomedical genomic data.

Diagonal Discriminant Analysis with Feature Selection for High Dimensional Data

We introduce a new method of performing high dimensional discriminant analysis, which we call multiDA. We achieve this by constructing a hybrid model that seamlessly integrates a multiclass diagonal discriminant analysis model and feature selection components. Our feature selection component naturally simplifies to weights which are simple functions of likelihood ratio statistics allowing natural comparisons with traditional hypothesis testing methods. We provide heuristic arguments suggesting desirable asymptotic properties of our algorithm with regards to feature selection. We compare our method with several other approaches, showing marked improvements in regard to prediction accuracy, interpretability of chosen features, and algorithm run time. We demonstrate such strengths of our model by showing strong classification performance on publicly available high dimensional datasets, as well as through multiple simulation studies. We make an R package available implementing our approach.

SGAD: Soft-Guided Adaptively-Dropped Neural Network

Deep neural networks (DNNs) have been proven to have many redundancies. Hence, many efforts have been made to compress DNNs. However, the existing model compression methods treat all the input samples equally while ignoring the fact that the difficulties of various input samples being correctly classified are different. To address this problem, DNNs with adaptive dropping mechanism are well explored in this work. To inform the DNNs how difficult the input samples can be classified, a guideline that contains the information of input samples is introduced to improve the performance. Based on the developed guideline and adaptive dropping mechanism, an innovative soft-guided adaptively-dropped (SGAD) neural network is proposed in this paper. Compared with the 32 layers residual neural networks, the presented SGAD can reduce the FLOPs by 77% with less than 1% drop in accuracy on CIFAR-10.

Deep Saliency Hashing

In recent years, hashing methods have been proved efficient for large-scale Web media search. However, existing general hashing methods have limited discriminative power for describing fine-grained objects that share similar overall appearance but have subtle difference. To solve this problem, we for the first time introduce attention mechanism to the learning of hashing codes. Specifically, we propose a novel deep hashing model, named deep saliency hashing (DSaH), which automatically mines salient regions and learns semantic-preserving hashing codes simultaneously. DSaH is a two-step end-to-end model consisting of an attention network and a hashing network. Our loss function contains three basic components, including the semantic loss, the saliency loss, and the quantization loss. The saliency loss guides the attention network to mine discriminative regions from pairs of images. We conduct extensive experiments on both fine-grained and general retrieval datasets for performance evaluation. Experimental results on Oxford Flowers-17 and Stanford Dogs-120 demonstrate that our DSaH performs the best for fine-grained retrieval task and beats the existing best retrieval performance (DPSH) by approximately 12%. DSaH also outperforms several state-of-the-art hashing methods on general datasets, including CIFAR-10 and NUS-WIDE.

Polarity and Intensity: the Two Aspects of Sentiment Analysis

Current multimodal sentiment analysis frames sentiment score prediction as a general Machine Learning task. However, what the sentiment score actually represents has often been overlooked. As a measurement of opinions and affective states, a sentiment score generally consists of two aspects: polarity and intensity. We decompose sentiment scores into these two aspects and study how they are conveyed through individual modalities and combined multimodal models in a naturalistic monologue setting. In particular, we build unimodal and multimodal multi-task learning models with sentiment score prediction as the main task and polarity and/or intensity classification as the auxiliary tasks. Our experiments show that sentiment analysis benefits from multi-task learning, and individual modalities differ when conveying the polarity and intensity aspects of sentiment.

Diversity in Machine Learning

Machine learning methods have achieved good performance and been widely applied in various real-world applications. It can learn the model adaptively and be better fit for special requirements of different tasks. Many factors can affect the performance of the machine learning process, among which diversity of the machine learning is an important one. Generally, a good machine learning system is composed of plentiful training data, a good model training process, and an accurate inference. The diversity could help each procedure to guarantee a total good machine learning: diversity of the training data ensures the data contain enough discriminative information, diversity of the learned model (diversity in parameters of each model or diversity in models) makes each parameter/model capture unique or complement information and the diversity in inference can provide multiple choices each of which corresponds to a plausible result. However, there is no systematical analysis of the diversification in machine learning system. In this paper, we systematically summarize the methods to make data diversification, model diversification, and inference diversification in machine learning process, respectively. In addition, the typical applications where the diversity technology improved the machine learning performances have been surveyed, including the remote sensing imaging tasks, machine translation, camera relocalization, image segmentation, object detection, topic modeling, and others. Finally, we discuss some challenges of diversity technology in machine learning and point out some directions in future work. Our analysis provides a deeper understanding of the diversity technology in machine learning tasks, and hence can help design and learn more effective models for specific tasks.

Conditional Neural Processes

Deep neural networks excel at function approximation, yet they are typically trained from scratch for each new function. On the other hand, Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a new function at test time. Yet GPs are computationally expensive, and it can be hard to design appropriate priors. In this paper we propose a family of neural models, Conditional Neural Processes (CNPs), that combine the benefits of both. CNPs are inspired by the flexibility of stochastic processes such as GPs, but are structured as neural networks and trained via gradient descent. CNPs make accurate predictions after observing only a handful of training data points, yet scale to complex functions and large datasets. We demonstrate the performance and versatility of the approach on a range of canonical machine learning tasks, including regression, classification and image completion.

Neural Processes

A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, data-efficient and flexible, however they are also computationally intensive and thus limited in their applicability. We introduce a class of neural latent variable models which we call Neural Processes (NPs), combining the best of both worlds. Like GPs, NPs define distributions over functions, are capable of rapid adaptation to new observations, and can estimate the uncertainty in their predictions. Like NNs, NPs are computationally efficient during training and evaluation but also learn to adapt their priors to data. We demonstrate the performance of NPs on a range of learning tasks, including regression and optimisation, and compare and contrast with related models in the literature.

MIXGAN: Learning Concepts from Different Domains for Mixture Generation

In this work, we present an interesting attempt on mixture generation: absorbing different image concepts (e.g., content and style) from different domains and thus generating a new domain with learned concepts. In particular, we propose a mixture generative adversarial network (MIXGAN). MIXGAN learns concepts of content and style from two domains respectively, and thus can join them for mixture generation in a new domain, i.e., generating images with content from one domain and style from another. MIXGAN overcomes the limitation of current GAN-based models which either generate new images in the same domain as they observed in training stage, or require off-the-shelf content templates for transferring or translation. Extensive experimental results demonstrate the effectiveness of MIXGAN as compared to related state-of-the-art GAN-based models.

Encoding Spatial Relations from Natural Language

Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes. We present a system capable of capturing the semantics of spatial relations such as behind, left of, etc from natural language. Our key contributions are a novel multi-modal objective based on generating images of scenes from their textual descriptions, and a new dataset on which to train it. We demonstrate that internal representations are robust to meaning preserving transformations of descriptions (paraphrase invariance), while viewpoint invariance is an emergent property of the system.

EmbNum: Semantic labeling for numerical values with deep metric learning
Are Condorcet and minimax voting systems the best
Answering Hindsight Queries with Lifted Dynamic Junction Trees
Exact results for directed random networks that grow by node duplication
Higher-dimension Tensor Completion via Low-rank Tensor Ring Decomposition
cvBMS and cvBMA: filling in the gaps
BIN-CT: Urban Waste Collection based in Predicting the Container Fill Level
Asymptotic Analysis of Spatial Coupling Coding for Compute-and-Forward Relaying
Conditional Tail-Related Risk Estimation Using Composite Asymmetric Least Squares and Empirical Likelihood
Multi-Level Feature Abstraction from Convolutional Neural Networks for Multimodal Biometric Identification
Distributed resource allocation through utility design – Part I: optimizing the performance certificates via the price of anarchy
Breast Cancer Diagnosis via Classification Algorithms
COTA: Improving the Speed and Accuracy of Customer Support through Ranking and Deep Networks
Best-Effort FPGA Programming: A Few Steps Can Go a Long Way
SWIFT: Maintaining weak-scalability with a dynamic range of $10^4$ in time-step size to harness extreme adaptivity
Distributed resource allocation through utility design – Part II: applications to submodular, supermodular and set covering problems
Finite Sample $L_2$ Bounds for Sequential Monte Carlo and Adaptive Path Selection
A Dataset for Lane Instance Segmentation in Urban Environments
Anomaly Detection for Skin Disease Images Using Variational Autoencoder
OCTen: Online Compression-based Tensor Decomposition
Old and new challenges in Hadamard spaces
Efficient ConvNets for Analog Arrays
A New Approach to the Hofstadter $Q$-Recurrence
Delta-matroids as subsystems of sequences of Higgs lifts
The excluded 3-minors for vf-safe delta-matroids
A simplex algorithm for rational CP-factorization
Efficient Rational Proofs with Strong Utility-Gap Guarantees
Visualizing spreading phenomena on complex networks
ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations
Patient representation learning and interpretable evaluation using clinical notes
Simpler but More Accurate Semantic Dependency Parsing
A Counter Example to the Shuffle Compatiblity Conjecture
Endmember Extraction on the Grassmannian
A Fixed-Point Iteration for Steady-State Analysis of Water Distribution Networks
Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning
TripleID-Q: RDF Query Processing Framework using GPU
Distance-Two Colorings of Barnette Graphs
An asymptotic distribution theory for Eulerian recurrences with applications
Selective Deep Convolutional Neural Network for Low Cost Distorted Image Classification
Unbiased Decoder Learning for Fast Image Style Transfer
Region Growing Curriculum Generation for Reinforcement Learning
Crosstalk based Fine-Grained Reconfiguration Techniques for Polymorphic Circuits
Treating Content Delivery in Multi-Antenna Coded Caching as General Message Sets Transmission: A DoF Region Perspective
Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation
Qos-Based Web Service Discovery And Selection Using Machine Learning
Multi-task Mid-level Feature Alignment Network for Unsupervised Cross-Dataset Person Re-Identification
Modeling Sparse Deviations for Compressed Sensing using Generative Models
Understanding Visual Ads by Aligning Symbols and Objects using Co-Attention
Ramsey theory for hypergroups
A 5.16Gbps decoder ASIC for Polar Code in 16nm FinFET
Video Semantic Salient Instance Segmentation: Benchmark Dataset and Baseline
Discriminative Feature Learning with Foreground Attention for Person Re-Identification
Video Frame Interpolation by Plug-and-Play Deep Locally Linear Embedding
Spatial Modulation for Molecular Communication
Post hoc false positive control for spatially structured hypotheses
Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation
Near-Optimal Distance Emulator for Planar Graphs
Homeostatic plasticity and external input shape neural network dynamics
Combining extensions of the Hales-Jewett\\ Theorem with Ramsey Theory\\ in other structures
Empirical fixed point bifurcation analysis
Factored Bandits
Uncorrelated Feature Encoding for Faster Image Style Transfer
Centrality-Friendship Paradoxes: When Our Friends Are More Important Than Us
Radar Communication for Combating Mutual Interference of FMCW Radars
Robust Identification of Target Genes and Outliers in Triple-negative Breast Cancer Data
Deep Autoencoder for Combined Human Pose Estimation and body Model Upscaling
Anomalous diffusion of random walk on random planar maps
Limit theorems for quadratic forms and related quantities of discretely sampled continuous-time moving averages
Generating Synthetic but Plausible Healthcare Record Datasets
Curiosity Driven Exploration of Learned Disentangled Goal Spaces
An Integration of Bottom-up and Top-Down Salient Cues on RGB-D Data: Saliency from Objectness vs. Non-Objectness
Distributed Estimation Via a Roaming Token
Tensor Decomposition for EEG Signal Retrieval
TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
Wideband Time-Domain Digital Backpropagation via Subband Processing and Deep Learning
Spectral gaps, missing faces and minimal degrees
Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding
Secure Routing in OFDM based Multi-Hop Underwater Acoustic Sensor Networks
Random band matrices in the delocalized phase, I: Quantum unique ergodicity and universality
Random band matrices in the delocalized phase, II: Generalized resolvent estimates
Analyzing Big Datasets of Genomic Sequences: Fast and Scalable Collection of k-mer Statistics
The SEN1-2 Dataset for Deep Learning in SAR-Optical Data Fusion
On the set of critical exponents of discrete groups acting on regular trees
VideoKifu, or the automatic transcription of a Go game
Quasi-Monte Carlo Variational Inference
Sensors, SLAM and Long-term Autonomy: A Review
Ensemble learning with Conformal Predictors: Targeting credible predictions of conversion from Mild Cognitive Impairment to Alzheimer’s Disease
Modeling outcomes of soccer matches
Partially Observable Reinforcement Learning for Intelligent Transportation Systems
Neonatal Pain Expression Recognition Using Transfer Learning
Randomization Inference for Peer Effects
Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences
On the Identifying Content of Instrument Monotonicity
Data-Driven Learning-Based Optimization for Distribution System State Estimation
Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
Towards Automation of Sense-type Identification of Verbs in OntoSenseNet(Telugu)
BCSAT : A Benchmark Corpus for Sentiment Analysis in Telugu Using Word-level Annotations
Tight bounds for popping algorithms
Generating Mandarin and Cantonese F0 Contours with Decision Trees and BLSTMs
Percolation on triangulations: a bijective path to Liouville quantum gravity
Deep Learning Based Damage Detection on Post-Hurricane Satellite Imagery
Semicontinuity of structure for small sumsets in compact abelian groups
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator
Localization Recall Precision (LRP): A New Performance Metric for Object Detection