• Jet Charge and Machine Learning
• Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses
• Olive Oil is Made of Olives, Baby Oil is Made for Babies: Interpreting Noun Compounds using Paraphrases in a Neural Model
• Asynchronous Distributed Optimization with Heterogeneous Regularizations and Normalizations
• Spectrahedral Lifts of Convex Sets
• Renewal Population Dynamics and their Eternal Family Trees
• Probabilistic Video Generation using Holistic Attribute Control
• Incremental Learning-to-Learn with Statistical Guarantees
• A Unified Framework for Multi-View Multi-Class Object Pose Estimation
• Distributed Mechanism Design for Multicast Transmission
• Stochastic PDE Limit of the Six Vertex Model
• Fisher Pruning of Deep Nets for Facial Trait Classification
• Robust Blind Deconvolution via Mirror Descent
• Extended depth-of-field in holographic image reconstruction using deep learning based auto-focusing and phase-recovery
• On locally repeated values of arithmetic functions over $\mathbb F_q[T]$
• Efficient Search of QC-LDPC Codes with Girths 6 and 8 and Free of Elementary Trapping Sets with Small Size
• Circular repetition thresholds on some small alphabets
• Connectivity-Preserving Coordination Control of Multi-Agent Systems with Time-Varying Delays
• Learning the Localization Function: Machine Learning Approach to Fingerprinting Localization
• Network and Panel Quantile Effects Via Distribution Regression
• On the Parameterized Computation of Minimum Volume Outer Ellipsoid of Minkowski Sum of Ellipsoids
• Entropy-based closure for probabilistic learning on manifolds
• Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks
• Optimal price management in retail energy markets: an impulse control problem with asymptotic estimates
• Financial Contagion in a Generalized Stochastic Block Model
• Boosted Density Estimation Remastered
• Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks
• Generalized Optimization of High Capacity Compressive Imaging Systems
• Sensing Matrix Design via Capacity Maximization for Block Compressive Sensing Applications
• Can Decentralized Status Update Achieve Universally Near-Optimal Age-of-Information in Wireless Multiaccess Channels?
• Deep Pose Consensus Networks
• SUCAG: Stochastic Unbiased Curvature-aided Gradient Method for Distributed Optimization
• Randomness and Permutations in Coordinate Descent Methods
• Residual Networks: Lyapunov Stability and Convex Decomposition
• Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs
• Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection
• Characterising knotting properties of polymers in nanochannels
• A Topological Approach to Secure Message Dissemination in Vehicular Networks
• PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model
• SCISPACE: A Scientific Collaboration Workspace for File Systems in Geo-Distributed HPC Data Centers
• Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations
• Statistical approach to flow stress and generalized Hall-Petch law for polycrystalline materials under plastic deformations
• A non-homogeneous hidden Markov model for partially observed longitudinal responses
• The generating function of planar Eulerian orientations
• Persistence Weighted Gaussian Kernel for Probability Distributions on the Space of Persistence Diagrams
• A quantum algorithm for simulating non-sparse Hamiltonians
• Speaker Clustering With Neural Networks And Audio Processing
• Synchronization of Coupled Oscillators: The Taylor Expansion of the Inverse Kuramoto Map
• The Harborth Constant for Dihedral Groups
• Machine learning classification for field distributions of photonic modes
• The convex hull of a planar random walk: perimeter, diameter, and shape
• A Study of Delay Drifts on Massive MIMO Wideband Channel Models
• A Generalized Framework for Chance-constrained Optimal Power Flow
• On the stability of persistent entropy and new summary functions for TDA
• Learning Eligibility in Clinical Cancer Trials using Deep Neural Networks
• Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data
• Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World
• Quenches near Ising quantum criticality as a challenge for artificial neural networks
• Prioritized Multi-View Stereo Depth Map Generation Using Confidence Prediction
• Analysis of dependent scattering mechanism in hard-sphere Yukawa random media
• Dichromatic Gray Pixel for Camera-agnostic Color Constancy
• A framework for Culture-aware Robots based on Fuzzy Logic
• Venue Suggestion Using Social-Centric Scores
• Structured Output Learning with Abstention: Application to Accurate Opinion Prediction
• Sum-Product estimates and expanding polynomials over matrix rings
• On LDPC Code Based Massive Random-Access Scheme for the Gaussian Multiple Access Channel
• A trust-based recommendation method using network diffusion processes
• Model Consistency for Learning with Mirror-Stratifiable Regularizers
• Observability and State Estimation for a Class of Nonlinear Systems
• Calibrating Model-Based Inferences and Decisions
• Found a good match: should I keep searching? – Accuracy and Performance in Iris Matching Using 1-to-First Search
• Densely Connected Pyramid Dehazing Network
• Sequence pairs with asymptotically optimal aperiodic correlation
• PlaneMatch: Patch Coplanarity Prediction for Robust RGB-D Reconstruction
• Structure connectivity and substructure connectivity of twisted hypercubes
• Quality expectations of machine translation
• A Smoke Removal Method for Laparoscopic Images
• Group Sparsity Residual with Non-Local Samples for Image Denoising
• Buried object detection from B-scan ground penetrating radar data using Faster-RCNN
• Signaling Game-based Misbehavior Inspection in V2I-enabled Highway Operations
• Incremental Color Quantization for Color-Vision-Deficient Observers Using Mobile Gaming Data
• Edge Kempe equivalence of regular graph covers
• Parallel tree algorithms for AMR and non-standard data access
• Guided Image Inpainting: Replacing an Image Region by Pulling Content from Another Image
• Liminal reciprocity and factorization statistics
• Active colloidal chains with cilia- and flagella-like motion
• A Quantile-Based Approach to Modelling Recovery Time in Structural Health Monitoring
• Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft
• Clustering-driven Deep Embedding with Pairwise Constraints
• Frieze varieties : A characterization of the finite-tame-wild trichotomy for acyclic quivers
• Towards Universal Representation for Unseen Action Recognition
• Hypergraph cuts above the average
• A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Evaluation Campaign
• Branched Generative Adversarial Networks for Multi-Scale Image Manifold Learning
• Locally Private Bayesian Inference for Count Models
• A positive formula for the Ehrhart-like polynomials from root system chip-firing
• Attention Solves Your TSP
• Word sense induction using word embeddings and community detection in complex networks
• Quantifying Age and Model Uncertainties in Paleoclimate Data and Dynamical Climate Models with a Joint Inferential Analysis
• Circumspheres of sets of n+1 random points in the d-dimensional Euclidean unit ball (0<n<d+1)
• KonIQ-10k: Towards an ecologically valid and large-scale IQA database
• Optimality of refraction strategies for a constrained dividend problem
• Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
• Generalized Scene Reconstruction
Seglearn is an open-source python package for machine learning time series or sequences using a sliding window segmentation approach. The implementation provides a flexible pipeline for tackling classification, regression, and forecasting problems with multivariate sequence and contextual data. This package is compatible with scikit-learn and is listed under scikit-learn Related Projects. The package depends on numpy, scipy, and scikit-learn. Seglearn is distributed under the BSD 3-Clause License. Documentation includes a detailed API description, user guide, and examples. Unit tests provide a high degree of code coverage.
We propose a novel way to measure and understand convolutional neural networks by quantifying the amount of input signal they let in. To do this, an autoencoder (AE) was fine-tuned on gradients from a pre-trained classifier with fixed parameters. We compared the reconstructed samples from AEs that were fine-tuned on a set of image classifiers (AlexNet, VGG16, ResNet-50, and Inception~v3) and found substantial differences. The AE learns which aspects of the input space to preserve and which ones to ignore, based on the information encoded in the backpropagated gradients. Measuring the changes in accuracy when the signal of one classifier is used by a second one, a relation of total order emerges. This order depends directly on each classifier’s input signal but it does not correlate with classification accuracy or network size. Further evidence of this phenomenon is provided by measuring the normalized mutual information between original images and auto-encoded reconstructions from different fine-tuned AEs. These findings break new ground in the area of neural network understanding, opening a new way to reason, debug, and interpret their results. We present four concrete examples in the literature where observations can now be explained in terms of the input signal that a model uses.
An action should remain identifiable when modifying its speed: consider the contrast between an expert chef and a novice chef each chopping an onion. Here, we expect the novice chef to have a relatively measured and slow approach to chopping when compared to the expert. In general, the speed at which actions are performed, whether slower or faster than average, should not dictate how they are recognized. We explore the erratic behavior caused by this phenomena on state-of-the-art deep network-based methods for action recognition in terms of maximum performance and stability in recognition accuracy across a range of input video speeds. By observing the trends in these metrics and summarizing them based on expected temporal behaviour w.r.t. variations in input video speeds, we find two distinct types of network architectures. In this paper, we propose a preprocessing method named T-RECS, as a way to extend deep-network-based methods for action recognition to explicitly account for speed variability in the data. We do so by adaptively resampling the inputs to a given model. T-RECS is agnostic to the specific deep-network model; we apply it to four state-of-the-art action recognition architectures, C3D, I3D, TSN, and ConvNet+LSTM. On HMDB51 and UCF101, T-RECS-based I3D models show a peak improvement of at least 2.9% in performance over the baseline while T-RECS-based C3D models achieve a maximum improvement in stability by 59% over the baseline, on the HMDB51 dataset.
Traditionally it had been a problem that researchers did not have access to enough spatial data to answer pressing research questions or build compelling visualizations. Today, however, the problem is often that we have too much data. Spatially redundant or approximately redundant points may refer to a single feature (plus noise) rather than many distinct spatial features. We use a machine learning approach with density-based clustering to compress such spatial data into a set of representative features.
Mixing by cutting-and-shuffling can be understood and predicted using dynamical systems based tools and techniques. In existing studies, mixing is generated by maps that repeat the same cut-and-shuffle process at every iteration, in a ‘fixed’ manner. However, mixing can be greatly improved by varying the cut-and-shuffle parameters at each step, using a ‘variable’ approach. To demonstrate this approach, we show how to optimize mixing by cutting-and-shuffling on the one-dimensional line interval, known as an interval exchange transformation (IET). Mixing can be significantly improved by optimizing variable protocols, especially for initial conditions more complex than just a simple two-color line interval. While we show that optimal variable IETs can be found analytically for arbitrary numbers of iterations, for more complex cutting-and-shuffling systems, computationally expensive numerical optimization methods would be required. Furthermore, the number of control parameters grows linearly with the number of iterations in variable systems. Therefore, optimizing over large numbers of iterations is generally computationally prohibitive. We demonstrate an ad hoc approach to cutting-and-shuffling that is computationally inexpensive and guarantees the mixing metric is within a constant factor of the optimum. This ad hoc approach yields significantly better mixing than fixed IETs which are known to produce weak-mixing, because cut pieces never reconnect. The heuristic principles of this method can be applied to more general cutting-and-shuffling systems.
Despite major methodological developments, Bayesian inference for Gaussian graphical models remains challenging in high dimension due to the tremendous size of the model space. This article proposes a method to infer the marginal and conditional independence structures between variables by multiple testing of hypotheses. Specifically, we introduce closed-form Bayes factors under the Gaussian conjugate model to evaluate the null hypotheses of marginal and conditional independence between variables. Their computation for all pairs of variables is shown to be extremely efficient, thereby allowing us to address large problems with thousands of nodes. Moreover, we derive exact tail probabilities from the null distributions of the Bayes factors. These allow the use of any multiplicity correction procedure to control error rates for incorrect edge inclusion. We demonstrate the proposed approach to graphical model selection on various simulated examples as well as on a large gene expression data set from The Cancer Genome Atlas.
In the context of a sequential search problem, I explore large-generations learning dynamics for agents who suffer from the ‘gambler’s fallacy’ – the statistical bias of anticipating too much regression to the mean for realizations of independent random events. Searchers are uncertain about search pool qualities of different periods but infer these fundamentals from search outcomes of the previous generation. Searchers’ stopping decisions impose a censoring effect on the data of their successors, as the values they would have found in later periods had they kept searching remain unobserved. While innocuous for rational agents, this censoring effect interacts with the gambler’s fallacy and creates a feedback loop between distorted stopping rules and pessimistic beliefs about search pool qualities of later periods. In general settings, the stopping rules used by different generations monotonically converge to a steady-state rule that stops searching earlier than optimal. In settings where true pool qualities increase over time – so there is option value in rejecting above-average early draws – learning is monotonically harmful and welfare strictly decreases across generations.
Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.
Learning-based methods have been successful in solving complex control tasks without significant prior knowledge about the system. However, these methods typically do not provide any safety guarantees, which prevents their use in safety-critical, real-world applications. In this paper, we present a learning-based model predictive control scheme that provides provable high-probability safety guarantees. To this end, we exploit regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories. Unlike previous approaches, we do not assume that model uncertainties are independent. Based on these predictions, we guarantee that trajectories satisfy safety constraints. Moreover, we use a terminal set constraint to recursively guarantee the existence of safe control actions at every iteration. In our experiments, we show that the resulting algorithm can be used to safely and efficiently explore and learn about dynamic systems.
In this paper we consider a distributed convex optimization problem over time-varying networks. We propose a dual method that converges R-linearly to the optimal point given that the agents’ objective functions are strongly convex and have Lipschitz continuous gradients. The proposed method requires half the amount of variable exchanges per iterate than methods based on DIGing, and yields improved practical performance as empirically demonstrated.
Deep neural networks are often trained in the over-parametrized regime (i.e. with far more parameters than training examples), and understanding why the training converges to solutions that generalize remains an open problem. Several studies have highlighted the fact that the training procedure, i.e. mini-batch Stochastic Gradient Descent (SGD) leads to solutions that have specific properties in the loss landscape. However, even with plain Gradient Descent (GD) the solutions found in the over-parametrized regime are pretty good and this phenomenon is poorly understood. We propose an analysis of this behavior for feedforward networks with a ReLU activation function under the assumption of small initialization and learning rate and uncover a quantization effect: The weight vectors tend to concentrate at a small number of directions determined by the input data. As a consequence, we show that for given input data there are only finitely many, ‘simple’ functions that can be obtained, independent of the network size. This puts these functions in analogy to linear interpolations (for given input data there are finitely many triangulations, which each determine a function by linear interpolation). We ask whether this analogy extends to the generalization properties – while the usual distribution-independent generalization property does not hold, it could be that for e.g. smooth functions with bounded second derivative an approximation property holds which could ‘explain’ generalization of networks (of unbounded size) to unseen inputs.
Supervised learning frequently boils down to determining hidden and bright parameters in a parameterized hypothesis space based on finite input-output samples. The hidden parameters determine the attributions of hidden predictors or the nonlinear mechanism of an estimator, while the bright parameters characterize how hidden predictors are linearly combined or the linear mechanism. In traditional learning paradigm, hidden and bright parameters are not distinguished and trained simultaneously in one learning process. Such an one-stage learning (OSL) brings a benefit of theoretical analysis but suffers from the high computational burden. To overcome this difficulty, a two-stage learning (TSL) scheme, featured by learning through deterministic assignment of hidden parameters (LtDaHP) was proposed, which suggests to deterministically generate the hidden parameters by using minimal Riesz energy points on a sphere and equally spaced points in an interval. We theoretically show that with such deterministic assignment of hidden parameters, LtDaHP with a neural network realization almost shares the same generalization performance with that of OSL. We also present a series of simulations and application examples to support the outperformance of LtDaHP
We introduce the use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN). Conventionally, ReLU is used as an activation function in DNNs, with Softmax function as their classification function. However, there have been several studies on using a classification function other than Softmax, and this study is an addition to those. We accomplish this by taking the activation of the penultimate layer in a neural network, then multiply it by weight parameters to get the raw scores . Afterwards, we threshold the raw scores by , i.e. , where is the ReLU function. We provide class predictions through argmax function, i.e. argmax .
Parametric approaches to Learning, such as deep learning (DL), are highly popular in nonlinear regression, in spite of their extremely difficult training with their increasing complexity (e.g. number of layers in DL). In this paper, we present an alternative semi-parametric framework which foregoes the ordinarily required feedback, by introducing the novel idea of geometric regularization. We show that certain deep learning techniques such as residual network (ResNet) architecture are closely related to our approach. Hence, our technique can be used to analyze these types of deep learning. Moreover, we present preliminary results which confirm that our approach can be easily trained to obtain complex structures.
Conversational agents have become ubiquitous, ranging from goal-oriented systems for helping with reservations to chit-chat models found in modern virtual assistants. In this survey paper, we explore this fascinating field. We look at some of the pioneering work that defined the field and gradually move to the current state-of-the-art models. We look at statistical, neural, generative adversarial network based and reinforcement learning based approaches and how they evolved. Along the way we discuss various challenges that the field faces, lack of context in utterances, not having a good quantitative metric to compare models, lack of trust in agents because they do not have a consistent persona etc. We structure this paper in a way that answers these pertinent questions and discusses competing approaches to solve them.
Deep learning revolutionized data science, and recently, its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks such as human pose estimation did not escape this methodological change. The large number of deep architectures lead to a plethora of methods that are evaluated under different experimental protocols. Moreover, small changes in the architecture of the network, or in the data pre-processing procedure, together with the stochastic nature of the optimization methods, lead to notably different results, making extremely difficult to sift methods that significantly outperform others. Therefore, when proposing regression algorithms, practitioners proceed by trial-and-error. This situation motivated the current study, in which we perform a systematic evaluation and a statistical analysis of the performance of vanilla deep regression — short for convolutional neural networks with a linear regression top layer –. Up to our knowledge this is the first comprehensive analysis of deep regression techniques. We perform experiments on three vision problems and report confidence intervals for the median performance as well as the statistical significance of the results, if any. Surprisingly, the variability due to different data pre-processing procedures generally eclipses the variability due to modifications in the network architecture.
This paper introduces a simple and explicit measure of word importance in a global context, including very small contexts (10+ sentences). After generating a word-vector space containing both 2-gram clauses and single tokens, it became clear that more contextually significant words disproportionately define clause meanings. Using this simple relationship in a weighted bag-of-words sentence embedding model results in sentence vectors that outperform the state-of-the-art for subjectivity/objectivity analysis, as well as paraphrase detection, and fall within those produced by state-of-the-art models for six other transfer learning tests. The metric was then extended to a sentence/document summarizer, an improved (and context-aware) cosine distance and a simple document stop word identifier. The sigmoid-global context weighted bag of words is presented as a new baseline for sentence embeddings.
Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems — BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform or compete with its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.