Seglearn: A Python Package for Learning Sequences and Time Series

Seglearn is an open-source python package for machine learning time series or sequences using a sliding window segmentation approach. The implementation provides a flexible pipeline for tackling classification, regression, and forecasting problems with multivariate sequence and contextual data. This package is compatible with scikit-learn and is listed under scikit-learn Related Projects. The package depends on numpy, scipy, and scikit-learn. Seglearn is distributed under the BSD 3-Clause License. Documentation includes a detailed API description, user guide, and examples. Unit tests provide a high degree of code coverage.

What do Deep Networks Like to See?

We propose a novel way to measure and understand convolutional neural networks by quantifying the amount of input signal they let in. To do this, an autoencoder (AE) was fine-tuned on gradients from a pre-trained classifier with fixed parameters. We compared the reconstructed samples from AEs that were fine-tuned on a set of image classifiers (AlexNet, VGG16, ResNet-50, and Inception~v3) and found substantial differences. The AE learns which aspects of the input space to preserve and which ones to ignore, based on the information encoded in the backpropagated gradients. Measuring the changes in accuracy when the signal of one classifier is used by a second one, a relation of total order emerges. This order depends directly on each classifier’s input signal but it does not correlate with classification accuracy or network size. Further evidence of this phenomenon is provided by measuring the normalized mutual information between original images and auto-encoded reconstructions from different fine-tuned AEs. These findings break new ground in the area of neural network understanding, opening a new way to reason, debug, and interpret their results. We present four concrete examples in the literature where observations can now be explained in terms of the input signal that a model uses.

Pando: a Volunteer Computing Platform for the Web

Volunteer computing is currently successfully used to make hundreds of thousands of machines available free-of-charge to projects of general interest. However the effort and cost involved in participating in and launching such projects may explain why only a few high-profile projects use it and why only 0.1% of Internet users participate in them. In this paper we present Pando, a new web-based volunteer computing system designed to be easy to deploy and which does not require dedicated servers. The tool uses new demand-driven stream abstractions and a WebRTC overlay based on a fat tree for connecting volunteers. Together the stream abstractions and the fat-tree overlay enable a thousand browser tabs running on multiple machines to be used for computation, enough to tap into all machines bought as part of previous hardware investments made by a small- or medium-company or a university department. Moreover the approach is based on a simple programming model that should be both easy to use by itself by JavaScript programmers and as a compilation target by compiler writers. We provide a command-line version of the tool and all scripts and procedures necessary to replicate the experiments we made on the Grid5000 testbed.

T-RECS: Training for Rate-Invariant Embeddings by Controlling Speed for Action Recognition

An action should remain identifiable when modifying its speed: consider the contrast between an expert chef and a novice chef each chopping an onion. Here, we expect the novice chef to have a relatively measured and slow approach to chopping when compared to the expert. In general, the speed at which actions are performed, whether slower or faster than average, should not dictate how they are recognized. We explore the erratic behavior caused by this phenomena on state-of-the-art deep network-based methods for action recognition in terms of maximum performance and stability in recognition accuracy across a range of input video speeds. By observing the trends in these metrics and summarizing them based on expected temporal behaviour w.r.t. variations in input video speeds, we find two distinct types of network architectures. In this paper, we propose a preprocessing method named T-RECS, as a way to extend deep-network-based methods for action recognition to explicitly account for speed variability in the data. We do so by adaptively resampling the inputs to a given model. T-RECS is agnostic to the specific deep-network model; we apply it to four state-of-the-art action recognition architectures, C3D, I3D, TSN, and ConvNet+LSTM. On HMDB51 and UCF101, T-RECS-based I3D models show a peak improvement of at least 2.9% in performance over the baseline while T-RECS-based C3D models achieve a maximum improvement in stability by 59% over the baseline, on the HMDB51 dataset.

Clustering to Reduce Spatial Data Set Size

Traditionally it had been a problem that researchers did not have access to enough spatial data to answer pressing research questions or build compelling visualizations. Today, however, the problem is often that we have too much data. Spatially redundant or approximately redundant points may refer to a single feature (plus noise) rather than many distinct spatial features. We use a machine learning approach with density-based clustering to compress such spatial data into a set of representative features.

Optimized mixing by cutting-and-shuffling

Mixing by cutting-and-shuffling can be understood and predicted using dynamical systems based tools and techniques. In existing studies, mixing is generated by maps that repeat the same cut-and-shuffle process at every iteration, in a ‘fixed’ manner. However, mixing can be greatly improved by varying the cut-and-shuffle parameters at each step, using a ‘variable’ approach. To demonstrate this approach, we show how to optimize mixing by cutting-and-shuffling on the one-dimensional line interval, known as an interval exchange transformation (IET). Mixing can be significantly improved by optimizing variable protocols, especially for initial conditions more complex than just a simple two-color line interval. While we show that optimal variable IETs can be found analytically for arbitrary numbers of iterations, for more complex cutting-and-shuffling systems, computationally expensive numerical optimization methods would be required. Furthermore, the number of control parameters grows linearly with the number of iterations in variable systems. Therefore, optimizing over large numbers of iterations is generally computationally prohibitive. We demonstrate an ad hoc approach to cutting-and-shuffling that is computationally inexpensive and guarantees the mixing metric is within a constant factor of the optimum. This ad hoc approach yields significantly better mixing than fixed IETs which are known to produce weak-mixing, because cut pieces never reconnect. The heuristic principles of this method can be applied to more general cutting-and-shuffling systems.

Fast Bayesian inference in large Gaussian graphical models

Despite major methodological developments, Bayesian inference for Gaussian graphical models remains challenging in high dimension due to the tremendous size of the model space. This article proposes a method to infer the marginal and conditional independence structures between variables by multiple testing of hypotheses. Specifically, we introduce closed-form Bayes factors under the Gaussian conjugate model to evaluate the null hypotheses of marginal and conditional independence between variables. Their computation for all pairs of variables is shown to be extremely efficient, thereby allowing us to address large problems with thousands of nodes. Moreover, we derive exact tail probabilities from the null distributions of the Bayes factors. These allow the use of any multiplicity correction procedure to control error rates for incorrect edge inclusion. We demonstrate the proposed approach to graphical model selection on various simulated examples as well as on a large gene expression data set from The Cancer Genome Atlas.

Mislearning from Censored Data: Gambler’s Fallacy in a Search Problem

In the context of a sequential search problem, I explore large-generations learning dynamics for agents who suffer from the ‘gambler’s fallacy’ – the statistical bias of anticipating too much regression to the mean for realizations of independent random events. Searchers are uncertain about search pool qualities of different periods but infer these fundamentals from search outcomes of the previous generation. Searchers’ stopping decisions impose a censoring effect on the data of their successors, as the values they would have found in later periods had they kept searching remain unobserved. While innocuous for rational agents, this censoring effect interacts with the gambler’s fallacy and creates a feedback loop between distorted stopping rules and pessimistic beliefs about search pool qualities of later periods. In general settings, the stopping rules used by different generations monotonically converge to a steady-state rule that stops searching earlier than optimal. In settings where true pool qualities increase over time – so there is option value in rejecting above-average early draws – learning is monotonically harmful and welfare strictly decreases across generations.

An Analysis of Neural Language Modeling at Multiple Scales

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

Learning-based methods have been successful in solving complex control tasks without significant prior knowledge about the system. However, these methods typically do not provide any safety guarantees, which prevents their use in safety-critical, real-world applications. In this paper, we present a learning-based model predictive control scheme that provides provable high-probability safety guarantees. To this end, we exploit regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories. Unlike previous approaches, we do not assume that model uncertainties are independent. Based on these predictions, we guarantee that trajectories satisfy safety constraints. Moreover, we use a terminal set constraint to recursively guarantee the existence of safe control actions at every iteration. In our experiments, we show that the resulting algorithm can be used to safely and efficiently explore and learn about dynamic systems.

PANDA: A Dual Linearly Converging Method for Distributed Optimization over Time-Varying Undirected Graphs

In this paper we consider a distributed convex optimization problem over time-varying networks. We propose a dual method that converges R-linearly to the optimal point given that the agents’ objective functions are strongly convex and have Lipschitz continuous gradients. The proposed method requires half the amount of variable exchanges per iterate than methods based on DIGing, and yields improved practical performance as empirically demonstrated.

Gradient Descent Quantizes ReLU Network Features

Deep neural networks are often trained in the over-parametrized regime (i.e. with far more parameters than training examples), and understanding why the training converges to solutions that generalize remains an open problem. Several studies have highlighted the fact that the training procedure, i.e. mini-batch Stochastic Gradient Descent (SGD) leads to solutions that have specific properties in the loss landscape. However, even with plain Gradient Descent (GD) the solutions found in the over-parametrized regime are pretty good and this phenomenon is poorly understood. We propose an analysis of this behavior for feedforward networks with a ReLU activation function under the assumption of small initialization and learning rate and uncover a quantization effect: The weight vectors tend to concentrate at a small number of directions determined by the input data. As a consequence, we show that for given input data there are only finitely many, ‘simple’ functions that can be obtained, independent of the network size. This puts these functions in analogy to linear interpolations (for given input data there are finitely many triangulations, which each determine a function by linear interpolation). We ask whether this analogy extends to the generalization properties – while the usual distribution-independent generalization property does not hold, it could be that for e.g. smooth functions with bounded second derivative an approximation property holds which could ‘explain’ generalization of networks (of unbounded size) to unseen inputs.

Learning through deterministic assignment of hidden parameters

Supervised learning frequently boils down to determining hidden and bright parameters in a parameterized hypothesis space based on finite input-output samples. The hidden parameters determine the attributions of hidden predictors or the nonlinear mechanism of an estimator, while the bright parameters characterize how hidden predictors are linearly combined or the linear mechanism. In traditional learning paradigm, hidden and bright parameters are not distinguished and trained simultaneously in one learning process. Such an one-stage learning (OSL) brings a benefit of theoretical analysis but suffers from the high computational burden. To overcome this difficulty, a two-stage learning (TSL) scheme, featured by learning through deterministic assignment of hidden parameters (LtDaHP) was proposed, which suggests to deterministically generate the hidden parameters by using minimal Riesz energy points on a sphere and equally spaced points in an interval. We theoretically show that with such deterministic assignment of hidden parameters, LtDaHP with a neural network realization almost shares the same generalization performance with that of OSL. We also present a series of simulations and application examples to support the outperformance of LtDaHP

Deep Learning using Rectified Linear Units (ReLU)

We introduce the use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN). Conventionally, ReLU is used as an activation function in DNNs, with Softmax function as their classification function. However, there have been several studies on using a classification function other than Softmax, and this study is an addition to those. We accomplish this by taking the activation of the penultimate layer h_{n - 1} in a neural network, then multiply it by weight parameters \theta to get the raw scores o_{i}. Afterwards, we threshold the raw scores o_{i} by 0, i.e. f(o) = \max(0, o_{i}), where f(o) is the ReLU function. We provide class predictions \hat{y} through argmax function, i.e. argmax f(x).

Demystifying Deep Learning: A Geometric Approach to Iterative Projections

Parametric approaches to Learning, such as deep learning (DL), are highly popular in nonlinear regression, in spite of their extremely difficult training with their increasing complexity (e.g. number of layers in DL). In this paper, we present an alternative semi-parametric framework which foregoes the ordinarily required feedback, by introducing the novel idea of geometric regularization. We show that certain deep learning techniques such as residual network (ResNet) architecture are closely related to our approach. Hence, our technique can be used to analyze these types of deep learning. Moreover, we present preliminary results which confirm that our approach can be easily trained to obtain complex structures.

The Rapidly Changing Landscape of Conversational Agents

Conversational agents have become ubiquitous, ranging from goal-oriented systems for helping with reservations to chit-chat models found in modern virtual assistants. In this survey paper, we explore this fascinating field. We look at some of the pioneering work that defined the field and gradually move to the current state-of-the-art models. We look at statistical, neural, generative adversarial network based and reinforcement learning based approaches and how they evolved. Along the way we discuss various challenges that the field faces, lack of context in utterances, not having a good quantitative metric to compare models, lack of trust in agents because they do not have a consistent persona etc. We structure this paper in a way that answers these pertinent questions and discusses competing approaches to solve them.

A Comprehensive Analysis of Deep Regression

Deep learning revolutionized data science, and recently, its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks such as human pose estimation did not escape this methodological change. The large number of deep architectures lead to a plethora of methods that are evaluated under different experimental protocols. Moreover, small changes in the architecture of the network, or in the data pre-processing procedure, together with the stochastic nature of the optimization methods, lead to notably different results, making extremely difficult to sift methods that significantly outperform others. Therefore, when proposing regression algorithms, practitioners proceed by trial-and-error. This situation motivated the current study, in which we perform a systematic evaluation and a statistical analysis of the performance of vanilla deep regression — short for convolutional neural networks with a linear regression top layer –. Up to our knowledge this is the first comprehensive analysis of deep regression techniques. We perform experiments on three vision problems and report confidence intervals for the median performance as well as the statistical significance of the results, if any. Surprisingly, the variability due to different data pre-processing procedures generally eclipses the variability due to modifications in the network architecture.

Context is Everything: Finding Meaning Statistically in Semantic Spaces

This paper introduces a simple and explicit measure of word importance in a global context, including very small contexts (10+ sentences). After generating a word-vector space containing both 2-gram clauses and single tokens, it became clear that more contextually significant words disproportionately define clause meanings. Using this simple relationship in a weighted bag-of-words sentence embedding model results in sentence vectors that outperform the state-of-the-art for subjectivity/objectivity analysis, as well as paraphrase detection, and fall within those produced by state-of-the-art models for six other transfer learning tests. The metric was then extended to a sentence/document summarizer, an improved (and context-aware) cosine distance and a simple document stop word identifier. The sigmoid-global context weighted bag of words is presented as a new baseline for sentence embeddings.

Group Normalization

Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems — BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform or compete with its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.

Jet Charge and Machine Learning
Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses
Olive Oil is Made of Olives, Baby Oil is Made for Babies: Interpreting Noun Compounds using Paraphrases in a Neural Model
Asynchronous Distributed Optimization with Heterogeneous Regularizations and Normalizations
Spectrahedral Lifts of Convex Sets
Renewal Population Dynamics and their Eternal Family Trees
Probabilistic Video Generation using Holistic Attribute Control
Incremental Learning-to-Learn with Statistical Guarantees
A Unified Framework for Multi-View Multi-Class Object Pose Estimation
Distributed Mechanism Design for Multicast Transmission
Stochastic PDE Limit of the Six Vertex Model
Fisher Pruning of Deep Nets for Facial Trait Classification
Robust Blind Deconvolution via Mirror Descent
Extended depth-of-field in holographic image reconstruction using deep learning based auto-focusing and phase-recovery
On locally repeated values of arithmetic functions over $\mathbb F_q[T]$
Efficient Search of QC-LDPC Codes with Girths 6 and 8 and Free of Elementary Trapping Sets with Small Size
Circular repetition thresholds on some small alphabets
Connectivity-Preserving Coordination Control of Multi-Agent Systems with Time-Varying Delays
Learning the Localization Function: Machine Learning Approach to Fingerprinting Localization
Network and Panel Quantile Effects Via Distribution Regression
On the Parameterized Computation of Minimum Volume Outer Ellipsoid of Minkowski Sum of Ellipsoids
Entropy-based closure for probabilistic learning on manifolds
Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks
Optimal price management in retail energy markets: an impulse control problem with asymptotic estimates
Financial Contagion in a Generalized Stochastic Block Model
Boosted Density Estimation Remastered
Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks
Generalized Optimization of High Capacity Compressive Imaging Systems
Sensing Matrix Design via Capacity Maximization for Block Compressive Sensing Applications
Can Decentralized Status Update Achieve Universally Near-Optimal Age-of-Information in Wireless Multiaccess Channels?
Deep Pose Consensus Networks
SUCAG: Stochastic Unbiased Curvature-aided Gradient Method for Distributed Optimization
Randomness and Permutations in Coordinate Descent Methods
Residual Networks: Lyapunov Stability and Convex Decomposition
Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs
Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection
Characterising knotting properties of polymers in nanochannels
A Topological Approach to Secure Message Dissemination in Vehicular Networks
PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model
SCISPACE: A Scientific Collaboration Workspace for File Systems in Geo-Distributed HPC Data Centers
Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations
Statistical approach to flow stress and generalized Hall-Petch law for polycrystalline materials under plastic deformations
A non-homogeneous hidden Markov model for partially observed longitudinal responses
The generating function of planar Eulerian orientations
Persistence Weighted Gaussian Kernel for Probability Distributions on the Space of Persistence Diagrams
A quantum algorithm for simulating non-sparse Hamiltonians
Speaker Clustering With Neural Networks And Audio Processing
Synchronization of Coupled Oscillators: The Taylor Expansion of the Inverse Kuramoto Map
The Harborth Constant for Dihedral Groups
Machine learning classification for field distributions of photonic modes
The convex hull of a planar random walk: perimeter, diameter, and shape
A Study of Delay Drifts on Massive MIMO Wideband Channel Models
A Generalized Framework for Chance-constrained Optimal Power Flow
On the stability of persistent entropy and new summary functions for TDA
Learning Eligibility in Clinical Cancer Trials using Deep Neural Networks
Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data
Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World
Quenches near Ising quantum criticality as a challenge for artificial neural networks
Prioritized Multi-View Stereo Depth Map Generation Using Confidence Prediction
Analysis of dependent scattering mechanism in hard-sphere Yukawa random media
Dichromatic Gray Pixel for Camera-agnostic Color Constancy
A framework for Culture-aware Robots based on Fuzzy Logic
Venue Suggestion Using Social-Centric Scores
Structured Output Learning with Abstention: Application to Accurate Opinion Prediction
Sum-Product estimates and expanding polynomials over matrix rings
On LDPC Code Based Massive Random-Access Scheme for the Gaussian Multiple Access Channel
A trust-based recommendation method using network diffusion processes
Model Consistency for Learning with Mirror-Stratifiable Regularizers
Observability and State Estimation for a Class of Nonlinear Systems
Calibrating Model-Based Inferences and Decisions
Found a good match: should I keep searching? – Accuracy and Performance in Iris Matching Using 1-to-First Search
Densely Connected Pyramid Dehazing Network
Sequence pairs with asymptotically optimal aperiodic correlation
PlaneMatch: Patch Coplanarity Prediction for Robust RGB-D Reconstruction
Structure connectivity and substructure connectivity of twisted hypercubes
Quality expectations of machine translation
A Smoke Removal Method for Laparoscopic Images
Group Sparsity Residual with Non-Local Samples for Image Denoising
Buried object detection from B-scan ground penetrating radar data using Faster-RCNN
Signaling Game-based Misbehavior Inspection in V2I-enabled Highway Operations
Incremental Color Quantization for Color-Vision-Deficient Observers Using Mobile Gaming Data
Edge Kempe equivalence of regular graph covers
Parallel tree algorithms for AMR and non-standard data access
Guided Image Inpainting: Replacing an Image Region by Pulling Content from Another Image
Liminal reciprocity and factorization statistics
Active colloidal chains with cilia- and flagella-like motion
A Quantile-Based Approach to Modelling Recovery Time in Structural Health Monitoring
Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft
Clustering-driven Deep Embedding with Pairwise Constraints
Frieze varieties : A characterization of the finite-tame-wild trichotomy for acyclic quivers
Towards Universal Representation for Unseen Action Recognition
Hypergraph cuts above the average
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Evaluation Campaign
Branched Generative Adversarial Networks for Multi-Scale Image Manifold Learning
Locally Private Bayesian Inference for Count Models
A positive formula for the Ehrhart-like polynomials from root system chip-firing
Attention Solves Your TSP
Word sense induction using word embeddings and community detection in complex networks
Quantifying Age and Model Uncertainties in Paleoclimate Data and Dynamical Climate Models with a Joint Inferential Analysis
Circumspheres of sets of n+1 random points in the d-dimensional Euclidean unit ball (0<n<d+1)
KonIQ-10k: Towards an ecologically valid and large-scale IQA database
Optimality of refraction strategies for a constrained dividend problem
Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
Generalized Scene Reconstruction