Incremental Training of Deep Convolutional Neural Networks

We propose an incremental training method that partitions the original network into sub-networks, which are then gradually incorporated in the running network during the training process. To allow for a smooth dynamic growth of the network, we introduce a look-ahead initialization that outperforms the random initialization. We demonstrate that our incremental approach reaches the reference network baseline accuracy. Additionally, it allows to identify smaller partitions of the original state-of-the-art network, that deliver the same final accuracy, by using only a fraction of the global number of parameters. This allows for a potential speedup of the training time of several factors. We report training results on CIFAR-10 for ResNet and VGGNet.

Network Traffic Anomaly Detection Using Recurrent Neural Networks

We show that a recurrent neural network is able to learn a model to represent sequences of communications between computers on a network and can be used to identify outlier network traffic. Defending computer networks is a challenging problem and is typically addressed by manually identifying known malicious actor behavior and then specifying rules to recognize such behavior in network communications. However, these rule-based approaches often generalize poorly and identify only those patterns that are already known to researchers. An alternative approach that does not rely on known malicious behavior patterns can potentially also detect previously unseen patterns. We tokenize and compress netflow into sequences of ‘words’ that form ‘sentences’ representative of a conversation between computers. These sentences are then used to generate a model that learns the semantic and syntactic grammar of the newly generated language. We use Long-Short-Term Memory (LSTM) cell Recurrent Neural Networks (RNN) to capture the complex relationships and nuances of this language. The language model is then used predict the communications between two IPs and the prediction error is used as a measurement of how typical or atyptical the observed communication are. By learning a model that is specific to each network, yet generalized to typical computer-to-computer traffic within and outside the network, a language model is able to identify sequences of network activity that are outliers with respect to the model. We demonstrate positive unsupervised attack identification performance (AUC 0.84) on the ISCX IDS dataset which contains seven days of network activity with normal traffic and four distinct attack patterns.

Understanding Autoencoders with Information Theoretic Concepts

Despite their great success in practical applications, there is still a lack of theoretical and systematic methods to analyze deep neural networks. In this paper, we illustrate an advanced information theoretic methodology to understand the dynamics of learning and the design of autoencoders, a special type of deep learning architectures that resembles a communication channel. By generalizing the information plane to any cost function, and inspecting the roles and dynamics of different layers using layer-wise information quantities, we emphasize the role that mutual information plays in quantifying learning from data. We further propose and also experimentally validate, for mean square error training, two hypotheses regarding the layer-wise flow of information and intrinsic dimensionality of the bottleneck layer, using respectively the data processing inequality and the identification of a bifurcation point in the information plane that is controlled by the given data. Our observations have direct impact on the optimal design of autoencoders, the design of alternative feedforward training methods, and even in the problem of generalization.

A comparative analysis of state-of-the-art SQL-on-Hadoop systems for interactive analytics

Hadoop is emerging as the primary data hub in enterprises, and SQL represents the de facto language for data analysis. This combination has led to the development of a variety of SQL-on-Hadoop systems in use today. While the various SQL-on-Hadoop systems target the same class of analytical workloads, their different architectures, design decisions and implementations impact query performance. In this work, we perform a comparative analysis of four state-of-the-art SQL-on-Hadoop systems (Impala, Drill, Spark SQL and Phoenix) using the Web Data Analytics micro benchmark and the TPC-H benchmark on the Amazon EC2 cloud platform. The TPC-H experiment results show that, although Impala outperforms other systems (4.41x – 6.65x) in the text format, trade-offs exists in the parquet format, with each system performing best on subsets of queries. A comprehensive analysis of execution profiles expands upon the performance results to provide insights into performance variations, performance bottlenecks and query execution characteristics.

Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning

As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by machine learning algorithms. In this paper, we perform the first systematic study of poisoning attacks and their countermeasures for linear regression models. In poisoning attacks, attackers deliberately influence the training data to manipulate the results of a predictive model. We propose a theoretically-grounded optimization framework specifically designed for linear regression and demonstrate its effectiveness on a range of datasets and models. We also introduce a fast statistical attack that requires limited knowledge of the training process. Finally, we design a new principled defense method that is highly resilient against all poisoning attacks. We provide formal guarantees about its convergence and an upper bound on the effect of poisoning attacks when the defense is deployed. We evaluate extensively our attacks and defenses on three realistic datasets from health care, loan assessment, and real estate domains.

Stochastic EM for Shuffled Linear Regression

We consider the problem of inference in a linear regression model in which the relative ordering of the input features and output labels is not known. Such datasets naturally arise from experiments in which the samples are shuffled or permuted during the protocol. In this work, we propose a framework that treats the unknown permutation as a latent variable. We maximize the likelihood of observations using a stochastic expectation-maximization (EM) approach. We compare this to the dominant approach in the literature, which corresponds to hard EM in our framework. We show on synthetic data that the stochastic EM algorithm we develop has several advantages, including lower parameter error, less sensitivity to the choice of initialization, and significantly better performance on datasets that are only partially shuffled. We conclude by performing two experiments on real datasets that have been partially shuffled, in which we show that the stochastic EM algorithm can recover the weights with modest error.

Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data

We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting.

Neural Autoregressive Flows

Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.

Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

Data poisoning is a type of adversarial attack on machine learning models wherein the attacker adds examples to the training set to manipulate the behavior of the model at test time. This paper explores a broad class of poisoning attacks on neural nets. The proposed attacks use ‘clean-labels’; they don’t require the attacker to have any control over the labeling of training data. They are also targeted; they control the behavior of the classifier on a specific test instance without noticeably degrading classifier performance on other instances. For example, an attacker could add a seemingly innocuous image (that is properly labeled) to a training set for a face recognition engine, and control the identity of a chosen person at test time. Because the attacker does not need to control the labeling function, poisons could be entered into the training set simply by putting them online and waiting for them to be scraped by a data collection bot. We present an optimization-based method for crafting poisons, and show that just one single poison image can control classifier behavior when transfer learning is used. For full end-to-end training, we present a ‘watermarking’ strategy that makes poisoning reliable using multiple (~50) poisoned training instances. We demonstrate our method by generating poisoned frog images from the CIFAR dataset and using them to manipulate image classifiers.

Deep Appearance Maps

We propose a deep representation of appearance, i. e. the relation of color, surface orientation, viewer position, material and illumination. Previous approaches have used deep learning to extract classic appearance representations relating to reflectance model parameters (e. g. Phong) or illumination (e. g. HDR environment maps). We suggest to directly represent appearance itself as a network we call a deep appearance map (DAM). This is a 4D generalization over 2D reflectance maps, which held the view direction fixed. First, we show how a DAM can be learned from images or video frames and later be used to synthesize appearance, given new surface orientations and viewer positions. Second, we demonstrate how another network can be used to map from an image or video frames to a DAM network to reproduce this appearance, without using a lengthy optimization such as stochastic gradient descent (learning-to-learn). Finally, we generalize this to an appearance estimation-and-segmentation task, where we map from an image showing multiple materials to multiple networks reproducing their appearance, as well as per-pixel segmentation.

Database Consistency Models

A data store allows application processes to put and get data from a shared memory. In general, a data store cannot be modelled as a strictly sequential process. Applications observe non-sequential behaviours, called anomalies. The set of possible behaviours, and conversely of possible anomalies, constitutes the consistency model of the data store.

Average performance analysis of the stochastic gradient method for online PCA

This paper studies the complexity of the stochastic gradient algorithm for PCA when the data are observed in a streaming setting. We also propose an online approach for selecting the learning rate. Simulation experiments confirm the practical relevance of the plain stochastic gradient approach and that drastic improvements can be achieved by learning the learning rate.

A Language for Function Signature Representations
In-depth Question classification using Convolutional Neural Networks
Different classes of binary necklaces and a combinatorial method for their enumerations
CIKM AnalytiCup 2017 Lazada Product Title Quality Challenge An Ensemble of Deep and Shallow Learning to predict the Quality of Product Titles
Explicit Formulas for Cauchy and Bernoulli Polynomials with a $q$ Parameter in Terms of $r$-Whitney Numbers
Fixed points of competitive threshold-linear networks
Face Alignment in Full Pose Range: A 3D Total Solution
Improving Massive MIMO Belief Propagation Detector with Deep Neural Network
Dold’s Theorem from Viewpoint of Strong Compatibility Graphs
Switching Control of Linear Time-Varying Networked Control Systems with Sparse Observer-Controller Networks
Resilient Non-Submodular Maximization over Matroid Constraints
Newton complementary duals of $f$-ideals
Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments
Generative Adversarial Learning for Spectrum Sensing
Approximation of mild solutions of a semilinear fractional elliptic equation with random noise
Predicting Electric Vehicle Charging Station Usage: Using Machine Learning to Estimate Individual Station Statistics from Physical Configurations of Charging Station Networks
Simple and Effective Semi-Supervised Question Answering
Hierarchical Novelty Detection for Visual Object Recognition
Confidence from Invariance to Image Transformations
Robust Scale-Free Synthesis for Frequency Control in Power Systems
Speaker-Invariant Training via Adversarial Learning
A Fast Divide-and-Conquer Sparse Cox Regression
Deep Spatiotemporal Models for Robust Proprioceptive Terrain Classification
Minimizing Content Staleness in Dynamo-Style Replicated Storage Systems
Hybrid Optimal Theory and Predictive Control for Power Management in Hybrid Electric Vehicle
Process Control with Highly Left Censored Data
BBCPOP: A Sparse Doubly Nonnegative Relaxation of Polynomial Optimization Problems with Binary, Box and Complementarity Constraints
Calibration of Sobol indices estimates in case of noisy output
A Bi-population Particle Swarm Optimizer for Learning Automata based Slow Intelligent System
VerdictDB: Universalizing Approximate Query Processing
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Further results on Andrews–Yee’s two identities for mock theta functions $ω(z;q)$ and $v(z;q)$
High-Dimensional Joint Estimation of Multiple Directed Gaussian Graphical Models
3D Interpreter Networks for Viewer-Centered Wireframe Modeling
Learning-Based Task Offloading for Vehicular Cloud Computing Systems
Multi-Scale Spatially-Asymmetric Recalibration for Image Classification
A spline-assisted semiparametric approach to non-parametric measurement error models
Estimation of Markov Chain via Rank-constrained Likelihood
Left-Right Comparative Recurrent Model for Stereo Matching
Some new bounds on LCD codes over finite fields
Stochastic Primal-Dual Coordinate Method for Nonlinear Convex Cone Programs
Automatic Normalization of Word Variations in Code-Mixed Social Media Text
Emotions are Universal: Learning Sentiment Based Representations of Resource-Poor Languages using Siamese Networks
Sentiment Analysis of Code-Mixed Languages leveraging Resource Rich Languages
Full Characterization of Optimal Uncoded Placement for the Structured Clique Cover Delivery of Nonuniform Demands
Simple estimators for network sampling
StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning
Convolutional Neural Networks Regularized by Correlated Noise
End-to-End Dense Video Captioning with Masked Transformer
Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks
Incorporating Word Embeddings into Open Directory Project based Large-scale Classification
AttnConvnet at SemEval-2018 Task 1: Attention-based Convolutional Neural Networks for Multi-label Emotion Classification
Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text
Learning on Hypergraphs with Sparsity
Learning to Search via Self-Imitation
Modeling the Multipath Cross-Polarization Ratio for Above-6 GHz Radio Links
Robust estimation of continuous-time ARMA models via indirect inference
Boxicity, poset dimension, and excluded minors
Koopman Operator Based Finite-Set Model Predictive Control for Electrical Drives
Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling
Prediction and Localization of Student Engagement in the Wild
Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning
Adaptive distributed methods under communication constraints
Exploring Multi-Branch and High-Level Semantic Networks for Improving Pedestrian Detection
CodeSLAM – Learning a Compact, Optimisable Representation for Dense Visual SLAM
Weakly Supervised Instance Segmentation using Class Peak Response
PhaseNet for Video Frame Interpolation
Isolated factorizations and their applications in simplicial affine semigroups
Learning to Guide Decoding for Image Captioning
Grouped Heterogeneous Mixture Modeling for Clustered Data
Hyperspherical Variational Auto-Encoders
When will you do what? – Anticipating Temporal Occurrences of Activities
Novel mobility edges in the off-diagonal disordered tight-binding models
Qubits through Queues: The Capacity of Channels with Waiting Time Dependent Errors
Cognitive Radio from Hell: Flipping Attack on Direct-Sequence Spread Spectrum
Towards whole-body CT Bone Segmentation
Approximate lumpability for Markovian agent-based models using local symmetries
Speech waveform synthesis from MFCC sequences with generative adversarial networks
DeSIGN: Design Inspiration from Generative Networks
Correlated discrete data generation using adversarial training
Symbol-Level Precoding Design Based on Distance Preserving Constructive Interference Regions
Dynamic Video Segmentation Network
A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations
Unsupervised Learning of Sequence Representations by Autoencoders
The Transactional Conflict Problem
A unified approach to construct snarks with circular flow number 5
An Enhanced Symmetry for the $p$-adic Wavelets
Tree lengths for general $Λ$-coalescents and the asymptotic site frequency spectrum around the Bolthausen-Sznitman coalescent
On the regularity of abnormal minimizers for rank $2$ sub-Riemannian structures
360° Stance Detection
Tail Asymptotics for a Retrial Queue with Bernoulli Schedule
On tight bounds for the Lasso
Large-Scale Cox Process Inference using Variational Fourier Features
Distributionally Linearizable Data Structures
Computing Inferences for Large-Scale Continuous-Time Markov Chains by Combining Lumping with Imprecision
Attracting Tangles to Solve Parity Games
On set systems without a simplex-cluster and the Junta method
Provably Robust Learning-Based Approach for High-Accuracy Tracking Control of Lagrangian Systems
Propagation of chaos and the many-demes limit for weakly interacting diffusions in the sparse regime
Cache-Aided Interactive Multiview Video Streaming in Small Cell Wireless Networks
Multi-lingual neural title generation for e-Commerce browse pages
Social versus Moral preferences in the Ultimatum Game: A theoretical model and an experiment
Holiest Minimum-Cost Paths and Flows in Surface Graphs
Multichannel Design of Non uniform Constellations for Broadcast/Multicast Services
Two-stage approach for the inference of the source of high-dimension and complex chemical data in forensic science
Training VAEs Under Structured Residuals
Prediction intervals for random-effects meta-analysis: a confidence distribution approach
The exact chromatic number of the convex segment disjointness graph
Sparse graphs without linear anticomplete pairs
Operator Scaling via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing
Transferring Common-Sense Knowledge for Object Detection
Disconnectedness and unboundedness of the solution sets of monotone vector variational inequalities