New HSIC-based tests for independence between two stationary multivariate time series

This paper proposes some novel one-sided omnibus tests for independence between two multivariate stationary time series. These new tests apply the Hilbert-Schmidt independence criterion (HSIC) to test the independence between the innovations of both time series. Under regular conditions, the limiting null distributions of our HSIC-based tests are established. Next, our HSIC-based tests are shown to be consistent. Moreover, a residual bootstrap method is used to obtain the critical values for our HSIC-based tests, and its validity is justified. Compared with the existing cross-correlation-based tests for linear dependence, our tests examine the general (including both linear and non-linear) dependence to give investigators more complete information on the causal relationship between two multivariate time series. The merits of our tests are illustrated by some simulation results and a real example.

Extended Vertical Lists for Temporal Pattern Mining from Multivariate Time Series

Temporal Pattern Mining (TPM) is the problem of mining predictive complex temporal patterns from multivariate time series in a supervised setting. We develop a new method called the Fast Temporal Pattern Mining with Extended Vertical Lists. This method utilizes an extension of the Apriori property which requires a more complex pattern to appear within records only at places where all of its subpatterns are detected as well. The approach is based on a novel data structure called the Extended Vertical List that tracks positions of the first state of the pattern inside records. Extensive computational results indicate that the new method performs significantly faster than the previous version of the algorithm for TMP. However, the speed-up comes at the expense of memory usage.

Profile-guided memory optimization for deep neural networks

Recent years have seen deep neural networks (DNNs) becoming wider and deeper to achieve better performance in many applications of AI. Such DNNs however require huge amounts of memory to store weights and intermediate results (e.g., activations, feature maps, etc.) in propagation. This requirement makes it difficult to run the DNNs on devices with limited, hard-to-extend memory, degrades the running time performance, and restricts the design of network models. We address this challenge by developing a novel profile-guided memory optimization to efficiently and quickly allocate memory blocks during the propagation in DNNs. The optimization utilizes a simple and fast heuristic algorithm based on the two-dimensional rectangle packing problem. Experimenting with well-known neural network models, we confirm that our method not only reduces the memory consumption by up to 49.5\% but also accelerates training and inference by up to a factor of four thanks to the rapidity of the memory allocation and the ability to use larger mini-batch sizes.

Weak Labeling for Crowd Learning

Crowdsourcing has become very popular among the machine learning community as a way to obtain labels that allow a ground truth to be estimated for a given dataset. In most of the approaches that use crowdsourced labels, annotators are asked to provide, for each presented instance, a single class label. Such a request could be inefficient, that is, considering that the labelers may not be experts, that way to proceed could fail to take real advantage of the knowledge of the labelers. In this paper, the use of weak labeling for crowd learning is proposed, where the annotators may provide more than a single label per instance to try not to miss the real label. The main hypothesis is that, by allowing weak labeling, knowledge can be extracted from the labelers more efficiently by than in the standard crowd learning scenario. Empirical evidence which supports that hypothesis is presented.

Securing Distributed Machine Learning in High Dimensions

We consider securing a distributed machine learning system wherein the data is kept confidential by its providers who are recruited as workers to help the learner to train a d–dimensional model. In each communication round, up to q out of the m workers suffer Byzantine faults; faulty workers are assumed to have complete knowledge of the system and can collude to behave arbitrarily adversarially against the learner. We assume that each worker keeps a local sample of size n. (Thus, the total number of data points is N=nm.) Of particular interest is the high-dimensional regime d \gg n. We propose a secured variant of the classical gradient descent method which can tolerate up to a constant fraction of Byzantine workers. We show that the estimation error of the iterates converges to an estimation error O(\sqrt{q/N} + \sqrt{d/N}) in O(\log N) rounds. The core of our method is a robust gradient aggregator based on the iterative filtering algorithm proposed by Steinhardt et al. \cite{Steinhardt18} for robust mean estimation. We establish a uniform concentration of the sample covariance matrix of gradients, and show that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function. As a by-product, we develop a new concentration inequality for sample covariance matrices of sub-exponential distributions, which might be of independent interest.

TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation

Interacting with relational databases through natural language helps users of any background easily query and analyze a vast amount of data. This requires a system that understands users’ questions and converts them to SQL queries automatically. In this paper we present a novel approach, TypeSQL, which views this problem as a slot filling task. Additionally, TypeSQL utilizes type information to better understand rare entities and numbers in natural language questions. We test this idea on the WikiSQL dataset and outperform the prior state-of-the-art by 5.5% in much less time. We also show that accessing the content of databases can significantly improve the performance when users’ queries are not well-formed. TypeSQL gets 82.6% accuracy, a 17.5% absolute improvement compared to the previous content-sensitive model.

Improved Classification Based on Deep Belief Networks

For better classification generative models are used to initialize the model and model features before training a classifier. Typically it is needed to solve separate unsupervised and supervised learning problems. Generative restricted Boltzmann machines and deep belief networks are widely used for unsupervised learning. We developed several supervised models based on DBN in order to improve this two-phase strategy. Modifying the loss function to account for expectation with respect to the underlying generative model, introducing weight bounds, and multi-level programming are applied in model development. The proposed models capture both unsupervised and supervised objectives effectively. The computational study verifies that our models perform better than the two-phase training approach.

Generative Model for Heterogeneous Inference

Generative models (GMs) such as Generative Adversary Network (GAN) and Variational Auto-Encoder (VAE) have thrived these years and achieved high quality results in generating new samples. Especially in Computer Vision, GMs have been used in image inpainting, denoising and completion, which can be treated as the inference from observed pixels to corrupted pixels. However, images are hierarchically structured which are quite different from many real-world inference scenarios with non-hierarchical features. These inference scenarios contain heterogeneous stochastic variables and irregular mutual dependences. Traditionally they are modeled by Bayesian Network (BN). However, the learning and inference of BN model are NP-hard thus the number of stochastic variables in BN is highly constrained. In this paper, we adapt typical GMs to enable heterogeneous learning and inference in polynomial time.We also propose an extended autoregressive (EAR) model and an EAR with adversary loss (EARA) model and give theoretical results on their effectiveness. Experiments on several BN datasets show that our proposed EAR model achieves the best performance in most cases compared to other GMs. Except for black box analysis, we’ve also done a serial of experiments on Markov border inference of GMs for white box analysis and give theoretical results.

Accelerator-Aware Pruning for Convolutional Neural Networks

Convolutional neural networks have shown tremendous performance in computer vision tasks,but their excessive amount of weights and operations prevent them from being adopted in embedded environments. One of the solutions involves pruning, where some unimportant weights are forced to be zero. Many pruning schemes have been proposed, but have focused mainly on the number of pruned weights. The previous pruning schemes hardly considered ASIC or FPGA accelerator architectures. When the pruned networks are run on the accelerators, the lack of architecture consideration casues some inefficiency problems including internal buffer mis-alignment and load imbalance. This paper proposes a new pruning scheme that reflects accelerator architectures. In the proposed scheme, pruning is performed so that the same number of weights remain for each weight group corresponding to activations fetched simultaneously. In this way, the pruning scheme resolves the inefficiency problems. Even with the constraint, the proposed pruning scheme reached a pruning ratio similar to that of the previous unconstrained pruning schemes not only in AlexNet and VGG16 but also in the state-of-the-art very-deep networks like ResNet. Furthermore, the proposed scheme demonstrated a comparable pruning ratio in slimmed networks that were already pruned channel-wisely. In addition to improving the efficiency of previous sparse accelerators, it will be also shown that the proposed pruning scheme can be used to reduce the logic complexity of sparse accelerators.

Social Network Fusion and Mining: A Survey

Looking from a global perspective, the landscape of online social networks is highly fragmented. A large number of online social networks have appeared, which can provide users with various types of services. Generally, the information available in these online social networks is of diverse categories, which can be represented as heterogeneous social networks (HSN) formally. Meanwhile, in such an age of online social media, users usually participate in multiple online social networks simultaneously to enjoy more social networks services, who can act as bridges connecting different networks together. So multiple HSNs not only represent information in single network, but also fuse information from multiple networks. Formally, the online social networks sharing common users are named as the aligned social networks, and these shared users who act like anchors aligning the networks are called the anchor users. The heterogeneous information generated by users’ social activities in the multiple aligned social networks provides social network practitioners and researchers with the opportunities to study individual user’s social behaviors across multiple social platforms simultaneously. This paper presents a comprehensive survey about the latest research works on multiple aligned HSNs studies based on the broad learning setting, which covers 5 major research tasks network alignment, link prediction, community detection, information diffusion and network embedding respectively.

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

Video representation learning is a vital problem for classification task. Recently, a promising unsupervised paradigm termed self-supervised learning has emerged, which explores inherent supervisory signals implied in massive data for feature learning via solving auxiliary tasks. However, existing methods in this regard suffer from two limitations when extended to video classification. First, they focus only on a single task, whereas ignoring complementarity among different task-specific features and thus resulting in suboptimal video representation. Second, high computational and memory cost hinders their application in real-world scenarios. In this paper, we propose a graph-based distillation framework to address these problems: (1) We propose logits graph and representation graph to transfer knowledge from multiple self-supervised tasks, where the former distills classifier-level knowledge by solving a multi-distribution joint matching problem, and the latter distills internal feature knowledge from pairwise ensembled representations with tackling the challenge of heterogeneity among different features; (2) The proposal that adopts a teacher-student framework can reduce the redundancy of knowledge learnt from teachers dramatically, leading to a lighter student model that solves classification task more efficiently. Experimental results on 3 video datasets validate that our proposal not only helps learn better video representation but also compress model for faster inference.

Quantized Compressive K-Means

The recent framework of compressive statistical learning aims at designing tractable learning algorithms that use only a heavily compressed representation-or sketch-of massive datasets. Compressive K-Means (CKM) is such a method: it estimates the centroids of data clusters from pooled, non-linear, random signatures of the learning examples. While this approach significantly reduces computational time on very large datasets, its digital implementation wastes acquisition resources because the learning examples are compressed only after the sensing stage. The present work generalizes the sketching procedure initially defined in Compressive K-Means to a large class of periodic nonlinearities including hardware-friendly implementations that compressively acquire entire datasets. This idea is exemplified in a Quantized Compressive K-Means procedure, a variant of CKM that leverages 1-bit universal quantization (i.e. retaining the least significant bit of a standard uniform quantizer) as the periodic sketch nonlinearity. Trading for this resource-efficient signature (standard in most acquisition schemes) has almost no impact on the clustering performances, as illustrated by numerical experiments.

Handling Missing Values using Decision Trees with Branch-Exclusive Splits

In this article we propose a new decision tree construction algorithm. The proposed approach allows the algorithm to interact with some predictors that are only defined in subspaces of the feature space. One way to utilize this new tool is to create or use one of the predictors to keep track of missing values. This predictor can later be used to define the subspace where predictors with missing values are available for the data partitioning process. By doing so, this new classification tree can handle missing values for both modelling and prediction. The algorithm is tested against simulated and real data. The result is a classification procedure that efficiently handles missing values and produces results that are more accurate and more interpretable than most common procedures.

Capsule networks for low-data transfer learning

We propose a capsule network-based architecture for generalizing learning to new data with few examples. Using both generative and non-generative capsule networks with intermediate routing, we are able to generalize to new information over 25 times faster than a similar convolutional neural network. We train the networks on the multiMNIST dataset lacking one digit. After the networks reach their maximum accuracy, we inject 1-100 examples of the missing digit into the training set, and measure the number of batches needed to return to a comparable level of accuracy. We then discuss the improvement in low-data transfer learning that capsule networks bring, and propose future directions for capsule research.

Adaptive Mesh Refinement in Analog Mesh Computers
Rational proofs for quantum computing
Tile Low-Rank Approximation of Large-Scale Maximum Likelihood Estimation on Manycore Architectures
Notes on stable learning with piecewise-linear basis functions
Diffusion Profile for Random Band Matrices: a Short Proof
Wiener integrals with respect to the two-parameter tempered Hermite random fields
Interpenetrating Cooperative Localization in Dynamic Connected Vehicle Networks
JUNIPR: a Framework for Unsupervised Machine Learning in Particle Physics
A method of induction the distances with Hilbert structure
Nyldon words
Challenges Towards Deploying Data Intensive Scientific Applications on Extreme Heterogeneity Supercomputers
An Faster Network Motif Detection Tool
On the Structure of Unique Shortest Paths in Graphs
Cheap Non-standard Analysis and Computability
The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression
On the Performance of a Canonical Labeling for Matching Correlated Erdős-Rényi Graphs
High-Performance Massive Subgraph Counting using Pipelined Adaptive-Group Communication
Asynchronous and Distributed Tracking of Time-Varying Fixed Points
RULLS: Randomized Union of Locally Linear Subspaces for Feature Engineering
On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference
On the Structure and Scarcity of Alternating Knots
Interleaved group products
Multi Layer Sparse Coding: the Holistic Way
Adaptive MPC with Chance Constraints for FIR Systems
Fundamental Limits of Coded Linear Transform
Geometric Partitioning and Ordering Strategies for Task Mapping on Parallel Computers
Progressive Neural Networks for Image Classification
Multivariate Subjective Fiducial Inference
Off the Beaten Track: Using Deep Learning to Interpolate Between Music Genres
HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering
Intermediate Disorder regime for half-space directed polymers
On the Dual Geometry of Laplacian Eigenfunctions
Multiagent Soft Q-Learning
A Nonlinear Spectral Method for Core–Periphery Detection in Networks
Sparse Wide-Area Control of Power Systems using Data-driven Reinforcement Learning
End-to-End Multimodal Speech Recognition
Transform the Non-linear Programming Problem to the Initial-value Problem to Solve
Adaptive MPC for Iterative Tasks
Linear programming on non-compact polytopes and the Kuratowski convergence with application in economics
Location-Aware Pilot Allocation in Multi-Cell Multi-User Massive MIMO Networks
Hierarchical Density Order Embeddings
Quickest Detection of Intermittent Signals With Application to Vision Based Aircraft Detection
Memristor Crossbars with 4.5 Terabits-per-Inch-Square Density and Two Nanometer Dimension
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Capturing Capacity and Profit Gains with Base Station Sharing in mmWave Cellular Networks
An ASP Methodology for Understanding Narratives about Stereotypical Activities
Action Categorization for Computationally Improved Task Learning and Planning
Competitive Learning Enriches Learning Representation and Accelerates the Fine-tuning of CNNs
Universal Knot Diagrams
Prospects for Theranostics in Neurosurgical Technology: Empowering Confocal Laser Endomicroscopy Diagnostics via Deep Learning
Estimation of convex supports from noisy measurements
A Neural Embeddings Approach for Detecting Mobile Counterfeit Apps
A study of the limiting behavior of delayed random sums under non-identical distributions setup
GEP-MSCRA for computing the group zero-norm regularized least squares estimator
A Code Equivalence between Secure Network and Index Coding
Optimal-margin evolutionary classifier
Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees
An effective crossing minimisation heuristic based on star insertion
On the Arithmetic Complexities of Hamming Codes and Hadamard Codes
High-dimensional Penalty Selection via Minimum Description Length Principle
On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings
Large-dimensional behavior of regularized Maronna’s M-estimators of covariance matrices
Boosting LiDAR-based Semantic Labeling by Cross-Modal Training Data Generation
Mean-Field Stochastic Control with Elephant Memory in Finite and Infinite Time Horizon
Multi-tiling and equidecomposability of polytopes by lattice translates
Arithmetic of Catalan’s constant and its relatives
Invariant measures of the Milstein method for stochastic differential equations with commutative noise
On the CLT for rotations and BV functions
Integrating Local Context and Global Cohesiveness for Open Information Extraction
Empirical Best Linear Unbiased Predictors in Multivariate Nested-Error Regression Models
On Measuring the Variability of Small Area Estimators in a Multivariate Fay-Herriot Model
System Description of CITlab’s Recognition & Retrieval Engine for ICDAR2017 Competition on Information Extraction in Historical Handwritten Records
Analysis of Service-oriented Modeling Approaches for Viewpoint-specific Model-driven Development of Microservice Architecture
Pay Attention to Virality: understanding popularity of social media videos with the attention mechanism
Quantum Dynamic Programming Algorithm for DAGs. Applications for AND-OR DAG Evaluation and DAG’s Diameter Search
A mirroring formula for the interior polynomial of a bipartite graph
Auction Mechanisms in Cloud/Fog Computing Resource Allocation for Public Blockchain Networks
Near-Lossless Deep Feature Compression for Collaborative Intelligence
Short note on an open problem
Quasi Sure Central Limit Theorem
Word combinatorics for stochastic differential equations: splitting integrators
Recommending Outfits from Personal Closet
Directional Sinogram Inpainting for Limited Angle Tomography
Link and code: Fast indexing with graphs and compact regression codes
PANDA: Facilitating Usable AI Development
Symbolic Automata with Memory: a Computational Model for Complex Event Processing
Force Estimation from OCT Volumes using 3D CNNs
Post-selected Classical Query Complexity
Distributed Ledger Technology: Blockchain Compared to Directed Acyclic Graph
Turán number of theta graphs
Parametric System Identification Using Quantized Data
Time-constrained multi-agent task scheduling based on prescribed performance control
Joint Deformable Registration of Large EM Image Volumes: A Matrix Solver Approach
Deep Keyframe Detection in Human Action Videos
Structure detection of Wiener-Hammerstein systems with process noise
Efficient Multidimensional Regularization for Volterra Series Estimation
Dynamic Signal Measurements Based on Quantized Data
Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles
Corrected Empirical Bayes Confidence Region in a Multivariate Fay-Herriot Model
New properties of the Edelman-Greene bijection
Equilibrium Computation in Atomic Splittable Routing Games with Convex Cost Functions
TOA Positioning for a TDMA Localization System
About sunflowers
QuickMergesort: Practically Efficient Constant-Factor Optimal Sorting
Adaptive pooling operators for weakly labeled sound event detection
A General Analytical Solution to Impulse Response of 3-D Microfluidic Channels in Molecular Communication
Visual Data Synthesis via GAN for Zero-Shot Video Classification
On stochastic optimization methods for Monte Carlo least-squares problems
On deep speaker embeddings for text-independent speaker recognition
A remark on H1 martingales
Domain Adaptation through Synthesis for Unsupervised Person Re-identification
Subgroup identification in dose-finding trials via model-based recursive partitioning
Quantum reverse hypercontractivity: its tensorization and application to strong converses
Visual Estimation of Building Condition with Patch-level ConvNets
Identification and Inference of Network Formation Games with Misclassified Links
IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification
Detection-Tracking for Efficient Person Analysis: The DetTA Pipeline
Detection of Glottal Closure Instants using Deep Dilated Convolutional Neural Networks
Tight MMSE Bounds for the AGN Channel Under KL Divergence Constraints on the Input Distribution
Centralized Caching and Delivery of Correlated Contents over a Gaussian Broadcast Channel
AbuSniff: Automatic Detection and Defenses Against Abusive Facebook Friends
Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images
Machine Learning pipeline for discovering neuroimaging-based biomarkers in neurology and psychiatry
Big Data Analytic based on Scalable PANFIS for RFID Localization
fMRI: preprocessing, classification and pattern recognition
Rigorous Validation of Stochastic Transition Paths
Efficient and adaptive parameterized algorithms on modular decompositions
Beam Training and Data Transmission Optimization in Millimeter-Wave Vehicular Networks
Condensation in critical Cauchy Bienaymé-Galton-Watson trees
Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation
Edit Distance between Unrooted Trees in Cubic Time
Moment analysis of linear time-varying dynamical systems with renewal transitions
Dialogue Modeling Via Hash Functions: Applications to Psychotherapy
The Capacity of Private Information Retrieval with Eavesdroppers
Percolation on hyperbolic graphs
Who witnesses The Witness? Finding witnesses in The Witness is hard and sometimes impossible
The loss landscape of overparameterized neural networks