Bidirectional deep echo state networks

In this work we propose a deep architecture for the classification of multivariate time series. By means of a recurrent and untrained reservoir we generate a vectorial representation that embeds the temporal relationships in the data. To overcome the limitations of the reservoir vanishing memory, we introduce a bidirectional reservoir, whose last state captures also the past dependencies in the input. We apply dimensionality reduction to the final reservoir states to obtain compressed fixed size representations of the time series. These are subsequently fed into a deep feedforward network, which is trained to perform the final classification. We test our architecture on benchmark datasets and on a real-world use-case of blood samples classification. Results show that our method performs better than a standard echo state network, and it can be trained much faster than a fully-trained recurrent network.

An Efficient Bayesian Robust Principal Component Regression

Principal component regression is a linear regression model with principal components as regressors. This type of modelling is particularly useful for prediction in settings with high-dimensional covariates. Surprisingly, the existing literature treating of Bayesian approaches is relatively sparse. In this paper, we aim at filling some gaps through the following practical contribution: we introduce a Bayesian approach with detailed guidelines for a straightforward implementation. The approach features two characteristics that we believe are important. First, it effectively involves the relevant principal components in the prediction process. This is achieved in two steps. The first one is model selection; the second one is to average out the predictions obtained from the selected models according to model averaging mechanisms, allowing to account for model uncertainty. The model posterior probabilities are required for model selection and model averaging. For this purpose, we include a procedure leading to an efficient reversible jump algorithm. The second characteristic of our approach is whole robustness, meaning that the impact of outliers on inference gradually vanishes as they approach plus or minus infinity. The conclusions obtained are consequently consistent with the majority of observations (the bulk of the data).

Towards Deep Learning Models for Psychological State Prediction using Smartphone Data: Challenges and Opportunities

There is an increasing interest in exploiting mobile sensing technologies and machine learning techniques for mental health monitoring and intervention. Researchers have effectively used contextual information, such as mobility, communication and mobile phone usage patterns for quantifying individuals’ mood and wellbeing. In this paper, we investigate the effectiveness of neural network models for predicting users’ level of stress by using the location information collected by smartphones. We characterize the mobility patterns of individuals using the GPS metrics presented in the literature and employ these metrics as input to the network. We evaluate our approach on the open-source StudentLife dataset. Moreover, we discuss the challenges and trade-offs involved in building machine learning models for digital mental health and highlight potential future work in this direction.

Action-Attending Graphic Neural Network

The motion analysis of human skeletons is crucial for human action recognition, which is one of the most active topics in computer vision. In this paper, we propose a fully end-to-end action-attending graphic neural network (A^2GNN) for skeleton-based action recognition, in which each irregular skeleton is structured as an undirected attribute graph. To extract high-level semantic representation from skeletons, we perform the local spectral graph filtering on the constructed attribute graphs like the standard image convolution operation. Considering not all joints are informative for action analysis, we design an action-attending layer to detect those salient action units (AUs) by adaptively weighting skeletal joints. Herein the filtering responses are parameterized into a weighting function irrelevant to the order of input nodes. To further encode continuous motion variations, the deep features learnt from skeletal graphs are gathered along consecutive temporal slices and then fed into a recurrent gated network. Finally, the spectral graph filtering, action-attending and recurrent temporal encoding are integrated together to jointly train for the sake of robust action recognition as well as the intelligibility of human actions. To evaluate our A^2GNN, we conduct extensive experiments on four benchmark skeleton-based action datasets, including the large-scale challenging NTU RGB+D dataset. The experimental results demonstrate that our network achieves the state-of-the-art performances.

Grounding Visual Explanations (Extended Abstract)

Existing models which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image. In this paper, a new model is proposed for generating explanations by utilizing localized grounding of constituent phrases in generated explanations to ensure image relevance. Specifically, we introduce a phrase-critic model to refine (re-score/re-rank) generated candidate explanations and employ a relative-attribute inspired ranking loss using ‘flipped’ phrases as negative examples for training. At test time, our phrase-critic model takes an image and a candidate explanation as input and outputs a score indicating how well the candidate explanation is grounded in the image.

High-Resolution Deep Convolutional Generative Adversarial Networks

Generative Adversarial Networks (GANs) convergence in a high-resolution setting with a computational constrain of GPU memory capacity (from 12GB to 24 GB) has been beset with difficulty due to the known lack of convergence rate stability. In order to boost network convergence of DCGAN (Deep Convolutional Generative Adversarial Networks) and achieve good-looking high-resolution results we propose a new layered network structure, HR-DCGAN, that incorporates current state-of-the-art techniques for this effect. A novel dataset, CZ Faces (CZF), containing human faces from different ethnical groups in a wide variety of illumination conditions and image resolutions is introduced. We conduct extensive experiments on CelebA and CZF.

Double Deep Machine Learning

Very important breakthroughs in data-centric machine learning algorithms led to impressive performance in transactional point applications such as detecting anger in speech, alerts from a Face Recognition system, or EKG interpretation. Non-transactional applications, e.g. medical diagnosis beyond the EKG results, require AI algorithms that integrate deeper and broader knowledge in their problem-solving capabilities, e.g. integrating knowledge about anatomy and physiology of the heart with EKG results and additional patient findings. Similarly, for military aerial interpretation, where knowledge about enemy doctrines on force composition and spread helps immensely in situation assessment beyond image recognition of individual objects. The Double Deep Learning approach advocates integrating data-centric machine self-learning techniques with machine-teaching techniques to leverage the power of both and overcome their corresponding limitations. To take AI to the next level, it is essential that we rebalance the roles of data and knowledge. Data is important but knowledge- deep and commonsense- are equally important. An initiative is proposed to build Wikipedia for Smart Machines, meaning target readers are not human, but rather smart machines. Named ReKopedia, the goal is to develop methodologies, tools, and automatic algorithms to convert humanity knowledge that we all learn in schools, universities and during our professional life into Reusable Knowledge structures that smart machines can use in their inference algorithms. Ideally, ReKopedia would be an open source shared knowledge repository similar to the well-known shared open source software code repositories. Examples in the article are based on- or inspired by- real-life non-transactional AI systems I deployed over decades of AI career that benefit hundreds of millions of people around the globe.

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

In this paper, we propose a novel deep learning architecture for multi-label zero-shot learning (ML-ZSL), which is able to predict multiple unseen class labels for each input instance. Inspired by the way humans utilize semantic knowledge between objects of interests, we propose a framework that incorporates knowledge graphs for describing the relationships between multiple labels. Our model learns an information propagation mechanism from the semantic label space, which can be applied to model the interdependencies between seen and unseen class labels. With such investigation of structured knowledge graphs for visual reasoning, we show that our model can be applied for solving multi-label classification and ML-ZSL tasks. Compared to state-of-the-art approaches, comparable or improved performances can be achieved by our method.

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

We propose a simple yet effective technique to simplify the training and the resulting model of neural networks. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-k elements (in terms of magnitude) are kept. As a result, only k rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction in the computational cost. Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications. Surprisingly, experimental results demonstrate that most of time we only need to update fewer than 5% of the weights at each back propagation pass. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The model simplification results show that we could adaptively simplify the model which could often be reduced by around 9x, without any loss on accuracy or even with improved accuracy.

A generic and fast C++ optimization framework

The development of the mlpack C++ machine learning library ( ) has required the design and implementation of a flexible, robust optimization system that is able to solve the types of arbitrary optimization problems that may arise all throughout machine learning problems. In this paper, we present the generic optimization framework that we have designed for mlpack. A key priority in the design was ease of implementation of both new optimizers and new objective functions to be optimized; therefore, implementation of a new optimizer requires only one method and implementation of a new objective function requires at most four functions. This leads to simple and intuitive code, which, for fast prototyping and experimentation, is of paramount importance. When compared to optimization frameworks of other libraries, we find that mlpack’s supports more types of objective functions, is able to make optimizations that other frameworks do not, and seamlessly supports user-defined objective functions and optimizers.

Deep Local Binary Patterns

Local Binary Pattern (LBP) is a traditional descriptor for texture analysis that gained attention in the last decade. Being robust to several properties such as invariance to illumination translation and scaling, LBPs achieved state-of-the-art results in several applications. However, LBPs are not able to capture high-level features from the image, merely encoding features with low abstraction levels. In this work, we propose Deep LBP, which borrow ideas from the deep learning community to improve LBP expressiveness. By using parametrized data-driven LBP, we enable successive applications of the LBP operators with increasing abstraction levels. We validate the relevance of the proposed idea in several datasets from a wide range of applications. Deep LBP improved the performance of traditional and multiscale LBP in all cases.

How Wrong Am I? – Studying Adversarial Examples and their Impact on Uncertainty in Gaussian Process Machine Learning Models

Machine learning models are vulnerable to adversarial examples: minor, in many cases imperceptible, perturbations to classification inputs. Among other suspected causes, adversarial examples exploit ML models that offer no well-defined indication as to how well a particular prediction is supported by training data, yet are forced to confidently extrapolate predictions in areas of high entropy. In contrast, Bayesian ML models, such as Gaussian Processes (GP), inherently model the uncertainty accompanying a prediction in the well-studied framework of Bayesian Inference. This paper is first to explore adversarial examples and their impact on uncertainty estimates for Gaussian Processes. To this end, we first present three novel attacks on Gaussian Processes: GPJM and GPFGS exploit forward derivatives in GP latent functions, and Latent Space Approximation Networks mimic the latent space representation in unsupervised GP models to facilitate attacks. Further, we show that these new attacks compute adversarial examples that transfer to non-GP classification models, and vice versa. Finally, we show that GP uncertainty estimates not only differ between adversarial examples and benign data, but also between adversarial examples computed by different algorithms.

ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation

A user can be represented as what he/she does along the history. A common way to deal with the user modeling problem is to manually extract all kinds of aggregated features over the heterogeneous behaviors, which may fail to fully represent the data itself due to limited human instinct. Recent works usually use RNN-based methods to give an overall embedding of a behavior sequence, which then could be exploited by the downstream applications. However, this can only preserve very limited information, or aggregated memories of a person. When a downstream application requires to facilitate the modeled user features, it may lose the integrity of the specific highly correlated behavior of the user, and introduce noises derived from unrelated behaviors. This paper proposes an attention based user behavior modeling framework called ATRank, which we mainly use for recommendation tasks. Heterogeneous user behaviors are considered in our model that we project all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via self-attention. Downstream applications then can use the user behavior vectors via vanilla attention. Experiments show that ATRank can achieve better performance and faster training process. We further explore ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models.

Nonparametric independence testing via mutual information

We propose a test of independence of two multivariate random vectors, given a sample from the underlying population. Our approach, which we call MINT, is based on the estimation of mutual information, whose decomposition into joint and marginal entropies facilitates the use of recently-developed efficient entropy estimators derived from nearest neighbour distances. The proposed critical values, which may be obtained from simulation (in the case where one marginal is known) or resampling, guarantee that the test has nominal size, and we provide local power analyses, uniformly over classes of densities whose mutual information satisfies a lower bound. Our ideas may be extended to provide a new goodness-of-fit tests of normal linear models based on assessing the independence of our vector of covariates and an appropriately-defined notion of an error vector. The theory is supported by numerical studies on both simulated and real data.

Accuracy of inference on the physics of binary evolution from gravitational-wave observations
Language-Based Image Editing with Recurrent Attentive Models
Network Geometry and Complexity
Chromatic Number and Dichromatic Polynomial of Digraphs
On cordial labeling of hypertrees
Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies
One Model for the Learning of Language
Grammatical facial expression recognition using customized deep neural network architecture
Neighborhood selection with application to social networks
Spatio-Temporal Motifs for Optimized Vehicle-to-Vehicle (V2V) Communications
Local Density of the Bose Glass Phase
GA-PSO-Optimized Neural-Based Control Scheme for Adaptive Congestion Control to Improve Performance in Multimedia Applications
On the Verification and Computation of Strong Nash Equilibrium
Poverty Mapping Using Convolutional Neural Networks Trained on High and Medium Resolution Satellite Images, With an Application in Mexico
Fast ordered sampling of DNA sequence variants
Estimating stationary characteristic functions of stochastic systems via semidefinite programming
Attend and Interact: Higher-Order Object Interactions for Video Understanding
Entanglement contour perspective for strong area law violation in a disordered long-range hopping model
Mosquito detection with low-cost smartphones: data acquisition for malaria research
Conditional Markov Chain Search for the Simple Plant Location Problem improves upper bounds on twelve Körkel-Ghosh instances
Student Success Prediction in MOOCs
Question Asking as Program Generation
Numerical time integration of lumped parameter systems governed by implicit constitutive relations
Grounded Objects and Interactions for Video Captioning
Adaptive active queue management controller for TCP communication networks using PSO-RBF models
Exploring the Use of Shatter for AllSAT Through Ramsey-Type Problems
3D Reconstruction of Incomplete Archaeological Objects Using a Generative Adversary Network
Free energy of bipartite spherical Sherrington–Kirkpatrick model
Mobile Video Object Detection with Temporally-Aware Feature Maps
Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries
Thoracic Disease Identification and Localization with Limited Supervision
Shape Inpainting using 3D Generative Adversarial Network and Recurrent Convolutional Networks
Improvements to context based self-supervised learning
Dimensionality Reduction on Grassmannian via Riemannian Optimization: A Generalized Perspective
Non local branching Brownians with annihilation and free boundary problems
A Unified Method for Exact Inference in Random-effects Meta-analysis via Monte Carlo Conditioning
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
An $O^*(1.84^k)$ Parameterized Algorithm for the Multiterminal Cut Problem
Average treatment effects in the presence of unknown interference
Improving Palliative Care with Deep Learning
Multi-objective risk-averse two-stage stochastic programming problems
Ubenwa: Cry-based Diagnosis of Birth Asphyxia
Training a network to attend like human drivers saves it from common but misleading loss functions
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
Vision Based Railway Track Monitoring using Deep Learning
A Resizable Mini-batch Gradient Descent based on a Randomized Weighted Majority
Towards Self-organized Large-Scale Shape Formation: A Cognitive Agent-Based Computing Approach
Multi-Objective Maximization of Monotone Submodular Functions with Cardinality Constraint
Using KL-divergence to focus Deep Visual Explanation
Generic algorithms for scheduling applications on heterogeneous multi-core platforms
Towards dense volumetric pancreas segmentation in CT using 3D fully convolutional networks
Evolution of Social Power for Opinion Dynamics Networks
Best rank-$k$ approximations for tensors: generalizing Eckart-Young
xUnit: Learning a Spatial Activation Function for Efficient Image Restoration
Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size
Renormalization of local times of super-Brownian motion
Chinese Typeface Transformation with Hierarchical Adversarial Network
A scale-dependent finite difference method for time fractional derivative relaxation type equations
A Fusion-based Gender Recognition Method Using Facial Images
Reconstruction of a random phase dynamics network from observations
Separating Style and Content for Generalized Style Transfer
An almost-linear time algorithm for uniform random spanning tree generation
Fast Recurrent Fully Convolutional Networks for Direct Perception in Autonomous Driving
Association schemes on the Schubert cells of a Grassmannian
GPI-based Secrecy Rate Maximization Beamforming Scheme for Wireless Transmission with AN-aided Directional Modulation
A unified deep artificial neural network approach to partial differential equations in complex geometries
AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding
Contributed Discussion to Computationally Efficient Multivariate Spatio-Temporal Models for High-Dimensional Count-Valued Data by Bradley et al
Local neighbourhoods for first passage percolation on the configuration model
Large Neural Network Based Detection of Apnea, Bradycardia and Desaturation Events
Optimal Index Codes via a Duality between Index Coding and Network Coding
Improved Bayesian Compression
A note on convergence of solutions of total variation regularized linear inverse problems
Win Prediction in Esports: Mixed-Rank Match Prediction in Multi-player Online Battle Arena Games
Pseudo-positive regularization for deep person re-identification
Modelling dark current and hot pixels in imaging sensors
Detecting hip fractures with radiologist-level performance using deep neural networks
Image Matters: Jointly Train Advertising CTR Model with Image Representation of Ad and User Behavior
Towards a Combinatorial proof of Gessel’s conjecture on two-sided Gamma positivity: A reduction to simple permutations
Asymptotic normality in Crump-Mode-Jagers processes: the lattice case
Classification of postoperative surgical site infections from blood measurements with missing data using recurrent neural networks
Optimal rates of linear convergence of the averaged alternating modified reflections method for two subspaces
Random walk on a randomly oriented honeycomb lattice
Multiwinner Elections with Diversity Constraints
Graph Clustering using Effective Resistance
Nonlinear oscillatory mixing in the generalized Landau scenario
Discovery of Complex Anomalous Patterns of Sexual Violence in El Salvador
Learning a Robust Representation via a Deep Network on Symmetric Positive Definite Manifolds
Liouville quantum gravity on the annulus
Cut-off phenomenon for random walks on free orthogonal quantum groups
Calibration of Distributionally Robust Empirical Optimization Models
Random affine simplexes
Learning to Play Othello with Deep Neural Networks
Cautious NMPC with Gaussian Process Dynamics for Miniature Race Cars
Dependent landmark drift: robust point set registration based on the Gaussian mixture model with a statistical shape model
On optimal coding of non-linear dynamical systems
Evolving soft locomotion in aquatic and terrestrial environments: effects of material properties and environmental transitions
Unsupervised Reverse Domain Adaption for Synthetic Medical Images via Adversarial Training
Loom: Query-aware Partitioning of Online Graphs
Bounds for the Nakamura number
Partial Truthfulness in Minimal Peer Prediction Mechanisms with Limited Knowledge
Superpixels Based Segmentation and SVM Based Classification Method to Distinguish Five Diseases from Normal Regions in Wireless Capsule Endoscopy
Depth Assisted Full Resolution Network for Single Image-based View Synthesis
On the chromatic number of almost s-stable Kneser graphs
Driven to Distraction: Self-Supervised Distractor Learning for Robust Monocular Visual Odometry in Urban Environments
Classifying optimal binary subspace codes of length 8, constant dimension 4 and minimum distance 6
Dynamic Matching: Reducing Integral Algorithms to Approximately-Maximal Fractional Algorithms
Segmenting Brain Tumors with Symmetry
Neural Motifs: Scene Graph Parsing with Global Context
The Complexity of Multiwinner Voting Rules with Variable Number of Winners
Nearly Optimal Stochastic Approximation for Online Principal Subspace Estimation
Hardening Quantum Machine Learning Against Adversaries
A Parallelizable Acceleration Framework for Packing Linear Programs
On the Existence of Densities for Functional Data and their Link to Statistical Privacy
Multiresolution and Hierarchical Analysis of Astronomical Spectroscopic Cubes using 3D Discrete Wavelet Transform
Predict Responsibly: Increasing Fairness by Learning To Defer
ADVISE: Symbolism and External Knowledge for Decoding Advertisements
Neon2: Finding Local Minima via First-Order Oracles
Self-similar growth-fragmentations as scaling limits of Markov branching processes