Inverse Conditional Probability Weighting with Clustered Data in Causal Inference

Estimating the average treatment causal effect in clustered data often involves dealing with unmeasured cluster-specific confounding variables. Such variables may be correlated with the measured unit covariates and outcome. When the correlations are ignored, the causal effect estimation can be biased. By utilizing sufficient statistics, we propose an inverse conditional probability weighting (ICPW) method, which is robust to both (i) the correlation between the unmeasured cluster-specific confounding variable and the covariates and (ii) the correlation between the unmeasured cluster-specific confounding variable and the outcome. Assumptions and conditions for the ICPW method are presented. We establish the asymptotic properties of the proposed estimators. Simulation studies and a case study are presented for illustration.

Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering

Answer triggering is the task of selecting the best-suited answer for a given question from a set of candidate answers if exists. In this paper, we present a hybrid deep learning model for answer triggering, which combines several dependency graph based alignment features, namely graph edit distance, graph-based similarity and dependency graph coverage, with dense vector embeddings from a Convolutional Neural Network (CNN). Our experiments on the WikiQA dataset show that such a combination can more accurately trigger a candidate answer compared to the previous state-of-the-art models. Comparative study on WikiQA dataset shows 5.86% absolute F-score improvement at the question level.

Hybrid Subspace Learning for High-Dimensional Data

The high-dimensional data setting, in which p >> n, is a challenging statistical paradigm that appears in many real-world problems. In this setting, learning a compact, low-dimensional representation of the data can substantially help distinguish signal from noise. One way to achieve this goal is to perform subspace learning to estimate a small set of latent features that capture the majority of the variance in the original data. Most existing subspace learning models, such as PCA, assume that the data can be fully represented by its embedding in one or more latent subspaces. However, in this work, we argue that this assumption is not suitable for many high-dimensional datasets; often only some variables can easily be projected to a low-dimensional space. We propose a hybrid dimensionality reduction technique in which some features are mapped to a low-dimensional subspace while others remain in the original space. Our model leads to more accurate estimation of the latent space and lower reconstruction error. We present a simple optimization procedure for the resulting biconvex problem and show synthetic data results that demonstrate the advantages of our approach over existing methods. Finally, we demonstrate the effectiveness of this method for extracting meaningful features from both gene expression and video background subtraction datasets.

Is Robustness the Cost of Accuracy? — A Comprehensive Study on the Robustness of 18 Deep Image Classification Models

The prediction accuracy has been the long-lasting and sole standard for comparing the performance of different image classification models, including the ImageNet competition. However, recent studies have highlighted the lack of robustness in well-trained deep neural networks to adversarial examples. Visually imperceptible perturbations to natural images can easily be crafted and mislead the image classifiers towards misclassification. To demystify the trade-offs between robustness and accuracy, in this paper we thoroughly benchmark 18 ImageNet models using multiple robustness metrics, including the distortion, success rate and transferability of adversarial examples between 306 pairs of models. Our extensive experimental results reveal several new insights: (1) linear scaling law – the empirical \ell_2 and \ell_\infty distortion metrics scale linearly with the logarithm of classification error; (2) model architecture is a more critical factor to robustness than model size, and the disclosed accuracy-robustness Pareto frontier can be used as an evaluation criterion for ImageNet model designers; (3) for a similar network architecture, increasing network depth slightly improves robustness in \ell_\infty distortion; (4) there exist models (in VGG family) that exhibit high adversarial transferability, while most adversarial examples crafted from one model can only be transferred within the same family. Experiment code is publicly available at \url{https://…/Adversarial_Survey}.

Mathematical Foundations of Probability Theory

In the footsteps of the book \textit{Measure Theory and Integration By and For the Learner} of our series in Probability Theory and Statistics, we intended to devote a special volume of the very probabilistic aspects of the first cited theory. The book might have assigned the title : From Measure Theory and Integration to Probability Theory. The fundamental aspects of Probability Theory, as described by the keywords and phrases below, are presented, not from experiences as in the book \textit{A Course on Elementary Probability Theory}, but from a pure mathematical view based on Measure Theory. Such an approach places Probability Theory in its natural frame of Functional Analysis and constitutes a firm preparation to the study of Random Analysis and Stochastic processes. At the same time, it offers a solid basis towards Mathematical Statistics Theory. The book will be continuously updated and improved on a yearly basis.

Logical Semantics and Commonsense Knowledge: Where Did we Go Wrong, and How to Go Forward, Again

We argue that logical semantics might have faltered due to its failure in distinguishing between two fundamentally very different types of concepts: ontological concepts, that should be types in a strongly-typed ontology, and logical concepts, that are predicates corresponding to properties of and relations between objects of various ontological types. We will then show that accounting for these differences amounts to the integration of lexical and compositional semantics in one coherent framework, and to an embedding in our logical semantics of a strongly-typed ontology that reflects our commonsense view of the world and the way we talk about it in ordinary language. We will show that in such a framework a number of challenges in natural language semantics can be adequately and systematically treated.

NIMFA: A Python Library for Nonnegative Matrix Factorization

NIMFA is an open-source Python library that provides a unified interface to nonnegative matrix factorization algorithms. It includes implementations of state-of-the-art factorization methods, initialization approaches, and quality scoring. It supports both dense and sparse matrix representation. NIMFA’s component-based implementation and hierarchical design should help the users to employ already implemented techniques or design and code new strategies for matrix factorization tasks.

Regularized matrix data clustering and its application to image analysis

In this paper, we propose a regularized mixture probabilistic model to cluster matrix data and apply it to brain signals. The approach is able to capture the sparsity (low rank, small/zero values) of the original signals by introducing regularization terms into the likelihood function. Through a modified EM algorithm, our method achieves the optimal solution with low computational cost. Theoretical results are also provided to establish the consistency of the proposed estimators. Simulations show the advantages of the proposed method over other existing methods. We also apply the approach to two real datasets from different experiments. Promising results imply that the proposed method successfully characterizes signals with different patterns while yielding insightful scientific interpretation.

Nuisance Parameters Free Changepoint Detection in Non-stationary Series

Detecting abrupt changes in the mean of a time series, so-called changepoints, is important for many applications. However, many procedures rely on the estimation of nuisance parameters (like long-run variance). Under the alternative (a change in mean), estimators might be biased and data-adaptive rules for the choice of tuning parameters might not work as expected. If the data is not stationary, but heteroscedastic, this becomes more challenging. The aim of this paper is to present and investigate two changepoint tests, which involve neither nuisance nor tuning parameters. This is achieved by combing self-normalization and wild bootstrap. We study the asymptotic behavior and show the consistency of the bootstrap under the hypothesis as well as under the alternative, assuming mild conditions on the weak dependence of the time series and allowing the variance to change over time. As a by-product of the proposed tests, a changepoint estimator is introduced and its consistency is proved. The results are illustrated through a simulation study, which demonstrates computational efficiency of the developed methods. The new tests will also be applied to real data examples from finance and hydrology.

Residual Memory Networks: Feed-forward approach to learn long temporal dependencies

Training deep recurrent neural network (RNN) architectures is complicated due to the increased network complexity. This disrupts the learning of higher order abstracts using deep RNN. In case of feed-forward networks training deep structures is simple and faster while learning long-term temporal information is not possible. In this paper we propose a residual memory neural network (RMN) architecture to model short-time dependencies using deep feed-forward layers having residual and time delayed connections. The residual connection paves way to construct deeper networks by enabling unhindered flow of gradients and the time delay units capture temporal information with shared weights. The number of layers in RMN signifies both the hierarchical processing depth and temporal depth. The computational complexity in training RMN is significantly less when compared to deep recurrent networks. RMN is further extended as bi-directional RMN (BRMN) to capture both past and future information. Experimental analysis is done on AMI corpus to substantiate the capability of RMN in learning long-term information and hierarchical information. Recognition performance of RMN trained with 300 hours of Switchboard corpus is compared with various state-of-the-art LVCSR systems. The results indicate that RMN and BRMN gains 6 % and 3.8 % relative improvement over LSTM and BLSTM networks.

Differential Private Stream Processing of Energy Consumption

A number of applications benefit from continuously releasing streams of personal data statistics. The process, however, poses significant privacy risks. Motivated by an application in energy systems, this paper presents OptStream, a novel algorithm for releasing differential private data streams. OptStream is a 4-step procedure consisting of sampling, perturbation, reconstruction, and post-processing modules. The sampling module selects a small set of points to access privately in each period of interest, the perturbation module adds noise to the sampled data points to guarantee privacy, the reconstruction module re-assembles the non-sampling data points from the perturbed sampled points, and the post-processing module uses convex optimization over the private output of the previous modules, as well as the private answers of additional queries on the data stream, to ensure consistency of the data’s salient features. OptStream is used to release a real data stream from the largest transmission operator in Europe. Experimental results show that OptStream not only improves the accuracy of the state-of-the-art by at least one order of magnitude on this application domain, but it is also able to ensure accurate load forecasting based on the private data.

A Survey on Deep Transfer Learning

As a new classification platform, deep learning has recently received increasing attention from researchers and has been successfully applied to many domains. In some domains, like bioinformatics and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation, which limits its development. Transfer learning relaxes the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. This survey focuses on reviewing the current researches of transfer learning by using deep neural network and its applications. We defined deep transfer learning, category and review the recent research works based on the techniques used in deep transfer learning.

Automated Extraction of Personal Knowledge from Smartphone Push Notifications

Personalized services are in need of a rich and powerful personal knowledge base, i.e. a knowledge base containing information about the user. This paper proposes an approach to extracting personal knowledge from smartphone push notifications, which are used by mobile systems and apps to inform users of a rich range of information. Our solution is based on the insight that most notifications are formatted using templates, while knowledge entities can be usually found within the parameters to the templates. As defining all the notification templates and their semantic rules are impractical due to the huge number of notification templates used by potentially millions of apps, we propose an automated approach for personal knowledge extraction from push notifications. We first discover notification templates through pattern mining, then use machine learning to understand the template semantics. Based on the templates and their semantics, we are able to translate notification text into knowledge facts automatically. Users’ privacy is preserved as we only need to upload the templates to the server for model training, which do not contain any personal information. According to our experiments with about 120 million push notifications from 100,000 smartphone users, our system is able to extract personal knowledge accurately and efficiently.

Towards Closing the Gap in Weakly Supervised Semantic Segmentation with DCNNs: Combining Local and Global Models
A Review of Learning with Deep Generative Models from perspective of graphical modeling
Notes On Group Distance Magicness of Product Graphs
Towards Efficient Maximum Likelihood Estimation of LPV-SS Models
Self-Attention Recurrent Network for Saliency Detection
Note: Effect of localization on mean-field density of state near jamming
Degree Growth Rates and Index Estimation in a Directed Preferential Attachment Model
Prediction in Riemannian metrics derived from divergence functions
Multi-Objective Cognitive Model: a supervised approach for multi-subject fMRI analysis
Dynamical multiple regression in function spaces, under kernel regressors, with ARH(1) errors
Graph Based Imaging for Synthetic Aperture Radar
Strongly consistent autoregressive predictors in abstract Banach spaces
Structured Adversarial Attack: Towards General Implementation and Better Interpretability
Diffusion approximations and control variates for MCMC
An inversion metric for reduced words
Homogenization of Symmetric Lévy Processes on $\mathbb{R}^d$
Model-Aided Wireless Artificial Intelligence: Embedding Expert Knowledge in Deep Neural Networks Towards Wireless Systems Optimization
Dilated Convolutions in Neural Networks for Left Atrial Segmentation in 3D Gadolinium Enhanced-MRI
3D Conceptual Design Using Deep Learning
A Multi-task Framework for Skin Lesion Detection and Segmentation
Kid on The Phone! Toward Automatic Detection of Children on Mobile Devices
Missing Value Imputation Based on Deep Generative Models
Too many secants: a hierarchical approach to secant-based dimensionality reduction on large data sets
Error Detection in a Large-Scale Lexical Taxonomy
Sampling-based randomized designs for causal inference under the potential outcomes framework
Projectively unique polytopes and toric slack ideals
Computationally efficient model selection for joint spikes and waveforms decoding
Skin Lesion Diagnosis using Ensembles, Unscaled Multi-Crop Evaluation and Loss Weighting
Effective Resource Sharing in Mobile-Cell Environments
Revisiting the simulation of quantum Turing machines by quantum circuits
The Bases of Association Rules of High Confidence
New Viewpoint and Algorithms for Water-Filling Solutions in Wireless Communications
A formula for the cohomology and $K$-class of a regular Hessenberg variety
Energy-Age Tradeoff in Status Update Communication Systems with Retransmission
A Study of Deep Feature Fusion based Methods for Classifying Multi-lead ECG
Signal Jamming Attacks Against Communication-Based Train Control: Attack Impact and Countermeasure
Liquid Pouring Monitoring via Rich Sensory Inputs
Incorporating Scalability in Unsupervised Spatio-Temporal Feature Learning
Machine Learning Phase Transition: An Iterative Methodology
Concentration bounds for empirical conditional value-at-risk: The unbounded case
Using Linguistic Cues for Analyzing Social Movements
Beyond the Central Limit Theorem: Universal and Non-universal Simulations of Random Variables by General Mappings
Deep Transfer Learning for EEG-based Brain Computer Interface
Gray-box Adversarial Training
A Flip-Syndrome-List Polar Decoder Architecture for Ultra-Low-Latency Communications
Scalability Analysis of a LoRa Network under Imperfect Orthogonality
On Optimizing Deep Convolutional Neural Networks by Evolutionary Computing
DP-Degree Colorable Hypergraphs
Spline Regression with Automatic Knot Selection
Phase Transition in Matched Formulas and a Heuristic for Biclique Satisfiability
About the Stein equation for the generalized inverse Gaussian and Kummer distributions
Solution Paths of Variational Regularization Methods for Inverse Problems
Defense Against Adversarial Attacks with Saak Transform
Inner approximation algorithm for solving linear multiobjective optimization problems
Thresholds of mixed fractional Brownian motion
Blockchain Queueing Theory
Compactness of semigroups of explosive symmetric Markov processes
The k-cube is k-representable
Linearly Precoded Rate Splitting: Optimality and Non-Optimality for MIMO Broadcast Channels
Field theory for amorphous solids
Regret Bounds for Reinforcement Learning via Markov Chain Concentration
Visual Question Generation for Class Acquisition of Unknown Objects
Girsanov formula for $G$-Brownian motion: the degenerate case
Efficient domination in regular graphs
Detailed Dense Inference with Convolutional Neural Networks via Discrete Wavelet Transform
Fourth moment theorems on the Poisson space: analytic statements via product formulae
Improving Temporal Interpolation of Head and Body Pose using Gaussian Process Regression in a Matrix Completion Setting
Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation
Beyond $1/2$-Approximation for Submodular Maximization on Massive Data Streams
An Efficient Approach to Learning Chinese Judgment Document Similarity Based on Knowledge Summarization
Generalized Port-Hamiltonian DAE Systems
Correlated time-changed Lévy Processes
Metal Artifact Reduction in Cone-Beam X-Ray CT via Ray Profile Correction
Statistical Windows in Testing for the Initial Distribution of a Reversible Markov Chain
The Contact Process on Periodic Trees
Coloured stochastic vertex models and their spectral theory
Reasoning with Justifiable Exceptions in Contextual Hierarchies (Appendix)
An Efficient Deep Reinforcement Learning Model for Urban Traffic Control
GLSE Precoders for Massive MIMO Systems: Analysis and Applications
One-Shot Coherence Distillation: The Full Story
Two Practical Random-Subcarrier-Selection Methods for Secure Precise Wireless Transmission
DeepTAM: Deep Tracking and Mapping
The Fluid Mechanics of Liquid Democracy
A bijection between ternary trees and a subclass of Motzkin paths
On the Duality and File Size Hierarchy of Fractional Repetition Codes
Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data
Error Correction Maximization for Deep Image Hashing
V-FCNN: Volumetric Fully Convolution Neural Network For Automatic Atrial Segmentation
Assessing and countering reaction attacks against post-quantum public-key cryptosystems based on QC-LDPC codes
Deep Shape Analysis on Abdominal Organs for Diabetes Prediction
A Review on Image- and Network-based Brain Data Analysis Techniques for Alzheimer’s Disease Diagnosis Reveals a Gap in Developing Predictive Methods for Prognosis
A non-linear parabolic PDE with a distributional coefficient and its applications to stochastic analysis
Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN
Super Resolution Phase Retrieval for Sparse Signals
Semi-discrete unbalanced optimal transport and quantization
A Survey on Surrogate Approaches to Non-negative Matrix Factorization
Adversarial Vision Challenge
Time-Dependent Shortest Path Queries Among Growing Discs
Stability and Throughput Analysis of Multiple Access Networks with Finite Blocklength Constraints
Idempotent Analysis, Tropical Convexity and Reduced Divisors
Hashing with Binary Matrix Pursuit
Simultaneous Edge Alignment and Learning
Mass-spring-damper Network for Distributed Averaging and Optimization
Bionic Reflex Control Strategy for Robotic Finger with Kinematic Constraints
Robust Secrecy Energy Efficient Beamforming in MISOME-SWIPT Systems With Proportional Fairness
Distributionally Robust Co-Optimization of Power Dispatch and Do-Not-Exceed Limits