Causal Inference Under Network Interference: A Framework for Experiments on Social Networks

No man is an island, as individuals interact and influence one another daily in our society. When social influence takes place in experiments on a population of interconnected individuals, the treatment on a unit may affect the outcomes of other units, a phenomenon known as interference. This thesis develops a causal framework and inference methodology for experiments where interference takes place on a network of influence (i.e. network interference). In this framework, the network potential outcomes serve as the key quantity and flexible building blocks for causal estimands that represent a variety of primary, peer, and total treatment effects. These causal estimands are estimated via principled Bayesian imputation of missing outcomes. The theory on the unconfoundedness assumptions leading to simplified imputation highlights the importance of including relevant network covariates in the potential outcome model. Additionally, experimental designs that result in balanced covariates and sizes across treatment exposure groups further improve the causal estimate, especially by mitigating potential outcome model mis-specification. The true potential outcome model is not typically known in real-world experiments, so the best practice is to account for interference and confounding network covariates through both balanced designs and model-based imputation. A full factorial simulated experiment is formulated to demonstrate this principle by comparing performance across different randomization schemes during the design phase and estimators during the analysis phase, under varying network topology and true potential outcome models. Overall, this thesis asserts that interference is not just a nuisance for analysis but rather an opportunity for quantifying and leveraging peer effects in real-world experiments.

Deep Learning for Accelerated Reliability Analysis of Infrastructure Networks

Natural disasters can have catastrophic impacts on the functionality of infrastructure systems and cause severe physical and socio-economic losses. Given budget constraints, it is crucial to optimize decisions regarding mitigation, preparedness, response, and recovery practices for these systems. This requires accurate and efficient means to evaluate the infrastructure system reliability. While numerous research efforts have addressed and quantified the impact of natural disasters on infrastructure systems, typically using the Monte Carlo approach, they still suffer from high computational cost and, thus, are of limited applicability to large systems. This paper presents a deep learning framework for accelerating infrastructure system reliability analysis. In particular, two distinct deep neural network surrogates are constructed and studied: (1) A classifier surrogate which speeds up the connectivity determination of networks, and (2) An end-to-end surrogate that replaces a number of components such as roadway status realization, connectivity determination, and connectivity averaging. The proposed approach is applied to a simulation-based study of the two-terminal connectivity of a California transportation network subject to extreme probabilistic earthquake events. Numerical results highlight the effectiveness of the proposed approach in accelerating the transportation system two-terminal reliability analysis with extremely high prediction accuracy.

A parameterized activation function for learning fuzzy logic operations in deep neural networks

We present a deep learning architecture for learning fuzzy logic expressions. Our model uses an innovative, parameterized, differentiable activation function that can learn a number of logical operations by gradient descent. This activation function allows a neural network to determine the relationships between its input variables and provides insight into the logical significance of learned network parameters. We provide a theoretical basis for this parameterization and demonstrate its effectiveness and utility by successfully applying our model to five classification problems from the UCI Machine Learning Repository.

Generating Sentence Planning Variations for Story Telling

There has been a recent explosion in applications for dialogue interaction ranging from direction-giving and tourist information to interactive story systems. Yet the natural language generation (NLG) component for many of these systems remains largely handcrafted. This limitation greatly restricts the range of applications; it also means that it is impossible to take advantage of recent work in expressive and statistical language generation that can dynamically and automatically produce a large number of variations of given content. We propose that a solution to this problem lies in new methods for developing language generation resources. We describe the ES-Translator, a computational language generator that has previously been applied only to fables, and quantitatively evaluate the domain independence of the EST by applying it to personal narratives from weblogs. We then take advantage of recent work on language generation to create a parameterized sentence planner for story generation that provides aggregation operations, variations in discourse and in point of view. Finally, we present a user evaluation of different personal narrative retellings.

EC3: Combining Clustering and Classification for Ensemble Learning

Classification and clustering algorithms have been proved to be successful individually in different contexts. Both of them have their own advantages and limitations. For instance, although classification algorithms are more powerful than clustering methods in predicting class labels of objects, they do not perform well when there is a lack of sufficient manually labeled reliable data. On the other hand, although clustering algorithms do not produce label information for objects, they provide supplementary constraints (e.g., if two objects are clustered together, it is more likely that the same label is assigned to both of them) that one can leverage for label prediction of a set of unknown objects. Therefore, systematic utilization of both these types of algorithms together can lead to better prediction performance. In this paper, We propose a novel algorithm, called EC3 that merges classification and clustering together in order to support both binary and multi-class classification. EC3 is based on a principled combination of multiple classification and multiple clustering methods using an optimization function. We theoretically show the convexity and optimality of the problem and solve it by block coordinate descent method. We additionally propose iEC3, a variant of EC3 that handles imbalanced training data. We perform an extensive experimental analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known standalone classifiers, 5 ensemble classifiers, and 2 existing methods that merge classification and clustering) on 13 standard benchmark datasets. We show that our methods outperform other baselines for every single dataset, achieving at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than the best baseline), more resilient to noise and class imbalance than the best baseline method.

Safe Reinforcement Learning via Shielding

Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield is introduced in the traditional learning process in two alternative ways, depending on the location at which the shield is implemented. In the first one, the shield acts each time the learning agent is about to make a decision and provides a list of safe actions. In the second way, the shield is introduced after the learning agent. The shield monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. We discuss which requirements a shield must meet to preserve the convergence guarantees of the learner. Finally, we demonstrate the versatility of our approach on several challenging reinforcement learning scenarios.

Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded Modes

The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation Also, we present the results of testing neural networks architectures on H2O platform for various activation functions, stopping metrics, and other parameters of machine learning algorithm. It was demonstrated for the use case of MNIST database of handwritten digits in single-threaded mode that blind selection of these parameters can hugely increase (by 2-3 orders) the runtime without the significant increase of precision. This result can have crucial influence for optimization of available and new machine learning methods, especially for image recognition problems.

Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization

A number of online services nowadays rely upon machine learning to extract valuable information from data collected in the wild. This exposes learning algorithms to the threat of data poisoning, i.e., a coordinate attack in which a fraction of the training data is controlled by the attacker and manipulated to subvert the learning process. To date, these attacks have been devised only against a limited class of binary learning algorithms, due to the inherent complexity of the gradient-based procedure used to optimize the poisoning points (a.k.a. adversarial training examples). In this work, we first extend the definition of poisoning attacks to multiclass problems. We then propose a novel poisoning algorithm based on the idea of back-gradient optimization, i.e., to compute the gradient of interest through automatic differentiation, while also reversing the learning procedure to drastically reduce the attack complexity. Compared to current poisoning strategies, our approach is able to target a wider class of learning algorithms, trained with gradient- based procedures, including neural networks and deep learning architectures. We empirically evaluate its e ectiveness on several application examples, including spam ltering, malware detection, and handwritten digit recognition. We finally show that, similarly to adversarial test examples, adversarial training examples can also be transferred across different learning algorithms.

Multi-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning

The recently proposed Multi-Layer Convolutional Sparse Coding (ML-CSC) model, consisting of a cascade of convolutional sparse layers, provides a new interpretation of Convolutional Neural Networks (CNNs). Under this framework, the computation of the forward pass in a CNN is equivalent to a pursuit algorithm aiming to estimate the nested sparse representation vectors — or feature maps — from a given input signal. Despite having served as a pivotal connection between CNNs and sparse modeling, a deeper understanding of the ML-CSC is still lacking: there are no pursuit algorithms that can serve this model exactly, nor are there conditions to guarantee a non-empty model. While one can easily obtain signals that approximately satisfy the ML-CSC constraints, it remains unclear how to simply sample from the model and, more importantly, how one can train the convolutional filters from real data. In this work, we propose a sound pursuit algorithm for the ML-CSC model by adopting a projection approach. We provide new and improved bounds on the stability of the solution of such pursuit and we analyze different practical alternatives to implement this in practice. We show that the training of the filters is essential to allow for non-trivial signals in the model, and we derive an online algorithm to learn the dictionaries from real data, effectively resulting in cascaded sparse convolutional layers. Last, but not least, we demonstrate the applicability of the ML-CSC model for several applications in an unsupervised setting, providing competitive results. Our work represents a bridge between matrix factorization, sparse dictionary learning and sparse auto-encoders, and we analyze these connections in detail.

Design Patterns for Fusion-Based Object Retrieval

We address the task of ranking objects (such as people, blogs, or verticals) that, unlike documents, do not have direct term-based representations. To be able to match them against keyword queries, evidence needs to be amassed from documents that are associated with the given object. We present two design patterns, i.e., general reusable retrieval strategies, which are able to encompass most existing approaches from the past. One strategy combines evidence on the term level (early fusion), while the other does it on the document level (late fusion). We demonstrate the generality of these patterns by applying them to three different object retrieval tasks: expert finding, blog distillation, and vertical ranking.

Unifying DAGs and UGs

We introduce a new class of graphical models that generalizes Lauritzen-Wermuth-Frydenberg chain graphs by relaxing the semi-directed acyclity constraint so that only directed cycles are forbidden. Moreover, up to two edges are allowed between any pair of nodes. Specifically, we present local, pairwise and global Markov properties for the new graphical models and prove their equivalence.

Multi-view Low-rank Sparse Subspace Clustering

Most existing approaches address multi-view subspace clustering problem by constructing the affinity matrix on each view separately and afterwards propose how to extend spectral clustering algorithm to handle multi-view data. This paper presents an approach to multi-view subspace clustering that learns a joint subspace representation by constructing affinity matrix shared among all views. Relying on the importance of both low-rank and sparsity constraints in the construction of the affinity matrix, we introduce the objective that balances between the agreement across different views, while at the same time encourages sparsity and low-rankness of the solution. Related low-rank and sparsity constrained optimization problem is for each view solved using the alternating direction method of multipliers. Furthermore, we extend our approach to cluster data drawn from nonlinear subspaces by solving the corresponding problem in a reproducing kernel Hilbert space. The proposed algorithm outperforms state-of-the-art multi-view subspace clustering algorithms on one synthetic and four real-world datasets.

Anomaly Detection: Review and preliminary Entropy method tests

Anomalies are strange data points; they usually represent an unusual occurrence. Anomaly detection is presented from the perspective of Wireless sensor networks. Different approaches have been taken in the past, as we will see, not only to identify outliers, but also to establish the statistical properties of the different methods. The usual goal is to show that the approach is asymptotically efficient and that the metric used is unbiased or maybe biased. This project is based on a work done by [1]. The approach is based on the principle that the entropy of the data is increased when an anomalous data point is measured. The entropy of the data set is thus to be estimated. In this report however, preliminary efforts at confirming the results of [1] is presented. To estimate the entropy of the dataset, since no parametric form is assumed, the probability density function of the data set is first estimated using data split method. This estimated pdf value is then plugged-in to the entropy estimation formula to estimate the entropy of the dataset. The data (test signal) used in this report is Gaussian distributed with zero mean and variance 4. Results of pdf estimation using the k-nearest neighbour method using the entire dataset, and a data-split method are presented and compared based on how well they approximate the probability density function of a Gaussian with similar mean and variance. The number of nearest neighbours chosen for the purpose of this report is 8. This is arbitrary, but is reasonable since the number of anomalies introduced is expected to be less than this upon data-split. The data-split method is preferred and rightly so.

Gradual Learning of Deep Recurrent Neural Networks

Deep Recurrent Neural Networks (RNNs) achieve state-of-the-art results in many sequence-to-sequence tasks. However, deep RNNs are difficult to train and suffer from overfitting. We introduce a training method that trains the network gradually, and treats each layer individually, to achieve improved results in language modelling tasks. Training deep LSTM with Gradual Learning (GL) obtains perplexity of 61.7 on the Penn Treebank (PTB) corpus. As far as we know (as for the 20.05.2017), GL improves the best state-of-the-art performance by a single LSTM/RHN model on the word-level PTB dataset.

GLASS: A General Likelihood Approximate Solution Scheme
Relativized Separation of Reversible and Irreversible Space-Time Complexity Classes
Popular progression differences in vector spaces
Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank
Popular progression differences in vector spaces II
On denoising autoencoders trained to minimise binary cross-entropy
Simple Extension of Halász’s Result
Martingale approximations for random fields
Peaks on Graphs
Comfort-Aware Building Climate Control Using Distributed-Parameter Models
Noncollinear biphoton states: enormously high resource of azimuthal entanglement as it’s seen in Cartesian variables
On the largest planar graphs with everywhere positive combinatorial curvature
A Note on Exponential Inequalities in Hilbert Spaces for Spatial Processes with Applications to the Functional Kernel Regression Model
Subspace Selection to Suppress Confounding Source Domain Information in AAM Transfer Learning
The free group on n generators modulo n+u random relations as n goes to infinity
Power of Deep Learning for Channel Estimation and Signal Detection in OFDM Systems
Fixed Budget Ranking and Selection under Input Uncertainty
Randomized Quantile Residuals: an Omnibus Model Diagnostic Tool with Unified Reference Distribution
A Measure of Dependence Between Discrete and Continuous Variables
A guided intermediate resampling particle filter for inference on high dimensional systems
The cotype zeta function of $\mathbb{Z}^d$
An inexact subsampled proximal Newton-type method for large-scale machine learning
DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars
Some heterochromatic theorems for matroids
How directional mobility affects biodiversity in rock-paper-scissors models
Really? Well. Apparently Bootstrapping Improves the Performance of Sarcasm and Nastiness Classifiers for Online Dialogue
Generating Different Story Tellings from Semantic Representations of Narrative
Identifying Subjective and Figurative Language in Online Dialogue
On the Speed of an Excited Asymmetric Random Walk
The tensor rank of tensor product of two three-qubit W states is eight
Narrative Variations in a Virtual Storyteller
On the Reconstruction Risk of Convolutional Sparse Dictionary Learning
On Quasi-Energy-Spectra, Pair Correlations of Sequences and Additive Combinatorics
Regenerative processes for Poisson zero polytopes
Scaling Properties of Dynamical Localization in Monochromatically Perturbed Quantum Maps: standard map and Anderson map
A sure independence screening procedure for ultra-high dimensional partially linear additive models
Estimates of Dirichlet heat kernels for subordinate Brownian motions
Comparing Human and Machine Errors in Conversational Speech Transcription
Non-asymptotic variance bounds and deviation inequalities by optimal transport
The Green-Tao theorem for primes of the form $x^2+y^2+1$
Fast, Accurate, and Scalable Method for Sparse Coupled Matrix-Tensor Factorization
Vertex-disjoint properly edge-colored cycles in edge-colored complete graphs
Recursion for the smallest eigenvalue density of $β$-Wishart-Laguerre ensemble
Deterministic Digital Clustering of Wireless Ad Hoc Networks
Gaussian comparison and anti-concentration inequalities for norms of Gaussian random elements
On the time of first level crossing and inverse Gaussian distribution
Generalized inverse Gaussian distributions and the time of first level crossing
Discovering Sequential Patterns in Event-Based Spatio-Temporal Data by Means of Microclustering – Extended Report
American options in an imperfect market with default
Testing k-monotonicity of a discrete distribution. Application to the estimation of the number of classes in a population
On approximations for the distribution of first level crossing time
Performance Guaranteed Network Acceleration via High-Order Residual Quantization
Further Results on Size and Power of Heteroskedasticity and Autocorrelation Robust Tests, with an Application to Trend Testing
Extremal solutions to some art gallery and terminal-pairability problems
Natasha 2: Faster Non-Convex Optimization Than SGD
A Lyapunov function construction for the Douglas-Rachford operator in a non-convex setting
Setting an attention region for convolutional neural networks using region selective features, for recognition of materials within glass vessels
Neural Machine Translation Training in a Multi-Domain Scenario
Better together? Statistical learning in models made of modules
Embedding Half-Edge Graphs in Punctured Surfaces
EntiTables: Smart Assistance for Entity-Focused Tables
Beyond Outerplanarity
Machine Learning Approach for Detection of nonTor Traffic
Curriculum Learning for Multi-Task Classification of Visual Attributes
Posterior Concentration for Bayesian Regression Trees and their Ensembles
Spectral Limitations of Quadrature Rules and Generalized Spherical Designs
On the Multi-Interval Ulam-Rényi Game: for 3 lies 4 intervals suffice
On the Complexity of Instationary Gas Flows
Autoencoder with recurrent neural networks for video forgery detection
Nonlinear Fokker-Planck equations driven by Gaussian linear multiplicative noise
Aligned Drawing of Planar Graphs
Cheeger Inequalities for Submodular Transformations
Slow diffusion by Markov random flights
A doubly stochastic advection-diffusion-decay model for testing data assimilation methodologies
Affine Volterra processes
Longtime convergence of the Temperature-Accelerated Molecular Dynamics Method
Enabling Sphere Decoding for SCMA
Computation Rate Maximization for Wireless Powered Mobile-Edge Computing with Binary Computation Offloading
On Existentially Complete Triangle-free Graphs
Coulomb GANs: Provably Optimal Nash Equilibria via Potential Fields
4D Multi-atlas Label Fusion using Longitudinal Images
Improved Support Recovery Guarantees for the Group Lasso With Applications to Structural Health Monitoring
Learning in the Repeated Secretary Problem
Deployment and Trajectory Optimization for UAVs: A Quantization Theory Approach
Evolution of a Fluctuating Population in Randomly Switching Environment
Semantic Texture for Robust Dense Tracking
On $m$-Closed Graphs
Circumference of 3-connected cubic graphs
Reasoning about Fine-grained Attribute Phrases using Reference Games
Viewing Vanilla Quantum Annealing Through Spin Glasses
A lower bound on the size of an absorbing set in an arc-coloured tournament
Minimax theorems for American options in incomplete markets without time-consistency
Navigating the Data Lake with Datamaran: Automatically Extracting Structure from Log Datasets
Online Circle and Sphere Packing
Bounds on the Menu-Size of Approximately Optimal Auctions via Optimal-Transport Duality
Simultaneous Variable and Covariance Selection with the Multivariate Spike-and-Slab Lasso