Estimating the Causal Impact of Recommendation Systems from Observational Data

Recommendation systems are an increasingly prominent part of the web, accounting for up to a third of all traffic on several of the world’s most popular sites. Nevertheless, little is known about how much activity such systems actually cause over and above activity that would have occurred via other means (e.g., search) if recommendations were absent. Although the ideal way to estimate the causal impact of recommendations is via randomized experiments, such experiments are costly and may inconvenience users. In this paper, therefore, we present a method for estimating causal effects from purely observational data. Specifically, we show that causal identification through an instrumental variable is possible when a product experiences an instantaneous shock in direct traffic and the products recommended next to it do not. We then apply our method to browsing logs containing anonymized activity for 2.1 million users on Amazon.com over a 9 month period and analyze over 4,000 unique products that experience such shocks. We find that although recommendation click-throughs do account for a large fraction of traffic among these products, at least 75% of this activity would likely occur in the absence of recommendations. We conclude with a discussion about the assumptions under which the method is appropriate and caveats around extrapolating results to other products, sites, or settings.

EdgeCentric: Anomaly Detection in Edge-Attributed Networks

Given a network with attributed edges, how can we identify anomalous behavior? Networks with edge attributes are commonplace in the real world. For example, edges in e-commerce networks often indicate how users rated products and services in terms of number of stars, and edges in online social and phonecall networks contain temporal information about when friendships were formed and when users communicated with each other — in such cases, edge attributes capture information about how the adjacent nodes interact with other entities in the network. In this paper, we aim to utilize exactly this information to discern suspicious from typical node behavior. Our work has a number of notable contributions, including (a) formulation: while most other graph-based anomaly detection works use structural graph connectivity or node information, we focus on the new problem of leveraging edge information, (b) methodology: we introduce EdgeCentric, an intuitive and scalable compression-based approach for detecting edge-attributed graph anomalies, and (c) practicality: we show that EdgeCentric successfully spots numerous such anomalies in several large, edge-attributed real-world graphs, including the Flipkart e-commerce graph with over 3 million product reviews between 1.1 million users and 545 thousand products, where it achieved 0.87 precision over the top 100 results.

Preprocessing under uncertainty

In this work we study preprocessing for tractable problems when part of the input is unknown or uncertain. This comes up naturally if, e.g., the load of some machines or the congestion of some roads is not known far enough in advance, or if we have to regularly solve a problem over instances that are largely similar, e.g., daily airport scheduling with few charter flights. Unlike robust optimization, which also studies settings like this, our goal lies not in computing solutions that are (approximately) good for every instantiation. Rather, we seek to preprocess the known parts of the input, to speed up finding an optimal solution once the missing data is known. We present efficient algorithms that given an instance with partially uncertain input generate an instance of size polynomial in the amount of uncertain data that is equivalent for every instantiation of the unknown part. Concretely, we obtain such algorithms for Minimum Spanning Tree, Minimum Weight Matroid Basis, and Maximum Cardinality Bipartite Maxing, where respectively the weight of edges, weight of elements, and the availability of vertices is unknown for part of the input. Furthermore, we show that there are tractable problems, such as Small Connected Vertex Cover, for which one cannot hope to obtain similar results.

Modularity Component Analysis versus Principal Component Analysis

In this paper the exact linear relation between the leading eigenvectors of the modularity matrix and the singular vectors of an uncentered data matrix is developed. Based on this analysis the concept of a modularity component is defined, and its properties are developed. It is shown that modularity component analysis can be used to cluster data similar to how traditional principal component analysis is used except that modularity component analysis does not require data centering.

Clustering with Beta Divergences

Clustering algorithms start with a fixed divergence metric, which captures the possibly asymmetric distance between two samples. In a mixture model, the sample distribution plays the role of a divergence metric. It is often the case that the distributional assumption is not validated, which calls for an adaptive approach. We consider a richer model where the underlying distribution belongs to a parametrized exponential family, called Tweedie Models. We first show the connection between the Tweedie models and beta divergences, and derive the corresponding hard-assignment clustering algorithm. We exploit this connection to identify moment conditions and use Generalized Method of Moments(GMoM) to learn the data distribution. Based on this adaptive approach, we propose four new hard clustering algorithms and compare them to the classical k-means and DP-means on synthetic data as well as seven UCI datasets and one large gene expression dataset. We further compare the GMoM routine to an approximate maximum likelihood routine and validate the computational benefits of the GMoM approach.

Clustering is Easy When ….What?

It is well known that most of the common clustering objectives are NP-hard to optimize. In practice, however, clustering is being routinely carried out. One approach for providing theoretical understanding of this seeming discrepancy is to come up with notions of clusterability that distinguish realistically interesting input data from worst-case data sets. The hope is that there will be clustering algorithms that are provably efficient on such ‘clusterable’ instances. This paper addresses the thesis that the computational hardness of clustering tasks goes away for inputs that one really cares about. In other words, that ‘Clustering is difficult only when it does not matter’ (the \emph{CDNM thesis} for short). I wish to present a a critical bird’s eye overview of the results published on this issue so far and to call attention to the gap between available and desirable results on this issue. A longer, more detailed version of this note is available as arXiv:1507.05307. I discuss which requirements should be met in order to provide formal support to the the CDNM thesis and then examine existing results in view of these requirements and list some significant unsolved research challenges in that direction.

Latent Space Model for Multi-Modal Social Data

With the emergence of social networking services, researchers enjoy the increasing availability of large-scale heterogenous datasets capturing online user interactions and behaviors. Traditional analysis of techno-social systems data has focused mainly on describing either the dynamics of social interactions, or the attributes and behaviors of the users. However, overwhelming empirical evidence suggests that the two dimensions affect one another, and therefore they should be jointly modeled and analyzed in a multi-modal framework. The benefits of such an approach include the ability to build better predictive models, leveraging social network information as well as user behavioral signals. To this purpose, here we propose the Constrained Latent Space Model (CLSM), a generalized framework that combines Mixed Membership Stochastic Blockmodels (MMSB) and Latent Dirichlet Allocation (LDA) incorporating a constraint that forces the latent space to concurrently describe the multiple data modalities. We derive an efficient inference algorithm based on Variational Expectation Maximization that has a computational cost linear in the size of the network, thus making it feasible to analyze massive social datasets. We validate the proposed framework on two problems: prediction of social interactions from user attributes and behaviors, and behavior prediction exploiting network information. We perform experiments with a variety of multi-modal social systems, spanning location-based social networks (Gowalla), social media services (Instagram, Orkut), e-commerce and review sites (Amazon, Ciao), and finally citation networks (Cora). The results indicate significant improvement in prediction accuracy over state of the art methods, and demonstrate the flexibility of the proposed approach for addressing a variety of different learning problems commonly occurring with multi-modal social data.

Social Media Analysis for Product Safety using Text Mining and Sentiment Analysis

The growing incidents of counterfeiting and associated economic and health consequences necessitate the development of active surveillance systems capable of producing timely and reliable information for all stake holders in the anti-counterfeiting fight. User generated content from social media platforms can provide early clues about product allergies, adverse events and product counterfeiting. This paper reports a work in progresswith contributions including: the development of a framework for gathering and analyzing the views and experiences of users of drug and cosmetic products using machine learning, text mining and sentiment analysis, the application of the proposed framework on Facebook comments and data from Twitter for brand analysis, and the description of how to develop a product safety lexicon and training data for modeling a machine learning classifier for drug and cosmetic product sentiment prediction. The initial brand and product comparison results signify the usefulness of text mining and sentiment analysis on social media data while the use of machine learning classifier for predicting the sentiment orientation provides a useful tool for users, product manufacturers, regulatory and enforcement agencies to monitor brand or product sentiment trends in order to act in the event of sudden or significant rise in negative sentiment.

Designs for Generalized Linear Models

This paper reviews the design of experiments for generalised linear models, including optimal design, Bayesian design and designs for models with random effects.

Large Enforced Sparse Non-Negative Matrix Factorization

Non-negative matrix factorization (NMF) is a common method for generating topic models from text data. NMF is widely accepted for producing good results despite its relative simplicity of implementation and ease of computation. One challenge with applying NMF to large datasets is that intermediate matrix products often become dense, stressing the memory and compute elements of a system. In this article, we investigate a simple but powerful modification of a common NMF algorithm that enforces the generation of sparse intermediate and output matrices. This method enables the application of NMF to large datasets through improved memory and compute performance. Further, we demonstrate empirically that this method of enforcing sparsity in the NMF either preserves or improves both the accuracy of the resulting topic model and the convergence rate of the underlying algorithm.

Causal Falling Rule Lists

A causal falling rule list (CFRL) is a sequence of if-then rules that specifies heterogeneous treatment effects, where (i) the order of rules determines the treatment effect subgroup a subject belongs to, and (ii) the treatment effect decreases monotonically down the list. A given CFRL parameterizes a hierarchical bayesian regression model in which the treatment effects are incorporated as parameters, and assumed constant within model-specific subgroups. We formulate the search for the CFRL best supported by the data as a Bayesian model selection problem, where we perform a search over the space of CFRL models, and approximate the evidence for a given CFRL model using standard variational techniques. We apply CFRL to a census wage dataset to identify subgroups of differing wage inequalities between men and women.

A Local Perspective on Community Structure in Multilayer Networks

We demonstrate how to use simple dynamical processes such as random walks to study and visualize community structure in multilayer networks.

A General Method for Robust Bayesian Modeling

Robust Bayesian models are appealing alternatives to standard models, providing protection from data that contains outliers or other departures from the model assumptions. Historically, robust models were mostly developed on a case-by-case basis; examples include robust linear regression, robust mixture models, and bursty topic models. In this paper we develop a general approach to robust Bayesian modeling. We show how to turn an existing Bayesian model into a robust model, and then develop a generic strategy for computing with it. We use our method to study robust variants of several models, including linear regression, Poisson regression, logistic regression, and probabilistic topic models. We discuss the connections between our methods and existing approaches, especially empirical Bayes and James-Stein estimation.

BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing

Basic Linear Algebra Subprograms (BLAS) are a set of low level linear algebra kernels widely adopted by applications involved with the deep learning and scientific computing. The massive and economic computing power brought forth by the emerging GPU architectures drives interest in implementation of compute-intensive level 3 BLAS on multi-GPU systems. In this paper, we investigate existing multi-GPU level 3 BLAS and present that 1) issues, such as the improper load balancing, inefficient communication, insufficient GPU stream level concurrency and data caching, impede current implementations from fully harnessing heterogeneous computing resources; 2) and the inter-GPU Peer-to-Peer(P2P) communication remains unexplored. We then present BLASX: a highly optimized multi-GPU level-3 BLAS. We adopt the concepts of algorithms-by-tiles treating a matrix tile as the basic data unit and operations on tiles as the basic task. Tasks are guided with a dynamic asynchronous runtime, which is cache and locality aware. The communication cost under BLASX becomes trivial as it perfectly overlaps communication and computation across multiple streams during asynchronous task progression. It also takes the current tile cache scheme one step further by proposing an innovative 2-level hierarchical tile cache, taking advantage of inter-GPU P2P communication. As a result, linear speedup is observable with BLASX under multi-GPU configurations; and the extensive benchmarks demonstrate that BLASX consistently outperforms the related leading industrial and academic projects such as cuBLAS-XT, SuperMatrix, MAGMA and PaRSEC.

Improving the Speed of Response of Learning Algorithms Using Multiple Models

This is the first of a series of papers that the authors propose to write on the subject of improving the speed of response of learning systems using multiple models. During the past two decades, the second author has worked on numerous methods for improving the stability, robustness, and performance of adaptive systems using multiple models and the other authors have collaborated with him on some of them. Independently, they have also worked on several learning methods, and have considerable experience with their advantages and limitations. In particular, they are well aware that it is common knowledge that machine learning is in general very slow. Numerous attempts have been made by researchers to improve the speed of convergence of algorithms in different contexts. In view of the success of multiple model based methods in improving the speed of convergence in adaptive systems, the authors believe that the same approach will also prove fruitful in the domain of learning. In this paper, a first attempt is made to use multiple models for improving the speed of response of the simplest learning schemes that have been studied. i.e. Learning Automata.

From the weak Bruhat order to crystal posets

The site frequency spectrum for general coalescents

Counting Restricted Dyck Paths Through Random Walks

Mean-field limit of generalized Hawkes processes

PERCH: Perception via Search for Multi-Object Recognition and Localization

The mathematics of causal sets

Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues

Anderson mobility gap probed by dynamic coherent backscattering

Application of Machine Learning Techniques in Human Activity Recognition

Optimization for Gaussian Processes via Chaining

On the Computability of AIXI

The greedy independent set in a random graph with given degrees

An integrative approach based on probabilistic modelling and statistical inference for morpho-statistical characterization of astronomical data

Modern Gyrokinetic Particle-In-Cell Simulation of Fusion Plasmas on Top Supercomputers

Void Probabilities and Cauchy-Schwarz Divergence for Generalized Labeled Multi-Bernoulli Models

Nonparametric Bayesian posterior contraction rates for discretely observed scalar diffusions

Metadisorder for designer light in random-walk systems

Sparse Hanson-Wright inequalities for subgaussian quadratic forms

The complexity of signed graph and 2-edge-coloured graph homomorphisms

On the lattice of flats of a boolean representable simplicial complex

Accelerometer based Activity Classification with Variational Inference on Sticky HDP-SLDS

Confidence Sets for the Source of a Diffusion in Regular Trees

Gathering a Closed Chain of Robots on a Grid

Connecting the Random Connection Model

Efficient estimators for likelihood ratio sensitivity indices of complex stochastic dynamics

Patterns in Inversion Sequences I

Fast Parallel Operations on Search Trees

Piecewise-Linear Approximation for Feature Subset Selection in a Sequential Logit Model

Bayesian Inference of Online Social Network Statistics via Lightweight Random Walk Crawls

Fractional boxicity of graphs

Unifying inference on brain network variations in neurological diseases: The Alzheimer’s case

Entropy and thinning of discrete random variables

System Descriptions of the First International Competition on Computational Models of Argumentation (ICCMA’15)

Introduction to $k$-independent graph of a graph

Relativised homomorphism preservation at the finite level

Transient dynamics of pulse-coupled oscillators with nonlinear charging curves

On the Structure of Quintic Polynomials

Exploring the Space of Adversarial Images

Uniform Lipschitz Property of Nonnegative Derivative Constrained B-Splines and Applications to Shape Constrained Estimation

Poisson statistics of eigenvalues in the hierarchical Dyson model

Privacy sets for constrained space-filling

One-dimensional Coulomb multi-particle systems

Algebraic geometry of Poisson regression

Hilbert cubes in arithmetic sets

Scalable inference for a full multivariate stochastic volatility model

Design of Experiments for Screening

The semiparametric Bernstein-von Mises theorem for models with symmetric error

Spanning k-ended trees of 3-regular connected graphs

A TV-Gaussian prior for infinite-dimensional Bayesian inverse problems and its numerical implementations

On the characterization of flowering curves using Gaussian mixture models

Signal Processing Structures for Solving Conservative Constraint Satisfaction Problems

Optimization of an electromagnetics code with multicore wavefront diamond blocking and multi-dimensional intra-tile parallelization

On subordinate random walks

Clustering Noisy Signals with Structured Sparsity Using Time-Frequency Representation

An introduction of the enlargement of filtration

Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015

Learning multi-faceted representations of individuals from heterogeneous evidence using neural networks

Percolation and coarse conformal uniformization

Bijections for maps with boundaries: Krikun’s formula for triangulations, and a quadrangulation analogue

Source detection algorithms for dynamic contaminants based on the analysis of a hydrodynamic limit

Monochromatic tree covers for multicoloured graphs

On the behavior of diffusion processes with traps

Sustainability-Aware Cloud Computing Using Virtual Carbon Tax

Algebraic Conditions for Generating Accurate Adjacency Arrays

A Historical Analysis of the Field of OR/MS using Topic Models

On the Principal Permanent Rank Characteristic Sequences of Graphs and Digraphs

Strong hypercontractivity and logarithmic Sobolev inequalities on stratified complex Lie groups

Robust Non-linear Wiener-Granger Causality For Large High-dimensional Data

The Curie-Weiss Model of SOC in Higher Dimension

Overlapping group logistic regression with applications to genetic pathway selection

Integrality Gaps and Approximation Algorithms for Dispersers and Bipartite Expanders

Set families with a forbidden pattern

Bibliometric-Enhanced Information Retrieval: 3rd International BIR Workshop

Networks, Dynamic Factors, and the Volatility Analysis of High-Dimensional Financial Series

Proof of a conjecture on `plateaux’ phenomenon of graph Laplacian eigenvalues

On the topology of a boolean representable simplicial complex

A Makespan Lower Bound for the Scheduling of the Tiled Cholesky Factorization based on ALAP scheduling

Long time asymptotics of non-symmetric random walks on crystal lattices

Optimal Rebalancing Frequencies for Multidimensional Portfolios

Faster algorithms to enumerate hypergraph transversals

Simultaneous confidence bands for contrasts between several nonlinear regression curves

Antimagic Labelings of Weighted and Oriented Graphs

How Important is Weight Symmetry in Backpropagation?

A small delay and correlation time limit of stochastic differential delay equations with state-dependent colored noise

A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks

Design of Self-Organising Networks

A cost function for similarity-based hierarchical clustering

A note on the Harer-Zagier formula and the Lehman-Walsh formula

Dynamic information routing in complex networks

A Pfaffian formula for monomer-dimer partition functions

Combinatorics of the two-species ASEP and Koornwinder moments

Partial least squares for dependent data