• Estimating the Causal Impact of Recommendation Systems from Observational Data
Recommendation systems are an increasingly prominent part of the web, accounting for up to a third of all traffic on several of the world’s most popular sites. Nevertheless, little is known about how much activity such systems actually cause over and above activity that would have occurred via other means (e.g., search) if recommendations were absent. Although the ideal way to estimate the causal impact of recommendations is via randomized experiments, such experiments are costly and may inconvenience users. In this paper, therefore, we present a method for estimating causal effects from purely observational data. Specifically, we show that causal identification through an instrumental variable is possible when a product experiences an instantaneous shock in direct traffic and the products recommended next to it do not. We then apply our method to browsing logs containing anonymized activity for 2.1 million users on Amazon.com over a 9 month period and analyze over 4,000 unique products that experience such shocks. We find that although recommendation click-throughs do account for a large fraction of traffic among these products, at least 75% of this activity would likely occur in the absence of recommendations. We conclude with a discussion about the assumptions under which the method is appropriate and caveats around extrapolating results to other products, sites, or settings.
• EdgeCentric: Anomaly Detection in Edge-Attributed Networks
Given a network with attributed edges, how can we identify anomalous behavior? Networks with edge attributes are commonplace in the real world. For example, edges in e-commerce networks often indicate how users rated products and services in terms of number of stars, and edges in online social and phonecall networks contain temporal information about when friendships were formed and when users communicated with each other — in such cases, edge attributes capture information about how the adjacent nodes interact with other entities in the network. In this paper, we aim to utilize exactly this information to discern suspicious from typical node behavior. Our work has a number of notable contributions, including (a) formulation: while most other graph-based anomaly detection works use structural graph connectivity or node information, we focus on the new problem of leveraging edge information, (b) methodology: we introduce EdgeCentric, an intuitive and scalable compression-based approach for detecting edge-attributed graph anomalies, and (c) practicality: we show that EdgeCentric successfully spots numerous such anomalies in several large, edge-attributed real-world graphs, including the Flipkart e-commerce graph with over 3 million product reviews between 1.1 million users and 545 thousand products, where it achieved 0.87 precision over the top 100 results.
• Preprocessing under uncertainty
In this work we study preprocessing for tractable problems when part of the input is unknown or uncertain. This comes up naturally if, e.g., the load of some machines or the congestion of some roads is not known far enough in advance, or if we have to regularly solve a problem over instances that are largely similar, e.g., daily airport scheduling with few charter flights. Unlike robust optimization, which also studies settings like this, our goal lies not in computing solutions that are (approximately) good for every instantiation. Rather, we seek to preprocess the known parts of the input, to speed up finding an optimal solution once the missing data is known. We present efficient algorithms that given an instance with partially uncertain input generate an instance of size polynomial in the amount of uncertain data that is equivalent for every instantiation of the unknown part. Concretely, we obtain such algorithms for Minimum Spanning Tree, Minimum Weight Matroid Basis, and Maximum Cardinality Bipartite Maxing, where respectively the weight of edges, weight of elements, and the availability of vertices is unknown for part of the input. Furthermore, we show that there are tractable problems, such as Small Connected Vertex Cover, for which one cannot hope to obtain similar results.
• Modularity Component Analysis versus Principal Component Analysis
In this paper the exact linear relation between the leading eigenvectors of the modularity matrix and the singular vectors of an uncentered data matrix is developed. Based on this analysis the concept of a modularity component is defined, and its properties are developed. It is shown that modularity component analysis can be used to cluster data similar to how traditional principal component analysis is used except that modularity component analysis does not require data centering.
• Clustering with Beta Divergences
Clustering algorithms start with a fixed divergence metric, which captures the possibly asymmetric distance between two samples. In a mixture model, the sample distribution plays the role of a divergence metric. It is often the case that the distributional assumption is not validated, which calls for an adaptive approach. We consider a richer model where the underlying distribution belongs to a parametrized exponential family, called Tweedie Models. We first show the connection between the Tweedie models and beta divergences, and derive the corresponding hard-assignment clustering algorithm. We exploit this connection to identify moment conditions and use Generalized Method of Moments(GMoM) to learn the data distribution. Based on this adaptive approach, we propose four new hard clustering algorithms and compare them to the classical k-means and DP-means on synthetic data as well as seven UCI datasets and one large gene expression dataset. We further compare the GMoM routine to an approximate maximum likelihood routine and validate the computational benefits of the GMoM approach.
• Clustering is Easy When ….What?
It is well known that most of the common clustering objectives are NP-hard to optimize. In practice, however, clustering is being routinely carried out. One approach for providing theoretical understanding of this seeming discrepancy is to come up with notions of clusterability that distinguish realistically interesting input data from worst-case data sets. The hope is that there will be clustering algorithms that are provably efficient on such ‘clusterable’ instances. This paper addresses the thesis that the computational hardness of clustering tasks goes away for inputs that one really cares about. In other words, that ‘Clustering is difficult only when it does not matter’ (the \emph{CDNM thesis} for short). I wish to present a a critical bird’s eye overview of the results published on this issue so far and to call attention to the gap between available and desirable results on this issue. A longer, more detailed version of this note is available as arXiv:1507.05307. I discuss which requirements should be met in order to provide formal support to the the CDNM thesis and then examine existing results in view of these requirements and list some significant unsolved research challenges in that direction.
• Latent Space Model for Multi-Modal Social Data
With the emergence of social networking services, researchers enjoy the increasing availability of large-scale heterogenous datasets capturing online user interactions and behaviors. Traditional analysis of techno-social systems data has focused mainly on describing either the dynamics of social interactions, or the attributes and behaviors of the users. However, overwhelming empirical evidence suggests that the two dimensions affect one another, and therefore they should be jointly modeled and analyzed in a multi-modal framework. The benefits of such an approach include the ability to build better predictive models, leveraging social network information as well as user behavioral signals. To this purpose, here we propose the Constrained Latent Space Model (CLSM), a generalized framework that combines Mixed Membership Stochastic Blockmodels (MMSB) and Latent Dirichlet Allocation (LDA) incorporating a constraint that forces the latent space to concurrently describe the multiple data modalities. We derive an efficient inference algorithm based on Variational Expectation Maximization that has a computational cost linear in the size of the network, thus making it feasible to analyze massive social datasets. We validate the proposed framework on two problems: prediction of social interactions from user attributes and behaviors, and behavior prediction exploiting network information. We perform experiments with a variety of multi-modal social systems, spanning location-based social networks (Gowalla), social media services (Instagram, Orkut), e-commerce and review sites (Amazon, Ciao), and finally citation networks (Cora). The results indicate significant improvement in prediction accuracy over state of the art methods, and demonstrate the flexibility of the proposed approach for addressing a variety of different learning problems commonly occurring with multi-modal social data.
• Social Media Analysis for Product Safety using Text Mining and Sentiment Analysis
The growing incidents of counterfeiting and associated economic and health consequences necessitate the development of active surveillance systems capable of producing timely and reliable information for all stake holders in the anti-counterfeiting fight. User generated content from social media platforms can provide early clues about product allergies, adverse events and product counterfeiting. This paper reports a work in progresswith contributions including: the development of a framework for gathering and analyzing the views and experiences of users of drug and cosmetic products using machine learning, text mining and sentiment analysis, the application of the proposed framework on Facebook comments and data from Twitter for brand analysis, and the description of how to develop a product safety lexicon and training data for modeling a machine learning classifier for drug and cosmetic product sentiment prediction. The initial brand and product comparison results signify the usefulness of text mining and sentiment analysis on social media data while the use of machine learning classifier for predicting the sentiment orientation provides a useful tool for users, product manufacturers, regulatory and enforcement agencies to monitor brand or product sentiment trends in order to act in the event of sudden or significant rise in negative sentiment.
• Designs for Generalized Linear Models
This paper reviews the design of experiments for generalised linear models, including optimal design, Bayesian design and designs for models with random effects.
• Large Enforced Sparse Non-Negative Matrix Factorization
Non-negative matrix factorization (NMF) is a common method for generating topic models from text data. NMF is widely accepted for producing good results despite its relative simplicity of implementation and ease of computation. One challenge with applying NMF to large datasets is that intermediate matrix products often become dense, stressing the memory and compute elements of a system. In this article, we investigate a simple but powerful modification of a common NMF algorithm that enforces the generation of sparse intermediate and output matrices. This method enables the application of NMF to large datasets through improved memory and compute performance. Further, we demonstrate empirically that this method of enforcing sparsity in the NMF either preserves or improves both the accuracy of the resulting topic model and the convergence rate of the underlying algorithm.
• Causal Falling Rule Lists
A causal falling rule list (CFRL) is a sequence of if-then rules that specifies heterogeneous treatment effects, where (i) the order of rules determines the treatment effect subgroup a subject belongs to, and (ii) the treatment effect decreases monotonically down the list. A given CFRL parameterizes a hierarchical bayesian regression model in which the treatment effects are incorporated as parameters, and assumed constant within model-specific subgroups. We formulate the search for the CFRL best supported by the data as a Bayesian model selection problem, where we perform a search over the space of CFRL models, and approximate the evidence for a given CFRL model using standard variational techniques. We apply CFRL to a census wage dataset to identify subgroups of differing wage inequalities between men and women.
• A Local Perspective on Community Structure in Multilayer Networks
We demonstrate how to use simple dynamical processes such as random walks to study and visualize community structure in multilayer networks.
• A General Method for Robust Bayesian Modeling
Robust Bayesian models are appealing alternatives to standard models, providing protection from data that contains outliers or other departures from the model assumptions. Historically, robust models were mostly developed on a case-by-case basis; examples include robust linear regression, robust mixture models, and bursty topic models. In this paper we develop a general approach to robust Bayesian modeling. We show how to turn an existing Bayesian model into a robust model, and then develop a generic strategy for computing with it. We use our method to study robust variants of several models, including linear regression, Poisson regression, logistic regression, and probabilistic topic models. We discuss the connections between our methods and existing approaches, especially empirical Bayes and James-Stein estimation.
• BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing
Basic Linear Algebra Subprograms (BLAS) are a set of low level linear algebra kernels widely adopted by applications involved with the deep learning and scientific computing. The massive and economic computing power brought forth by the emerging GPU architectures drives interest in implementation of compute-intensive level 3 BLAS on multi-GPU systems. In this paper, we investigate existing multi-GPU level 3 BLAS and present that 1) issues, such as the improper load balancing, inefficient communication, insufficient GPU stream level concurrency and data caching, impede current implementations from fully harnessing heterogeneous computing resources; 2) and the inter-GPU Peer-to-Peer(P2P) communication remains unexplored. We then present BLASX: a highly optimized multi-GPU level-3 BLAS. We adopt the concepts of algorithms-by-tiles treating a matrix tile as the basic data unit and operations on tiles as the basic task. Tasks are guided with a dynamic asynchronous runtime, which is cache and locality aware. The communication cost under BLASX becomes trivial as it perfectly overlaps communication and computation across multiple streams during asynchronous task progression. It also takes the current tile cache scheme one step further by proposing an innovative 2-level hierarchical tile cache, taking advantage of inter-GPU P2P communication. As a result, linear speedup is observable with BLASX under multi-GPU configurations; and the extensive benchmarks demonstrate that BLASX consistently outperforms the related leading industrial and academic projects such as cuBLAS-XT, SuperMatrix, MAGMA and PaRSEC.
• Improving the Speed of Response of Learning Algorithms Using Multiple Models
This is the first of a series of papers that the authors propose to write on the subject of improving the speed of response of learning systems using multiple models. During the past two decades, the second author has worked on numerous methods for improving the stability, robustness, and performance of adaptive systems using multiple models and the other authors have collaborated with him on some of them. Independently, they have also worked on several learning methods, and have considerable experience with their advantages and limitations. In particular, they are well aware that it is common knowledge that machine learning is in general very slow. Numerous attempts have been made by researchers to improve the speed of convergence of algorithms in different contexts. In view of the success of multiple model based methods in improving the speed of convergence in adaptive systems, the authors believe that the same approach will also prove fruitful in the domain of learning. In this paper, a first attempt is made to use multiple models for improving the speed of response of the simplest learning schemes that have been studied. i.e. Learning Automata.
• From the weak Bruhat order to crystal posets
• The site frequency spectrum for general coalescents
• Counting Restricted Dyck Paths Through Random Walks
• Mean-field limit of generalized Hawkes processes
• PERCH: Perception via Search for Multi-Object Recognition and Localization
• The mathematics of causal sets
• Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues
• Anderson mobility gap probed by dynamic coherent backscattering
• Application of Machine Learning Techniques in Human Activity Recognition
• Optimization for Gaussian Processes via Chaining
• On the Computability of AIXI
• The greedy independent set in a random graph with given degrees
• An integrative approach based on probabilistic modelling and statistical inference for morpho-statistical characterization of astronomical data
• Modern Gyrokinetic Particle-In-Cell Simulation of Fusion Plasmas on Top Supercomputers
• Void Probabilities and Cauchy-Schwarz Divergence for Generalized Labeled Multi-Bernoulli Models
• Nonparametric Bayesian posterior contraction rates for discretely observed scalar diffusions
• Metadisorder for designer light in random-walk systems
• Sparse Hanson-Wright inequalities for subgaussian quadratic forms
• The complexity of signed graph and 2-edge-coloured graph homomorphisms
• On the lattice of flats of a boolean representable simplicial complex
• Accelerometer based Activity Classification with Variational Inference on Sticky HDP-SLDS
• Confidence Sets for the Source of a Diffusion in Regular Trees
• Gathering a Closed Chain of Robots on a Grid
• Connecting the Random Connection Model
• Efficient estimators for likelihood ratio sensitivity indices of complex stochastic dynamics
• Patterns in Inversion Sequences I
• Fast Parallel Operations on Search Trees
• Piecewise-Linear Approximation for Feature Subset Selection in a Sequential Logit Model
• Bayesian Inference of Online Social Network Statistics via Lightweight Random Walk Crawls
• Fractional boxicity of graphs
• Unifying inference on brain network variations in neurological diseases: The Alzheimer’s case
• Entropy and thinning of discrete random variables
• System Descriptions of the First International Competition on Computational Models of Argumentation (ICCMA’15)
• Introduction to $k$-independent graph of a graph
• Relativised homomorphism preservation at the finite level
• Transient dynamics of pulse-coupled oscillators with nonlinear charging curves
• On the Structure of Quintic Polynomials
• Exploring the Space of Adversarial Images
• Uniform Lipschitz Property of Nonnegative Derivative Constrained B-Splines and Applications to Shape Constrained Estimation
• Poisson statistics of eigenvalues in the hierarchical Dyson model
• Privacy sets for constrained space-filling
• One-dimensional Coulomb multi-particle systems
• Algebraic geometry of Poisson regression
• Hilbert cubes in arithmetic sets
• Scalable inference for a full multivariate stochastic volatility model
• Design of Experiments for Screening
• The semiparametric Bernstein-von Mises theorem for models with symmetric error
• Spanning k-ended trees of 3-regular connected graphs
• A TV-Gaussian prior for infinite-dimensional Bayesian inverse problems and its numerical implementations
• On the characterization of flowering curves using Gaussian mixture models
• Signal Processing Structures for Solving Conservative Constraint Satisfaction Problems
• Optimization of an electromagnetics code with multicore wavefront diamond blocking and multi-dimensional intra-tile parallelization
• On subordinate random walks
• Clustering Noisy Signals with Structured Sparsity Using Time-Frequency Representation
• An introduction of the enlargement of filtration
• Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015
• Learning multi-faceted representations of individuals from heterogeneous evidence using neural networks
• Percolation and coarse conformal uniformization
• Bijections for maps with boundaries: Krikun’s formula for triangulations, and a quadrangulation analogue
• Source detection algorithms for dynamic contaminants based on the analysis of a hydrodynamic limit
• Monochromatic tree covers for multicoloured graphs
• On the behavior of diffusion processes with traps
• Sustainability-Aware Cloud Computing Using Virtual Carbon Tax
• Algebraic Conditions for Generating Accurate Adjacency Arrays
• A Historical Analysis of the Field of OR/MS using Topic Models
• On the Principal Permanent Rank Characteristic Sequences of Graphs and Digraphs
• Strong hypercontractivity and logarithmic Sobolev inequalities on stratified complex Lie groups
• Robust Non-linear Wiener-Granger Causality For Large High-dimensional Data
• The Curie-Weiss Model of SOC in Higher Dimension
• Overlapping group logistic regression with applications to genetic pathway selection
• Integrality Gaps and Approximation Algorithms for Dispersers and Bipartite Expanders
• Set families with a forbidden pattern
• Bibliometric-Enhanced Information Retrieval: 3rd International BIR Workshop
• Networks, Dynamic Factors, and the Volatility Analysis of High-Dimensional Financial Series
• Proof of a conjecture on `plateaux’ phenomenon of graph Laplacian eigenvalues
• On the topology of a boolean representable simplicial complex
• A Makespan Lower Bound for the Scheduling of the Tiled Cholesky Factorization based on ALAP scheduling
• Long time asymptotics of non-symmetric random walks on crystal lattices
• Optimal Rebalancing Frequencies for Multidimensional Portfolios
• Faster algorithms to enumerate hypergraph transversals
• Simultaneous confidence bands for contrasts between several nonlinear regression curves
• Antimagic Labelings of Weighted and Oriented Graphs
• How Important is Weight Symmetry in Backpropagation?
• A small delay and correlation time limit of stochastic differential delay equations with state-dependent colored noise
• A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks
• Design of Self-Organising Networks
• A cost function for similarity-based hierarchical clustering
• A note on the Harer-Zagier formula and the Lehman-Walsh formula
• Dynamic information routing in complex networks
• A Pfaffian formula for monomer-dimer partition functions
• Combinatorics of the two-species ASEP and Koornwinder moments
• Partial least squares for dependent data
Like this:
Like Loading...