On the classification problem for Poisson Point Processes

We study the binary classification problem for Poisson point processes, which are allowed to take values in a general metric space. The problem is tackled in two different ways: estimating nonparametricaly the intensity functions of the processes (and then plugged into a deterministic formula which expresses the regression function in terms of the intensities), and performing the classical k nearest neighbor rule by introducing a suitable distance between patterns of points. In the first approach we prove the consistency of the estimated intensity so that the rule turns out to be also consistent. For the k-NN classifier, we prove that the regression function fulfils the so called ‘Besicovitch condition’, usually required for the consistency of the classical classification rules. The theoretical findings are illustrated on simulated data, where in one case the k-NN rule outperforms the first approach.


Beyond Classification: Latent User Interests Profiling from Visual Contents Analysis

User preference profiling is an important task in modern online social networks (OSN). With the proliferation of image-centric social platforms, such as Pinterest, visual contents have become one of the most informative data streams for understanding user preferences. Traditional approaches usually treat visual content analysis as a general classification problem where one or more labels are assigned to each image. Although such an approach simplifies the process of image analysis, it misses the rich context and visual cues that play an important role in people’s perception of images. In this paper, we explore the possibilities of learning a user’s latent visual preferences directly from image contents. We propose a distance metric learning method based on Deep Convolutional Neural Networks (CNN) to directly extract similarity information from visual contents and use the derived distance metric to mine individual users’ fine-grained visual preferences. Through our preliminary experiments using data from 5,790 Pinterest users, we show that even for the images within the same category, each user possesses distinct and individually-identifiable visual preferences that are consistent over their lifetime. Our results underscore the untapped potential of finer-grained visual preference profiling in understanding users’ preferences.


GraphConnect: A Regularization Framework for Neural Networks

Deep neural networks have proved very successful in domains where large training sets are available, but when the number of training samples is small, their performance suffers from overfitting. Prior methods of reducing overfitting such as weight decay, Dropout and DropConnect are data-independent. This paper proposes a new method, GraphConnect, that is data-dependent, and is motivated by the observation that data of interest lie close to a manifold. The new method encourages the relationships between the learned decisions to resemble a graph representing the manifold structure. Essentially GraphConnect is designed to learn attributes that are present in data samples in contrast to weight decay, Dropout and DropConnect which are simply designed to make it more difficult to fit to random error or noise. Empirical Rademacher complexity is used to connect the generalization error of the neural network to spectral properties of the graph learned from the input data. This framework is used to show that GraphConnect is superior to weight decay. Experimental results on several benchmark datasets validate the theoretical analysis, and show that when the number of training samples is small, GraphConnect is able to significantly improve performance over weight decay.


Multivariate Time Series Classification Using Dynamic Time Warping Template Selection for Human Activity Recognition

Accurate and computationally efficient means for classifying human activities have been the subject of extensive research efforts. Most current research focuses on extracting complex features to achieve high classification accuracy. We propose a template selection approach based on Dynamic Time Warping, such that complex feature extraction and domain knowledge is avoided. We demonstrate the predictive capability of the algorithm on both simulated and real smartphone data.


Multilinear Subspace Clustering

In this paper we present a new model and an algorithm for unsupervised clustering of 2-D data such as images. We assume that the data comes from a union of multilinear subspaces (UOMS) model, which is a specific structured case of the much studied union of subspaces (UOS) model. For segmentation under this model, we develop Multilinear Subspace Clustering (MSC) algorithm and evaluate its performance on the YaleB and Olivietti image data sets. We show that MSC is highly competitive with existing algorithms employing the UOS model in terms of clustering performance while enjoying improvement in computational complexity.


Backbone Language Modeling for Constrained Natural Language Generation

Recent language models, especially those based on recurrent neural networks (RNNs), make it possible to generate natural language from a learned probability. Language generation has wide applications including machine translation, summarization, question answering, conversation systems, etc. Existing methods typically learn a joint probability of words conditioned on additional information, which is (either statically or dynamically) fed to RNN’s hidden layer. In many applications, we are likely to impose hard constraints on the generated texts, i.e., a particular word must appear in the sentence. Unfortunately, existing methods could not solve this problem. In this paper, we propose a backbone language model (backbone LM) for constrained language generation. Provided a specific word, our model generates previous words and future words simultaneously. In this way, the given word could appear at any position in the sentence. Experimental results show that the generated texts are coherent and fluent.


ATD: Anomalous Topic Discovery in High Dimensional Discrete Data

We propose an algorithm for detecting patterns exhibited by anomalous clusters in high dimensional discrete data. Unlike most anomaly detection (AD) methods, which detect individual anomalies, our proposed method detects groups (clusters) of anomalies; i.e. sets of points which collectively exhibit abnormal patterns. In many applications this can lead to better understanding of the nature of the atypical behavior and to identifying the sources of the anomalies. Moreover, we consider the case where the atypical patterns exhibit on only a small (salient) subset of the very high dimensional feature space. Individual AD techniques and techniques that detect anomalies using all the features typically fail to detect such anomalies, but our method can detect such instances collectively, discover the shared anomalous patterns exhibited by them, and identify the subsets of salient features. In this paper, we focus on detecting anomalous topics in a batch of text documents, developing our algorithm based on topic models. Results of our experiments show that our method can accurately detect anomalous topics and salient features (words) under each such topic in a synthetic data set and two real-world text corpora and achieves better performance compared to both standard group AD and individual AD techniques. All required code to reproduce our experiments is available from https://…/ATD.


Behavioral Modeling for Churn Prediction: Early Indicators and Accurate Predictors of Custom Defection and Loyalty

Churn prediction, or the task of identifying customers who are likely to discontinue use of a service, is an important and lucrative concern of firms in many different industries. As these firms collect an increasing amount of large-scale, heterogeneous data on the characteristics and behaviors of customers, new methods become possible for predicting churn. In this paper, we present a unified analytic framework for detecting the early warning signs of churn, and assigning a ‘Churn Score’ to each customer that indicates the likelihood that the particular individual will churn within a predefined amount of time. This framework employs a brute force approach to feature engineering, then winnows the set of relevant attributes via feature selection, before feeding the final feature-set into a suite of supervised learning algorithms. Using several terabytes of data from a large mobile phone network, our method identifies several intuitive – and a few surprising – early warning signs of churn, and our best model predicts whether a subscriber will churn with 89.4% accuracy.


Latent variable model selection for Gaussian conditional random fields

We consider the problem of learning a conditional Gaussian graphical model in the presence of latent variables. Building on recent advances in this field, we suggest a method that decomposes the parameters of a conditional Markov random field into the sum of a sparse and a low-rank matrix. We derive convergence bounds for this estimator and show that it is well-behaved in the high-dimensional regime as well as ‘sparsistent’ (i.e. capable of recovering the graph structure). We then describe a proximal gradient algorithm which is able to fit the model to thousands of variables. Through extensive simulations, we illustrate the conditions required for identifiability and show that there is a wide range of situations in which this model performs significantly better than its counterparts. We also show how this problem is relevant to some of the challenges faced by instrumental variable methods.


Collaborative Filtering for Predicting User Preferences for Organizing Objects

As service robots become more and more capable of performing useful tasks for us, there is a growing need to teach robots how we expect them to carry out these tasks. However, different users typically have their own preferences, for example with respect to arranging objects on different shelves. As many of these preferences depend on a variety of factors including personal taste, cultural background, or common sense, it is challenging for an expert to pre-program a robot in order to accommodate all potential users. At the same time, it is impractical for robots to constantly query users about how they should perform individual tasks. In this work, we present an approach to learn patterns in user preferences for the task of tidying up objects in containers, e.g., shelves or boxes. Our method builds upon the paradigm of collaborative filtering for making personalized recommendations and relies on data from different users that we gather using crowdsourcing. To deal with novel objects for which we have no data, we propose a method that compliments standard collaborative filtering by leveraging information mined from the Web. When solving a tidy-up task, we first predict pairwise object preferences of the user. Then, we subdivide the objects in containers by modeling a spectral clustering problem. Our solution is easy to update, does not require complex modeling, and improves with the amount of user data. We evaluate our approach using crowdsourcing data from over 1,200 users and demonstrate its effectiveness for two tidy-up scenarios. Additionally, we show that a real robot can reliably predict user preferences using our approach.


Kernel principal component analysis network for image classification

In order to classify the nonlinear feature with linear classifier and improve the classification accuracy, a deep learning network named kernel principal component analysis network (KPCANet) is proposed. First, mapping the data into higher space with kernel principal component analysis to make the data linearly separable. Then building a two-layer KPCANet to obtain the principal components of image. Finally, classifying the principal components with linearly classifier. Experimental results show that the proposed KPCANet is effective in face recognition, object recognition and hand-writing digits recognition, it also outperforms principal component analysis network (PCANet) generally as well. Besides, KPCANet is invariant to illumination and stable to occlusion and slight deformation.


A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction

Deep convolutional neural networks have led to breakthrough results in practical feature extraction applications. The mathematical analysis of such networks was initiated by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on semi-discrete shift-invariant wavelet frames and modulus non-linearities in each network layer, and proved translation invariance (asymptotically in the wavelet scale parameter) and deformation stability of the corresponding feature extractor. The purpose of this paper is to develop Mallat’s theory further by allowing for general convolution kernels, or in more technical parlance, general semi-discrete shift-invariant frames (including Weyl-Heisenberg, curvelet, shearlet, ridgelet, and wavelet frames) and general Lipschitz-continuous non-linearities (e.g., rectified linear units, shifted logistic sigmoids, hyperbolic tangents, and modulus functions), as well as pooling through sub-sampling, all of which can be different in different network layers. The resulting generalized network enables extraction of significantly wider classes of features than those resolved by Mallat’s wavelet-modulus scattering network. We prove deformation stability for a larger class of deformations than those considered by Mallat, and we establish a new translation invariance result which is of vertical nature in the sense of the network depth determining the amount of invariance. Moreover, our results establish that deformation stability and vertical translation invariance are guaranteed by the network structure per se rather than the specific convolution kernels and non-linearities. This offers an explanation for the tremendous success of deep convolutional neural networks in a wide variety of practical feature extraction applications. The mathematical techniques we employ are based on continuous frame theory.


Up in the Air: When Homes Meet the Web of Things

The emerging Web of Things (WoT) will comprise billions of Web-enabled objects (or ‘things’) where such objects can sense, communicate, compute and potentially actuate. WoT is essentially the embodiment of the evolution from systems linking digital documents to systems relating digital information to real-world physical items. It is widely understood that significant technical challenges exist in developing applications in the WoT environment. In this paper, we report our practical experience in the design and development of a smart home system in a WoT environment. Our system provides a layered framework for managing and sharing the information produced by physical things as well as the residents. We particularly focus on a research prototype named WITS, that helps the elderly live independently and safely in their own homes, with minimal support from the decreasing number of individuals in the working-age population. WITS enables an unobtrusive monitoring of elderly people in a real-world, inhabituated home environment, by leveraging WoT technologies in building context-aware, personalized services.


Test-Driven Development of ontologies (extended version)

Emerging ontology authoring methods to add knowledge to an ontology focus on ameliorating the validation bottleneck. The verification of the newly added axiom is still one of trying and seeing what the reasoner says, because a systematic testbed for ontology authoring is missing. We sought to address this by introducing the approach of test-driven development for ontology authoring. We specify 36 generic tests, as TBox queries and TBox axioms tested through individuals, and structure their inner workings in an `open box’-way, which cover the OWL 2 DL language features. This is implemented as a Protege plugin so that one can perform a TDD test as a black box test. We evaluated the two test approaches on their performance. The TBox queries were faster, and that effect is more pronounced the larger the ontology is. We provide a general sequence of a TDD process for ontology engineering as a foundation for a TDD methodology.


Algorithms for Provisioning Queries and Analytics

Provisioning is a technique for avoiding repeated expensive computations in what-if analysis. Given a query, an analyst formulates k hypotheticals, each retaining some of the tuples of a database instance, possibly overlapping, and she wishes to answer the query under scenarios, where a scenario is defined by a subset of the hypotheticals that are ‘turned on’. We say that a query admits compact provisioning if given any database instance and any k hypotheticals, one can create a poly-size (in k) sketch that can then be used to answer the query under any of the 2^{k} possible scenarios without accessing the original instance. In this paper, we focus on provisioning complex queries that combine relational algebra (the logical component), grouping, and statistics/analytics (the numerical component). We first show that queries that compute quantiles or linear regression (as well as simpler queries that compute count and sum/average of positive values) can be compactly provisioned to provide (multiplicative) approximate answers to an arbitrary precision. In contrast, exact provisioning for each of these statistics requires the sketch size to be exponential in k. We then establish that for any complex query whose logical component is a positive relational algebra query, as long as the numerical component can be compactly provisioned, the complex query itself can be compactly provisioned. On the other hand, introducing negation or recursion in the logical component again requires the sketch size to be exponential in k. While our positive results use algorithms that do not access the original instance after a scenario is known, we prove our lower bounds even for the case when, knowing the scenario, limited access to the instance is allowed.


Graphs of Large Girth and Surfaces of Large Systole

Correlation between graphs with an application to brain networks analysis

Limits of discrete distributions and Gibbs measures on random graphs

Information-Theoretic Bounded Rationality

Linear lambda terms as invariants of rooted trivalent maps

On Lagrangian Relaxation and Reoptimization Problems

Energy transport in the Anderson insulator

Towards optimal Takacs–Fiksel estimation

Spectral Analysis of Quasi-Cyclic Product Codes

A Tandem Fluid Network with Lévy Input in Heavy Traffic

On the Decision Tree Complexity of $k$-SUM

Constructions and Bounds for Mixed-Dimension Subspace Codes

Deep Learning for Surface Material Classification Using Haptic And Visual Information

Enumerating matroids of fixed rank

Fixed-Parameter Algorithms for Rectilinear Steiner tree and Rectilinear Traveling Salesman Problem in the plane

The 2015 Sheffield System for Transcription of Multi-Genre Broadcast Media

A variant of Strassen’s theorem: Existence of martingales within a prescribed distance

On the Way to Future’s High Energy Particle Physics Transport Code

Constrained Sampling and Counting: Universal Hashing Meets SAT Solving

A numerical scheme for space-time fractional advection-dispersion equation

Banded operational matrices for Bernstein polynomials and application to the fractional advection-dispersion equation

Growth Estimates in Positive Characteristic via Collisions

Ornstein-Uhlenbeck diffusion of hermitian and non-hermitian matrices – unexpected links

Metastability of finite state Markov chains: a recursive procedure to identify slow variables for model reduction

Towards Establishing Monotonic Searchability in Self-Stabilizing Data Structures (full version)

Inter-class orthogonal main effect plans for asymmetrical experiments

Asymptotic pricing in large financial markets

Approximations for solutions of Lévy-type stochastic differential equations

Efficient numerical methods for the random-field Ising model: Finite-size scaling, reweighting extrapolation, and computation of response functions

Holonomic gradient method for the probability content of a simplex region with a multivariate normal distribution

Protocol-dependent shear modulus of amorphous solids

Path computation in multi-layer multi-domain networks: A language theoretic approach

Paths vs. stars in the local profile of trees

Bayesian trend filtering: adaptive temporal smoothing with shrinkage priors

Inexact Krylov Subspace Algorithms for Large Matrix Exponential Eigenproblem from Dimensionality Reduction

Remote Health Coaching System and Human Motion Data Analysis for Physical Therapy with Microsoft Kinect

From Discrepancy to Majority

Information Flows? A Critique of Transfer Entropies

Investigating The Impact Of Network Effects On Content Generation: Evidence From A Large Online Student Network

Classification of weighted networks through mesoscale homological features

On the cromatic number of infinitesimal plane layer

Treewidth of grid subsets

Graphs of Edge-Intersecting and Non-Splitting One Bend Paths in a Grid

Nonexistence of embeddings with uniformly bounded distortions of Laakso graphs into diamond graphs

Distortion of embeddings of binary trees into diamond graphs

Serifos: Workload Consolidation and Load Balancing for SSD Based Cloud Storage Systems

Towards Integrated Glance To Restructuring in Combinatorial Optimization

Content-based Multi-path Routing in Structured Cyclic Overlays

Power propagation time and lower bounds for power domination number

On the Widom-Rowlinson Occupancy Fraction in Regular Graphs

Coloring graphs with two odd cycle lengths

Building and Searching a Balanced, Distributed k-d Tree with MapReduce

Revisiting Differentially Private Regression: Lessons From Learning Theory and their Consequences

Optimizing Spread of Influence in Weighted Social Networks via Partial Incentives

A path Turan problem for infinite graphs

Generalized couplings and convergence of transition probabilities

A new formula for the generating function of the numbers of simple graphs

A Polynomial-Time Approximation Scheme for The Airplane Refueling Problem

Lower Bounds for the Domination Numbers of Connected Graphs without Short Cycles

On cubic stochastic operators and processes

A Stern-type congruence for the Schroder numbers

Random walk on unipotent matrix groups

Predicting the Sentiment Polarity and Rating of Yelp Reviews

On the finite Sample Properties of Regularized M-estimators

Chinese Postman Problem on Edge-Colored Multigraphs

A Marked Cox Model for IBNR Claims: Model and Theory

Finding a low-dimensional piece of a set of integers

Online Matroid Intersection: Beating Half for Random Arrival

Monodromy and $K$-theory of Schubert Curves via Generalized Jeu de Taquin

The Limitations of Optimization from Samples

Weak Dirichlet processes with jumps

Special weak Dirichlet processes and BSDEs driven by a random measure

Using machine learning for medium frequency derivative portfolio trading

A Necessary Condition for Byzantine $k$-Set Agreement

A new robust adaptive algorithm for underwater acoustic channel equalization

meta4diag: Bayesian Bivariate Meta-analysis of Diagnostic Test Studies for Routine Practice

Bayesian bivariate meta-analysis of diagnostic test studies with interpretable priors

Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines

A new proof of Seymour’s 6-flow theorem

An integral inequality for the invariant measure of some finite dimensional stochastic differential equation

General Parity Result and Cycle-plus-Triangles Graphs

A Note on Bipartite Subgraphs and Triangle-independent Sets

Ulrich Schur bundles on flag varieties

Two perspectives of the unit area quantum sphere and their equivalence

The average velocity of self-propelled particles in a two-dimensional potential with colored noise

Discriminative Subnetworks with Regularized Spectral Learning for Global-state Network Data

Regularized Estimation of Piecewise Constant Gaussian Graphical Models: The Group-Fused Graphical Lasso

Design Principles for Scaling Multi-core OLTP Under High Contention

A central limit theorem for the spatial Lambda Fleming-Viot process with selection

Discerning Non-Stationary Market Microstructure Noise and Time-Varying Liquidity in High Frequency Data

Likelihood-based tests on linear hypotheses of large dimensional mean vectors with unequal covariance matrices

Long paths in first passage percolation on the complete graph I. Local PWIT dynamics

Long paths in first passage percolation on the complete graph II. Global branching dynamics

The Role of Race, Ethnicity, and Gender in the Congressional Cosponsorship Network

Phase-locked Patterns of the Kuramoto Model on 3-Regular Graphs