Model selection and structure specification in ultra-high dimensional generalised semi-varying coefficient models

In this paper, we study the model selection and structure specification for the generalised semi-varying coefficient models (GSVCMs), where the number of potential covariates is allowed to be larger than the sample size. We first propose a penalised likelihood method with the LASSO penalty function to obtain the preliminary estimates of the functional coefficients. Then, using the quadratic approximation for the local log-likelihood function and the adaptive group LASSO penalty (or the local linear approximation of the group SCAD penalty) with the help of the preliminary estimation of the functional coefficients, we introduce a novel penalised weighted least squares procedure to select the significant covariates and identify the constant coefficients among the coefficients of the selected covariates, which could thus specify the semiparametric modelling structure. The developed model selection and structure specification approach not only inherits many nice statistical properties from the local maximum likelihood estimation and nonconcave penalised likelihood method, but also computationally attractive thanks to the computational algorithm that is proposed to implement our method. Under some mild conditions, we establish the asymptotic properties for the proposed model selection and estimation procedure such as the sparsity and oracle property. We also conduct simulation studies to examine the finite sample performance of the proposed method, and finally apply the method to analyse a real data set, which leads to some interesting findings.


RATM: Recurrent Attentive Tracking Model

This work presents an attention mechanism-based neural network approach for tracking objects in video. A recurrent neural network is trained to predict the position of an object at time t+1 given a series of selective glimpses into video frames at time steps 1 to t. The proposed recurrent attentive tracking model can be trained using simple gradient-based training. Various settings are explored in experiments on artificial data to justify design choices.


WarpLDA: a Simple and Efficient O(1) Algorithm for Latent Dirichlet Allocation

Developing efficient and scalable algorithms for Latent Dirichlet Allocation (LDA) is of wide interest for many applications. Previous work has developed an O(1) Metropolis-Hastings sampling method for each token. However, the performance is far from being optimal due to random accesses to the parameter matrices and frequent cache misses. In this paper, we propose WarpLDA, a novel O(1) sampling algorithm for LDA. WarpLDA is a Metropolis-Hastings based algorithm which is designed to optimize the cache hit rate. Advantages of WarpLDA include 1) Efficiency and scalability: WarpLDA has good locality and carefully designed partition method, and can be scaled to hundreds of machines; 2) Simplicity: WarpLDA does not have any complicated modules such as alias tables, hybrid data structures, or parameter servers, making it easy to understand and implement; 3) Robustness: WarpLDA is consistently faster than other algorithms, under various settings from small-scale to massive-scale dataset and model. WarpLDA is 5-15x faster than state-of-the-art LDA samplers, implying less cost of time and money. With WarpLDA users can learn up to one million topics from hundreds of millions of documents in a few hours, at the speed of 2G tokens per second, or learn topics from small-scale datasets in seconds.


Scheduling under Linear Constraints

We introduce a parallel machine scheduling problem in which the processing times of jobs are not given in advance but are determined by a system of linear constraints. The objective is to minimize the makespan, i.e., the maximum job completion time among all feasible choices. This novel problem is motivated by various real-world application scenarios. We discuss the computational complexity and algorithms for various settings of this problem. In particular, we show that if there is only one machine with an arbitrary number of linear constraints, or there is an arbitrary number of machines with no more than two linear constraints, or both the number of machines and the number of linear constraints are fixed constants, then the problem is polynomial-time solvable via solving a series of linear programming problems. If both the number of machines and the number of constraints are inputs of the problem instance, then the problem is NP-Hard. We further propose several approximation algorithms for the latter case.


On Differentially Private Online Collaborative Recommendation Systems

In collaborative recommendation systems, privacy may be compromised, as users’ opinions are used to generate recommendations for others. In this paper, we consider an online collaborative recommendation system, and we measure users’ privacy in terms of the standard differential privacy. We give the first quantitative analysis of the trade-offs between recommendation quality and users’ privacy in such a system by showing a lower bound on the best achievable privacy for any non-trivial algorithm, and proposing a near-optimal algorithm. From our results, we find that there is actually little trade-off between recommendation quality and privacy for any non-trivial algorithm. Our results also identify the key parameters that determine the best achievable privacy.


The Singular Value Decomposition, Applications and Beyond

The singular value decomposition (SVD) is not only a classical theory in matrix computation and analysis, but also is a powerful tool in machine learning and modern data analysis. In this tutorial we first study the basic notion of SVD and then show the central role of SVD in matrices. Using majorization theory, we consider variational principles of singular values and eigenvalues. Built on SVD and a theory of symmetric gauge functions, we discuss unitarily invariant norms, which are then used to formulate general results for matrix low rank approximation. We study the subdifferentials of unitarily invariant norms. These results would be potentially useful in many machine learning problems such as matrix completion and matrix data classification. Finally, we discuss matrix low rank approximation and its recent developments such as randomized SVD, approximate matrix multiplication, CUR decomposition, and Nystrom approximation. Randomized algorithms are important approaches to large scale SVD as well as fast matrix computations.


Robust Gaussian Graphical Modeling with the Trimmed Graphical Lasso

Gaussian Graphical Models (GGMs) are popular tools for studying network structures. However, many modern applications such as gene network discovery and social interactions analysis often involve high-dimensional noisy data with outliers or heavier tails than the Gaussian distribution. In this paper, we propose the Trimmed Graphical Lasso for robust estimation of sparse GGMs. Our method guards against outliers by an implicit trimming mechanism akin to the popular Least Trimmed Squares method used for linear regression. We provide a rigorous statistical analysis of our estimator in the high-dimensional setting. In contrast, existing approaches for robust sparse GGMs estimation lack statistical guarantees. Our theoretical results are complemented by experiments on simulated and real gene expression data which further demonstrate the value of our approach.


Klout Score: Measuring Influence Across Multiple Social Networks

In this work, we present the Klout Score, an influence scoring system that assigns scores to 750 million users across 9 different social networks on a daily basis. We propose a hierarchical framework for generating an influence score for each user, by incorporating information for the user from multiple networks and communities. Over 3600 features that capture signals of influential interactions are aggregated across multiple dimensions for each user. The features are scalably generated by processing over 45 billion interactions from social networks every day, as well as by incorporating factors that indicate real world influence. Supervised models trained from labeled data determine the weights for features, and the final Klout Score is obtained by hierarchically combining communities and networks. We validate the correctness of the score by showing that users with higher scores are able to spread information more effectively in a network. Finally, we use several comparisons to other ranking systems to show that highly influential and recognizable users across different domains have high Klout scores.


A Certain Tendency Of The Database Community

We posit that striving for distributed systems that provide ‘single system image’ semantics is fundamentally flawed and at odds with how systems operate in the physical world. We realize the database as an optimization of this system: a required, essential optimization in practice that facilitates central data placement and ease of access to participants in a system. We motivate a new model of computation that is designed to address the problems of computation over ‘eventually consistent’ information in a large-scale distributed system.


First Order Probabilities For Galton-Watson Trees

Isoperimetric profiles and random walks on some permutation wreath products

Spiking Deep Networks with LIF Neurons

Rigidity hierarchy in random point fields: random polynomials and determinantal processes

Spherical 2-designs and lattices from Abelian groups

Fast Out-of-Sample Predictions for Bayesian Hierarchical Models of Latent Health States

Convexity in Tree Spaces

It-Situ Data Analysis of Protein Folding Trajectories

Node Expansions and Cuts in Gromov-hyperbolic Graphs

Counting covering cycles

Composite operators in Cubic Field Theories and Link Overlap Fluctuations in Spin-Glass Models

Representation for the Gauss-Laplace Transmutation

Duality in the Category of Andersen-Jantzen-Soergel

Bounds on the Exponential Domination Number

Subsequence Automata with Default Transitions

Bulk and Boundary Invariants for Complex Topological Insulators: From K-Theory to Physics

Fourier uniformity on subspaces

Monochromatic sums and products

Taylor schemes for rough differential equations and fractional diffusions

Scaling limits of random normal matrix processes at singular boundary points

Estimating the smoothness of a Gaussian random field from irregularly spaced data via higher-order quadratic variations

Bridging centrality and extremity: Refining empirical data depth using extreme value statistics

Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling

Absolute moments and Fourier-based probability metrics

Twisted Brauer monoids

Permutations sortable by two stacks in series

Optimal experimental designs for fMRI via circulant biased weighing designs

Height functions for amenable groups

The ‘Most informative boolean function’ conjecture holds for high noise

Nonconvex Penalization in Sparse Estimation: An Approach Based on the Bernstein Function

Existence, uniqueness and approximation for $L^p$ solutions of reflected BSDEs under weaker assumptions

Self-Organization and Heating by Inward Diffusion in Magnetospheric Plasmas

Feature-Based Diversity Optimization for Problem Instance Classification

Attention with Intention for a Neural Network Conversation Model

Enumerative geometry of elliptic curves on toric surfaces

A Smooth and Locally Sparse Estimator for Functional Linear Regression via Functional SCAD Penalty

Network structure at multiple scales via a new network statistic: the onion decomposition

There is Individualized Treatment. Why Not Individualized Inference?

The Differential Geometry of Homogeneity Spaces Across Effect Scales

Automatic Synthesis of Geometry Problems for an Intelligent Tutoring System

Learning $\ell^{0}$-Graph for Data Clustering

Lower bound of the asymptotic complexity of self-similar fractal graphs

An Evaluation of Sparse Inverse Covariance Models for Group Functional Connectivity in Molecular Imaging

Cyclic sieving and rational Catalan theory

Topologies of nodal sets of random band limited functions

Classification of edge-transitive propeller graphs

A note on the Ramsey number of even wheels versus stars

Emoticons vs. Emojis on Twitter: A Causal Inference Approach