Spectral Learning for Supervised Topic Models

Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on variational approximation or Monte Carlo sampling, which often suffers from the local minimum defect. Spectral methods have been applied to learn unsupervised topic models, such as latent Dirichlet allocation (LDA), with provable guarantees. This paper investigates the possibility of applying spectral methods to recover the parameters of supervised LDA (sLDA). We first present a two-stage spectral method, which recovers the parameters of LDA followed by a power update method to recover the regression model parameters. Then, we further present a single-phase spectral algorithm to jointly recover the topic distribution matrix as well as the regression weights. Our spectral algorithms are provably correct and computationally efficient. We prove a sample complexity bound for each algorithm and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on synthetic and real-world datasets verify the theory and demonstrate the practical effectiveness of the spectral algorithms. In fact, our results on a large-scale review rating dataset demonstrate that our single-phase spectral algorithm alone gets comparable or even better performance than state-of-the-art methods, while previous work on spectral methods has rarely reported such promising performance.


Scaling up Dynamic Topic Models

Dynamic topic models (DTMs) are very effective in discovering topics and capturing their evolution trends in time series data. To do posterior inference of DTMs, existing methods are all batch algorithms that scan the full dataset before each update of the model and make inexact variational approximations with mean-field assumptions. Due to a lack of a more scalable inference algorithm, despite the usefulness, DTMs have not captured large topic dynamics. This paper fills this research void, and presents a fast and parallelizable inference algorithm using Gibbs Sampling with Stochastic Gradient Langevin Dynamics that does not make any unwarranted assumptions. We also present a Metropolis-Hastings based O(1) sampler for topic assignments for each word token. In a distributed environment, our algorithm requires very little communication between workers during sampling (almost embarrassingly parallel) and scales up to large-scale applications. We are able to learn the largest Dynamic Topic Model to our knowledge, and learned the dynamics of 1,000 topics from 2.6 million documents in less than half an hour, and our empirical results show that our algorithm is not only orders of magnitude faster than the baselines but also achieves lower perplexity.


Contextual LSTM (CLSTM) models for Large scale NLP tasks

Documents exhibit sequential structure at multiple levels of abstraction (e.g., sentences, paragraphs, sections). These abstractions constitute a natural hierarchy for representing the context in which to infer the meaning of words and larger fragments of text. In this paper, we present CLSTM (Contextual LSTM), an extension of the recurrent neural network LSTM (Long-Short Term Memory) model, where we incorporate contextual features (e.g., topics) into the model. We evaluate CLSTM on three specific NLP tasks: word prediction, next sentence selection, and sentence topic prediction. Results from experiments run on two corpora, English documents in Wikipedia and a subset of articles from a recent snapshot of English Google News, indicate that using both words and topics as features improves performance of the CLSTM models over baseline LSTM models for these tasks. For example on the next sentence selection task, we get relative accuracy improvements of 21% for the Wikipedia dataset and 18% for the Google News dataset. This clearly demonstrates the significant benefit of using context appropriately in natural language (NL) tasks. This has implications for a wide variety of NL applications like question answering, sentence completion, paraphrase generation, and next utterance prediction in dialog systems.


Ramsey numbers for bipartite graphs with small bandwidth

Almost all primes have a multiple of small Hamming weight

Revise Saturated Activation Functions

A Poisson process model for Monte Carlo

A Nonparametric Framework for Quantifying Generative Inference on Neuromorphic Systems

Symmetry in the Green’s function for birth-death chains

Effective resistances in finite and infinite grids via Foster’s formulas

Achieving both High Energy Efficiency and High Performance in On-Chip Communication using Hierarchical Rings with Deflection Routing

Sequence-to-Sequence RNNs for Text Summarization

Generalized Gaussian Mechanism for Differential Privacy

Sampling latent states for high-dimensional non-linear state space models with the embedded HMM method

Structured Sparse Regression via Greedy Hard-Thresholding

An accelerated exponential time integrator for semi-linear stochastic strongly damped wave equation with additive noise

Strong Backdoors for Default Logic

First-order Methods for Geodesically Convex Optimization

Uniresolution representations of white-matter data from CoCoMac

Binarizations in Random Number Generation

On Training Bi-directional Neural Network Language Model with Noise Contrastive Estimation

Aging in Metropolis dynamics of the REM: a proof

Self-organisation in cellular automata with coalescent particles: Qualitative and quantitative approaches

Density analysis of non-Markovian BSDEs and applications to biology and finance

Quantifying noisy attractors: from heteroclinic to excitable networks

Ordonnancement d’entités pour la rencontre du web des documents et du web des données

Gaussian polytopes: a cumulant-based approach

Sublinear Random Access Generators for Preferential Attachment Graphs

Competitive Path Computation and Function Placement in SDNs

A Constant Approximation Algorithm for Scheduling Packets on Line Networks

Node-By-Node Greedy Deep Learning for Interpretable Features

The Register Function and Reductions of Binary Trees and Lattice Paths

Nonparametric Functional Calibration of Computer Models

Mean and variance of balanced Pólya urns

Every positive integer is a sum of three palindromes

Syncronization and functional central limit theorems for interacting reinforced random walks

Efficient computation of Sobol’ indices for stochastic models

GAP Safe Screening Rules for Sparse-Group-Lasso

Sparse Signal Detection with Compressive Measurements via Partial Support Set Estimation

A Mutual Contamination Analysis of Mixed Membership and Partial Label Models

Optimal synchronization of directed complex networks

Classical solution to a multidimensional stochastic Burgers equation

Interlacing families and the Hermitian spectral norm of digraphs

Semi-parametric Order-based Generalized Multivariate Regression

Sorting With Forbidden Intermediates

Learning to SMILE(S)

Random walk in the low disorder ballistic regime