• Relative Density and Exact Recovery in Heterogeneous Stochastic Block Models
• The non-equilibrium allele frequency spectrum in a Poisson random field framework
• Declarative, Secure, Convergent Edge Computation
• On the Total Number of Bends for Planar Octilinear Drawings
• On Brownian motion, simple paths, and loops
• Expressing the Indefinite Integral of the Standard Normal Probability Density Function
• Energy-Efficient Classification for Anomaly Detection: The Wireless Channel as a Helper
• On the Planar Split Thickness of Graphs
• Stability with respect to initial conditions in V-norm for nonlinear filters with ergodic observations
• Tight Bounds for Distributed Minimum-Weight Spanning Tree Verification
• Coupling stochastic EM and Approximate Bayesian Computation for parameter inference in state-space models
• Exending pseudo-arcs in odd characteristic
• Constructing minimal blocking sets using field reduction
• A linear set view on KM-arcs
• A Bayesian approach to the g-formula
• Causal and anti-causal learning in pattern recognition for neuroimaging
• From One Point to A Manifold: Orbit Models for Knowledge Graph Embedding
• Mixed eigenvalues of p-Laplacian on trees
• On the Hurwitz action in finite Coxeter groups
• Predator-prey model for the self-organisation of stochastic oscillators in dual populations
• Learning optimal nonlinearities for iterative thresholding algorithms
• A leader-election procedure using records
• Cubic Graphs with Total Domatic Number at Least Two
• Convex programming approach to robust estimation of a multivariate Gaussian model
• Entropy Chaos and Bose-Einstein Condensation
• Visible lattice points in random walks
• Comment on ‘Phase transition for quenched coupled replicas in a plaquette spin model of glasses”
• Probabilistic Analysis of the Dual Next-Fit Algorithm for Bin Covering
• Edgeworth expansion for the pre-averaging estimator
• Heath-Jarrow-Morton-Musiela equation with Lévy perturbation
• Local entanglement structure across a many-body localization transition
• Joint Image-Text News Topic Detection and Tracking with And-Or Graph Representation
• n-type Markov Branching Processes with Immigration
• Weak Local Rules for Planar Octagonal Tilings
• An Improved Global Risk Bound in Concave Regression
• Hyper-Heuristic Algorithm for Finding Efficient Features in Diagnose of Lung Cancer Disease
• Agreement-based Joint Training for Bidirectional Attention-based Neural Machine Translation
• Admissible colourings of 3-manifold triangulations for Turaev-Viro type invariants
• Graphical Exchange Mechanisms
• Noise-Compensated, Bias-Corrected Diffusion Weighted Endorectal Magnetic Resonance Imaging via a Stochastically Fully-Connected Joint Conditional Random Field Model
• Reducing Parallel Communication in Algebraic Multigrid through Sparsification
• Positivity of affine charge
• On an Edge Precoloring Conjecture
• Operators on compositions and generalized skew Pieri rules
• Inverse subspace iteration for spectral stochastic finite element methods
• KKM type theorems with boundary conditions
• On the stability of a class of non-monotonic systems of parallel queues
• Inference on the mode of weak directional signals: a Le Cam perspective on hypothesis testing near singularities
• Partial regularity of viscosity solutions for a class of Kolmogorov equations arising from mathematical finance
• $C_0$-sequentially equicontinuous semigroups on locally convex spaces
• A Quantum Theory of the Glass Transition Suggests Universality Amongst Glass Formers
• Relaxed Linearized Algorithms for Faster X-Ray CT Image Reconstruction
• Random SU(2)-symmetric spin chains
Nearly all technology platforms (e.g., web applications, mobile applications, etc.) use randomized controlled trials, or A/B tests, as a means to optimize their product offering. Such tests are generally analyzed using classical frequentist statistical measures: p-values and confidence intervals. These measures serve as a transparent, interpretable interface between the data and the user, allowing valid inference. However, these reported values cease to be valid if users make decisions based on continuous monitoring of their tests. Users try to take advantage of data as fast as it becomes available, but current testing practice prevents them from doing so while maintaining valid statistical inference. Through connections to sequential hypothesis testing, we present analogues of classical frequentist statistical measures that are always valid, regardless of when users choose to look at the test. We discuss how to optimally choose such a sequential test. We also discuss applications to bandits, and extensions to multiple hypothesis testing in the sequential setting. Finally, we discuss implementation and deployment of our approach in a large scale commercial A/B testing platform.
Training neural network language models over large vocabularies is still computationally very costly compared to count-based models such as Kneser-Ney. At the same time, neural language models are gaining popularity for many applications such as speech recognition and machine translation whose success depends on scalability. We present a systematic comparison of strategies to represent and train large vocabularies, including softmax, hierarchical softmax, target sampling, noise contrastive estimation and self normalization. We further extend self normalization to be a proper estimator of likelihood and introduce an efficient variant of softmax. We evaluate each method on three popular benchmarks, examining performance on rare words, the speed/accuracy trade-off and complementarity to Kneser-Ney.
This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird’s advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.
In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points are often belonging to the same or similar classes, and more generally, classification rules of high accuracy tend to be ‘locally simple but globally complex’, we propose data dependent dispatching that takes advantage of such structures. Our main technical contribution is to provide algorithms with provable guarantees for data-dependent dispatching, that partition the data in a way that satisfies important conditions for accurate distributed learning, including fault tolerance and balancedness. We show the effectiveness of our method over the widely used random partitioning scheme in several real world image and advertising datasets.
Domain adaptation is the supervised learning setting in which the training and test data originate from different domains: the so-called source and target domains. In this paper, we propose and study a domain adaption approach, called feature-level domain adaptation (flda), that models the dependence between two domains by means of a feature-level transfer distribution. The domain adapted classifier is trained by minimizing the expected loss under this transfer distribution. Our empirical evaluation of flda focuses on problems with binary and count features in which the domain adaptation can be naturally modeled via a dropout distribution, which allows the final classifier to adapt to the importance of specific features in the target data. Our experimental evaluation suggests that under certain conditions, flda converges to the classifier trained on the target distribution. Experiments with our domain adaptation approach on several real-world problems show that flda performs on par with state-of-the-art techniques in domain adaptation.
In this note we introduce linear regression with basis functions in order to apply Bayesian model selection. The goal is to incorporate Occam’s razor as provided by Bayes analysis in order to automatically pick the model optimally able to explain the data without overfitting.
Selecting between competing statistical models is a challenging problem especially when the competing models are non-nested. In this paper we offer a simple solution by devising an algorithm which combines MCMC and importance sampling to obtain computationally efficient estimates of the marginal likelihood which can then be used to compare the models. The algorithm is successfully applied to longitudinal epidemic and time series data sets and shown to outperform existing methods for computing the marginal likelihood.
We present new, more efficient algorithms for estimating random walk scores such as Personalized PageRank from a given source node to one or several target nodes. These scores are useful for personalized search and recommendations on networks including social networks, user-item networks, and the web. Past work has proposed using Monte Carlo or using linear algebra to estimate scores from a single source to every target, making them inefficient for a single pair. Our contribution is a new bidirectional algorithm which combines linear algebra and Monte Carlo to achieve significant speed improvements. On a diverse set of six graphs, our algorithm is 70x faster than past state-of-the-art algorithms. We also present theoretical analysis: while past algorithms require time to estimate a random walk score of typical size on an -node graph to a given constant accuracy, our algorithm requires only expected time for an average target, where is the number of edges, and is provably accurate. In addition to our core bidirectional estimator for personalized PageRank, we present an alternative algorithm for undirected graphs, a generalization to arbitrary walk lengths and Markov Chains, an algorithm for personalized search ranking, and an algorithm for sampling random paths from a given source to a given set of targets. We expect our bidirectional methods can be extended in other ways and will be useful subroutines in other graph analysis problems.