Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality. We argue this is caused in part by inherent deficiencies of space partitioning, which is the underlying strategy used by almost all existing methods. We devise a new strategy that avoids partitioning the vector space and present a novel randomized algorithm that runs in time linear in dimensionality and sub-linear in the size of the dataset and takes space constant in dimensionality and linear in the size of the dataset. The proposed algorithm allows fine-grained control over accuracy and speed on a per-query basis, automatically adapts to variations in dataset density, supports dynamic updates to the dataset and is easy-to-implement. We show appealing theoretical properties and demonstrate empirically that the proposed algorithm outperforms locality-sensitivity hashing (LSH) in terms of approximation quality and speed.


Taxonomy grounded aggregation of classifiers with different label sets

We describe the problem of aggregating the label predictions of diverse classifiers using a class taxonomy. Such a taxonomy may not have been available or referenced when the individual classifiers were designed and trained, yet mapping the output labels into the taxonomy is desirable to integrate the effort spent in training the constituent classifiers. A hierarchical taxonomy representing some domain knowledge may be different from, but partially mappable to, the label sets of the individual classifiers. We present a heuristic approach and a principled graphical model to aggregate the label predictions by grounding them into the available taxonomy. Our model aggregates the labels using the taxonomy structure as constraints to find the most likely hierarchically consistent class. We experimentally validate our proposed method on image and text classification tasks.


The Multiple Filter Test for Change Point Detection in Point Processes with Weakly Dependent Life Times

Empirical point processes such as neuronal spike trains can show changes in the rate at multiple time scales. In order to test the null hypothesis of constant rate and to estimate the change points, a multiple filter test (MFT) has been proposed by Messer et al. (2014) that uses filtered derivative processes and that can be applied under the assumption of independent life times. Here we extend the MFT to point processes with ergodic life times associated with short range dependencies. For practical application, we derive local parameter estimators under m-dependence. Our simulations illustrate the necessity of incorporating correlation estimators within the MFT as well as good performance when local parameter estimators are used. We also discuss principles of estimating m in practice, suggesting that it can be preferable to neglect small serial correlations for the sake of variability reduction when using finite analysis windows in practice. Finally, we estimate m and apply the extended MFT to an empirical data set of spike trains that show serial correlations of low order.


LSTM Neural Reordering Feature for Statistical Machine Translation

Artificial neural networks are powerful models, which have been widely applied into many aspects of machine translation, such as language modeling and translation modeling. Though notable improvements have been made in these areas, the reordering problem still remains a challenge in statistical machine translations. In this paper, we present a novel neural reordering model that directly models word pairs and alignment. By utilizing LSTM recurrent neural networks, much longer context could be learned for reordering prediction. Experimental results on NIST OpenMT12 Arabic-English and Chinese-English 1000-best rescoring task show that our LSTM neural reordering feature is robust and achieves significant improvements over various baseline systems.


Inferring Interpersonal Relations in Narrative Summaries

Characterizing relationships between people is fundamental for the understanding of narratives. In this work, we address the problem of inferring the polarity of relationships between people in narrative summaries. We formulate the problem as a joint structured prediction for each narrative, and present a model that combines evidence from linguistic and semantic features, as well as features based on the structure of the social community in the text. We also provide a clustering-based approach that can exploit regularities in narrative types. e.g., learn an affinity for love-triangles in romantic stories. On a dataset of movie summaries from Wikipedia, our structured models provide more than a 30% error-reduction over a competitive baseline that considers pairs of characters in isolation.


Infinitely Many Carmichael Numbers for a Modified Miller-Rabin Prime Test

Decomposing almost complete graphs by random trees

Kernel estimation of the tail index of a right-truncated Pareto-type distribution

A new interpretation of Catalan numbers

A New Approach for Scalable Analysis of Microbial Communities

Distribution of Two-Sample Tests Based on Geometric Graphs

Optimal quantizers for probability distributions on nonhomogeneous Cantor sets

An In-place Framework for Exact and Approximate Shortest Unique Substring Queries

Robust mixture regression based on the skew t distribution

Free energy in the Potts spin glass

Resonance in orbits of plane partitions and increasing tableaux

Point distributions in compact metric spaces

T-partition systems and travel groupoids on a graph

Following the evolution of glassy states under external perturbations: the full replica symmetry breaking solution

Gaussian and Robust Kronecker Product Covariance Estimation: Existence and Uniqueness

Fractals for Kernelization Lower Bounds, With an Application to Length-Bounded Cut Problems

The Asymptotic Distribution of Symbols on Diagonals of Random Weighted Staircase Tableaux

Highly Scalable Tensor Factorization for Prediction of Drug-Protein Interaction Type

A Multi-Agent Framework for Testing Distributed Systems

A Hybrid Intelligent Model for Software Cost Estimation

Transport Maps for $β$-Matrix Models in the Multi-Cut Regime

Visibility graph motifs

Emergence of integer quantum Hall effect from chaos

Spiral Structures in the Rotor-Router Walk

Symmetric colorings of polypolyhedra

Multi-class oscillating systems of interacting neurons

Evaluating Morphological Computation in Muscle and DC-motor Driven Models of Human Hopping

Extended Conditional Independence and Applications in Causal Inference

Towards Dropout Training for Convolutional Neural Networks

On Color Preserving Automorphisms of Cayley Graphs of Odd Square-free Order

MOCICE-BCubed F$_1$: A New Evaluation Measure for Biclustering Algorithms

Minimax theory for a class of non-linear statistical inverse problems

Equivalence Classes of Staged Trees

Divide and conquer in ABC: Expectation-Progagation algorithms for likelihood-free inference

Efficient filtering of adult content using textual information

A new view of the nonlinear dielectric response of glass formers

On the number of Steiner triple systems S(2^m-1,3,2) of rank 2^m – m + 2 over GF(2)

Percolation properties in a traffic model

The distance-dependent two-point function of quadrangulations: a new derivation by direct recursion

Augmenting Phrase Table by Employing Lexicons for Pivot-based SMT

Consistency in Non-Transactional Distributed Storage Systems

Learning Using 1-Local Membership Queries

Online Budgeted Repeated Matching

Optimal Estimation and Completion of Matrices with Biclustering Structures

A Universal Closed Form For Square Matrix Powers

Vanilla Lasso for sparse classification under single index models

Pretzel Knots and q-Series

Age-aggregation bias in mortality trends

GrantMed: a new, international system for tracking grants and funding trends in the life sciences

Exact bounds on the inverse Mills ratio and its derivatives

Mean field limit for bias voter model on regular trees

Unlabeled Sensing with Random Linear Measurements

Floquet Weyl Phases in a Three Dimensional Network Model

Multilingual Language Processing From Bytes

Dynamic Parallel and Distributed Graph Cuts

New Conjectures for Union-Closed Families

A new shellability proof of an identity of Dixon

Decoding Hidden Markov Models Faster Than Viterbi Via Online Matrix-Vector (max, +)-Multiplication

Cloud Computing Avoids Downfall of Application Service Providers

Evolution arrests invasions of cooperative populations

Critical behaviour of branching diffusions in bounded domains