A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem

Search is a central problem in artificial intelligence, and BFS and DFS the two most fundamental ways to search. In this report we derive results for average BFS and DFS runtime: For tree search, we employ a probabilistic model of goal distribution; for graph search, the analysis depends on an additional statistic of path redundancy and average branching factor. As an application, we use the results on two concrete grammar problems. The runtime estimates can be used to select the faster out of BFS and DFS for a given problem, and may form the basis for further analysis of more advanced search methods. Finally, we verify our results experimentally; the analytical approximations come surprisingly close to empirical reality.

Finite Dictionary Variants of the Diffusion KLMS Algorithm

The diffusion based distributed learning approaches have been found to be a viable solution for learning over linearly separable datasets over a network. However, approaches till date are suitable for linearly separable datasets and need to be extended to scenarios in which we need to learn a non-linearity. In such scenarios, the recently proposed diffusion kernel least mean squares (KLMS) has been found to be performing better than diffusion least mean squares (LMS). The drawback of diffusion KLMS is that it requires infinite storage for observations (also called dictionary). This paper formulates the diffusion KLMS in a fixed budget setting such that the storage requirement is curtailed while maintaining appreciable performance in terms of convergence. Simulations have been carried out to validate the two newly proposed algorithms named as quantised diffusion KLMS (QDKLMS) and fixed budget diffusion KLMS (FBDKLMS) against KLMS, which indicate that both the proposed algorithms deliver better performance as compared to the KLMS while reducing the dictionary size storage requirement.

H-NND: A New Member of the In-Tree (IT) Clustering Family

Previously in 2014, we proposed the Nearest Descent (ND) method, capable of generating an efficient Graph, called the in-tree (IT) structure, This IT structure has some beautiful and effective advantages, which makes it well suited for data clustering. Subsequently, in order to avoid the seemingly redundant edges in the IT structure resulted from ND, we proposed another method, called the Nearest Neighbor Descent (NND), by adding a Neighborhood Graph constraint on ND. Although the undesired edges between clusters no longer appear, NND proves still not perfect. Because NND brings with it a new yet worse problem, the over-partitioning problem. Now, in this paper, we proposed a method, called the Hierarchical Nearest Neighbor Descent (H-NND), which overcomes the over-partitioning problem that NND faces via using the hierarchical strategy. Specifically, H-NND uses ND to effectively merge the over-segmented sub-graphs or clusters that NND produces. Like ND, H-NND also generates an IT structure. This seemingly comes back to the situation that ND faces. However, the redundant edges in the IT structures generated by H-NND turn out to be generally more salient than that by ND, and thus it becomes much easier and more reliable to identify the redundant edges even simply via taking the edge length as the only measure. We have proven the power of H-NND on several clustering datasets of varying shapes, dimensions and attributes.

Practical and Optimal LSH for Angular Distance

We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn 2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also introduce a multiprobe version of this algorithm, and conduct experimental evaluation on real and synthetic data sets. We complement the above positive results with a fine-grained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a trade-off between evaluation time and quality that is close to optimal for a natural class of LSH functions.

Statistical Inference, Learning and Models in Big Data

Big data provides big opportunities for statistical inference, but perhaps even bigger challenges, often related to differences in volume, variety, velocity, and veracity of information when compared to smaller carefully collected datasets. From January to June, 2015, the Canadian Institute of Statistical Sciences organized a thematic program on Statistical Inference, Learning and Models in Big Data. This paper arose from presentations and discussions that took place during the thematic program.

A simple two-sample Bayesian t-test for hypothesis testing

Absorbing random-walk centrality: Theory and algorithms

An explicit representation of the transition densities of the skew Brownian motion with drift and two semipermeable barriers

An information criterion for model selection with missing data via complete-data divergence

Annealed central limit theorems for the Ising model on random graphs

Approximating the Smallest Spanning Subgraph for 2-Edge-Connectivity in Directed Graphs

Arrangements Of Minors In The Positive Grassmannian And a Triangulation of The Hypersimplex

Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

Asynchronous Distributed ADMM for Large-Scale Optimization- Part I: Algorithm and Convergence Analysis

Asynchronous Distributed ADMM for Large-Scale Optimization- Part II: Linear Convergence Analysis and Numerical Performance

Beta-gamma tail asymptotics

Bivariate Extension of (Dynamic) Cumulative Past Entropy

Commutation and normal ordering for operators on symmetric functions

Conserved Ising Model on the Human Connectome

Containment Problems for Projections of Polyhedra and Spectrahedra

Convex Fused Lasso Denoising with Non-Convex Regularization and its use for Pulse Detection

Dimensionally Exponential Lower Bounds on the $L^p$ Norms of the Spherical Maximal Operator for Cartesian Powers of Finite Trees and Related Graphs

Disorder in Large-N Theories

Estimating the Division Kernel of a Size-Structured Population

Exact minimum codegree threshold for $K^- _4$-factors

Fast Second-Order Stochastic Backpropagation for Variational Inference

Fractional Zero Forcing via Three-color Forcing Games

Identification and Doubly Robust Estimation of Data Missing Not at Random With an Ancillary Variable

Kostka polynomials from nilpotent cones and Springer fiber cohomology

Modeling and visualizing networked multi-core embedded software energy consumption

Nested Recurrence Relations With Conolly-Like Solutions

On Misspecifications in Regularity and Properties of Estimators

On Parameter Estimation for Cusp-type Signals

On Parameter Estimation of Hidden Telegraph Process

On the energy efficiency of client-centric data consistency management under random read/write access to Big Data with Apache HBase

On the low dimensional dynamics of structured random networks

Performance Analysis of Incremental LMS over Flat Fading Channels

Robust regression estimation and inference in the presence of cellwise and casewise contamination

Sélection de variables par le GLM-Lasso pour la prédiction du risque palustre

Skorokhod’s M1 topology for distribution-valued processes

Spatial Boundaries and the Local Context of Residential Segregation

The Prism tableau model for Schubert polynomials

The Steiner 3-diameter of a graph

Transfer learning approach for financial applications

Triangulating stable laminations

Utility Maximisation for Exponential Levy Models with option and information processes