Search is a central problem in artificial intelligence, and BFS and DFS the two most fundamental ways to search. In this report we derive results for average BFS and DFS runtime: For tree search, we employ a probabilistic model of goal distribution; for graph search, the analysis depends on an additional statistic of path redundancy and average branching factor. As an application, we use the results on two concrete grammar problems. The runtime estimates can be used to select the faster out of BFS and DFS for a given problem, and may form the basis for further analysis of more advanced search methods. Finally, we verify our results experimentally; the analytical approximations come surprisingly close to empirical reality.
The diffusion based distributed learning approaches have been found to be a viable solution for learning over linearly separable datasets over a network. However, approaches till date are suitable for linearly separable datasets and need to be extended to scenarios in which we need to learn a non-linearity. In such scenarios, the recently proposed diffusion kernel least mean squares (KLMS) has been found to be performing better than diffusion least mean squares (LMS). The drawback of diffusion KLMS is that it requires infinite storage for observations (also called dictionary). This paper formulates the diffusion KLMS in a fixed budget setting such that the storage requirement is curtailed while maintaining appreciable performance in terms of convergence. Simulations have been carried out to validate the two newly proposed algorithms named as quantised diffusion KLMS (QDKLMS) and fixed budget diffusion KLMS (FBDKLMS) against KLMS, which indicate that both the proposed algorithms deliver better performance as compared to the KLMS while reducing the dictionary size storage requirement.
Previously in 2014, we proposed the Nearest Descent (ND) method, capable of generating an efficient Graph, called the in-tree (IT) structure, This IT structure has some beautiful and effective advantages, which makes it well suited for data clustering. Subsequently, in order to avoid the seemingly redundant edges in the IT structure resulted from ND, we proposed another method, called the Nearest Neighbor Descent (NND), by adding a Neighborhood Graph constraint on ND. Although the undesired edges between clusters no longer appear, NND proves still not perfect. Because NND brings with it a new yet worse problem, the over-partitioning problem. Now, in this paper, we proposed a method, called the Hierarchical Nearest Neighbor Descent (H-NND), which overcomes the over-partitioning problem that NND faces via using the hierarchical strategy. Specifically, H-NND uses ND to effectively merge the over-segmented sub-graphs or clusters that NND produces. Like ND, H-NND also generates an IT structure. This seemingly comes back to the situation that ND faces. However, the redundant edges in the IT structures generated by H-NND turn out to be generally more salient than that by ND, and thus it becomes much easier and more reliable to identify the redundant edges even simply via taking the edge length as the only measure. We have proven the power of H-NND on several clustering datasets of varying shapes, dimensions and attributes.
We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn 2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also introduce a multiprobe version of this algorithm, and conduct experimental evaluation on real and synthetic data sets. We complement the above positive results with a fine-grained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a trade-off between evaluation time and quality that is close to optimal for a natural class of LSH functions.
Big data provides big opportunities for statistical inference, but perhaps even bigger challenges, often related to differences in volume, variety, velocity, and veracity of information when compared to smaller carefully collected datasets. From January to June, 2015, the Canadian Institute of Statistical Sciences organized a thematic program on Statistical Inference, Learning and Models in Big Data. This paper arose from presentations and discussions that took place during the thematic program.