A Mathematical Theory for Clustering in Metric Spaces

Clustering is one of the most fundamental problems in data analysis and it has been studied extensively in the literature. Though many clustering algorithms have been proposed, clustering theories that justify the use of these clustering algorithms are still unsatisfactory. In particular, one of the fundamental challenges is to address the following question: What is a cluster in a set of data points? In this paper, we make an attempt to address such a question by considering a set of data points associated with a distance measure (metric). We first propose a new cohesion measure in terms of the distance measure. Using the cohesion measure, we define a cluster as a set of points that are cohesive to themselves. For such a definition, we show there are various equivalent statements that have intuitive explanations. We then consider the second question: How do we find clusters and good partitions of clusters under such a definition? For such a question, we propose a hierarchical agglomerative algorithm and a partitional algorithm. Unlike standard hierarchical agglomerative algorithms, our hierarchical agglomerative algorithm has a specific stopping criterion and it stops with a partition of clusters. Our partitional algorithm, called the K-sets algorithm in the paper, appears to be a new iterative algorithm. Unlike the Lloyd iteration that needs two-step minimization, our K-sets algorithm only takes one-step minimization. One of the most interesting findings of our paper is the duality result between a distance measure and a cohesion measure. Such a duality result leads to a dual K-sets algorithm for clustering a set of data points with a cohesion measure. The dual K-sets algorithm converges in the same way as a sequential version of the classical kernel K-means algorithm. The key difference is that a cohesion measure does not need to be positive semi-definite.

Bayesian Nonparametric Graph Clustering

We present clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables. As opposed to standard approaches that assume known graph structures, we first estimate the edge structure of the unknown graph using Bayesian neighborhood selection approaches, wherein we account for the uncertainty of graphical structure learning through model-averaged estimates of the suitable parameters. Subsequently, we develop a nonparametric graph clustering model on the lower dimensional projections of the graph based on Laplacian embeddings using Dirichlet process mixture models. In contrast to standard algorithmic approaches, this fully probabilistic approach allows incorporation of uncertainty in estimation and inference for both graph structure learning and clustering. More importantly, we formalize the arguments for Laplacian embeddings as suitable projections for graph clustering by providing theoretical support for the consistency of the eigenspace of the estimated graph Laplacians. We develop fast computational algorithms that allow our methods to scale to large number of nodes. Through extensive simulations we compare our clustering performance with standard clustering methods. We apply our methods to a novel pan-cancer proteomic data set, and evaluate protein networks and clusters across multiple different cancer types.

Deep Multimodal Embedding: Manipulating Novel Objects with Point-clouds, Language and Trajectories

A robot operating in a real-world environment needs to perform reasoning with a variety of sensing modalities. However, manually designing features that allow a learning algorithm to relate these different modalities can be extremely challenging. In this work, we consider the task of manipulating novel objects and appliances. To this end, we learn to embed point-cloud, natural language, and manipulation trajectory data into a shared embedding space using a deep neural network. In order to learn semantically meaningful spaces throughout our network, we introduce a method for pre-training its lower layers for multimodal feature embedding and a method for fine-tuning this embedding space using a loss-based margin. We test our model on the Robobarista dataset [22], where we achieve significant improvements in both accuracy and inference time over the previous state of the art.

Description of the Odin Event Extraction Framework and Rule Language

This document describes the Odin framework, which is a domain-independent platform for developing rule-based event extraction models. Odin aims to be powerful (the rule language allows the modeling of complex syntactic structures) and robust (to recover from syntactic parsing errors, syntactic patterns can be freely mixed with surface, token-based patterns), while remaining simple (some domain grammars can be up and running in minutes), and fast (Odin processes over 100 sentences/second in a real-world domain with over 200 rules). Here we include a thorough definition of the Odin rule language, together with a description of the Odin API in the Scala language, which allows one to apply these rules to arbitrary texts.

mplot: An R Package for Graphical Model Stability and Variable Selection Procedures

The mplot package provides an easy to use implementation of model stability and variable inclusion plots (M\’uller and Welsh 2010; Murray, Heritier, and M\’uller 2013) as well as the adaptive fence (Jiang, Rao, Gu, and Nguyen 2008; Jiang, Nguyen, and Rao 2009) for linear and generalised linear models. We provide a number of innovations on the standard procedures and address many practical implementation issues including the addition of redundant variables, interactive visualisations and approximating logistic models with linear models. An option is provided that combines our bootstrap approach with glmnet for higher dimensional models. The plots and graphical user interface leverage state of the art web technologies to facilitate interaction with the results. The speed of implementation comes from the leaps package and cross-platform multicore support.

Training Deep Networks with Structured Layers by Matrix Backpropagation

Deep neural network architectures have recently produced excellent results in a variety of areas in artificial intelligence and visual recognition, well surpassing traditional shallow architectures trained using hand-designed features. The power of deep networks stems both from their ability to perform local computations followed by pointwise non-linearities over increasingly larger receptive fields, and from the simplicity and scalability of the gradient-descent training procedure based on backpropagation. An open problem is the inclusion of layers that perform global, structured matrix computations like segmentation (e.g. normalized cuts) or higher-order pooling (e.g. log-tangent space metrics defined over the manifold of symmetric positive definite matrices) while preserving the validity and efficiency of an end-to-end deep training framework. In this paper we propose a sound mathematical apparatus to formally integrate global structured computation into deep computation architectures. At the heart of our methodology is the development of the theory and practice of backpropagation that generalizes to the calculus of adjoint matrix variations. We perform segmentation experiments using the BSDS and MSCOCO benchmarks and demonstrate that deep networks relying on second-order pooling and normalized cuts layers, trained end-to-end using matrix backpropagation, outperform counterparts that do not take advantage of such global layers.

$2^{\aleph_0}$ pairwise non-isomorphic maximal-closed subgroups of Sym$(\mathbb{N})$ via the classification of the reducts of the Henson digraphs

A Basis for Slicing Birkhoff Polytopes

A brief note on the Karhunen-Loève expansion

A Cyberinfrastructure-based Approach to Real Time Water Temperature Prediction

A Lasserre-based $(1+\varepsilon)$-approximation for $Pm \mid p_j=1,\textrm{prec} \mid C_{\max}$

An extending result on spectral radius of bipartite graphs

Bayesian model selection on linear mixed-effects models for comparisons between multiple treatments and a control

Birth-and-death Pólya urns and stationary random partitions

Caminos de Dyck contenidos en un diagrama de Ferrers

Computational Intelligence Challenges and Applications on Large-Scale Astronomical Time Series Databases

Condensation and symmetry-breaking in the zero-range process with weak site disorder

Connectivity Preserving Iterative Compaction and Finding 2 Disjoint Rooted Paths in Linear Time

Constructing Abstraction Hierarchies Using a Skill-Symbol Loop

Counting isomorphism classes of $β$-normal linear lambda terms

Critical Phenomena of Dynamical Delocalization in Quantum Anderson Map

Efficient Computation of the Quasi Likelihood function for Discretely Observed Diffusion Processes

Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Fractional coverings, greedy coverings, and rectifier networks

Graph games and the pizza problem

Information Limits for Recovering a Hidden Community

Linear-time Learning on Distributions with Approximate Kernel Embeddings

Majority Digraphs

Measure preserving actions of affine semigroups and {x+y,xy} patterns

Minimax Regret 1-Median Problem in Dynamic Path Networks

Moment estimates implied by modified log-Sobolev inequalities

New Formulas for Dyck Paths in a Rectangle

Nonlinear stochastic time-fractional slow and fast diffusion equations on $\mathbb{R}^d$

Nonparametric estimation of the distribution of the autoregressive coefficient from panel random-coefficient AR(1) data

Off-equilibrium finite-size method for critical behavior analyses

On the invariance principle for empirical processes of associated sequences

Online Stochastic Linear Optimization under One-bit Feedback

Poisson approximation of subgraph counts in stochastic block models and a graphon model

Poisson suspensions and Sushis

Polar Grassmannians and their Codes

Predicting the outcomes of every process for which an asymptotically accurate stationary predictor exists is impossible

Quantum diffusion of a relativistic particle in a time-dependent random potential

Ramanujan-type Congruences for $\ell$-Regular Partitions Modulo $3, 5, 11$ and $13$

Rate Analysis for Detection of Sparse Mixtures

Rigged configurations for all symmetrizable types

Selecting Relevant Web Trained Concepts for Automated Event Retrieval

Sentiment of Emojis

Sentiment Uncertainty and Spam in Twitter Streams and Its Implications for General Purpose Realtime Sentiment Analysis

Spectra of Graphs and Closed Distance Magic Labelings

The Design and Implementation of the Wave Transactional Filesystem

The Ghirlanda Guerra Identities and the Replica Trick for the Branching Random Walk

The width of quadrangulations of the projective plane

Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach

Validity of time reversal for testing Granger causality

Warp: Lightweight Multi-Key Transactions for Key-Value Stores