K-Core Decomposition google
The $k$-core decomposition is a fundamental primitive in many machine learning and data mining applications. We present the first distributed and the first streaming algorithms to compute and maintain an approximate $k$-core decomposition with provable guarantees. Our algorithms achieve rigorous bounds on space complexity while bounding the number of passes or number of rounds of computation. We do so by presenting a new powerful sketching technique for $k$-core decomposition, and then by showing it can be computed efficiently in both streaming and MapReduce models. Finally, we confirm the effectiveness of our sketching technique empirically on a number of publicly available graphs. …

Compressive K-means (CKM) google
The Lloyd-Max algorithm is a classical approach to perform K-means clustering. Unfortunately, its cost becomes prohibitive as the training dataset grows large. We propose a compressive version of K-means (CKM), that estimates cluster centers from a sketch, i.e. from a drastically compressed representation of the training dataset. We demonstrate empirically that CKM performs similarly to Lloyd-Max, for a sketch size proportional to the number of cen-troids times the ambient dimension, and independent of the size of the original dataset. Given the sketch, the computational complexity of CKM is also independent of the size of the dataset. Unlike Lloyd-Max which requires several replicates, we further demonstrate that CKM is almost insensitive to initialization. For a large dataset of 10^7 data points, we show that CKM can run two orders of magnitude faster than five replicates of Lloyd-Max, with similar clustering performance on artificial data. Finally, CKM achieves lower classification errors on handwritten digits classification.
“Lloyd-Max”


DOM-Q-NET google
Building agents to interact with the web would allow for significant improvements in knowledge understanding and representation learning. However, web navigation tasks are difficult for current deep reinforcement learning (RL) models due to the large discrete action space and the varying number of actions between the states. In this work, we introduce DOM-Q-NET, a novel architecture for RL-based web navigation to address both of these problems. It parametrizes Q functions with separate networks for different action categories: clicking a DOM element and typing a string input. Our model utilizes a graph neural network to represent the tree-structured HTML of a standard web page. We demonstrate the capabilities of our model on the MiniWoB environment where we can match or outperform existing work without the use of expert demonstrations. Furthermore, we show 2x improvements in sample efficiency when training in the multi-task setting, allowing our model to transfer learned behaviours across tasks. …

Dynamic Mode Decomposition (DMD) google
Dynamic mode decomposition (DMD) is a dimensionality reduction algorithm developed by Peter Schmid in 2008. Given a time series of data, DMD computes a set of modes each of which is associated with a fixed oscillation frequency and decay/growth rate. For linear systems in particular, these modes and frequencies are analogous to the normal modes of the system, but more generally, they are approximations of the modes and eigenvalues of the composition operator (also called the Koopman operator). Due to the intrinsic temporal behaviors associated with each mode, DMD differs from dimensionality reduction methods such as principal component analysis, which computes orthogonal modes that lack predetermined temporal behaviors. Because its modes are not orthogonal, DMD-based representations can be less parsimonious than those generated by PCA. However, they can also be more physically meaningful because each mode is associated with a damped (or driven) sinusoidal behavior in time.
Data-driven multiscale decompositions for forecasting and model discovery