TribeFlow: Mining & Predicting User Trajectories

Which song will Smith listen to next? Which restaurant will Alice go to tomorrow? Which product will John click next? These applications have in common the prediction of user trajectories that are in a constant state of flux over a hidden network (e.g. website links, geographic location). What users are doing now may be unrelated to what they will be doing in an hour from now. Mindful of these challenges we propose TribeFlow, a method designed to cope with the complex challenges of learning personalized predictive models of non-stationary, transient, and time-heterogeneous user trajectories. TribeFlow is a general method that can perform next product recommendation, next song recommendation, next location prediction, and general arbitrary-length user trajectory prediction without domain-specific knowledge. TribeFlow is more accurate and up to 413x faster than top competitors.


Understanding symmetries in deep networks

Recent works have highlighted scale invariance or symmetry present in the weight space of a typical deep network and the adverse effect it has on the Euclidean gradient based stochastic gradient descent optimization. In this work, we show that a commonly used deep network, which uses convolution, batch normalization, reLU, max-pooling, and sub-sampling pipeline, possess more complex forms of symmetry arising from scaling-based reparameterization of the network weights. We propose to tackle the issue of the weight space symmetry by constraining the filters to lie on the unit-norm manifold. Consequently, training the network boils down to using stochastic gradient descent updates on the unit-norm manifold. Our empirical evidence based on the MNIST dataset shows that the proposed updates improve the test performance beyond what is achieved with batch normalization and without sacrificing the computational efficiency of the weight updates.


Data Stream Classification using Random Feature Functions and Novel Method Combinations

Big Data streams are being generated in a faster, bigger, and more commonplace. In this scenario, Hoeffding Trees are an established method for classification. Several extensions exist, including high-performing ensemble setups such as online and leveraging bagging. Also, k-nearest neighbors is a popular choice, with most extensions dealing with the inherent performance limitations over a potentially-infinite stream. At the same time, gradient descent methods are becoming increasingly popular, owing in part to the successes of deep learning. Although deep neural networks can learn incrementally, they have so far proved too sensitive to hyper-parameter options and initial conditions to be considered an effective `off-the-shelf’ data-streams solution. In this work, we look at combinations of Hoeffding-trees, nearest neighbour, and gradient descent methods with a streaming preprocessing approach in the form of a random feature functions filter for additional predictive power. We further extend the investigation to implementing methods on GPUs, which we test on some large real-world datasets, and show the benefits of using GPUs for data-stream learning due to their high scalability. Our empirical evaluation yields positive results for the novel approaches that we experiment with, highlighting important issues, and shed light on promising future directions in approaches to data-stream classification.


The Unbiasedness Approach to Linear Regression Models

The linear regression models are widely used statistical techniques in numerous practical applications. The standard regression model requires several assumptions about the regres- sors and the error term. The regression parameters are estimated using the least-squares method. In this paper, we consider the regression model with arbitrary regressors and with- out the error term. An explicit expression for the regression parameters vector is obtained. The unbiasedness approach is used to estimate the regression parameters and its various properties are investigated. It is shown that the resulting unbiased estimator equals the least-squares estimator for the ?xed design model. The analysis of residuals and the regres- sion sum of squares can be carried out in a natural way. The unbiased estimator of the dispersion matrix of the unbiased estimator is also obtained. Applications to AR(p) model and numerical examples are also discussed.


Visualising interactive inferences with IDPD3

A large part of the use of knowledge base systems is the interpretation of the output by the end-users and the interaction with these users. Even during the development process visualisations can be a great help to the developer. We created IDPD3 as a library to visualise models of logic theories. IDPD3 is a new version of ID^{P}_{Draw} and adds support for visualised interactive simulations.


Bound Your Models! How to Make OWL an ASP Modeling Language

To exploit the Web Ontology Language OWL as an answer set programming (ASP) language, we introduce the notion of bounded model semantics, as an intuitive and computationally advantageous alternative to its classical semantics. We show that a translation into ASP allows for solving a wide range of bounded-model reasoning tasks, including satisfiability and axiom entailment but also novel ones such as model extraction and enumeration. Ultimately, our work facilitates harnessing advanced semantic web modeling environments for the logic programming community through an ‘off-label use’ of OWL.


Information Theory and Statistics: an overview

We give an overview of the role of information theory in statistics, and particularly in biostatistics. We recall the basic quantities in information theory; entropy, cross-entropy, conditional entropy, mutual information and Kullback-Leibler risk. Then we examine the role of information theory in estimation theory, where the log-klikelihood can be identified as being an estimator of a cross-entropy. Then the basic quantities are extended to estimators, leading to criteria for estimator selection, such as Akaike criterion and its extensions. Finally we investigate the use of these concepts in Bayesian theory; the cross-entropy of the predictive distribution can be used for model selection; a cross-validation estimator of this cross-entropy is found to be equivalent to the pseudo-Bayes factor.


PCA-Based Out-of-Sample Extension for Dimensionality Reduction

Dimensionality reduction methods are very common in the field of high dimensional data analysis. Typically, algorithms for dimensionality reduction are computationally expensive. Therefore, their applications for the analysis of massive amounts of data are impractical. For example, repeated computations due to accumulated data are computationally prohibitive. In this paper, an out-of-sample extension scheme, which is used as a complementary method for dimensionality reduction, is presented. We describe an algorithm which performs an out-of-sample extension to newly-arrived data points. Unlike other extension algorithms such as Nystr\’om algorithm, the proposed algorithm uses the intrinsic geometry of the data and properties for dimensionality reduction map. We prove that the error of the proposed algorithm is bounded. Additionally to the out-of-sample extension, the algorithm provides a degree of the abnormality of any newly-arrived data point.


Implicit Feedback Recommendation using Method of Moment

Building recommendation algorithms is one of the most challenging tasks in Machine Learning. Although there has been significant progress in building recommendation systems when explicit feedback is available from the users in the form of rating or text, most of the applications do not receive such feedback. Here we consider the recommendation task where the available data is the record of the items selected by different users over time for subscription or purchase. This is known as implicit feedback recommendation. Such data are usually available as large amount of user logs stored over massively distributed storage systems such as Hadoop. Therefore it is essential to have a highly scalable algorithm to build a recommender system for such applications. Here we propose a probabilistic algorithm that takes only two to three passes through the entire dataset to extract the model parameters during the training phase. We demonstrate the competitive performance of our algorithm in several empirical measures as well as the computation time in comparison with the existing algorithms on various publicly available datasets.


Identifying Actionable Messages on Social Media

Text actionability detection is the problem of classifying user authored natural language text, according to whether it can be acted upon by a responding agent. In this paper, we propose a supervised learning framework for domain-aware, large-scale actionability classification of social media messages. We derive lexicons, perform an in-depth analysis for over 25 text based features, and explore strategies to handle domains that have limited training data. We apply these methods to over 46 million messages spanning 75 companies and 35 languages, from both Facebook and Twitter. The models achieve an aggregate population-weighted F measure of 0.78 and accuracy of 0.74, with values of over 0.9 in some cases.


A Bayesian Approach to the Partitioning of Workflows

When partitioning workflows in realistic scenarios, the knowledge of the processing units is often vague or unknown. A naive approach to addressing this issue is to perform many controlled experiments for different workloads, each consisting of multiple number of trials in order to estimate the mean and variance of the specific workload. Since this controlled experimental approach can be quite costly in terms of time and resources, we propose a variant of the Gibbs Sampling algorithm that uses a sequence of Bayesian inference updates to estimate the processing characteristics of the processing units. Using the inferred characteristics of the processing units, we are able to determine the best way to split a workflow for processing it in parallel with the lowest expected completion time and least variance.


ZenLDA: An Efficient and Scalable Topic Model Training System on Distributed Data-Parallel Platform

This paper presents our recent efforts, zenLDA, an efficient and scalable Collapsed Gibbs Sampling system for Latent Dirichlet Allocation training, which is thought to be challenging that both data parallelism and model parallelism are required because of the Big sampling data with up to billions of documents and Big model size with up to trillions of parameters. zenLDA combines both algorithm level improvements and system level optimizations. It first presents a novel CGS algorithm that balances the time complexity, model accuracy and parallelization flexibility. The input corpus in zenLDA is represented as a directed graph and model parameters are annotated as the corresponding vertex attributes. The distributed training is parallelized by partitioning the graph that in each iteration it first applies CGS step for all partitions in parallel, followed by synchronizing the computed model each other. In this way, both data parallelism and model parallelism are achieved by converting them to graph parallelism. We revisited the tradeoff between system efficiency and model accuracy and presented approximations such as unsynchronized model, sparse model initialization and ‘converged’ token exclusion. zenLDA is built on GraphX in Spark that provides distributed data abstraction (RDD) and expressive APIs to simplify the programming efforts and simultaneously hides the system complexities. This enables us to implement other CGS algorithm with a few lines of code change. To better fit in distributed data-parallel framework and achieve comparable performance with contemporary systems, we also presented several system level optimizations to push the performance limit. zenLDA was evaluated it against web-scale corpus, and the result indicates that zenLDA can achieve about much better performance than other CGS algorithm we implemented, and simultaneously achieve better model accuracy.


Bootstrapping Empirical Processes of Cluster Functionals with Application to Extremograms

In the extreme value analysis of time series, not only the tail behavior is of interest, but also the serial dependence plays a crucial role. Drees and Rootz\’en (2010) established limit theorems for a general class of empirical processes of so-called cluster functionals which can be used to analyse various aspects of the extreme value behavior of mixing time series. However, usually the limit distribution is too complex to enable a direct construction of confidence regions. Therefore, we suggest a multiplier block bootstrap analog to the empirical processes of cluster functionals. It is shown that under virtually the same conditions as used by Drees and Rootz\’en (2010), conditionally on the data, the bootstrap processes converge to the same limit distribution. These general results are applied to construct confidence regions for the empirical extremogram introduced by Davis and Mikosch (2009). In a simulation study, the confidence intervals constructed by our multiplier block bootstrap approach compare favorably to the stationary bootstrap proposed by Davis et al.\ (2012).


Spatial Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams

Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have shortcomings that make them unsuitable for rapid detection of locally emerging events on massive text streams. We describe Spatially Compact Semantic Scan (SCSS) that has been developed specifically to overcome the shortcomings of current methods in detecting new spatially compact events in text streams. SCSS employs alternating optimization between using semantic scan to estimate contrastive foreground topics in documents, and discovering spatial neighborhoods with high occurrence of documents containing the foreground topics. We evaluate our method on Emergency Department chief complaints dataset (ED dataset) to verify the effectiveness of our method in detecting real-world disease outbreaks from free-text ED chief complaint data.


A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding

Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for modeling and predicting sequential data, e.g. speech utterances or handwritten documents. In this study, we propose to use BLSTM-RNN for a unified tagging solution that can be applied to various tagging tasks including part-of-speech tagging, chunking and named entity recognition. Instead of exploiting specific features carefully optimized for each task, our solution only uses one set of task-independent features and internal representations learnt from unlabeled text for all tasks.Requiring no task specific knowledge or sophisticated feature engineering, our approach gets nearly state-of-the-art performance in all these three tagging tasks.


Preconditioned Data Sparsification for Big Data with Applications to PCA and K-means

We analyze a compression scheme for large data sets that randomly keeps a small percentage of the components of each data sample. The benefit is that the output is a sparse matrix and therefore subsequent processing, such as PCA or K-means, is significantly faster, especially in a distributed-data setting. Furthermore, the sampling is single-pass and applicable to streaming data. The sampling mechanism is a variant of previous methods proposed in the literature combined with a randomized preconditioning to smooth the data. We provide guarantees for PCA in terms of the covariance matrix, and guarantees for K-means in terms of the error in the center estimators at a given step. We present numerical evidence to show both that our bounds are nearly tight and that our algorithms provide a real benefit when applied to standard test data sets, as well as providing certain benefits over related sampling approaches.


Extremal Depth for Functional Data and Applications

We propose a new notion called `extremal depth’ (ED) for functional data, discuss its properties, and compare its performance with existing concepts. The proposed notion is based on a measure of extreme `outlyingness’. ED has several desirable properties that are not shared by other notions and is especially well suited for obtaining central regions of functional data and function spaces. In particular: a) the central region achieves the nominal (desired) simultaneous coverage probability; b) there is a correspondence between ED-based (simultaneous) central regions and appropriate point-wise central regions; and c) the method is resistant to certain classes of functional outliers. The paper examines the performance of ED and compares it with other depth notions. Its usefulness is demonstrated through applications to constructing central regions, functional boxplots, outlier detection, and simultaneous confidence bands in regression problems.


Wavelet Functional Data Analysis for FANOVA Models under Dependent Errors

We extend the wavelet tests for fixed effects FANOVA models with iid errors, proposed in Abramovich et al, 2004 to FANOVA models with dependent errors and provide an iterative Cochrane-Orcutt type procedure to estimate the parameters and the functional. The function is estimated through a nonlinear wavelet estimator. Nonparametric tests based on the optimal performance of nonlinear wavelet estimators are also proposed. The method is illustrated on real data sets and in simulated studies. The simulation also addresses the test performance under realistic sample sizes.


Tree Recurrent Neural Networks with Application to Language Modeling

In this paper we develop a recurrent neural network (TreeRNN), which is designed to predict a tree rather than a linear sequence as is the case in conventional recurrent neural networks. Our model defines the probability of a sentence by estimating the generation probability of its dependency tree. We construct the tree incrementally by generating the left and right dependents of a node whose probability is computed using recurrent neural networks with shared hidden layers. Application of our model to two language modeling tasks shows that it outperforms or performs on par with related models.


Gaussian Process Random Fields

Gaussian processes have been successful in both supervised and unsupervised machine learning tasks, but their computational complexity has constrained practical applications. We introduce a new approximation for large-scale Gaussian processes, the Gaussian Process Random Field (GPRF), in which local GPs are coupled via pairwise potentials. The GPRF likelihood is a simple, tractable, and parallelizeable approximation to the full GP marginal likelihood, enabling latent variable modeling and hyperparameter selection on large datasets. We demonstrate its effectiveness on synthetic spatial data as well as a real-world application to seismic event location.


Searching input values hitting suspicious Intervals in programs with floating-point operations

An Erdős–Hajnal analogue for permutation classes

The elapsed time between two transient state observations for an absorbing Markov chain

On a Brownian motion with a hard membrane

Detecting Interrogative Utterances with Recurrent Neural Networks

Efficient Algorithms under Asymmetric Read and Write Costs

On the Number of Non-zero Elements of Joint Degree Vectors

The Brownian Plane with Minimal Neck Baby Universe

Color-line and Proper Color-line Graphs

Consistent Parameter Estiamtion for LASSO and Approximate Message Passing

Detecting a Path of Correlations in a Network

Oriented Threshold Graphs

Joint imputation procedures for categorical variables

Reconstruction of a multidimensional scenery with a branching random walk

Shape-dependence of transmission, reflection and absorption eigenvalue densities in disordered waveguides with dissipation

Effective Invariant Theory of Permutation Groups using Representation Theory

Comparison of surrogate-based uncertainty quantification methods for computationally expensive simulators

Do Prices Coordinate Markets?

A web-based IDE for IDP

Lowering the learning curve for declarative programming: a Python API for the IDP system

SWISH: SWI-Prolog for Sharing

Complex Quantum Networks: From Universal Breakdown to Optimal Transport

Reinventing the Triangles: Rule of Thumb for Assessing Detectability

A Lower Bound for the Distributed Lovász Local Lemma

Burrows-Wheeler transform for terabases

Beating the Harmonic lower bound for online bin packing

The (3,1)-ordering for 4-connected planar triangulations

Properties of the Sample Mean in Graph Spaces and the Majorize-Minimize-Mean Algorithm

Local semicircle law under moment conditions. Part II: Localization and delocalization

A homology valued invariant for trivalent fatgraph spines

Finetuning Randomized Heuristic Search For 2D Path Planning: Finding The Best Input Parameters For R* Algorithm Through Series Of Experiments

Approximating Subadditive Hadamard Functions on Implicit Matrices

The Variational Fair Auto Encoder

Local digital algorithms applied to Boolean models

SAT as a game

On the Stabilizer of Weight Enumerators of Linear Codes

A Pareto Optimal D* Search Algorithm for Multiobjective Path Planning

Unit Interval Orders and the Dot Action on the Cohomology of Regular Semisimple Hessenberg Varieties

Non-Gaussian normal diffusion induced by delocalization

Optimal Gaussian approximations to the posterior for log-linear models with Diaconis-Ylvisaker priors

PAC Learning-Based Verification and Model Synthesis

Learning Unfair Trading: a Market Manipulation Analysis From the Reinforcement Learning Perspective

ProtNN: Fast and Accurate Nearest Neighbor Protein Function Prediction based on Graph Embedding in Structural and Topological Space

Modeling of Stationary Periodic Time Series by ARMA Representations

Irreducible triangulations of the punctured torus

Regularized quantile regression under heterogeneous sparsity with application to quantitative genetic traits

Galaxy-X: A Novel Approach for Multi-class Classification in an Open Universe

Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis

The Distributed Selection Problem and the AKS Sorting Network

GL_n(F_q)-analogues of factorization problems in the symmetric group

The 4/3 Additive Spanner Exponent is Tight

Mixing time and eigenvalues of the abelian sandpile Markov chain

Beating CountSketch for Heavy Hitters in Insertion Streams

Discrete Riemann surfaces based on quadrilateral cellular decompositions

Parameterized Algorithms for Constraint Satisfaction Problems Above Average with Global Cardinality Constraints

Partial Functional Linear Quantile Regression for Neuroimaging Data Analysis

Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces

A correction on Shiloach’s algorithm for minimum linear arrangement of trees

The Separated Box Product of Two Digraphs

On the Number of Many-to-Many Alignments of N Sequences

Entanglement properties of correlated random spin chains and its similarities with conformal invariant systems

Particle jumps in structural glasses

Maximal matchings in polyspiro and benzenoid chains

On the Beck-Fiala Conjecture for Random Set Systems

Proactive Demand Response for Data Centers: A Win-Win Solution

From random walks to distances on unweighted graphs

The stochastic logarithmic Schrödinger equation

Hedgehogs are not colour blind

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Fourth Moment Theorems for complex Gaussian approximation

An Impossibility Result for Reconstruction in a Degree-Corrected Planted-Partition Model

Spiking Analog VLSI Neuron Assemblies as Constraint Satisfaction Problem Solvers

Upper bounds for the achromatic and coloring numbers of a graph

Measuring the Complexity of Continuous Distributions

Inverse Problems in a Bayesian Setting

Posterior Predictive P-values with Fisher Randomization Tests in Noncompliance Settings: Test Statistics vs Discrepancy Variables

Estimation under cross-classified sampling with application to a childhood survey

Uniqueness, Spatial Mixing, and Approximation for Ferromagnetic 2-Spin Systems

Parallel Exhaustive Search without Coordination

The basis of $2\times 2$ monotone grid classes

Real Options and Threshold Strategies

Structure theory of flip graphs with applications to Weak Symmetry Breaking

Number of right ideals and a $q$-analogue of indecomposable permutations

Abelian logic gates

Box Resolvability

Generating countable groups by discrete subsets

Domination parameters with number 2

Submodular Functions: from Discrete to Continous Domains

Z Specification for the W3C Editor’s Draft Core SHACL Semantics

Low Correlation Noise Stability of Symmetric Sets

Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

Automatic Prosody Prediction for Chinese Speech Synthesis using BLSTM-RNN and Embedding Features

Beyond Degree Choosability

Construction of multi-default models with full viability

The Dynamic Splitting Method with an application to portfolio credit risk

Interventions in Networks

Primary Facets Of Order Polytopes

Parameterized Integer Quadratic Programming: Variables and Coefficients

A Schmid-Leiman based transformation resulting in perfect inter-correlations of three types of factor score predictors

Kernel-Penalized Regression for Analysis of Microbiome Data

Statistical Patterns of Darwinian Evolution

Empirical eigenvalue based testing for structural breaks in linear panel data models

Sparse Neural Codes and Convexity

A New Reduced-Rank Linear Discriminant Analysis Method and Its Applications

Calibrated Percentile Double Bootstrap For Robust Linear Regression Inference

The Smith group of the hypercube

Stochastic Top-k ListNet

Problem collection from the IML programme: Graphs, Hypergraphs, and Computing

A New Approach to Euler Calculus for Continuous Integrands

Discrete time approximation of a COGARCH(p,q) model and its estimation

New congruences for 2-color partitions

Shortest Reconfiguration of Sliding Tokens on a Caterpillar

Measure-Transformed Quasi Maximum Likelihood Estimation

Unmixed r-partite graphs

Linearity of regression for weak records, revisited

LM-CMA: an Alternative to L-BFGS for Large Scale Black-box Optimization

Large-scale probabilistic predictors with and without guarantees of validity

Exploiting Redundant Computation in Communication-Avoiding Algorithms for Algorithm-Based Fault Tolerance

Wong-Zakai Approximation through Rough Paths in the G-expectation Framework

On backward stochastic differential equations driven by a family of Itô’s processes

Pattern avoidance for set partitions à la Klazar

Mixed stochastic differential equations: Existence and uniqueness result

A Bijection on Classes Enumerated by the Schröder Numbers

Free Energy Rate Density and Self-organization in Complex Systems

Union-Free Families of Subsets

On $\mathcal{D}$-equivalence classes of some graphs

Support Vector Regression, Smooth Splines, and Time Series Prediction

Bayesian Nonparametric Reconstruction and Prediction of Nonlinear Dynamic Systems with Geometric Stick Breaking Noise

Convergence of Proximal-Gradient Stochastic Variational Inference under Non-Decreasing Step-Size Sequence

On the optimal control of opinion dynamics on evolving networks

Kummer and gamma laws through independences on trees – another parallel to the Matsumoto-Yor property

Adjacency matrices of random digraphs: singularity and anti-concentration

Recursive computation for evaluating the exact $p$-values of temporal and spatial scan statistics

On weak uniqueness and distributional properties of a solution to an SDE with $α$-stable noise

Sketch-based Image Retrieval from Millions of Images under Rotation, Translation and Scale Variations

Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in Neocortex

Pattern Avoidance in Task-Precedence Posets

Limit distributions of sample covariance matrices are compound free Poisson

The Pareto Regret Frontier for Bandits

Learning Adversary Behavior in Security Games: A PAC Model Perspective

Learning Causal Graphs with Small Interventions

Quantifying the Cognitive Extent of Science

Empirical Bayes prediction for the multivariate newsvendor loss function

Pathwise no-arbitrage in a class of Delta hedging strategies

Narrow Gauge and Analytical Branching Strategies for Mixed Integer Programming