• TribeFlow: Mining & Predicting User Trajectories
Which song will Smith listen to next? Which restaurant will Alice go to tomorrow? Which product will John click next? These applications have in common the prediction of user trajectories that are in a constant state of flux over a hidden network (e.g. website links, geographic location). What users are doing now may be unrelated to what they will be doing in an hour from now. Mindful of these challenges we propose TribeFlow, a method designed to cope with the complex challenges of learning personalized predictive models of non-stationary, transient, and time-heterogeneous user trajectories. TribeFlow is a general method that can perform next product recommendation, next song recommendation, next location prediction, and general arbitrary-length user trajectory prediction without domain-specific knowledge. TribeFlow is more accurate and up to 413x faster than top competitors.
• Understanding symmetries in deep networks
Recent works have highlighted scale invariance or symmetry present in the weight space of a typical deep network and the adverse effect it has on the Euclidean gradient based stochastic gradient descent optimization. In this work, we show that a commonly used deep network, which uses convolution, batch normalization, reLU, max-pooling, and sub-sampling pipeline, possess more complex forms of symmetry arising from scaling-based reparameterization of the network weights. We propose to tackle the issue of the weight space symmetry by constraining the filters to lie on the unit-norm manifold. Consequently, training the network boils down to using stochastic gradient descent updates on the unit-norm manifold. Our empirical evidence based on the MNIST dataset shows that the proposed updates improve the test performance beyond what is achieved with batch normalization and without sacrificing the computational efficiency of the weight updates.
• Data Stream Classification using Random Feature Functions and Novel Method Combinations
Big Data streams are being generated in a faster, bigger, and more commonplace. In this scenario, Hoeffding Trees are an established method for classification. Several extensions exist, including high-performing ensemble setups such as online and leveraging bagging. Also,

-nearest neighbors is a popular choice, with most extensions dealing with the inherent performance limitations over a potentially-infinite stream. At the same time, gradient descent methods are becoming increasingly popular, owing in part to the successes of deep learning. Although deep neural networks can learn incrementally, they have so far proved too sensitive to hyper-parameter options and initial conditions to be considered an effective `off-the-shelf’ data-streams solution. In this work, we look at combinations of Hoeffding-trees, nearest neighbour, and gradient descent methods with a streaming preprocessing approach in the form of a random feature functions filter for additional predictive power. We further extend the investigation to implementing methods on GPUs, which we test on some large real-world datasets, and show the benefits of using GPUs for data-stream learning due to their high scalability. Our empirical evaluation yields positive results for the novel approaches that we experiment with, highlighting important issues, and shed light on promising future directions in approaches to data-stream classification.
• The Unbiasedness Approach to Linear Regression Models
The linear regression models are widely used statistical techniques in numerous practical applications. The standard regression model requires several assumptions about the regres- sors and the error term. The regression parameters are estimated using the least-squares method. In this paper, we consider the regression model with arbitrary regressors and with- out the error term. An explicit expression for the regression parameters vector is obtained. The unbiasedness approach is used to estimate the regression parameters and its various properties are investigated. It is shown that the resulting unbiased estimator equals the least-squares estimator for the ?xed design model. The analysis of residuals and the regres- sion sum of squares can be carried out in a natural way. The unbiased estimator of the dispersion matrix of the unbiased estimator is also obtained. Applications to AR(p) model and numerical examples are also discussed.
• Visualising interactive inferences with IDPD3
A large part of the use of knowledge base systems is the interpretation of the output by the end-users and the interaction with these users. Even during the development process visualisations can be a great help to the developer. We created IDPD3 as a library to visualise models of logic theories. IDPD3 is a new version of

and adds support for visualised interactive simulations.
• Bound Your Models! How to Make OWL an ASP Modeling Language
To exploit the Web Ontology Language OWL as an answer set programming (ASP) language, we introduce the notion of bounded model semantics, as an intuitive and computationally advantageous alternative to its classical semantics. We show that a translation into ASP allows for solving a wide range of bounded-model reasoning tasks, including satisfiability and axiom entailment but also novel ones such as model extraction and enumeration. Ultimately, our work facilitates harnessing advanced semantic web modeling environments for the logic programming community through an ‘off-label use’ of OWL.
• Information Theory and Statistics: an overview
We give an overview of the role of information theory in statistics, and particularly in biostatistics. We recall the basic quantities in information theory; entropy, cross-entropy, conditional entropy, mutual information and Kullback-Leibler risk. Then we examine the role of information theory in estimation theory, where the log-klikelihood can be identified as being an estimator of a cross-entropy. Then the basic quantities are extended to estimators, leading to criteria for estimator selection, such as Akaike criterion and its extensions. Finally we investigate the use of these concepts in Bayesian theory; the cross-entropy of the predictive distribution can be used for model selection; a cross-validation estimator of this cross-entropy is found to be equivalent to the pseudo-Bayes factor.
• PCA-Based Out-of-Sample Extension for Dimensionality Reduction
Dimensionality reduction methods are very common in the field of high dimensional data analysis. Typically, algorithms for dimensionality reduction are computationally expensive. Therefore, their applications for the analysis of massive amounts of data are impractical. For example, repeated computations due to accumulated data are computationally prohibitive. In this paper, an out-of-sample extension scheme, which is used as a complementary method for dimensionality reduction, is presented. We describe an algorithm which performs an out-of-sample extension to newly-arrived data points. Unlike other extension algorithms such as Nystr\’om algorithm, the proposed algorithm uses the intrinsic geometry of the data and properties for dimensionality reduction map. We prove that the error of the proposed algorithm is bounded. Additionally to the out-of-sample extension, the algorithm provides a degree of the abnormality of any newly-arrived data point.
• Implicit Feedback Recommendation using Method of Moment
Building recommendation algorithms is one of the most challenging tasks in Machine Learning. Although there has been significant progress in building recommendation systems when explicit feedback is available from the users in the form of rating or text, most of the applications do not receive such feedback. Here we consider the recommendation task where the available data is the record of the items selected by different users over time for subscription or purchase. This is known as implicit feedback recommendation. Such data are usually available as large amount of user logs stored over massively distributed storage systems such as Hadoop. Therefore it is essential to have a highly scalable algorithm to build a recommender system for such applications. Here we propose a probabilistic algorithm that takes only two to three passes through the entire dataset to extract the model parameters during the training phase. We demonstrate the competitive performance of our algorithm in several empirical measures as well as the computation time in comparison with the existing algorithms on various publicly available datasets.
• Identifying Actionable Messages on Social Media
Text actionability detection is the problem of classifying user authored natural language text, according to whether it can be acted upon by a responding agent. In this paper, we propose a supervised learning framework for domain-aware, large-scale actionability classification of social media messages. We derive lexicons, perform an in-depth analysis for over 25 text based features, and explore strategies to handle domains that have limited training data. We apply these methods to over 46 million messages spanning 75 companies and 35 languages, from both Facebook and Twitter. The models achieve an aggregate population-weighted F measure of 0.78 and accuracy of 0.74, with values of over 0.9 in some cases.
• A Bayesian Approach to the Partitioning of Workflows
When partitioning workflows in realistic scenarios, the knowledge of the processing units is often vague or unknown. A naive approach to addressing this issue is to perform many controlled experiments for different workloads, each consisting of multiple number of trials in order to estimate the mean and variance of the specific workload. Since this controlled experimental approach can be quite costly in terms of time and resources, we propose a variant of the Gibbs Sampling algorithm that uses a sequence of Bayesian inference updates to estimate the processing characteristics of the processing units. Using the inferred characteristics of the processing units, we are able to determine the best way to split a workflow for processing it in parallel with the lowest expected completion time and least variance.
• ZenLDA: An Efficient and Scalable Topic Model Training System on Distributed Data-Parallel Platform
This paper presents our recent efforts, zenLDA, an efficient and scalable Collapsed Gibbs Sampling system for Latent Dirichlet Allocation training, which is thought to be challenging that both data parallelism and model parallelism are required because of the Big sampling data with up to billions of documents and Big model size with up to trillions of parameters. zenLDA combines both algorithm level improvements and system level optimizations. It first presents a novel CGS algorithm that balances the time complexity, model accuracy and parallelization flexibility. The input corpus in zenLDA is represented as a directed graph and model parameters are annotated as the corresponding vertex attributes. The distributed training is parallelized by partitioning the graph that in each iteration it first applies CGS step for all partitions in parallel, followed by synchronizing the computed model each other. In this way, both data parallelism and model parallelism are achieved by converting them to graph parallelism. We revisited the tradeoff between system efficiency and model accuracy and presented approximations such as unsynchronized model, sparse model initialization and ‘converged’ token exclusion. zenLDA is built on GraphX in Spark that provides distributed data abstraction (RDD) and expressive APIs to simplify the programming efforts and simultaneously hides the system complexities. This enables us to implement other CGS algorithm with a few lines of code change. To better fit in distributed data-parallel framework and achieve comparable performance with contemporary systems, we also presented several system level optimizations to push the performance limit. zenLDA was evaluated it against web-scale corpus, and the result indicates that zenLDA can achieve about much better performance than other CGS algorithm we implemented, and simultaneously achieve better model accuracy.
• Bootstrapping Empirical Processes of Cluster Functionals with Application to Extremograms
In the extreme value analysis of time series, not only the tail behavior is of interest, but also the serial dependence plays a crucial role. Drees and Rootz\’en (2010) established limit theorems for a general class of empirical processes of so-called cluster functionals which can be used to analyse various aspects of the extreme value behavior of mixing time series. However, usually the limit distribution is too complex to enable a direct construction of confidence regions. Therefore, we suggest a multiplier block bootstrap analog to the empirical processes of cluster functionals. It is shown that under virtually the same conditions as used by Drees and Rootz\’en (2010), conditionally on the data, the bootstrap processes converge to the same limit distribution. These general results are applied to construct confidence regions for the empirical extremogram introduced by Davis and Mikosch (2009). In a simulation study, the confidence intervals constructed by our multiplier block bootstrap approach compare favorably to the stationary bootstrap proposed by Davis et al.\ (2012).
• Spatial Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams
Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have shortcomings that make them unsuitable for rapid detection of locally emerging events on massive text streams. We describe Spatially Compact Semantic Scan (SCSS) that has been developed specifically to overcome the shortcomings of current methods in detecting new spatially compact events in text streams. SCSS employs alternating optimization between using semantic scan to estimate contrastive foreground topics in documents, and discovering spatial neighborhoods with high occurrence of documents containing the foreground topics. We evaluate our method on Emergency Department chief complaints dataset (ED dataset) to verify the effectiveness of our method in detecting real-world disease outbreaks from free-text ED chief complaint data.
• A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding
Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for modeling and predicting sequential data, e.g. speech utterances or handwritten documents. In this study, we propose to use BLSTM-RNN for a unified tagging solution that can be applied to various tagging tasks including part-of-speech tagging, chunking and named entity recognition. Instead of exploiting specific features carefully optimized for each task, our solution only uses one set of task-independent features and internal representations learnt from unlabeled text for all tasks.Requiring no task specific knowledge or sophisticated feature engineering, our approach gets nearly state-of-the-art performance in all these three tagging tasks.
• Preconditioned Data Sparsification for Big Data with Applications to PCA and K-means
We analyze a compression scheme for large data sets that randomly keeps a small percentage of the components of each data sample. The benefit is that the output is a sparse matrix and therefore subsequent processing, such as PCA or K-means, is significantly faster, especially in a distributed-data setting. Furthermore, the sampling is single-pass and applicable to streaming data. The sampling mechanism is a variant of previous methods proposed in the literature combined with a randomized preconditioning to smooth the data. We provide guarantees for PCA in terms of the covariance matrix, and guarantees for K-means in terms of the error in the center estimators at a given step. We present numerical evidence to show both that our bounds are nearly tight and that our algorithms provide a real benefit when applied to standard test data sets, as well as providing certain benefits over related sampling approaches.
• Extremal Depth for Functional Data and Applications
We propose a new notion called `extremal depth’ (ED) for functional data, discuss its properties, and compare its performance with existing concepts. The proposed notion is based on a measure of extreme `outlyingness’. ED has several desirable properties that are not shared by other notions and is especially well suited for obtaining central regions of functional data and function spaces. In particular: a) the central region achieves the nominal (desired) simultaneous coverage probability; b) there is a correspondence between ED-based (simultaneous) central regions and appropriate point-wise central regions; and c) the method is resistant to certain classes of functional outliers. The paper examines the performance of ED and compares it with other depth notions. Its usefulness is demonstrated through applications to constructing central regions, functional boxplots, outlier detection, and simultaneous confidence bands in regression problems.
• Wavelet Functional Data Analysis for FANOVA Models under Dependent Errors
We extend the wavelet tests for fixed effects FANOVA models with iid errors, proposed in Abramovich et al, 2004 to FANOVA models with dependent errors and provide an iterative Cochrane-Orcutt type procedure to estimate the parameters and the functional. The function is estimated through a nonlinear wavelet estimator. Nonparametric tests based on the optimal performance of nonlinear wavelet estimators are also proposed. The method is illustrated on real data sets and in simulated studies. The simulation also addresses the test performance under realistic sample sizes.
• Tree Recurrent Neural Networks with Application to Language Modeling
In this paper we develop a recurrent neural network (TreeRNN), which is designed to predict a tree rather than a linear sequence as is the case in conventional recurrent neural networks. Our model defines the probability of a sentence by estimating the generation probability of its dependency tree. We construct the tree incrementally by generating the left and right dependents of a node whose probability is computed using recurrent neural networks with shared hidden layers. Application of our model to two language modeling tasks shows that it outperforms or performs on par with related models.
• Gaussian Process Random Fields
Gaussian processes have been successful in both supervised and unsupervised machine learning tasks, but their computational complexity has constrained practical applications. We introduce a new approximation for large-scale Gaussian processes, the Gaussian Process Random Field (GPRF), in which local GPs are coupled via pairwise potentials. The GPRF likelihood is a simple, tractable, and parallelizeable approximation to the full GP marginal likelihood, enabling latent variable modeling and hyperparameter selection on large datasets. We demonstrate its effectiveness on synthetic spatial data as well as a real-world application to seismic event location.
• Searching input values hitting suspicious Intervals in programs with floating-point operations
• An Erdős–Hajnal analogue for permutation classes
• The elapsed time between two transient state observations for an absorbing Markov chain
• On a Brownian motion with a hard membrane
• Detecting Interrogative Utterances with Recurrent Neural Networks
• Efficient Algorithms under Asymmetric Read and Write Costs
• On the Number of Non-zero Elements of Joint Degree Vectors
• The Brownian Plane with Minimal Neck Baby Universe
• Color-line and Proper Color-line Graphs
• Consistent Parameter Estiamtion for LASSO and Approximate Message Passing
• Detecting a Path of Correlations in a Network
• Oriented Threshold Graphs
• Joint imputation procedures for categorical variables
• Reconstruction of a multidimensional scenery with a branching random walk
• Shape-dependence of transmission, reflection and absorption eigenvalue densities in disordered waveguides with dissipation
• Effective Invariant Theory of Permutation Groups using Representation Theory
• Comparison of surrogate-based uncertainty quantification methods for computationally expensive simulators
• Do Prices Coordinate Markets?
• A web-based IDE for IDP
• Lowering the learning curve for declarative programming: a Python API for the IDP system
• SWISH: SWI-Prolog for Sharing
• Complex Quantum Networks: From Universal Breakdown to Optimal Transport
• Reinventing the Triangles: Rule of Thumb for Assessing Detectability
• A Lower Bound for the Distributed Lovász Local Lemma
• Burrows-Wheeler transform for terabases
• Beating the Harmonic lower bound for online bin packing
• The (3,1)-ordering for 4-connected planar triangulations
• Properties of the Sample Mean in Graph Spaces and the Majorize-Minimize-Mean Algorithm
• Local semicircle law under moment conditions. Part II: Localization and delocalization
• A homology valued invariant for trivalent fatgraph spines
• Finetuning Randomized Heuristic Search For 2D Path Planning: Finding The Best Input Parameters For R* Algorithm Through Series Of Experiments
• Approximating Subadditive Hadamard Functions on Implicit Matrices
• The Variational Fair Auto Encoder
• Local digital algorithms applied to Boolean models
• SAT as a game
• On the Stabilizer of Weight Enumerators of Linear Codes
• A Pareto Optimal D* Search Algorithm for Multiobjective Path Planning
• Unit Interval Orders and the Dot Action on the Cohomology of Regular Semisimple Hessenberg Varieties
• Non-Gaussian normal diffusion induced by delocalization
• Optimal Gaussian approximations to the posterior for log-linear models with Diaconis-Ylvisaker priors
• PAC Learning-Based Verification and Model Synthesis
• Learning Unfair Trading: a Market Manipulation Analysis From the Reinforcement Learning Perspective
• ProtNN: Fast and Accurate Nearest Neighbor Protein Function Prediction based on Graph Embedding in Structural and Topological Space
• Modeling of Stationary Periodic Time Series by ARMA Representations
• Irreducible triangulations of the punctured torus
• Regularized quantile regression under heterogeneous sparsity with application to quantitative genetic traits
• Galaxy-X: A Novel Approach for Multi-class Classification in an Open Universe
• Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis
• The Distributed Selection Problem and the AKS Sorting Network
• GL_n(F_q)-analogues of factorization problems in the symmetric group
• The 4/3 Additive Spanner Exponent is Tight
• Mixing time and eigenvalues of the abelian sandpile Markov chain
• Beating CountSketch for Heavy Hitters in Insertion Streams
• Discrete Riemann surfaces based on quadrilateral cellular decompositions
• Parameterized Algorithms for Constraint Satisfaction Problems Above Average with Global Cardinality Constraints
• Partial Functional Linear Quantile Regression for Neuroimaging Data Analysis
• Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces
• A correction on Shiloach’s algorithm for minimum linear arrangement of trees
• The Separated Box Product of Two Digraphs
• On the Number of Many-to-Many Alignments of N Sequences
• Entanglement properties of correlated random spin chains and its similarities with conformal invariant systems
• Particle jumps in structural glasses
• Maximal matchings in polyspiro and benzenoid chains
• On the Beck-Fiala Conjecture for Random Set Systems
• Proactive Demand Response for Data Centers: A Win-Win Solution
• From random walks to distances on unweighted graphs
• The stochastic logarithmic Schrödinger equation
• Hedgehogs are not colour blind
• SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
• Fourth Moment Theorems for complex Gaussian approximation
• An Impossibility Result for Reconstruction in a Degree-Corrected Planted-Partition Model
• Spiking Analog VLSI Neuron Assemblies as Constraint Satisfaction Problem Solvers
• Upper bounds for the achromatic and coloring numbers of a graph
• Measuring the Complexity of Continuous Distributions
• Inverse Problems in a Bayesian Setting
• Posterior Predictive P-values with Fisher Randomization Tests in Noncompliance Settings: Test Statistics vs Discrepancy Variables
• Estimation under cross-classified sampling with application to a childhood survey
• Uniqueness, Spatial Mixing, and Approximation for Ferromagnetic 2-Spin Systems
• Parallel Exhaustive Search without Coordination
• The basis of $2\times 2$ monotone grid classes
• Real Options and Threshold Strategies
• Structure theory of flip graphs with applications to Weak Symmetry Breaking
• Number of right ideals and a $q$-analogue of indecomposable permutations
• Abelian logic gates
• Box Resolvability
• Generating countable groups by discrete subsets
• Domination parameters with number 2
• Submodular Functions: from Discrete to Continous Domains
• Z Specification for the W3C Editor’s Draft Core SHACL Semantics
• Low Correlation Noise Stability of Symmetric Sets
• Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations
• BinaryConnect: Training Deep Neural Networks with binary weights during propagations
• Automatic Prosody Prediction for Chinese Speech Synthesis using BLSTM-RNN and Embedding Features
• Beyond Degree Choosability
• Construction of multi-default models with full viability
• The Dynamic Splitting Method with an application to portfolio credit risk
• Interventions in Networks
• Primary Facets Of Order Polytopes
• Parameterized Integer Quadratic Programming: Variables and Coefficients
• A Schmid-Leiman based transformation resulting in perfect inter-correlations of three types of factor score predictors
• Kernel-Penalized Regression for Analysis of Microbiome Data
• Statistical Patterns of Darwinian Evolution
• Empirical eigenvalue based testing for structural breaks in linear panel data models
• Sparse Neural Codes and Convexity
• A New Reduced-Rank Linear Discriminant Analysis Method and Its Applications
• Calibrated Percentile Double Bootstrap For Robust Linear Regression Inference
• The Smith group of the hypercube
• Stochastic Top-k ListNet
• Problem collection from the IML programme: Graphs, Hypergraphs, and Computing
• A New Approach to Euler Calculus for Continuous Integrands
• Discrete time approximation of a COGARCH(p,q) model and its estimation
• New congruences for 2-color partitions
• Shortest Reconfiguration of Sliding Tokens on a Caterpillar
• Measure-Transformed Quasi Maximum Likelihood Estimation
• Unmixed r-partite graphs
• Linearity of regression for weak records, revisited
• LM-CMA: an Alternative to L-BFGS for Large Scale Black-box Optimization
• Large-scale probabilistic predictors with and without guarantees of validity
• Exploiting Redundant Computation in Communication-Avoiding Algorithms for Algorithm-Based Fault Tolerance
• Wong-Zakai Approximation through Rough Paths in the G-expectation Framework
• On backward stochastic differential equations driven by a family of Itô’s processes
• Pattern avoidance for set partitions à la Klazar
• Mixed stochastic differential equations: Existence and uniqueness result
• A Bijection on Classes Enumerated by the Schröder Numbers
• Free Energy Rate Density and Self-organization in Complex Systems
• Union-Free Families of Subsets
• On $\mathcal{D}$-equivalence classes of some graphs
• Support Vector Regression, Smooth Splines, and Time Series Prediction
• Bayesian Nonparametric Reconstruction and Prediction of Nonlinear Dynamic Systems with Geometric Stick Breaking Noise
• Convergence of Proximal-Gradient Stochastic Variational Inference under Non-Decreasing Step-Size Sequence
• On the optimal control of opinion dynamics on evolving networks
• Kummer and gamma laws through independences on trees – another parallel to the Matsumoto-Yor property
• Adjacency matrices of random digraphs: singularity and anti-concentration
• Recursive computation for evaluating the exact $p$-values of temporal and spatial scan statistics
• On weak uniqueness and distributional properties of a solution to an SDE with $α$-stable noise
• Sketch-based Image Retrieval from Millions of Images under Rotation, Translation and Scale Variations
• Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in Neocortex
• Pattern Avoidance in Task-Precedence Posets
• Limit distributions of sample covariance matrices are compound free Poisson
• The Pareto Regret Frontier for Bandits
• Learning Adversary Behavior in Security Games: A PAC Model Perspective
• Learning Causal Graphs with Small Interventions
• Quantifying the Cognitive Extent of Science
• Empirical Bayes prediction for the multivariate newsvendor loss function
• Pathwise no-arbitrage in a class of Delta hedging strategies
• Narrow Gauge and Analytical Branching Strategies for Mixed Integer Programming
Like this:
Like Loading...