Context-Based Prediction of App Usage

In this paper we propose a new algorithm for dynamically predicting a set of apps that the user is likely to use. A set of app icons (usually four) are presented on a special dock, called Prediction Bar, and dynamically change according to the user’s habits at a given time, location, and device state (headphone connected, bluetooth active, and so on). The goal of the algorithm is, given the context information, to actively help the user navigate to the desired app as well as to provide a personalized feeling. We propose an efficient online algorithm that is executed on the device, which is based on the Passive-Aggressive online algorithmic framework, adapted to maximize the AUC at each round. We concluded the paper with a large scale empirical study on the performance of the algorithm on a real users’ data.


Mining Top-K Co-Occurrence Items

Frequent itemset mining has emerged as a fundamental problem in data mining and plays an important role in many data mining tasks, such as association analysis, classification, etc. In the framework of frequent itemset mining, the results are itemsets that are frequent in the whole database. However, in some applications, such recommendation systems and social networks, people are more interested in finding out the items that occur with some user-specified itemsets (query itemsets) most frequently in a database. In this paper, we address the problem by proposing a new mining task named top-k co-occurrence item mining, where k is the desired number of items to be found. Four baseline algorithms are presented first. Then, we introduce a special data structure named Pi-Tree (Prefix itemset Tree) to maintain the information of itemsets. Based on Pi-Tree, we propose two algorithms, namely PT (Pi-Tree-based algorithm) and PT-TA (Pi-Tree-based algorithm with TA pruning), for mining top-k co-occurrence items by incorporating several novel strategies for pruning the search space to achieve high efficiency. The performance of PT and PT-TA was evaluated against the four proposed baseline algorithms on both synthetic and real databases. Extensive experiments show that PT not only outperforms other algorithms substantially in terms execution time but also has excellent scalability.


RDF2Rules: Learning Rules from RDF Knowledge Bases by Mining Frequent Predicate Cycles

Recently, several large-scale RDF knowledge bases have been built and applied in many knowledge-based applications. To further increase the number of facts in RDF knowledge bases, logic rules can be used to predict new facts based on the existing ones. Therefore, how to automatically learn reliable rules from large-scale knowledge bases becomes increasingly important. In this paper, we propose a novel rule learning approach named RDF2Rules for RDF knowledge bases. RDF2Rules first mines frequent predicate cycles (FPCs), a kind of interesting frequent patterns in knowledge bases, and then generates rules from the mined FPCs. Because each FPC can produce multiple rules, and effective pruning strategy is used in the process of mining FPCs, RDF2Rules works very efficiently. Another advantage of RDF2Rules is that it uses the entity type information when generates and evaluates rules, which makes the learned rules more accurate. Experiments show that our approach outperforms the compared approach in terms of both efficiency and accuracy.


Fast Parallel SVM using Data Augmentation

As one of the most popular classifiers, linear SVMs still have challenges in dealing with very large-scale problems, even though linear or sub-linear algorithms have been developed recently on single machines. Parallel computing methods have been developed for learning large-scale SVMs. However, existing methods rely on solving local sub-optimization problems. In this paper, we develop a novel parallel algorithm for learning large-scale linear SVM. Our approach is based on a data augmentation equivalent formulation, which casts the problem of learning SVM as a Bayesian inference problem, for which we can develop very efficient parallel sampling methods. We provide empirical results for this parallel sampling SVM, and provide extensions for SVR, non-linear kernels, and provide a parallel implementation of the Crammer and Singer model. This approach is very promising in its own right, and further is a very useful technique to parallelize a broader family of general maximum-margin models.


Composite Bayesian inference

This paper revisits the concept of composite likelihood from the perspective of probabilistic inference, and proposes a generalization called ‘super composite likelihood’ for more flexible use of data information. It is shown that super composite likelihood yields a class of discriminative models suitable for unsupervised or weakly supervised learning.


k-Means Clustering Is Matrix Factorization

We show that the objective function of conventional k-means clustering can be expressed as the Frobenius norm of the difference of a data matrix and a low rank approximation of that data matrix. In short, we show that k-means clustering is a matrix factorization problem. These notes are meant as a reference and intended to provide a guided tour towards a result that is often mentioned but seldom made explicit in the literature.


Exploiting Hierarchy for Ranking-based Recommendation

The purpose of this master’s thesis is to study and develop a new algorithmic framework for collaborative filtering (CF) to generate recommendations. The method we propose is based on the exploitation of the hierarchical structure of the item space and intuitively ‘stands’ on the property of Near Complete Decomposability (NCD) which is inherent in the structure of the majority of hierarchical systems. Building on the intuition behind the NCDawareRank algorithm and its related concept of NCD proximity, we model our system in a way that illuminates its endemic characteristics and we propose a new algorithmic framework for recommendations, called HIR. We focus on combining the direct with the NCD ‘neighborhoods’ of items to achieve better characterization of the inter-item relations, in order to improve the quality of recommendations and alleviate sparsity related problems.


Deterministic limit of mean field games associated with nonlinear Markov processes

An unsupervised spatiotemporal graphical modeling approach to anomaly detection in distributed CPS

Ramsey goodness of paths

Pull-based load distribution among heterogeneous parallel servers: the case of multiple routers

On the diameter of lattice polytopes

The Evolving Voter Model on Thick Graphs

Fatou’s theorem for subordinate Brownian motions with Gaussian components on $C^{1,1}$ open sets

Bellman equation and viscosity solutions for mean-field stochastic control problem

On graphs decomposable into induced matchings of linear sizes

Narrowing the gap in the clique-width dichotomy for $(H_1,H_2)$-free graphs

Tail waiting times and the extremes of stochastic processes

Choosability with union separation

A class of growth models rescaling to KPZ

Visualizations Relevant to The User By Multi-View Latent Variable Factorization

RFP: A Remote Fetching Paradigm for RDMA-Accelerated Systems

Fast and compact self-stabilizing verification, computation, and fault detection of an MST

The Lovász Hinge: A Convex Surrogate for Submodular Losses

Needlet approximation for isotropic random fields on the sphere

Hardware Architecture for Large Parallel Array of Random Feature Extractors applied to Image Recognition

On the symplectic integration of the discrete nonlinear Schrödinger equation with disorder

Achievable Performance of Blind Policies in Heavy Traffic

Volume of Hypercubes Clipped by Hyperplanes and Combinatorial Identities

Hangable Graph

Directed rooted forests in higher dimension

Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips

Constructive and analytic enumeration of circulant graphs with $p^3$ vertices; $p=3,5$

Walk on Spheres Algorithm for Helmholtz and Yukawa Equations via Duffin Correspondence

Measuring pattern retention in anonymized data — where one measure is not enough

Linear Size Constant-Composition Codes

Multivariate Output Analysis for Markov chain Monte Carlo

Abelian duality and propagation of resonance

Branching processes in a Lévy random environment

Service Choreography, SBVR, and Time

A Constraint-based Approach for Generating Transformation Patterns

Reinforcement Learning in Large Discrete Action Spaces

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

The number of Hamiltonian decompositions of regular graphs

The Max $K$-Armed Bandit: PAC Lower Bounds and Efficient Algorithms

Satisficing in multi-armed bandit problems

Representation and Coding of Signal Geometry

Coset Construction for Subspace Codes

Single-index copulae

Uniformly Valid Post-Regularization Confidence Regions for Many Functional Parameters in Z-Estimation Framework

Dynamic social networks based on movement

Emergent flat bands and many-body localization in a doped Mott insulator

The difference and ratio of the fractional matching number and the matching number of graphs

Regularity of distributions of Wigner integrals

Randomized Social Choice Functions Under Metric Preferences

An Inertial Latent-Variable Sequence Model

Decomposition spaces, incidence algebras and Möbius inversion III: the decomposition space of Möbius intervals

Decomposition spaces, incidence algebras and Möbius inversion II: completeness, length filtration, and finiteness

Projective Networks: Topologies for Large Parallel Computer Systems

Decomposition spaces, incidence algebras and Möbius inversion I: basic theory

Bayesian smoothing and estimation of functional data with approximation by basis functions

Universal Prediction Distribution for Surrogate Models

$J$-Hermitian determinantal point processes: balanced rigidity and balanced Palm equivalence

Optimizing the Number of Gates in Quantum Search

Max-linear models on directed acyclic graphs

Orthogonal apartments in Hilbert Grassmannians. Finite-dimensional case

On the structure of dominating graphs

Low Energy Properties of the Kondo chain in the RKKY regime

A Quadratic Assignment Formulation of the Graph Edit Distance

Selecting the top-quality item through crowd scoring

Random Point Sets on the Sphere — Hole Radii, Covering, and Separation

Maximum scattered linear sets and complete caps in Galois spaces

Reduction rules for the maximum parsimony distance on phylogenetic trees

Evaluation-as-a-Service: Overview and Outlook

Emulation of multivariate simulators using thin-plate splines with application to atmospheric dispersion

Interacting behavior and emerging complexity

The lateral transhipment problem with a-priori routes, and a lot sizing application

Adaptive Ensemble Learning with Confidence Bounds

Smooth estimation of a monotone hazard and a monotone density under random censoring

$T$-optimal discriminating designs for Fourier regression models

QuickProbs 2: towards rapid construction of high-quality alignments of large protein families

The ERA of FOLE: Foundation

Adaptive Algorithms for Online Convex Optimization with Long-term Constraints

The local metric dimension of subgraph-amalgamation of graphs

Multivariate approximation in total variation