Context-Based Prediction of App Usage
In this paper we propose a new algorithm for dynamically predicting a set of apps that the user is likely to use. A set of app icons (usually four) are presented on a special dock, called Prediction Bar, and dynamically change according to the user’s habits at a given time, location, and device state (headphone connected, bluetooth active, and so on). The goal of the algorithm is, given the context information, to actively help the user navigate to the desired app as well as to provide a personalized feeling. We propose an efficient online algorithm that is executed on the device, which is based on the Passive-Aggressive online algorithmic framework, adapted to maximize the AUC at each round. We concluded the paper with a large scale empirical study on the performance of the algorithm on a real users’ data.
Mining Top-K Co-Occurrence Items
Frequent itemset mining has emerged as a fundamental problem in data mining and plays an important role in many data mining tasks, such as association analysis, classification, etc. In the framework of frequent itemset mining, the results are itemsets that are frequent in the whole database. However, in some applications, such recommendation systems and social networks, people are more interested in finding out the items that occur with some user-specified itemsets (query itemsets) most frequently in a database. In this paper, we address the problem by proposing a new mining task named top-k co-occurrence item mining, where k is the desired number of items to be found. Four baseline algorithms are presented first. Then, we introduce a special data structure named Pi-Tree (Prefix itemset Tree) to maintain the information of itemsets. Based on Pi-Tree, we propose two algorithms, namely PT (Pi-Tree-based algorithm) and PT-TA (Pi-Tree-based algorithm with TA pruning), for mining top-k co-occurrence items by incorporating several novel strategies for pruning the search space to achieve high efficiency. The performance of PT and PT-TA was evaluated against the four proposed baseline algorithms on both synthetic and real databases. Extensive experiments show that PT not only outperforms other algorithms substantially in terms execution time but also has excellent scalability.
RDF2Rules: Learning Rules from RDF Knowledge Bases by Mining Frequent Predicate Cycles
Recently, several large-scale RDF knowledge bases have been built and applied in many knowledge-based applications. To further increase the number of facts in RDF knowledge bases, logic rules can be used to predict new facts based on the existing ones. Therefore, how to automatically learn reliable rules from large-scale knowledge bases becomes increasingly important. In this paper, we propose a novel rule learning approach named RDF2Rules for RDF knowledge bases. RDF2Rules first mines frequent predicate cycles (FPCs), a kind of interesting frequent patterns in knowledge bases, and then generates rules from the mined FPCs. Because each FPC can produce multiple rules, and effective pruning strategy is used in the process of mining FPCs, RDF2Rules works very efficiently. Another advantage of RDF2Rules is that it uses the entity type information when generates and evaluates rules, which makes the learned rules more accurate. Experiments show that our approach outperforms the compared approach in terms of both efficiency and accuracy.
Fast Parallel SVM using Data Augmentation
As one of the most popular classifiers, linear SVMs still have challenges in dealing with very large-scale problems, even though linear or sub-linear algorithms have been developed recently on single machines. Parallel computing methods have been developed for learning large-scale SVMs. However, existing methods rely on solving local sub-optimization problems. In this paper, we develop a novel parallel algorithm for learning large-scale linear SVM. Our approach is based on a data augmentation equivalent formulation, which casts the problem of learning SVM as a Bayesian inference problem, for which we can develop very efficient parallel sampling methods. We provide empirical results for this parallel sampling SVM, and provide extensions for SVR, non-linear kernels, and provide a parallel implementation of the Crammer and Singer model. This approach is very promising in its own right, and further is a very useful technique to parallelize a broader family of general maximum-margin models.
Composite Bayesian inference
This paper revisits the concept of composite likelihood from the perspective of probabilistic inference, and proposes a generalization called ‘super composite likelihood’ for more flexible use of data information. It is shown that super composite likelihood yields a class of discriminative models suitable for unsupervised or weakly supervised learning.
k-Means Clustering Is Matrix Factorization
We show that the objective function of conventional k-means clustering can be expressed as the Frobenius norm of the difference of a data matrix and a low rank approximation of that data matrix. In short, we show that k-means clustering is a matrix factorization problem. These notes are meant as a reference and intended to provide a guided tour towards a result that is often mentioned but seldom made explicit in the literature.
Exploiting Hierarchy for Ranking-based Recommendation
The purpose of this master’s thesis is to study and develop a new algorithmic framework for collaborative filtering (CF) to generate recommendations. The method we propose is based on the exploitation of the hierarchical structure of the item space and intuitively ‘stands’ on the property of Near Complete Decomposability (NCD) which is inherent in the structure of the majority of hierarchical systems. Building on the intuition behind the NCDawareRank algorithm and its related concept of NCD proximity, we model our system in a way that illuminates its endemic characteristics and we propose a new algorithmic framework for recommendations, called HIR. We focus on combining the direct with the NCD ‘neighborhoods’ of items to achieve better characterization of the inter-item relations, in order to improve the quality of recommendations and alleviate sparsity related problems.
• Deterministic limit of mean field games associated with nonlinear Markov processes
• An unsupervised spatiotemporal graphical modeling approach to anomaly detection in distributed CPS
• Ramsey goodness of paths
• Pull-based load distribution among heterogeneous parallel servers: the case of multiple routers
• On the diameter of lattice polytopes
• The Evolving Voter Model on Thick Graphs
• Fatou’s theorem for subordinate Brownian motions with Gaussian components on $C^{1,1}$ open sets
• Bellman equation and viscosity solutions for mean-field stochastic control problem
• On graphs decomposable into induced matchings of linear sizes
• Narrowing the gap in the clique-width dichotomy for $(H_1,H_2)$-free graphs
• Tail waiting times and the extremes of stochastic processes
• Choosability with union separation
• A class of growth models rescaling to KPZ
• Visualizations Relevant to The User By Multi-View Latent Variable Factorization
• RFP: A Remote Fetching Paradigm for RDMA-Accelerated Systems
• Fast and compact self-stabilizing verification, computation, and fault detection of an MST
• The Lovász Hinge: A Convex Surrogate for Submodular Losses
• Needlet approximation for isotropic random fields on the sphere
• Hardware Architecture for Large Parallel Array of Random Feature Extractors applied to Image Recognition
• On the symplectic integration of the discrete nonlinear Schrödinger equation with disorder
• Achievable Performance of Blind Policies in Heavy Traffic
• Volume of Hypercubes Clipped by Hyperplanes and Combinatorial Identities
• Hangable Graph
• Directed rooted forests in higher dimension
• Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips
• Constructive and analytic enumeration of circulant graphs with $p^3$ vertices; $p=3,5$
• Walk on Spheres Algorithm for Helmholtz and Yukawa Equations via Duffin Correspondence
• Measuring pattern retention in anonymized data — where one measure is not enough
• Linear Size Constant-Composition Codes
• Multivariate Output Analysis for Markov chain Monte Carlo
• Abelian duality and propagation of resonance
• Branching processes in a Lévy random environment
• Service Choreography, SBVR, and Time
• A Constraint-based Approach for Generating Transformation Patterns
• Reinforcement Learning in Large Discrete Action Spaces
• Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
• High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models
• The number of Hamiltonian decompositions of regular graphs
• The Max $K$-Armed Bandit: PAC Lower Bounds and Efficient Algorithms
• Satisficing in multi-armed bandit problems
• Representation and Coding of Signal Geometry
• Coset Construction for Subspace Codes
• Single-index copulae
• Uniformly Valid Post-Regularization Confidence Regions for Many Functional Parameters in Z-Estimation Framework
• Dynamic social networks based on movement
• Emergent flat bands and many-body localization in a doped Mott insulator
• The difference and ratio of the fractional matching number and the matching number of graphs
• Regularity of distributions of Wigner integrals
• Randomized Social Choice Functions Under Metric Preferences
• An Inertial Latent-Variable Sequence Model
• Decomposition spaces, incidence algebras and Möbius inversion III: the decomposition space of Möbius intervals
• Decomposition spaces, incidence algebras and Möbius inversion II: completeness, length filtration, and finiteness
• Projective Networks: Topologies for Large Parallel Computer Systems
• Decomposition spaces, incidence algebras and Möbius inversion I: basic theory
• Bayesian smoothing and estimation of functional data with approximation by basis functions
• Universal Prediction Distribution for Surrogate Models
• $J$-Hermitian determinantal point processes: balanced rigidity and balanced Palm equivalence
• Optimizing the Number of Gates in Quantum Search
• Max-linear models on directed acyclic graphs
• Orthogonal apartments in Hilbert Grassmannians. Finite-dimensional case
• On the structure of dominating graphs
• Low Energy Properties of the Kondo chain in the RKKY regime
• A Quadratic Assignment Formulation of the Graph Edit Distance
• Selecting the top-quality item through crowd scoring
• Random Point Sets on the Sphere — Hole Radii, Covering, and Separation
• Maximum scattered linear sets and complete caps in Galois spaces
• Reduction rules for the maximum parsimony distance on phylogenetic trees
• Evaluation-as-a-Service: Overview and Outlook
• Emulation of multivariate simulators using thin-plate splines with application to atmospheric dispersion
• Interacting behavior and emerging complexity
• The lateral transhipment problem with a-priori routes, and a lot sizing application
• Adaptive Ensemble Learning with Confidence Bounds
• Smooth estimation of a monotone hazard and a monotone density under random censoring
• $T$-optimal discriminating designs for Fourier regression models
• QuickProbs 2: towards rapid construction of high-quality alignments of large protein families
• The ERA of FOLE: Foundation
• Adaptive Algorithms for Online Convex Optimization with Long-term Constraints
• The local metric dimension of subgraph-amalgamation of graphs
• Multivariate approximation in total variation
Like this:
Like Loading...