We consider the multiple breakpoint detection problem, which is concerned with detecting the locations of several distinct changes in a one-dimensional noisy data series. We propose the breakpointError, a function that can be used to evaluate estimated breakpoint locations, given the known locations of true breakpoints. We discuss an application of the breakpointError for finding optimal penalties for breakpoint detection in simulated data. Finally, we show how to relax the breakpointError to obtain an annotation error function which can be used more readily in practice on real data. A fast C implementation of an algorithm that computes the breakpointError is available in an R package on R-Forge.
With the rapid growth in multimedia services and the enormous offers of video contents in online social networks, users have difficulty in obtaining their interests. Therefore, various personalized recommendation systems have been proposed. However, they ignore that the accelerated proliferation of social media data has led to the big data era, which has greatly impeded the process of video recommendation. In addition, none of them has considered both the privacy of users’ contexts (e,g,. social status, ages and hobbies) and video service vendors’ repositories, which are extremely sensitive and of significant commercial value. To handle the problems, we propose a cloud-assisted differentially private video recommendation system based on distributed online learning. In our framework, service vendors are modeled as distributed cooperative learners, recommending videos according to user’s context, while simultaneously adapting the video-selection strategy based on user-click feedback to maximize total user clicks (reward). Considering the sparsity and heterogeneity of big social media data, we also propose a novel \emph{geometric differentially private} model, which can greatly reduce the performance (recommendation accuracy) loss. Our simulation shows the proposed algorithms outperform other existing methods and keep a delicate balance between computing accuracy and privacy preserving level.
The integration of Linked Open Data (LOD) content in Web pages is a challenging and sometimes tedious task for Web developers. At the same moment, most software packages for blogs, content management systems (CMS), and shop applications support the consumption of feed formats, namely RSS and Atom. In this technical report, we demonstrate an on-line tool that fetches e-commerce data from a SPARQL endpoint and syndicates obtained results as RSS or Atom feeds. Our approach combines (1) the popularity and broad tooling support of existing feed formats, (2) the precision of queries against structured data built upon common Web vocabularies like schema.org, GoodRelations, FOAF, VCard, and WGS 84, and (3) the ease of integrating content from a large number of Web sites and other data sources in RDF in general.
While deep networks show to be highly effective in extensive applications, few efforts have been spent on studying its potential in clustering. In this paper, we argue that the successful domain expertise of sparse coding in clustering is still valuable, and can be combined with the key ingredients of deep learning. A novel feed-forward architecture, named TAG-LISTA, is constructed from graph-regularized sparse coding. It is then trained with task-specific loss functions from end to end. The inner connections of the proposed network to sparse coding leads to more effective training. Moreover, by introducing auxiliary clustering tasks to the hierarchy of intermediate features, we present DTAG-LISTA and obtain a further performance boost. We demonstrate extensive experiments on several benchmark datasets, under a wide variety of settings. The results verify that the proposed model performs significantly outperforms the generic architectures of the same parameter capacity, and also gains remarkable margins over several state-of-the-art methods.
We present a framework for supervised subspace tracking, when there are two time series and , one being the high-dimensional predictors and the other being the response variables and the subspace tracking needs to take into consideration of both sequences. It extends the classic online subspace tracking work which can be viewed as tracking of only. Our online sufficient dimensionality reduction (OSDR) is a meta-algorithm that can be applied to various cases including linear regression, logistic regression, multiple linear regression, multinomial logistic regression, support vector machine, the random dot product model and the multi-scale union-of-subspace model. OSDR reduces data-dimensionality on-the-fly with low-computational complexity and it can also handle missing data and dynamic data. OSDR uses an alternating minimization scheme and updates the subspace via gradient descent on the Grassmannian manifold. The subspace update can be performed efficiently utilizing the fact that the Grassmannian gradient with respect to the subspace in many settings is rank-one (or low-rank in certain cases). The optimization problem for OSDR is non-convex and hard to analyze in general; we provide convergence analysis of OSDR in a simple linear regression setting. The good performance of OSDR compared with the conventional unsupervised subspace tracking are demonstrated via numerical examples on simulated and real data.
We propose a novel value function approximation technique for Markov decision processes. We consider the problem of compactly representing the state-action value function using a low-rank and sparse matrix model. The problem is to decompose a matrix that encodes the true value function into low-rank and sparse components, and we achieve this using Robust Principal Component Analysis (PCA). Under minimal assumptions, this Robust PCA problem can be solved exactly via the Principal Component Pursuit convex optimization problem. We experiment the procedure on several examples and demonstrate that our method yields approximations essentially identical to the true function.