New Design Criteria for Robust PCA and a Compliant Bayesian-Inspired Algorithm
Commonly used in computer vision and other applications, robust PCA represents an algorithmic attempt to reduce the sensitivity of classical PCA to outliers. The basic idea is to learn a decomposition of some data matrix of interest into low rank and sparse components, the latter representing unwanted outliers. Although the resulting optimization problem is typically NP-hard, convex relaxations provide a computationally-expedient alternative with theoretical support. However, in practical regimes performance guarantees break down and a variety of non-convex alternatives, including Bayesian-inspired models, have been proposed to boost estimation quality. Unfortunately though, without additional a priori knowledge none of these methods can significantly expand the critical operational range such that exact principal subspace recovery is possible. Into this mix we propose a novel pseudo-Bayesian algorithm that explicitly compensates for design weaknesses in many existing non-convex approaches leading to state-of-the-art performance with a sound analytical foundation.
The Teaching Dimension of Linear Learners
Teaching dimension is a learning theoretic quantity that specifies the minimum training set size to teach a target model to a learner. Previous studies on teaching dimension focused on version-space learners which maintain all hypotheses consistent with the training data, and cannot be applied to modern machine learners which select a specific hypothesis via optimization. This paper presents the first known teaching dimension for ridge regression, support vector machines, and logistic regression. We also exhibit optimal training sets that match these teaching dimensions. Our approach generalizes to other linear learners.
Clustering by Deep Nearest Neighbor Descent (D-NND): A Density-based Parameter-Insensitive Clustering Method
Most density-based clustering methods largely rely on how well the underlying density is estimated. However, density estimation itself is also a challenging problem, especially the determination of the kernel bandwidth. A large bandwidth could lead to the over-smoothed density estimation in which the number of density peaks could be less than the true clusters, while a small bandwidth could lead to the under-smoothed density estimation in which spurious density peaks, or called the ‘ripple noise’, would be generated in the estimated density. In this paper, we propose a density-based hierarchical clustering method, called the Deep Nearest Neighbor Descent (D-NND), which could learn the underlying density structure layer by layer and capture the cluster structure at the same time. The over-smoothed density estimation could be largely avoided and the negative effect of the under-estimated cases could be also largely reduced. Overall, D-NND presents not only the strong capability of discovering the underlying cluster structure but also the remarkable reliability due to its insensitivity to parameters.
Discriminative Nonparametric Latent Feature Relational Models with Data Augmentation
We present a discriminative nonparametric latent feature relational model (LFRM) for link prediction to automatically infer the dimensionality of latent features. Under the generic RegBayes (regularized Bayesian inference) framework, we handily incorporate the prediction loss with probabilistic inference of a Bayesian model; set distinct regularization parameters for different types of links to handle the imbalance issue in real networks; and unify the analysis of both the smooth logistic log-loss and the piecewise linear hinge loss. For the nonconjugate posterior inference, we present a simple Gibbs sampler via data augmentation, without making restricting assumptions as done in variational methods. We further develop an approximate sampler using stochastic gradient Langevin dynamics to handle large networks with hundreds of thousands of entities and millions of links, orders of magnitude larger than what existing LFRM models can process. Extensive studies on various real networks show promising performance.
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies
Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increases up to its final value, we empirically show that it is possible to significantly reduce the number of learning steps. When used in conjunction with a varying learning rate, we empirically show that it outperforms original DQN on several experiments. We relate this phenomenon with the instabilities of neural networks when they are used in an approximate Dynamic Programming setting. We also describe the possibility to fall within a local optimum during the learning process, thus connecting our discussion with the exploration/exploitation dilemma.
Jointly Modeling Topics and Intents with Global Order Structure
Modeling document structure is of great importance for discourse analysis and related applications. The goal of this research is to capture the document intent structure by modeling documents as a mixture of topic words and rhetorical words. While the topics are relatively unchanged through one document, the rhetorical functions of sentences usually change following certain orders in discourse. We propose GMM-LDA, a topic modeling based Bayesian unsupervised model, to analyze the document intent structure cooperated with order information. Our model is flexible that has the ability to combine the annotations and do supervised learning. Additionally, entropic regularization can be introduced to model the significant divergence between topics and intents. We perform experiments in both unsupervised and supervised settings, results show the superiority of our model over several state-of-the-art baselines.
A Novel Approach to Distributed Multi-Class SVM
With data sizes constantly expanding, and with classical machine learning algorithms that analyze such data requiring larger and larger amounts of computation time and storage space, the need to distribute computation and memory requirements among several computers has become apparent. Although substantial work has been done in developing distributed binary SVM algorithms and multi-class SVM algorithms individually, the field of multi-class distributed SVMs remains largely unexplored. This research proposes a novel algorithm that implements the Support Vector Machine over a multi-class dataset and is efficient in a distributed environment (here, Hadoop). The idea is to divide the dataset into half recursively and thus compute the optimal Support Vector Machine for this half during the training phase, much like a divide and conquer approach. While testing, this structure has been effectively exploited to significantly reduce the prediction time. Our algorithm has shown better computation time during the prediction phase than the traditional sequential SVM methods (One vs. One, One vs. Rest) and out-performs them as the size of the dataset grows. This approach also classifies the data with higher accuracy than the traditional multi-class algorithms.
Thinking Required
There exists a theory of a single general-purpose learning algorithm which could explain the principles its operation. It assumes the initial rough architecture, a small library of simple innate circuits which are prewired at birth. and proposes that all significant mental algorithms are learned. Given current understanding and observations, this paper reviews and lists the ingredients of such an algorithm from architectural and functional perspectives.
Variable selection with Hamming loss
A Benchmark Comparison of State-of-the-Practice Sentiment Analysis Methods
In the last few years thousands of scientific papers have explored sentiment analysis, several startups that measures opinions on real data have emerged, and a number of innovative products related to this theme have been developed. There are multiple methods for measuring sentiments, including lexical-based approaches and supervised machine learning methods. Despite the vast interest on the theme and wide popularity of some methods, it is unclear which method is better for identifying the polarity (i.e., positive or negative) of a message. Thus, there is a strong need to conduct a thorough apple-to-apple comparison of sentiment analysis methods, as they are used in practice, across multiple datasets originated from different data sources. Such a comparison is key for understanding the potential limitations, advantages, and disadvantages of popular methods. This study aims at filling this gap by presenting a benchmark comparison of twenty one popular sentiment analysis methods (which we call the state-of-the-practice methods). Our evaluation is based on a benchmark of twenty labeled datasets, covering messages posted on social networks, movie and product reviews, as well as opinions and comments in news articles. Our results highlight the extent to which the prediction performance of these methods varies widely across datasets. Aiming at boosting the development of this research area, we open the methods’ codes and datasets used in this paper and we deploy a benchmark system, which provides an open API for accessing and comparing sentence-level sentiment analysis methods.
Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation
Generating News Headlines with Recurrent Neural Networks
We describe an application of an encoder-decoder recurrent neural network with LSTM units and attention to generating headlines from the text of news articles. We find that the model is quite effective at concisely paraphrasing news articles. Furthermore, we study how the neural network decides which input words to pay attention to, and specifically we identify the function of the different neurons in a simplified attention mechanism. Interestingly, our simplified attention mechanism performs better that the more complex attention mechanism on a held out set of articles.
Variance Reduction for Distributed Stochastic Gradient Descent
Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger stepsizes and preserving linear convergence rates. However, current variance reduced SGD methods require either high memory usage or require a full pass over the (large) data set at the end of each epoch to calculate the exact gradient of the objective function. This makes current VR methods impractical in distributed or parallel settings. In this paper, we propose a variance reduction method, called VR-lite, that does not require full gradient computations or extra storage. We explore distributed synchronous and asynchronous variants with both high and low communication latency. We find that our distributed algorithms scale linearly with the number of local workers and remain stable even with low communication frequency. We empirically compare both the sequential and distributed algorithms to state-of-the-art stochastic optimization methods, and find that our proposed algorithms consistently converge faster than other stochastic methods.
Deep Attention Recurrent Q-Network
A deep learning approach to reinforcement learning led to a general learner able to train on visual input to play a variety of arcade games at the human and superhuman levels. Its creators at the Google DeepMind’s team called the approach: Deep Q-Network (DQN). We present an extension of DQN by ‘soft’ and ‘hard’ attention mechanisms. Tests of the proposed Deep Attention Recurrent Q-Network (DARQN) algorithm on multiple Atari 2600 games show level of performance superior to that of DQN. Moreover, built-in attention mechanisms allow a direct online monitoring of the training process by highlighting the regions of the game screen the agent is focusing on when making decisions.
A Framework for Computing on Large Dynamic Graphs
This proposal presents a graph computing framework intending to support both online and offline computing on large dynamic graphs efficiently. The framework proposes a new data model to support rich evolving vertex and edge data types. It employs a replica-coherence protocol to improve data locality thus can adapt to data access patterns of different algorithms. A new computing model called protocol dataflow is proposed to implement and integrate various programming models for both online and offline computing on large dynamic graphs. A central topic of the proposal is also the analysis of large real dynamic graphs using our proposed framework. Our goal is to calculate the temporal patterns and properties which emerge when the large graphs keep evolving. Thus we can evaluate the capability of the proposed framework. Key words: Large dynamic graph, programming model, distributed computing.
Hierarchical Sparse Modeling: A Choice of Two Regularizers
Demanding sparsity in estimated models has become a routine practice in statistics. In many situations, we wish to demand that the sparsity patterns attained honor certain problem-specific constraints. Hierarchical sparse modeling (HSM) refers to situations in which these constraints specify that one set of parameters be set to zero whenever another is set to zero. In recent years, numerous papers have developed convex regularizers for this form of sparsity structure arising in areas including interaction modeling, time series, and covariance estimation. In this paper, we observe that these methods fall into two frameworks, which have not been systematically compared in the context of HSM. The purpose of this paper is to provide a side-by-side comparison of these two frameworks for HSM in terms of their statistical properties and computational efficiency. We call attention to a problem with the more commonly used framework and provide new insights into the other, which can greatly improve its computational performance. Finally, we compare the two methods in the context of covariance estimation, where we introduce a new sparsely-banded estimator, which we show achieves the statistical advantages of an existing method but is simpler to compute.
• A bijection for nonorientable general maps
• FabSim: facilitating computational research through automation on large-scale and distributed e-infrastructures
• Statics and dynamics of infinite-dimensional liquids and glasses: a parallel, compact derivation
• Wavelet Based Load Models from AMI Data
• Faster Information Gathering in Ad-Hoc Radio Tree Networks
• The Random Division of the Unit Interval and the Approximate -1 Exponent in the Monkey-at-the-Typewriter Model of Zipf’s Law
• Arcs in $\Z^2_{2p}$
• Higher Order Estimating Equations for High-dimensional Models
• Right-jumps and pattern avoiding permutations
• Simple Baseline for Visual Question Answering
• On star-forest ascending subgraph decomposition
• Finding $k$ Simple Shortest Paths and Cycles
• Limit theorems for Markovian Hawkes processes with a large initial intensity
• Multi-body quenched disordered $XY$ and $p$-clock models on random graphs
• A Time-varying Parameter Based Seasonally-adjusted Bayesian State-space Model for Forecasting
• A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation
• Extremal k-apex Trees for Randic Index
• Digital Genesis: Computers, Evolution and Artificial Life
• Piecewise deterministic Markov processes in biological models
• The multi-orientable random tensor model, a review
• From rules to runs: A dynamic epistemic take on imperfect information games
• Minimum Cut of Directed Planar Graphs in O(nloglogn) Time
• An Explicit Rate Bound for the Over-Relaxed ADMM
• Algorithmic independence of initial condition and dynamical law in thermodynamics and causal inference
• Computing the vertex Folkman numbers $F_v(a_1, …, a_s; m – 1)$ when $\max\{a_1, …, a_s\} = 6$
• Level-Based Analysis of Genetic Algorithms for Combinatorial Optimization
• Construction of Wannier Functions in Disordered Systems
• Risk Minimization in Structured Prediction using Orbit Loss
• Discrete Hilbert Transform a la Gundy-Varopoulos
• A bloated FM-index reducing the number of cache misses during the search
• On Konig-Egervary Collections of Maximum Critical Independent Sets
• Uniform contractivity in Wasserstein metric for the original 1D Kac’s model
• Filter Based Methods For Statistical Linear Inverse Problems
• Coloring points with respect to squares
• Learning population and subject-specific brain connectivity networks via Mixed Neighborhood Selection
• Subsets of $\mathbb{F}_q[x]$ free of 3-term geometric progressions
• Fast Optimization Algorithm on Riemannian Manifolds and Its Application in Low-Rank Representation
• Controllable Synchronization of Hierarchically Networked Oscillators
• Knowledge Sharing in Coalitions
• Rademacher Complexity of the Restricted Boltzmann Machine
• Entropy and Grand Lebesgue Spaces approach for Prokhorov-Skorokhod continuity of random processes, with tail estimates
• Bounds on bilinear inverse forms via Gaussian quadrature with applications
• Statistical Inference for Ergodic Point Processes and Limit Order Book
• Sparsified Cholesky and Multigrid Solvers for Connection Laplacians
• Probabilistic Structural Controllability in Causal Bayesian Networks
• Einstein relation and steady states for the random conductance model
• THCHS-30 : A Free Chinese Speech Corpus
• Driverseat: Crowdstrapping Learning Tasks for Autonomous Driving
• Statistical Signatures of Structural Organization: The case of long memory in renewal processes
• Explaining reviews and ratings with PACO: Poisson Additive Co-Clustering
• The functional AR(1) process with a unit root
• Designing Applications in a Hybrid Cloud
• Classification of Manifolds by Single-Layer Neural Networks
• Block Stanley decompositions I. Elementary and gnomon decompositions
• On Routing Disjoint Paths in Bounded Treewidth Graphs
• Fourientation activities and the Tutte polynomial
• Conformal Ward and BPZ Identities for Liouville quantum field theory
• The cycle descent statistic on permutations
• Distributed protocols for spanning tree construction and leader election
• On closedness under convolution roots related to an infinitely divisible distribution in the distribution class L(γ)
• Combinatorics of a fractal tiling family
• k-Trails: Recognition, Complexity, and Approximations
• Bayesian inference and model comparison for metallic fatigue data
• Want Answers? A Reddit Inspired Study on How to Pose Questions
• Fast Algorithms for Game-Theoretic Centrality Measures
• Restricted Low-Rank Approximation via ADMM
• On the Data Augmentation Algorithm for Bayesian Multivariate Linear Regression with Non-Gaussian Errors
• Absence of Localization in Disordered Two Dimensional Electron Gas at Weak Magnetic Field and Strong Spin-Orbit Coupling
• The Propus Construction for Symmetric Hadamard Matrices
• Similarity Learning via Adaptive Regression and Its Application to Image Retrieval
• Solving the Subset Sum Problem with Heap-Ordered Subset Trees
• Relative t-designs in binary Hamming association scheme H(n,2)
• Elliptic rook and file numbers
• Twisted patterns in large subsets of $\mathbb{Z}^N$
• A Restricted Visual Turing Test for Deep Scene and Event Understanding
• Approximate Range Counting Revisited
• Locating recombination hot spots in genomic sequences through the singular value decomposition
• Restricted Product Sets under Unique Representability
• Stochastic Collapsed Variational Inference for Sequential Data
• Stochastic Collapsed Variational Inference for Hidden Markov Models
• Unsupervised comparable corpora preparation and exploration for bi-lingual translation equivalents
• PJAIT Systems for the IWSLT 2015 Evaluation Campaign Enhanced by Comparable Corpora
• The mechanics of shuffle products and their siblings
• Robust Estimation of the Generalized Loggamma Model. The R Package robustloggamma
• Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
• Exact Algorithms via Monotone Local Search
• Quasi likelihood analysis of point processes for ultra high frequency data
• Guarding from Spurious Discoveries in High Dimension
• A Bayesian test to identify variance effects
• A Novel Paradigm for Calculating Ramsey Number via Artificial Bee Colony Algorithm
• The transition probability of the $q$-TAZRP ($q$-Bosons) with inhomogeneous jump rates
• Data Center Server Provision: Distributed Asynchronous Control for Coupled Renewal Systems
• Polynomial bounds for decoupling, with applications
• Pruned double Hurwitz numbers
• Creation of a Deep Convolutional Auto-Encoder in Caffe
• Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text
• Stochastic epidemic models featuring contact tracing with delays
• Negative local feedbacks in Boolean networks
• State of the Art Control of Atari Games Using Shallow Reinforcement Learning
Like this:
Like Loading...