Horseshoe Regularization
Feature subset selection arises in many high-dimensional applications in machine learning and statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex $\ell_\gamma$ penalty for $\gamma\in (0,1)$, which results in sparser models than the convex $\ell_1$ or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables an efficient expectation-maximization algorithm for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithm provides better statistical performance, and the computation requires a fraction of time of state of the art non-convex solvers. …
Multi-Agent Inverse Reinforcement Learning (MIRL)
Learning the reward function of an agent by observing its behavior is termed inverse reinforcement learning and has applications in learning from demonstration or apprenticeship learning. We introduce the problem of multiagent inverse reinforcement learning, where reward functions of multiple agents are learned by observing their uncoordinated behavior. A centralized controller then learns to coordinate their behavior by optimizing a weighted sum of reward functions of all the agents. We evaluate our approach on a traffic-routing domain, in which a controller coordinates actions of multiple traffic signals to regulate traffic density. We show that the learner is not only able to match but even significantly outperform the expert.
Multi-agent Inverse Reinforcement Learning for General-sum Stochastic Games …
Bloom Filter
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate. In other words, a query returns either ‘possibly in set’ or ‘definitely not in set’. Elements can be added to the set, but not removed (though this can be addressed with a ‘counting’ filter). The more elements that are added to the set, the larger the probability of false positives. Bloom proposed the technique for applications where the amount of source data would require an impracticably large hash area in memory if ‘conventional’ error-free hashing techniques were applied. He gave the example of a hyphenation algorithm for a dictionary of 500,000 words, out of which 90% follow simple hyphenation rules, but the remaining 10% require expensive disk accesses to retrieve specific hyphenation patterns. With sufficient core memory, an error-free hash could be used to eliminate all unnecessary disk accesses; on the other hand, with limited core memory, Bloom’s technique uses a smaller hash area but still eliminates most unnecessary accesses. For example, a hash area only 15% of the size needed by an ideal error-free hash still eliminates 85% of the disk accesses (Bloom (1970)).
Role of Bloom Filter in Big Data Research: A Survey …
Semi-Orthogonal Non-Negative Matrix Factorization (semi-orthogonal NMF)
Non-negative Matrix Factorization (NMF) is a popular clustering and dimension reduction method by decomposing a non-negative matrix into the product of two lower dimension matrices composed of basis vectors. In this paper, we propose a semi-orthogonal NMF method that enforces one of the matrices to be orthogonal with mixed signs, thereby guarantees the rank of the factorization. Our method preserves strict orthogonality by implementing the Cayley transformation to force the solution path to be exactly on the Stiefel manifold, as opposed to the approximated orthogonality solutions in existing literature. We apply a line search update scheme along with an SVD-based initialization which produces a rapid convergence of the algorithm compared to other existing approaches. In addition, we present formulations of our method to incorporate both continuous and binary design matrices. Through various simulation studies, we show that our model has an advantage over other NMF variations regarding the accuracy of the factorization, rate of convergence, and the degree of orthogonality while being computationally competitive. We also apply our method to a text-mining data on classifying triage notes, and show the effectiveness of our model in reducing classification error compared to the conventional bag-of-words model and other alternative matrix factorization approaches. …
If you did not already know
28 Thursday Jul 2022
Posted What is ...
in