Factor Adjusted Robust Multiple Testing
Large-scale multiple testing with correlated and heavy-tailed data arises in a wide range of research areas from genomics, medical imaging to finance. Conventional methods for estimating the false discovery proportion (FDP) often ignore the effect of heavy-tailedness and the dependence structure among test statistics, and thus may lead to inefficient or even inconsistent estimation. Also, the assumption of joint normality is often imposed, which is too stringent for many applications. To address these challenges, in this paper we propose a factoradjusted robust procedure for large-scale simultaneous inference with control of the false discovery proportion. We demonstrate that robust factor adjustments are extremely important in both improving the power of the tests and controlling FDP. We identify general conditions under which the proposed method produces consistent estimate of the FDP. As a byproduct that is of independent interest, we establish an exponential-type deviation inequality for a robust U-type covariance estimator under the spectral norm. Extensive numerical experiments demonstrate the advantage of the proposed method over several state-of-the-art methods especially when the data are generated from heavy-tailed distributions. Our proposed procedures are implemented in the R-package farmtest. …
Universal Sentence Encoder
We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub. …
Factored Bandits
We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms). …
HUPNU
Modern Internet of Things (IoT) applications generate massive amounts of data, much of it in the form of objects/items of readings, events, and log entries. Specifically, most of the objects in these IoT data contain rich embedded information (e.g., frequency and uncertainty) and different level of importance (e.g., unit utility of items, interestingness, cost, risk, or weight). Many existing approaches in data mining and analytics have limitations such as only the binary attribute is considered within a transaction, as well as all the objects/items having equal weights or importance. To solve these drawbacks, a novel utility-driven data analytics algorithm named HUPNU is presented, to extract High-Utility patterns by considering both Positive and Negative unit utilities from Uncertain data. The qualified high-utility patterns can be effectively discovered for risk prediction, manufacturing management, decision-making, among others. By using the developed vertical Probability-Utility list with the Positive-and-Negative utilities structure, as well as several effective pruning strategies. Experiments showed that the developed HUPNU approach performed great in mining the qualified patterns efficiently and effectively. …
If you did not already know
07 Sunday Mar 2021
Posted What is ...
in