Bayesian Weighted Mendelian Randomization (BWMR) google
The results from Genome-Wide Association Studies (GWAS) on thousands of phenotypes provide an unprecedented opportunity to infer the causal effect of one phenotype (exposure) on another (outcome). Mendelian randomization (MR), an instrumental variable (IV) method, has been introduced for causal inference using GWAS data. Due to the polygenic architecture of complex traits/diseases and the ubiquity of pleiotropy, however, MR has many unique challenges compared to conventional IV methods. In this paper, we propose a Bayesian weighted Mendelian randomization (BWMR) for causal inference to address these challenges. In our BWMR model, the uncertainty of weak effects owing to polygenicity has been taken into account and the violation of IV assumption due to pleiotropy has been addressed through outlier detection by Bayesian weighting. To make the causal inference based on BWMR computationally stable and efficient, we developed a variational expectation-maximization (VEM) algorithm. Moreover, we have also derived an exact closed-form formula to correct the posterior covariance which is well known to be underestimated in variational inference. Through comprehensive simulation studies, we evaluated the performance of BWMR, demonstrating the advantage of BWMR over its competitors. Then we applied BWMR to 10,762 pairs of exposure and outcome traits from 54 GWAS, uncovering novel casual relationship between exposure and outcome traits. The BWMR package is available at https://…/BWMR.

Model-Guided Iterative Sample Selection (MISS) google
Nowadays, sampling-based Approximate Query Processing (AQP) is widely regarded as a promising way to achieve interactivity in big data analytics. To build such an AQP system, finding the minimal sample size for a query regarding given error constraints in general, called Sample Size Optimization (SSO), is an essential yet unsolved problem. Ideally, the goal of solving the SSO problem is to achieve statistical accuracy, computational efficiency and broad applicability all at the same time. Existing approaches either make idealistic assumptions on the statistical properties of the query, or completely disregard them. This may result in overemphasizing only one of the three goals while neglect the others. To overcome these limitations, we first examine carefully the statistical properties shared by common analytical queries. Then, based on the properties, we propose a linear model describing the relationship between sample sizes and the approximation errors of a query, which is called the error model. Then, we propose a Model-guided Iterative Sample Selection (MISS) framework to solve the SSO problem generally. Afterwards, based on the MISS framework, we propose a concrete algorithm, called $L^2$Miss, to find optimal sample sizes under the $L^2$ norm error metric. Moreover, we extend the $L^2$Miss algorithm to handle other error metrics. Finally, we show theoretically and empirically that the $L^2$Miss algorithm and its extensions achieve satisfactory accuracy and efficiency for a considerably wide range of analytical queries. …

Forward Projection google
Dimensionality reduction is a common method for analyzing and visualizing high-dimensional data. However, reasoning dynamically about the results of a dimensionality reduction is difficult. Dimensionality-reduction algorithms use complex optimizations to reduce the number of dimensions of a dataset, but these new dimensions often lack a clear relation to the initial data dimensions, thus making them difficult to interpret. Here we propose a visual interaction framework to improve dimensionality-reduction based exploratory data analysis. We introduce two interaction techniques, forward projection and backward projection, for dynamically reasoning about dimensionally reduced data. We also contribute two visualization techniques, prolines and feasibility maps, to facilitate the effective use of the proposed interactions. We apply our framework to PCA and autoencoder-based dimensionality reductions. Through data-exploration examples, we demonstrate how our visual interactions can improve the use of dimensionality reduction in exploratory data analysis. …

Jeffreys-Lindley Paradox (JLP) google
Lindley’s paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution. The problem of the disagreement between the two approaches was discussed in Harold Jeffreys’ 1939 textbook; it became known as Lindley’s paradox after Dennis Lindley called the disagreement a paradox in a 1957 paper. …