Doctor AI: Predicting Clinical Events via Recurrent Neural Networks

Large amount of Electronic Health Record (EHR) data have been collected over millions of patients over multiple years. The rich longitudinal EHR data documented the collective experiences of physicians including diagnosis, medication prescription and procedures. We argue it is possible now to leverage the EHR data to model how physicians behave, and we call our model Doctor AI. Towards this direction of modeling clinical bahavior of physicians, we develop a successful application of Recurrent Neural Networks (RNN) to jointly forecast the future disease diagnosis and medication prescription along with their timing. Unlike a traditional classification model where a single target is of interest, our model can assess entire history of patients and make continuous and multilabel prediction based on patients’ historical data. We evaluate the performance of the proposed method on a large real-world EHR data over 250K patients over 8 years. We observe Doctor AI achieves up to 79% recall@30, significantly higher than several baselines.


Metric Learning with Adaptive Density Discrimination

Distance metric learning (DML) approaches learn a transformation to a representation space where distance is in correspondence with a predefined notion of similarity. While such models offer a number of compelling benefits, it has been difficult for these to compete with modern classification algorithms in performance and even in feature extraction. In this work, we propose a novel approach explicitly designed to address a number of subtle yet important issues which have stymied earlier DML algorithms. It maintains an explicit model of the distributions of the different classes in representation space. It then employs this knowledge to adaptively assess similarity, and achieve local discrimination by penalizing class distribution overlap. We demonstrate the effectiveness of this idea on several tasks. Our approach achieves state-of-the-art classification results on a number of fine-grained visual recognition datasets, surpassing the standard softmax classifier and outperforming triplet loss by a relative margin of 30-40%. In terms of computational performance, it alleviates training inefficiencies in the traditional triplet loss, reaching the same error in 5-30 times fewer iterations. Beyond classification, we further validate the saliency of the learnt representations via their attribute concentration and hierarchy recovery properties, achieving 10-25% relative gains on the softmax classifier and 25-50% on triplet loss in these tasks.


Probabilistic K-Means using Method of Moments

K-means is one of the most widely used algorithms for clustering in Data Mining applications, which attempts to minimize the sum of square of Euclidean distance of the points in the clusters from the respective means of the clusters. The simplicity and scalability of K-means makes it very appealing. However, K-means suffers from local minima problem, and comes with no guarantee to converge to the optimal cost. K-means++ tries to address the problem by seeding the means using a distance based sampling scheme. However, seeding the means in K-means++ needs O(K) passes through the entire dataset, which could be very costly in large amount of dataset. Here we propose a method of seeding initial means based on higher order moments of the data, which takes O(1) passes through the entire dataset to extract the initial set of means. Our method yields competitive performance with respect to all the existing K-means algorithms, whilst avoiding the expensive mean selection steps of K-means++ and other heuristics. We demonstrate the performance of our algorithm in comparison with the existing algorithms on various benchmark datasets.


A note on probability metrics in a categorical setting

Probability metrics constitute an important tool in probability theory and statistics \cite{DKS91}, \cite{R91}, \cite{Z83} as they are specific metrics on spaces of random variables which, by satisfying an extra condition, concord well with the randomness structure. But probability metrics suffer from the same instability under constructions as metrics. In \cite{L15}, as well as in former and related work which can be found in the references of \cite{L15}, a comprehensive setting was developed to deal with this. It is the purpose of this note to point out that these ideas can also be applied to probability metrics thus embedding them in a natural categorical framework, showing that certain constructions performed in the setting of probability theory are in fact categorical in nature. This allows us to deduce various separate results in the literature from a unified approach.


A Random Forest Guided Tour

The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad-hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.


Prioritized Experience Replay

Staleness-aware Async-SGD for Distributed Deep Learning

Least squares estimation for the subcritical Heston model based on continuous time observations

ACDC: A Structured Efficient Linear Layer

Weighted multiple ergodic averages and correlation sequences

Unitary-Group Invariant Kernels and Features from Transformed Unlabeled Data

From generalized Tamari intervals to non-separable planar maps (extended abstract)

On the Global Linear Convergence of Frank-Wolfe Optimization Variants

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Bayesian quantile regression analysis for continuous data with a discrete component at zero

Automatic Region-wise Spatially Varying Coefficient Regression Model: an Application to National Cardiovascular Disease Mortality and Air Pollution Association Study

Mean-Field interacton of Brownian occupation measures. II: A rigorous construction of the Pekar process

Behavior Query Discovery in System-Generated Temporal Graphs

Comparison of viscosity solutions of fully nonlinear degenerate parabolic Path-dependent PDEs

Censoring Representations with an Adversary

Infinite excursions of router walks on regular trees

Randomization can be as helpful as a glimpse of the future in online computation

Generation of scenarios from calibrated ensemble forecasts with a dynamic ensemble copula coupling approach

On Mäkelä’s Conjectures: deciding if a morphic word avoids long abelian-powers

Fast Saddle-Point Algorithm for Generalized Dantzig Selector and FDR Control with the Ordered $\ell_1$-Norm

Matrix-Ball Construction of affine Robinson-Schensted correspondence

Anomalous Contagion and Renormalization in Dynamical Networks with Nodal Mobility

Trees with small b-chromatic index

The Hopf Algebra of graph invariants

On an adaptive preconditioned Crank-Nicolson algorithm for infinite dimensional Bayesian inferences

Using Machine Learning to Predict the Outcome of English County twenty over Cricket Matches

Alternative Markov Properties for Acyclic Directed Mixed Graphs

The relationship between internet user type and user performance when carrying out simple vs. complex search tasks

A Framework for Evaluating the Retrieval Effectiveness of Search Engines

The retrieval effectiveness of search engines on navigational queries

The Influence of Commercial Intent of Search Results on Their Perceived Relevance

Ranking library materials

What Users See – Structures in Search Engine Results Pages

The Retrieval Effectiveness of Web Search Engines: Considering Results Descriptions

Problems with the use of Web search engines to find results in foreign languages

Metric learning for graph-based label propgation

The historical Moran model

Nonparametric estimation for irregularly sampled Lévy processes

Cache-Conscious Run-time Decomposition of Data Parallel Computations

Uniqueness of the extreme cases in theorems of Drisko and Erdős-Ginzburg-Ziv

Toward Transparent Heterogeneous Systems

Continued Classification of 3D Lattice Walks in the Positive Octant

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Solution Repair/Recovery in Uncertain Optimization Environment

Penalized complexity priors for degrees of freedom in Bayesian P-splines

Infinite-dimensional calculus under weak spatial regularity of the processes

Sparse learning of maximum likelihood model for optimization of complex loss function

The classical-quantum divergence of complexity in the Ising spin chain

On the minimum of a conditioned Brownian bridge

Dummy variables and their interactions in regression analysis: examples from research on body mass index

Generation and motion of interfaces in one-dimensional stochastic Allen-Cahn equation

Online learning in repeated auctions

Using Abduction in Markov Logic Networks for Root Cause Analysis

Complex-Valued Gaussian Processes for Regression: A Widely Non-Linear Approach

Efficient Output Kernel Learning for Multiple Tasks

Hyperspectral Unmixing in Presence of Endmember Variability, Nonlinearity or Mismodelling Effects

One to rule them all: a general method for fast computation on semirings isomorphic to $(\times, \max)$ on $\mathbb{R}_+$

A Distribution Adaptive Framework for Prediction Interval Estimation Using Nominal Variables

Preimages under the Stack-Sorting Algorithm

Wishart Mechanism for Differentially Private Principal Components Analysis

Expressiveness of Rectifier Networks

Discovering Underlying Plans Based on Distributed Representations of Actions

Bayesian hypothesis testing for one bit compressed sensing with sensing matrix perturbation

Learning Discriminative Representations for Semantic Cross Media Retrieval

Why are deep nets reversible: A simple theory, with implications for training

Tree-Guided MCMC Inference for Normalized Random Measure Mixture Models

The Invisible Hand of Dynamic Market Pricing

Adversarial Autoencoders

A New Smooth Approximation to the Zero One Loss with a Probabilistic Interpretation

Net2Net: Accelerating Learning via Knowledge Transfer

Discrete one-dimensional oriented percolation of intervals

Competitive Multi-scale Convolution

Local entropy as a measure for sampling solutions in Constraint Satisfaction Problems

Two laws of large numbers for sublinear expectations

Marginalized Two Part Models for Generalized Gamma Family of Distributions

MOEA/D-GM: Using probabilistic graphical models in MOEA/D for solving combinatorial optimization problems

Predicting distributions with Linearizing Belief Networks

Learning Structured Inference Neural Networks with Label Relations

A Bayesian Semiparametric Framework for Understanding and Predicting Customer Base Dynamics

A Block Regression Model for Short-Term Mobile Traffic Forecasting

Co-modularity and Co-community Detection in Large Networks

Identifying the Absorption Bump with Deep Learning

Rescue of endemic states in interconnected networks with adaptive coupling

blavaan: Bayesian structural equation models via parameter expansion

Semiparametric Estimation of CES Demand System with Observed and Unobserved Product Characteristics