Correntropy google
Correntropy is a nonlinear similarity measure between two random variables.
Learning with the Maximum Correntropy Criterion Induced Losses for Regression


Patient Event Graph (PatientEG) google
Medical activities, such as diagnoses, medicine treatments, and laboratory tests, as well as temporal relations between these activities are the basic concepts in clinical research. However, existing relational data model on electronic medical records (EMRs) lacks explicit and accurate semantic definitions of these concepts. It leads to the inconvenience of query construction and the inefficiency of query execution where multi-table join queries are frequently required. In this paper, we propose a patient event graph (PatientEG) model to capture the characteristics of EMRs. We respectively define five types of medical entities, five types of medical events and five types of temporal relations. Based on the proposed model, we also construct a PatientEG dataset with 191,294 events, 3,429 distinct entities, and 545,993 temporal relations using EMRs from Shanghai Shuguang hospital. To help to normalize entity values which contain synonyms, hyponymies, and abbreviations, we link them with the Chinese biomedical knowledge graph. With the help of PatientEG dataset, we are able to conveniently perform complex queries for clinical research such as auxiliary diagnosis and therapeutic effectiveness analysis. In addition, we provide a SPARQL endpoint to access PatientEG dataset and the dataset is also publicly available online. Also, we list several illustrative SPARQL queries on our website. …

LogitBoost Autoregressive Networks google
Multivariate binary distributions can be decomposed into products of univariate conditional distributions. Recently popular approaches have modeled these conditionals through neural networks with sophisticated weight-sharing structures. It is shown that state-of-the-art performance on several standard benchmark datasets can actually be achieved by training separate probability estimators for each dimension. In that case, model training can be trivially parallelized over data dimensions. On the other hand, complexity control has to be performed for each learned conditional distribution. Three possible methods are considered and experimentally compared. The estimator that is employed for each conditional is LogitBoost. Similarities and differences between the proposed approach and autoregressive models based on neural networks are discussed in detail. …

Discretification google
Discretification’ is the mechanism of making continuous data discrete. If you really grasp the concept, you may be thinking ‘Wait a minute, the type of data we are collecting is discrete in and of itself! Data can EITHER be discrete OR continuous, it can’t be both!’ You would be correct. But what if we manually selected values along that continuous measurement, and declared them to be in a specific category? For instance, if we declare 72.0 degrees and greater to be ‘Hot’, 35.0-71.9 degrees to be ‘Moderate’, and anything lower than 35.0 degrees to be ‘Cold’, we have ‘discretified’ temperature! Our readings that were once continuous now fit into distinct categories. So, where we do we draw the boundaries for these categories? What makes 35.0 degrees ‘Cold’ and 35.1 degrees ‘Moderate’? At is at this juncture that the TRUE decision is being made. The beauty of approaching the challenge in this manner is that it is data-centric, not concept-centric. Let’s walk through our marketing example first without using discretification, then with it. …