Cautious Deep Learning
Most classifiers operate by selecting the maximum of an estimate of the conditional distribution $p(y|x)$ where $x$ stands for the features of the instance to be classified and $y$ denotes its label. This often results in a hubristic bias: overconfidence in the assignment of a definite label. Usually, the observations are concentrated on a small volume but the classifier provides definite predictions for the entire space. We propose constructing conformal prediction sets [vovk2005algorithmic] which contain a set of labels rather than a single label. These conformal prediction sets contain the true label with probability $1-\alpha$. Our construction is based on $p(x|y)$ rather than $p(y|x)$ which results in a classifier that is very cautious: it outputs the null set – meaning `I don’t know’ — when the object does not resemble the training examples. An important property of our approach is that classes can be added or removed without having to retrain the classifier. We demonstrate the performance on the ImageNet ILSVRC dataset using high dimensional features obtained from state of the art convolutional neural networks. …

Data Standardization
When approaching data for modeling, some standard procedures should be used to prepare the data for modeling:
1.First the data should be filtered, and any outliers removed from the data (watch for a future post on how to scrub your raw data removing only legitimate outliers).
2.The data should be normalized or standardized to bring all of the variables into proportion with one another. For example, if one variable is 100 times larger than another (on average), then your model may be better behaved if you normalize/standardize the two variables to be approximately equivalent. Technically though, whether normalized/standardized, the coefficients associated with each variable will scale appropriately to adjust for the disparity in the variable sizes. …

Sample Entropy
Sample entropy (SampEn) is a modification of approximate entropy (ApEn), used for assessing the complexity of physiological time-series signals, diagnosing diseased states. SampEn has two advantages over ApEn: data length independence and a relatively trouble-free implementation. Also, there is a small computational difference: In ApEn, the comparison between the template vector (see below) and the rest of the vectors also includes comparison with itself. This guarantees that probabilities {\displaystyle C_{i}’^{m}(r)} C_{{i}}’^{{m}}(r) are never zero. Consequently, it is always possible to take a logarithm of probabilities. Because template comparisons with itself lower ApEn values, the signals are interpreted to be more regular than they actually are. These self-matches are not included in SampEn. There is a multiscale version of SampEn as well, suggested by Costa and others. …