Anomaly detectors are often used to produce a ranked list of statistical anomalies, which are examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, in realworld applications, this process can be exceedingly difficult for the analyst since a large fraction of high-ranking anomalies are false positives and not interesting from the application perspective. In this paper, we aim to make the analyst’s job easier by allowing for analyst feedback during the investigation process. Ideally, the feedback influences the ranking of the anomaly detector in a way that reduces the number of false positives that must be examined before discovering the anomalies of interest. In particular, we introduce a novel technique for incorporating simple binary feedback into tree-based anomaly detectors. We focus on the Isolation Forest algorithm as a representative tree-based anomaly detector, and show that we can significantly improve its performance by incorporating feedback, when compared with the baseline algorithm that does not incorporate feedback. Our technique is simple and scales well as the size of the data increases, which makes it suitable for interactive discovery of anomalies in large datasets.
In this paper a stochastic model of a call center with a two-level architecture is analyzed. A first-level pool of operators answers calls, identifies, and handles non-urgent calls. A call classified as urgent has to be transferred to specialized operators at the second level. When the operators of the second level are all busy, the operator of first level handling the urgent call is blocked until an operator at the second level is available. Under a scaling assumption, the evolution of the number of urgent calls blocked at level~$1$ is investigated. It is shown that if the ratio of the number of operators at level $2$ and~$1$ is greater than some threshold, then, essentially, the system operates without congestion, with probability close to $1$, no urgent call is blocked after some finite time. Otherwise, we prove that a positive fraction of the operators of the first level are blocked due to the congestion of the second level. Stochastic calculus with Poisson processes, coupling arguments and formulations in terms of Skorokhod problems are the main mathematical tools to establish these convergence results.
A new type of differential equations for probability measures on Euclidean spaces, called Measure Differential Equations (briefly MDEs), is introduced. MDEs correspond to Probability Vector Fields, which map measures on an Euclidean space to measures on its tangent bundle. Solutions are intended in weak sense and existence, uniqueness and continuous dependence results are proved under suitable conditions. The latter are expressed in terms of the Wasserstein metric on the base and fiber of the tangent bundle. MDEs represent a natural measure-theoretic generalization of Ordinary Differential Equations via a monoid morphism mapping sums of vector fields to fiber convolution of the corresponding Probability Vector Fields. Various examples, including finite-speed diffusion and concentration, are shown, together with relationships to Partial Differential Equations. Finally, MDEs are also natural mean-field limits of multi-particle systems, with convergence results extending the classical Dubroshin approach.