Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features:
· tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions.
· transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
· efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
· speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
· dynamic C code generation – Evaluate expressions faster.
· extensive unit-testing and self-verification – Detect and diagnose many types of mistake.
Theano has been powering large-scale computationally intensive scientific investigations since 2007. But it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal).
Attribute Oriented Induction (AOI)
Attribute Oriented Induction (AOI) is a data mining algorithm used for extracting knowledge of relational data, taking into account expert knowledge. It is a clustering algorithm that works by transforming the values of the attributes and converting an instance into others that are more generic or ambiguous. In this way, it seeks similarities between elements to generate data groupings. AOI was initially conceived as an algorithm for knowledge discovery in databases, but over the years it has been applied to other areas such as spatial patterns, intrusion detection or strategy making. …
We present a sample path dependent measure of causal influence between time series. The proposed causal measure is a random sequence, a realization of which enables identification of specific patterns that give rise to high levels of causal influence. We show that these patterns cannot be identified by existing measures such as directed information (DI). We demonstrate how sequential prediction theory may be leveraged to estimate the proposed causal measure and introduce a notion of regret for assessing the performance of such estimators. We prove a finite sample bound on this regret that is determined by the worst case regret of the sequential predictors used in the estimator. Justification for the proposed measure is provided through a series of examples, simulations, and application to stock market data. Within the context of estimating DI, we show that, because joint Markovicity of a pair of processes does not imply the marginal Markovicity of individual processes, commonly used plug-in estimators of DI will be biased for a large subset of jointly Markov processes. We introduce a notion of DI with ‘stale history’, which can be combined with a plug-in estimator to upper and lower bound the DI when marginal Markovicity does not hold. …
Contextual Outlier INterpretation (COIN)
Outlier detection plays an essential role in many data-driven applications to identify isolated instances that are different from the majority. While many statistical learning and data mining techniques have been used for developing more effective outlier detection algorithms, the interpretation of detected outliers does not receive much attention. Interpretation is becoming increasingly important to help people trust and evaluate the developed models through providing intrinsic reasons why the certain outliers are chosen. It is difficult, if not impossible, to simply apply feature selection for explaining outliers due to the distinct characteristics of various detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, the role of contrastive contexts where outliers locate, as well as the relation between outliers and contexts, are usually overlooked in interpretation. To tackle the issues above, in this paper, we propose a novel Contextual Outlier INterpretation (COIN) method to explain the abnormality of existing outliers spotted by detectors. The interpretability for an outlier is achieved from three aspects: outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework compared with existing interpretation approaches. …