Tableau Public  Tableau Public is a free data storytelling application. Create and share interactive charts and graphs, stunning maps, live dashboards and fun applications in minutes, then publish anywhere on the web. Anyone can do it, it’s that easy – and it’s free. 
Tag Management System (TMS) 
A Tag Management System (TMS) replaces hardcoded tags that are used for marketing, analytics, and testing on a website, with dynamic tags that are easier to implement and update. Every tag management system uses a container tag – a small snippet of code that allows you to dynamically insert tags into your website. You can think of container tags as buckets that hold other types of tags. You control which tags are added to the buckets using a simple web interface. In 2012, Google released a TMS called Google Tag Manager, which has quickly become one of the most widely used Tag Management Systems in the market. The benefits of tag management (and specifically Google Tag Manager) are enormous to any business, large or small. You can add and update Google AdWords tags, Google Analytics tags, DoubleClick Floodlight tags and many nonGoogle thirdparty tags directly from Google Tag Manager, instead of editing site code. This reduces errors, frees you from having to involve a webmaster, and allows you to quickly deploy tags on your site. To effectively use tag management, it’s important to understand basic concepts like the data layer, triggers, and variables. 
Tagger  We present a framework for efficient perceptual inference that explicitly reasons about the segmentation of its inputs and features. Rather than being trained for any specific segmentation, our framework learns the grouping process in an unsupervised manner or alongside any supervised task. By enriching the representations of a neural network, we enable it to group the representations of different objects in an iterative manner. By allowing the system to amortize the iterative inference of the groupings, we achieve very fast convergence. In contrast to many other recently proposed methods for addressing multiobject scenes, our system does not assume the inputs to be images and can therefore directly handle other modalities. For multidigit classification of very cluttered images that require texture segmentation, our method offers improved classification performance over convolutional networks despite being fully connected. Furthermore, we observe that our system greatly improves on the semisupervised result of a baseline Ladder network on our dataset, indicating that segmentation can also improve sample efficiency. 
Tagging Systems  Tagging systems have become increasingly popular. These systems enable users to add keywords (i.e., ‘tags’) to Internet resources (e.g., web pages, images, videos) without relying on a controlled vocabulary. Tagging systems have the potential to improve search, spam detection, reputation systems, and personal organization while introducing new modalities of social communication and opportunities for data mining. This potential is largely due to the social structure that underlies many of the current systems. Despite the rapid expansion of applications that support tagging of resources, tagging systems are still not well studied or understood. In this paper, we provide a short description of the academic related work to date. We offer a model of tagging systems, specifically in the context of webbased systems, to help us illustrate the possible benefits of these tools. Since many such systems already exist, we provide a taxonomy of tagging systems to help inform their analysis and design, and thus enable researchers to frame and compare evidence for the sustainability of such systems. We also provide a simple taxonomy of incentives and contribution models to inform potential evaluative frameworks. While this work does not present comprehensive empirical results, we present a preliminary study of the photosharing and tagging system Flickr to demonstrate our model and explore some of the issues in one sample system. This analysis helps us outline and motivate possible future directions of research in tagging systems. 
TagGuided HyperRecNN/TreeLSTM (TGHRecNN/TreeLSTM) 
Recursive Neural Network (RecNN), a type of models which compose words or phrases recursively over syntactic tree structures, has been proven to have superior ability to obtain sentence representation for a variety of NLP tasks. However, RecNN is born with a thorny problem that a shared compositional function for each node of trees can’t capture the complex semantic compositionality so that the expressive power of model is limited. In this paper, in order to address this problem, we propose TagGuided HyperRecNN/TreeLSTM (TGHRecNN/TreeLSTM), which introduces hypernetwork into RecNNs to take as inputs PartofSpeech (POS) tags of word/phrase and generate the semantic composition parameters dynamically. Experimental results on five datasets for two typical NLP tasks show proposed models both obtain significant improvement compared with RecNN and TreeLSTM consistently. Our TGHTreeLSTM outperforms all existing RecNNbased models and achieves or is competitive with stateoftheart on four sentence classification benchmarks. The effectiveness of our models is also demonstrated by qualitative analysis. 
Takeuchi’s Information Criteria (TIC) 
Takeuchi’s Information Criteria (TIC) is a linearization of maximum likelihood estimator bias which shrinks the model parameters towards the maximum entropy distribution, even when the model is misspecified. In statistical machine learning, $L_2$ regularization (a.k.a. ridge regression) also introduces a parameterized bias term with the goal of minimizing outofsample entropy, but generally requires a numerical solver to find the regularization parameter. 
Takeya Semantic Structure Analysis (TSSA) 
SSRA 
TangentNormal Adversarial Regularization  The everincreasing size of modern datasets combined with the difficulty of obtaining label information has made semisupervised learning of significant practical importance in modern machine learning applications. Compared with supervised learning, the key difficulty in semisupervised learning is how to make full use of the unlabeled data. In order to utilize manifold information provided by unlabeled data, we propose a novel regularization called the tangentnormal adversarial regularization, which is composed by two parts. The two terms complement with each other and jointly enforce the smoothness along two different directions that are crucial for semisupervised learning. One is applied along the tangent space of the data manifold, aiming to enforce local invariance of the classifier on the manifold, while the other is performed on the normal space orthogonal to the tangent space, intending to impose robustness on the classifier against the noise causing the observed data deviating from the underlying data manifold. Both of the two regularizers are achieved by the strategy of virtual adversarial training. Our method has achieved stateoftheart performance on semisupervised learning tasks on both artificial dataset and FashionMNIST dataset. 
TANKER  Named Entity Recognition and Disambiguation (NERD) systems have recently been widely researched to deal with the significant growth of the Web. NERD systems are crucial for several Natural Language Processing (NLP) tasks such as summarization, understanding, and machine translation. However, there is no standard interface specification, i.e. these systems may vary significantly either for exporting their outputs or for processing the inputs. Thus, when a given company desires to implement more than one NERD system, the process is quite exhaustive and prone to failure. In addition, industrial solutions demand critical requirements, e.g., largescale processing, completeness, versatility, and licenses. Commonly, these requirements impose a limitation, making good NERD models to be ignored by companies. This paper presents TANKER, a distributed architecture which aims to overcome scalability, reliability and failure tolerance limitations related to industrial needs by combining NERD systems. To this end, TANKER relies on a microservices oriented architecture, which enables agile development and delivery of complex enterprise applications. In addition, TANKER provides a standardized API which makes possible to combine several NERD systems at once. 
Target Diagram  tdr 
Target Driven Instance Detector (TDID) 
While stateoftheart general object detectors are getting better and better, there are not many systems specifically designed to take advantage of the instance detection problem. For many applications, such as household robotics, a system may need to recognize a few very specific instances at a time. Speed can be critical in these applications, as can the need to recognize previously unseen instances. We introduce a Target Driven Instance Detector(TDID), which modifies existing general object detectors for the instance recognition setting. TDID not only improves performance on instances seen during training, with a fast runtime, but is also able to generalize to detect novel instances. 
Target Set Selection in a Social Network (TSS Problem) 
Given a social network with diffusion probabilities as edge weights and an integer k, which k nodes should be chosen for initial injection of information to maximize influence in the network? This problem is known as Target Set Selection in a social network (TSS Problem) and more popularly, Social Influence Maximization Problem (SIM Problem). 
Targeted Kernel Network (TKN) 
We propose Attentive Regularization (AR), a method to constrain the activation maps of kernels in Convolutional Neural Networks (CNNs) to specific regions of interest (ROIs). Each kernel learns a location of specialization along with its weights through standard backpropagation. A differentiable attention mechanism requiring no additional supervision is used to optimize the ROIs. Traditional CNNs of different types and structures can be modified with this idea into equivalent Targeted Kernel Networks (TKNs), while keeping the network size nearly identical. By restricting kernel ROIs, we reduce the number of sliding convolutional operations performed throughout the network in its forward pass, speeding up both training and inference. We evaluate our proposed architecture on both synthetic and natural tasks across multiple domains. TKNs obtain significant improvements over baselines, requiring less computation (around an order of magnitude) while achieving superior performance. 
Targeted Learning  The statistics profession is at a unique point in history. The need for valid statistical tools is greater than ever; data sets are massive, often measuring hundreds of thousands of measurements for a single subject. The field is ready to move towards clear objective benchmarks under which tools can be evaluated. Targeted learning allows 1) the full generalization and utilization of crossvalidation as an estimator selection tool so that the subjective choices made by humans are now made by the machine, and 2) targeting the fitting of the probability distribution of the data toward the target parameter representing the scientific question of interest. Targeted learning methods build machinelearningbased estimators of parameters defined as features of the probability distribution of the data, while also providing influencecurve or bootstrapbased confidence internals. The theory offers a general template for creating targeted maximum likelihood estimators for a data structure, nonparametric or semiparametric statistical model, and parameter mapping. These estimators of causal inference parameters are double robust and have a variety of other desirable statistical properties. Targeted maximum likelihood estimation built on the lossbased ‘super learning’ system such that lowerdimensional parameters could be targeted (e.g., a marginal causal effect); the remaining bias for the (lowdimensional) target feature of the probability distribution was removed. Targeted learning for effect estimation and causal inference allows for the complete integration of machine learning advances in prediction while providing statistical inference for the target parameter(s) of interest. http://…/9781441997814 http://…/papers SuperLearner,tmle 
Targeted Maximum Likelihood Estimation (TMLE) 
Maximum likelihood estimation fits a model to data, minimizing a global measure, such as mean squared error (MSE). When we are interested in one particular parameter of the data distribution and consider the remaining parameters to be nuisance parameters, we would prefer an estimate that has smaller bias and variance for the targeted parameter, at the expense of increased bias and/or variance in the estimation of nuisance parameters. Targeted maximum likelihood estimation targets the MLE estimate of the parameter of interest in a way that reduces bias. This bias reduction is sometimes accompanied by an increase in the variance of the estimate, but the procedure often reduces variance as well in finite samples. Asymptotically, TMLE is maximally efficient when the model and nuisance parameters are correctly specified. The framework of targeted maximum likelihood estimation (TMLE), introduced in van der Laan & Rubin (2006), is a principled approach for constructing asymptotically linear and efficient substitution estimators in rich infinitedimensional models. The mechanics of TMLE hinge upon firstorder approximations of the parameter of interest as a mapping on the space of probability distributions. For such approximations to hold, a secondorder remainder term must tend to zero sufficiently fast. In practice, this means an initial estimator of the underlying datagenerating distribution with a sufficiently large rate of convergence must be available — in many cases, this requirement is prohibitively difficult to satisfy. http://…/paper335 
Targeted Minimum Loss Based Estimation (TMLE) 
Targeted minimum loss based estimation (TMLE) provides a template for the construction of semiparametric locally efficient double robust substitution estimators of the target parameter of the data generating distribution in a semiparametric censored data or causal inference model based on a sample of independent and identically distributed copies from this data generating distribution. A New Approach to Hierarchical Data Analysis: Targeted Maximum Likelihood Estimation of ClusterBased Effects Under Interference 
Tarjan’s Strongly Connected Components Algorithm  Tarjan’s Algorithm (named for its discoverer, Robert Tarjan) is a graph theory algorithm for finding the strongly connected components of a graph. Although it precedes it chronologically, it can be seen as an improved version of Kosaraju’s algorithm, and is comparable in efficiency to the pathbased strong component algorithm. 
TauCharts  Javascript charts with a focus on data, design and flexibility. Free open source D3.jsbased library. TauCharts is the datafocused charting library. Our goal – help people to build interactive complex visualizations easily. Achieve Charting Zen With TauCharts taucharts 
tauFalse Positive Learning (tauFPL) 
Learning a classifier with control on the falsepositive rate plays a critical role in many machine learning applications. Existing approaches either introduce prior knowledge dependent label cost or tune parameters based on traditional classifiers, which lack consistency in methodology because they do not strictly adhere to the falsepositive rate constraint. In this paper, we propose a novel scoringthresholding approach, tauFalse Positive Learning (tauFPL) to address this problem. We show the scoring problem which takes the falsepositive rate tolerance into accounts can be efficiently solved in linear time, also an outofbootstrap thresholding method can transform the learned ranking function into a low falsepositive classifier. Both theoretical analysis and experimental results show superior performance of the proposed tauFPL over existing approaches. 
Taxicab Correspondence Analysis  Taxicab Correspondence Analysis, Choulakian (2006) <doi:10.1007/s1133600412314>. Classical correspondence analysis (CA) is a statistical method to analyse 2dimensional tables of positive numbers and is typically applied to contingency tables (Benzecri, J.P. (1973). L’Analyse des Donnees. Volume II. L’Analyse des Correspondances. Paris, France: Dunod). Classical CA is based on the Euclidean distance. Taxicab CA is like classical CA but is based on the Taxicab or Manhattan distance. For some tables, Taxicab CA gives more informative results than classical CA. TaxicabCA 
TBATS Models (TBATS) 
The identifier BATS is an acronym for key features of the model: BoxCox transform, ARMA errors, Trend, and Seasonal components. The initial T in TBATS is connoting ‘trigonometric’. 
tDistributed Stochastic Neighbor Embedding (tSNE,TSNE) 
tdistributed stochastic neighbor embedding (tSNE) is a machine learning algorithm for dimensionality reduction developed by Laurens van der Maaten and Geoffrey Hinton. It is a nonlinear dimensionality reduction technique that is particularly well suited for embedding highdimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. Specifically, it models each highdimensional object by a two or threedimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points. The tSNE algorithms comprises two main stages. First, tSNE constructs a probability distribution over pairs of highdimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an infinitesimal probability of being picked. Second, tSNE defines a similar probability distribution over the points in the lowdimensional map, and it minimizes the KullbackLeibler divergence between the two distributions with respect to the locations of the points in the map. http://…butedstochasticneighborembeddingtsne Visualizing Data using tSNE tsne 
TEA Functions (TEA) 
· Transformations are functions that take existing input data and apply a function to it such that it changes form. A simple example could be combining first name, middle name, and last name fields in source data and creating a full name field that is the combination of the three sub fields. · Enrichments are functions that take existing input data, combined with additional data sources, and create new information that could not be gleaned from either source independently. For example, one could take two different lists of individuals and use pattern matching to create relationships that are not apparent from either list itself. · Augmentations are functions that add data of use in combination with the input data. The result is a more complete set of information that combines data from multiple sources. For example, a set of business entities gleaned from a conference attendee list, combined with Dun and Bradstreet profiles for those entities, creates a more complete set of information for each business entity. 
Teacher Guided Search for Architectures by Generation and Evaluation (TGSAGE) 
Strong improvements in network performance in vision tasks have resulted from the search of alternative network architectures, and prior work has shown that this search process can be automated and guided by evaluating candidate network performance following limited training (Performance Guided Architecture Search or PGAS). However, because of the large architecture search spaces and the high computational cost associated with evaluating each candidate model, further gains in computational efficiency are needed. Here we present a method termed Teacher Guided Search for Architectures by Generation and Evaluation (TGSAGE) that produces up to an order of magnitude in search efficiency over PGAS methods. Specifically, TGSAGE guides each step of the architecture search by evaluating the similarity of internal representations of the candidate networks with those of the (fixed) teacher network. We show that this procedure leads to significant reduction in required persample training and that, this advantage holds for two different search spaces of architectures, and two different search algorithms. We further show that in the space of convolutional cells for visual categorization, TGSAGE finds a cell structure with similar performance as was previously found using other methods but at a total computational cost that is two orders of magnitude lower than Neural Architecture Search (NAS) and more than four times lower than progressive neural architecture search (PNAS). These results suggest that TGSAGE can be used to accelerate network architecture search in cases where one has access to some or all of the internal representations of a teacher network of interest, such as the brain. 
TeacherStudent Curriculum Learning (TSCL) 
We propose TeacherStudent Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e. where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student’s performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully handcrafted curricula in two tasks: addition of decimal numbers with LSTM and navigation in Minecraft. Using our automatically generated curriculum enabled to solve a Minecraft maze that could not be solved at all when training directly on solving the maze, and the learning was an order of magnitude faster than uniform sampling of subtasks. 
TeachingLearningBased Optimization (TLBO) 
A new efficient optimization method, called ‘TeachingLearningBased Optimization (TLBO)’, is proposed in this paper for the optimization of mechanical design problems. This method works on the effect of influence of a teacher on learners. Like other natureinspired algorithms, TLBO is also a populationbased method and uses a population of solutions to proceed to the global solution. The population is considered as a group of learners or a class of learners. The process of TLBO is divided into two parts: the first part consists of the ‘Teacher Phase’ and the second part consists of the ‘Learner Phase’. ‘Teacher Phase’ means learning from the teacher and ‘Learner Phase’ means learning by the interaction between learners. The basic philosophy of the TLBO method is explained in detail. To check the effectiveness of the method it is tested on five different constrained benchmark test functions with different characteristics, four different benchmark mechanical design problems and six mechanical design optimization problems which have real world applications. The effectiveness of the TLBO method is compared with the other populationbased optimization algorithms based on the best solution, average solution, convergence rate and computational effort. Results show that TLBO is more effective and efficient than the other optimization methods for the mechanical design optimization problems considered. This novel optimization method can be easily extended to other engineering design optimization problems. Teaching Learning Based Optimization Algorithm 
TeKnowbase  In this paper, we describe the construction of TeKnowbase, a knowledgebase of technical concepts in computer science. Our main information sources are technical websites such as Webopedia and Techtarget as well as Wikipedia and online textbooks. We divide the knowledgebase construction problem into two parts — the acquisition of entities and the extraction of relationships among these entities. Our knowledgebase consists of approximately 100,000 triples. We conducted an evaluation on a sample of triples and report an accuracy of a little over 90\%. We additionally conducted classification experiments on StackOverflow data with features from TeKnowbase and achieved improved classification accuracy. 
Tell Me Something New (TMSN) 
We present a novel approach for parallel computation in the context of machine learning that we call ‘Tell Me Something New’ (TMSN). This approach involves a set of independent workers that use broadcast to update each other when they observe ‘something new’. TMSN does not require synchronization or a head node and is highly resilient against failing machines or laggards. We demonstrate the utility of TMSN by applying it to learning boosted trees. We show that our implementation is 10 times faster than XGBoost and LightGBM on the splicesite prediction problem. 
Template Model Builder  glmmTMB 
Temporal Aggregation  We call temporal aggregation the situation in which a variable that evolves through time can not be observed at all dates. This phenomenon arises frequently in economics, where it is very expensive to collect data on certain variables, and there is no reason to believe that economic time series are collected at the frequency required to fully capture the movements of the economy. For example, we only have quarterly observations on GNP, but it is reasonable to believe that the behavior of GNP within a quarter carries relevant information about the structure of the economy. 
Temporal Automatic Relation Discovery in Sequences (TARDIS) 
Recent empirical results on longterm dependency tasks have shown that neural networks augmented with an external memory can learn the longterm dependency tasks more easily and achieve better generalization than vanilla recurrent neural networks (RNN). We suggest that memory augmented neural networks can reduce the effects of vanishing gradients by creating shortcut (or wormhole) connections. Based on this observation, we propose a novel memory augmented neural network model called TARDIS (Temporal Automatic Relation Discovery in Sequences). The controller of TARDIS can store a selective set of embeddings of its own previous hidden states into an external memory and revisit them as and when needed. For TARDIS, memory acts as a storage for wormhole connections to the past to propagate the gradients more effectively and it helps to learn the temporal dependencies. The memory structure of TARDIS has similarities to both Neural Turing Machines (NTM) and Dynamic Neural Turing Machines (DNTM), but both read and write operations of TARDIS are simpler and more efficient. We use discrete addressing for read/write operations which helps to substantially to reduce the vanishing gradient problem with very long sequences. Read and write operations in TARDIS are tied with a heuristic once the memory becomes full, and this makes the learning problem simpler when compared to NTM or DNTM type of architectures. We provide a detailed analysis on the gradient propagation in general for MANNs. We evaluate our models on different longterm dependency tasks and report competitive results in all of them. 
Temporal Convolutional Network (TCN) 
The dominant paradigm for videobased action segmentation is composed of two steps: first, for each frame, compute lowlevel features using Dense Trajectories or a Convolutional Neural Network that encode spatiotemporal information locally, and second, input these features into a classifier that captures highlevel temporal relationships, such as a Recurrent Neural Network (RNN). While often effective, this decoupling requires specifying two separate models, each with their own complexities, and prevents capturing more nuanced longrange spatiotemporal relationships. We propose a unified approach, as demonstrated by our Temporal Convolutional Network (TCN), that hierarchically captures relationships at low, intermediate, and highlevel timescales. Our model achieves superior or competitive performance using video or sensor data on three public action segmentation datasets and can be trained in a fraction of the time it takes to train an RNN. Temporal Convolutional Nets (TCNs) Take Over from RNNs for NLP Predictions 
Temporal Database  A temporal database is a database with builtin support for handling data involving time, being related to Slowly changing dimension concept, for example a temporal data model and a temporal version of Structured Query Language (SQL). More specifically the temporal aspects usually include valid time and transaction time. These attributes can be combined to form bitemporal data. Valid time is the time period during which a fact is true with respect to the real world. Transaction time is the time period during which a fact stored in the database is considered to be true. Bitemporal data combines both Valid and Transaction Time. It is possible to have timelines other than Valid Time and Transaction Time, such as Decision Time, in the database. In that case the database is called a multitemporal database as opposed to a bitemporal database. However, this approach introduces additional complexities such as dealing with the validity of (foreign) keys. Temporal databases are in contrast to current databases, which store only facts which are believed to be true at the current time. 
Temporal Difference Learning (TD) 
Temporal difference (TD) learning is a prediction method. It has been mostly used for solving the reinforcement learning problem. ‘TD learning is a combination of Monte Carlo ideas and dynamic programming (DP) ideas.’ TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy. TD is related to dynamic programming techniques because it approximates its current estimate based on previously learned estimates (a process known as bootstrapping). The TD learning algorithm is related to the temporal difference model of animal learning. Adaptive Lambda LeastSquares Temporal Difference Learning 
Temporal Difference Variational AutoEncoder  One motivation for learning generative models of environments is to use them as simulators for modelbased reinforcement learning. Yet, it is intuitively clear that when time horizons are long, rolling out single step transitions is inefficient and often prohibitive. In this paper, we propose a generative model that learns state representations containing explicit beliefs about states several time steps in the future and that can be rolled out directly in these states without executing single step transitions. The model is trained on pairs of temporally separated time points, using an analogue of temporal difference learning used in reinforcement learning, taking the belief about possible futures at one time point as a bootstrap for training the belief at an earlier time. While we focus purely on the study of the model rather than its use in reinforcement learning, the model architecture we design respects agents’ constraints as it builds the representation online. 
Temporal Event Graph (TEG) 
Temporal networks are increasingly being used to model the interactions of complex systems. Most studies require the temporal aggregation of edges (or events) into discrete time steps to perform analysis. In this article we describe a static, lossless, and unique representation of a temporal network, the temporal event graph (TEG). The TEG describes the temporal network in terms of both the interevent time and twoevent temporal motif distributions. By considering these distributions in unison we provide a new method to characterise the behaviour of individuals and collectives in temporal networks as well as providing a natural decomposition of the network. We illustrate the utility of the TEG by providing examples on both synthetic and real temporal networks. 
Temporal Exponential Random Graph Model (TERGM) 
btergm 
Temporal Hierarchical Clustering  We study hierarchical clusterings of metric spaces that change over time. This is a natural geometric primitive for the analysis of dynamic data sets. Specifically, we introduce and study the problem of finding a temporally coherent sequence of hierarchical clusterings from a sequence of unlabeled point sets. We encode the clustering objective by embedding each point set into an ultrametric space, which naturally induces a hierarchical clustering of the set of points. We enforce temporal coherence among the embeddings by finding correspondences between successive pairs of ultrametric spaces which exhibit small distortion in the GromovHausdorff sense. We present both upper and lower bounds on the approximability of the resulting optimization problems. 
Temporal Multinomial Mixture (TMM) 
Evolutionary clustering aims at capturing the temporal evolution of clusters. This issue is particularly important in the context of social media data that are naturally temporally driven. In this paper, we propose a new probabilistic modelbased evolutionary clustering technique. The Temporal Multinomial Mixture (TMM) is an extension of classical mixture model that optimizes feature cooccurrences in the tradeoff with temporal smoothness. Our model is evaluated for two recent case studies on opinion aggregation over time. We compare four different probabilistic clustering models and we show the superiority of our proposal in the task of instanceoriented clustering. 
Temporal Network Autocorrelation Models (TNAM) 
tnam,xergm 
Temporal Network Centrality (TNC) 
TNC 
Temporal Overdrive Recurrent Neural Network  In this work we present a novel recurrent neural network architecture designed to model systems characterized by multiple characteristic timescales in their dynamics. The proposed network is composed by several recurrent groups of neurons that are trained to separately adapt to each timescale, in order to improve the system identification process. We test our framework on time series prediction tasks and we show some promising, preliminary results achieved on synthetic data. To evaluate the capabilities of our network, we compare the performance with several stateoftheart recurrent architectures. 
Temporal Pattern Mining  Temporal Pattern Mining (TPM) is the problem of mining predictive complex temporal patterns from multivariate time series in a supervised setting. 
Temporal Regularized Matrix Factorization (TRMF) 
Matrix factorization approaches have been applied to a variety of applications, from recommendation systems to multilabel learning. Standard low rank matrix factorization methods fail in cases when the data can be modeled as a time series, since they do not take into account the dependencies among factors, while EM algorithms designed for time series data are inapplicable to large multiple time series data. To overcome this, matrix factorization approaches are augmented with dynamic linear model based regularization frameworks. A major drawback in such approaches is that the exact dependencies between the latent factors are assumed to be known. In this paper, we introduce a Temporal Regularized Matrix Factorization (TRMF) method, an efficient alternating minimization scheme that not only learns the latent time series factors, but also the dependencies among the latent factors. TRMF is highly general, and subsumes several existing matrix factorization approaches for time series data. We make interesting connections to graph based matrix factorization methods in the context of learning the dependencies. Experiments on both real and synthetic data show that TRMF is highly scalable, and outperforms several existing approaches used for common large scale time series tasks. 
TemporalDifference Learning (TD Learning) 
Temporal Difference (TD) learning refers to a class of modelfree reinforcement learning methods which learn by bootstrapping from the current estimate of the value function. These methods sample from the environment, like Monte Carlo methods, and perform updates based on current estimates, like dynamic programming methods. While Monte Carlo methods only adjust their estimates once the final outcome is known, TD methods adjust predictions to match later, more accurate, predictions about the future before the final outcome is known. This is a form of bootstrapping, as illustrated with the following example: ‘Suppose you wish to predict the weather for Saturday, and you have some model that predicts Saturday’s weather, given the weather of each day in the week. In the standard case, you would wait until Saturday and then adjust all your models. However, when it is, for example, Friday, you should have a pretty good idea of what the weather would be on Saturday – and thus be able to change, say, Saturday’s model before Saturday arrives’. Temporal difference methods are related to the temporal difference model of animal learning. Temporal Difference Learning in Python Temporal Difference Learning with Neural Networks – Study of the Leakage Propagation Problem 
TensiStrength  Computer systems need to be able to react to stress in order to perform optimally on some tasks. This article describes TensiStrength, a system to detect the strength of stress and relaxation expressed in social media text messages. TensiStrength uses a lexical approach and a set of rules to detect direct and indirect expressions of stress or relaxation, particularly in the context of transportation. It is slightly more effective than a comparable sentiment analysis program, although their similar performances occur despite differences on almost half of the tweets gathered. The effectiveness of TensiStrength depends on the nature of the tweets classified, with tweets that are rich in stressrelated terms being particularly problematic. Although generic machine learning methods can give better performance than TensiStrength overall, they exploit topicrelated terms in a way that may be undesirable in practical applications and that may not work as well in more focused contexts. In conclusion, TensiStrength and generic machine learning approaches work well enough to be practical choices for intelligent applications that need to take advantage of stress information, and the decision about which to use depends on the nature of the texts analysed and the purpose of the task. 
Tensor Comprehensions  Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speechtotext, scene understanding, ranking user preferences, ad placement, etc. Competing frameworks for building these networks such as TensorFlow, Chainer, CNTK, Torch/PyTorch, Caffe1/2, MXNet and Theano, explore different tradeoffs between usability and expressiveness, research or production orientation and supported hardware. They operate on a DAG of computational operators, wrapping highperformance libraries such as CUDNN (for NVIDIA GPUs) or NNPACK (for various CPUs), and automate memory allocation, synchronization, distribution. Custom operators are needed where the computation does not fit existing highperformance library calls, usually at a high engineering cost. This is frequently required when new operators are invented by researchers: such operators suffer a severe performance penalty, which limits the pace of innovation. Furthermore, even if there is an existing runtime call these frameworks can use, it often doesn’t offer optimal performance for a user’s particular network architecture and dataset, missing optimizations between operators as well as optimizations that can be done knowing the size and shape of data. Our contributions include (1) a language close to the mathematics of deep learning called Tensor Comprehensions offering both imperative and declarative styles, (2) a polyhedral JustInTime compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, also providing optimizations such as operator fusion and specialization for specific sizes, (3) a compilation cache populated by an autotuner. [Abstract cutoff] 
Tensor Core  The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called ‘Tensor Core’ that performs one matrixmultiplyandaccumulate on 4×4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta microarchitecture, provides 640 Tensor Cores with a theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to program NVIDIA Tensor Cores, their performances and the precision loss due to computation in mixed precision. Currently, NVIDIA provides three different ways of programming matrixmultiplyandaccumulate on Tensor Cores: the CUDA Warp Matrix Multiply Accumulate (WMMA) API, CUTLASS, a templated library based on WMMA, and cuBLAS GEMM. After experimenting with different approaches, we found that NVIDIA Tensor Cores can deliver up to 83 Tflops/s in mixed precision on a Tesla V100 GPU, seven and three times the performance in single and half precision respectively. A WMMA implementation of batched GEMM reaches a performance of 4 Tflops/s. While precision loss due to matrix multiplication with half precision input might be critical in many HPC applications, it can be considerably reduced at the cost of increased computation. Our results indicate that HPC applications using matrix multiplications can strongly benefit from using of NVIDIA Tensor Cores. 
Tensor Graph Convolutional Neural Network (TGCNN) 
In this paper, we propose a novel tensor graph convolutional neural network (TGCNN) to conduct convolution on factorizable graphs, for which here two types of problems are focused, one is sequential dynamic graphs and the other is crossattribute graphs. Especially, we propose a graph preserving layer to memorize salient nodes of those factorized subgraphs, i.e. cross graph convolution and graph pooling. For cross graph convolution, a parameterized Kronecker sum operation is proposed to generate a conjunctive adjacency matrix characterizing the relationship between every pair of nodes across two subgraphs. Taking this operation, then general graph convolution may be efficiently performed followed by the composition of small matrices, which thus reduces high memory and computational burden. Encapsuling sequence graphs into a recursive learning, the dynamics of graphs can be efficiently encoded as well as the spatial layout of graphs. To validate the proposed TGCNN, experiments are conducted on skeleton action datasets as well as matrix completion dataset. The experiment results demonstrate that our method can achieve more competitive performance with the stateoftheart methods. 
Tensor Graphical Lasso (TeraLasso) 
The Bigraphical Lasso estimator was proposed to parsimoniously model the precision matrices of matrixnormal data based on the Cartesian product of graphs. By enforcing extreme sparsity (the number of parameters) and explicit structures on the precision matrix, this model has excellent potential for improving scalability of the computation and interpretability of complex data analysis. As a result, this model significantly reduces the size of the sample in order to learn the precision matrices, and hence the conditional probability models along different coordinates such as space, time and replicates. In this work, we extend the Bigraphical Lasso (BiGLasso) estimator to the TEnsor gRAphical Lasso (TeraLasso) estimator and propose an analogous method for modeling the precision matrix of tensorvalued data. We establish consistency for both the BiGLasso and TeraLasso estimators and obtain the rates of convergence in the operator and Frobenius norm for estimating the precision matrix. We design a scalable gradient descent method for solving the objective function and analyze the computational convergence rate, showing that the composite gradient descent algorithm is guaranteed to converge at a geometric rate to the global minimizer. Finally, we provide simulation evidence and analysis of a meteorological dataset, showing that we can recover graphical structures and estimate the precision matrices, as predicted by theory. 
Tensor Methods  Tensors are generalizations of matrices that let you look beyond pairwise relationships to higherdimensional models (a matrix is a secondorder tensor). For instance, one can examine patterns between any three (or more) dimensions in data sets. In a text mining application, this leads to models that incorporate the cooccurrence of three or more words, and in social networks, you can use tensors to encode arbitrary degrees of influence (e.g., ‘friend of friend of friend’ of a user). Tensors, as generalizations of vectors and matrices, have become increasingly popular in different areas of machine learning and data mining, where they are employed to approach a diverse number of difficult learning and analysis tasks. Prominent examples include learning on multirelational data and largescale knowledge bases, recommendation systems, computer vision, mining boolean data, neuroimaging or the analysis of timevarying networks. The success of tensors methods is strongly related to their ability to efficiently model, analyse and predict data with multiple modalities. To address specific challenges and problems, a variety of methods has been developed in different fields of application. http://…cetensorlibrariesfordatascience.html http://…/39352 
Tensor Monte Carlo  Multisample objectives improve over singlesample estimates by giving tighter variational bounds and more accurate estimates of posterior uncertainty. However, these multisample techniques scale poorly, in the sense that the number of samples required to maintain the same quality of posterior approximation scales exponentially in the number of latent dimensions. One approach to addressing these issues is sequential Monte Carlo (SMC). However for many problems SMC is prohibitively slow because the resampling steps imposes an inherently sequential structure on the computation, which is difficult to effectively parallelise on GPU hardware. We developed tensor MonteCarlo to address these issues. In particular, whereas the usual multisample objective draws $K$ samples from a joint distribution over all latent variables, we draw $K$ samples for each of the $n$ individual latent variables, and form our bound by averaging over all $K^n$ combinations of samples from each individual latent. While this sum over exponentially many terms might seem to be intractable, in many cases it can be efficiently computed by exploiting conditional independence structure. In particular, we generalise and simplify classical algorithms such as message passing by noting that these sums can be computed can be written in an extremely simple, general form: a series of tensor innerproducts which can be depicted graphically as reductions of a factor graph. As such, we can straightforwardly combine summation over discrete variables with importance sampling over importance sampling over continuous variables. 
Tensor Network (TN) 
The harnessing of modern computational abilities for manybody wavefunction representations is naturally placed as a prominent avenue in contemporary condensed matter physics. Specifically, highly expressive computational schemes that are able to efficiently represent the entanglement properties of manyparticle systems are of interest. In the seemingly unrelated field of machine learning, deep network architectures have exhibited an unprecedented ability to tractably encompass the dependencies characterizing hard learning tasks such as image classification. However, key questions regarding deep learning architecture design still have no adequate theoretical answers. In this paper, we establish a Tensor Network (TN) based common language between the two disciplines, which allows us to offer bidirectional contributions. By showing that manybody wavefunctions are structurally equivalent to mappings of ConvACs and RACs, we construct their TN equivalents, and suggest quantum entanglement measures as natural quantifiers of dependencies in such networks. Accordingly, we propose a novel entanglement based deep learning design scheme. In the other direction, we identify that an inherent reuse of information in stateoftheart deep learning architectures is a key trait that distinguishes them from TNs. We suggest a new TN manifestation of information reuse, which enables TN constructs of powerful architectures such as deep recurrent networks and overlapping convolutional networks. This allows us to theoretically demonstrate that the entanglement scaling supported by these architectures can surpass that of commonly used TNs in 1D, and can support volume law entanglement in 2D polynomially more efficiently than RBMs. We thus provide theoretical motivation to shift trending neuralnetwork based wavefunction representations closer to stateoftheart deep learning architectures. 
Tensor Network Language Model  We propose a new statistical model suitable for machine learning tasks of systems with long distance correlations such as human languages. The model is based on directed acyclic graph decorated by multilinear tensor maps in the vertices and vector spaces in the edges, called tensor network. Such tensor networks have been previously employed for effective numerical computation of the renormalization group flow on the space of effective quantum field theories and lattice models of statistical mechanics. We provide explicit algebrogeometric analysis of the parameter moduli space for tree graphs, discuss model properties and applications such as statistical translation. 
Tensor Product Generation Network (TPGN) 
We present a new tensor product generation network (TPGN) that generates natural language descriptions for images. The model has a novel architecture that instantiates a general framework for encoding and processing symbolic structure through neural network computation. This framework is built on Tensor Product Representations (TPRs). We evaluated the proposed TPGN on the MS COCO image captioning task. The experimental results show that the TPGN outperforms the LSTM based stateoftheart baseline with a significant margin. Further, we show that our caption generation model can be interpreted as generating sequences of grammatical categories and retrieving words by their categories from a plan encoded as a distributed representation. 
Tensor Rank Decomposition  In multilinear algebra, the tensor rank decomposition or canonical polyadic decomposition (CPD) may be regarded as a generalization of the matrix singular value decomposition (SVD) to tensors, which has found application in statistics, signal processing, psychometrics, linguistics and chemometrics. It was introduced by Hitchcock in 1927 and later rediscovered several times, notably in psychometrics. For this reason, the tensor rank decomposition is sometimes historically referred to as PARAFAC or CANDECOMP. 
Tensor Regression Networks (TRN) 
To date, most convolutional neural network architectures output predictions by flattening 3rdorder activation tensors, and applying fullyconnected output layers. This approach has two drawbacks: (i) we lose rich, multimodal structure during the flattening process and (ii) fullyconnected layers require many parameters. We present the first attempt to circumvent these issues by expressing the output of a neural network directly as the the result of a multilinear mapping from an activation tensor to the output. By imposing lowrank constraints on the regression tensor, we can efficiently solve problems for which existing solutions are badly parametrized. Our proposed tensor regression layer replaces flattening operations and fullyconnected layers by leveraging multimodal structure in the data and expressing the regression weights via a low rank tensor decomposition. Additionally, we combine tensor regression with tensor contraction to further increase efficiency. Augmenting the VGG and ResNet architectures, we demonstrate large reductions in the number of parameters with negligible impact on performance on the ImageNet dataset. 
Tensor Ring Network (TRNet) 
Deep neural networks have demonstrated stateoftheart performance in a variety of realworld applications. In order to obtain performance gains, these networks have grown larger and deeper, containing millions or even billions of parameters and over a thousand layers. The tradeoff is that these large architectures require an enormous amount of memory, storage, and computation, thus limiting their usability. Inspired by the recent tensor ring factorization, we introduce Tensor Ring Networks (TRNets), which significantly compress both the fully connected layers and the convolutional layers of deep neural networks. Our results show that our TRNets approach {is able to compress LeNet5 by $11\times$ without losing accuracy}, and can compress the stateoftheart Wide ResNet by $243\times$ with only 2.3\% degradation in {Cifar10 image classification}. Overall, this compression scheme shows promise in scientific computing and deep learning, especially for emerging resourceconstrained devices such as smartphones, wearables, and IoT devices. 
Tensor Robust Principal Component Analysis (TRPCA) 
In this paper, we consider the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover the lowrank and sparse components from their sum. Our model is based on the recently proposed tensortensor product (or tproduct) [13]. Induced by the tproduct, we first rigorously deduce the tensor spectral norm, tensor nuclear norm, and tensor average rank, and show that the tensor nuclear norm is the convex envelope of the tensor average rank within the unit ball of the tensor spectral norm. These definitions, their relationships and properties are consistent with matrix cases. Equipped with the new tensor nuclear norm, we then solve the TRPCA problem by solving a convex program and provide the theoretical guarantee for the exact recovery. Our TRPCA model and recovery guarantee include matrix RPCA as a special case. Numerical experiments verify our results, and the applications to image recovery and background modeling problems demonstrate the effectiveness of our method. 
Tensor Switching Networks (TS) 
We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensorvalued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deepnetworklike function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a onepass linear learning algorithm, and two backpropagationinspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks. 
Tensor Train PCA (TTPCA) 
Tensor train is a hierarchical tensor network structure that helps alleviate the curse of dimensionality by parameterizing largescale multidimensional data via a set of network of lowrank tensors. Associated with such a construction is a notion of Tensor Train subspace and in this paper we propose a TTPCA algorithm for estimating this structured subspace from the given data. By maintaining low rank tensor structure, TTPCA is more robust to noise comparing with PCA or TuckerPCA. This is borne out numerically by testing the proposed approach on the Extended YaleFace Dataset B. 
Tensor Train Rank Minimization  Tensor train (TT) decomposition provides a spaceefficient representation for higherorder tensors. Despite its advantage, we face two crucial limitations when we apply the TT decomposition to machine learning problems: the lack of statistical theory and of scalable algorithms. In this paper, we address the limitations. First, we introduce a convex relaxation of the TT decomposition problem and derive its error bound for the tensor completion task. Next, we develop an alternating optimization method with a randomization technique, in which the time complexity is as efficient as the space complexity is. In experiments, we numerically confirm the derived bounds and empirically demonstrate the performance of our method with a real higherorder tensor. 
Tensor2Tensor  Deep Learning (DL) has enabled the rapid advancement of many useful technologies, such as machine translation, speech recognition and object detection. In the research community, one can find code opensourced by the authors to help in replicating their results and further advancing deep learning. However, most of these DL systems use unique setups that require significant engineering effort and may only work for a specific problem or architecture, making it hard to run new experiments and compare the results. Today, we are happy to release Tensor2Tensor (T2T), an opensource system for training deep learning models in TensorFlow. T2T facilitates the creation of stateofthe art models for a wide variety of ML applications, such as translation, parsing, image captioning and more, enabling the exploration of various ideas much faster than previously possible. This release also includes a library of datasets and models, including the best models from a few recent papers (Attention Is All You Need, Depthwise Separable Convolutions for Neural Machine Translation and One Model to Learn Them All) to help kickstart your own DL research. 
TensorFlow  TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to largescale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an opensource package under the Apache 2.0 license in November, 2015 and are available at http://www.tensorflow.org. 
TensorFlow Agents  We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel without interference of the global interpreter lock. As part of this project, we introduce BatchPPO, an efficient implementation of the proximal policy optimization algorithm. By open sourcing TensorFlow Agents, we hope to provide a flexible starting point for future projects that accelerates future research in the field. 
Tensorflow Boosted Trees (TFBT) 
TF Boosted Trees (TFBT) is a new opensourced framework for the distributed training of gradient boosted trees. It is based on TensorFlow, and its distinguishing features include a novel architecture, automatic loss differentiation, layerbylayer boosting that results in smaller ensembles and faster prediction, principled multiclass handling, and a number of regularization techniques to prevent overfitting. 
TensorFlow Extended (TFX) 
Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful or chestration of many componentsa learner for generating models based on training data, modules for analyzing and validating both data as well as models, and nally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such or chestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt. We present TensorFlow Extended (TFX), a TensorFlow based generalpurpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform con guration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions. We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis. 
TensorFlow Hub  TensorFlow Hub is a library to foster the publication, discovery, and consumption of reusable parts of machine learning models. A module is a selfcontained piece of a TensorFlow graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning. Modules contain variables that have been pretrained for a task using a large dataset. By reusing a module on a related task, you can: • train a model with a smaller dataset, • improve generalization, or • significantly speed up training. 
TensorFlow Probability  TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFlow ecosystem, TensorFlow Probability provides integration of probabilistic methods with deep networks, gradientbased inference via automatic differentiation, and scalability to large datasets and models via hardware acceleration (e.g., GPUs) and distributed computation. 
TensorFlow.js  A WebGL accelerated, browser based JavaScript library for training and deploying ML models. 
TensorForce  Reinforcement learning approaches have long appealed to the data management community due to their ability to learn to control dynamic behavior from raw system performance. Recent successes in combining deep neural networks with reinforcement learning have sparked significant new interest in this domain. However, practical solutions remain elusive due to large training data requirements, algorithmic instability, and lack of standard tools. In this work, we introduce LIFT, an endtoend software stack for applying deep reinforcement learning to data management tasks. While prior work has frequently explored applications in simulations, LIFT centers on utilizing human expertise to learn from demonstrations, thus lowering online training times. We further introduce TensorForce, a TensorFlow library for applied deep reinforcement learning exposing a unified declarative interface to common RL algorithms, thus providing a backend to LIFT. We demonstrate the utility of LIFT in two case studies in database compound indexing and resource management in stream processing. Results show LIFT controllers initialized from demonstrations can outperform human baselines and heuristics across latency metrics and space usage by up to 70%. 
Tensorial Mixture Models  We introduce a generative model, we call Tensorial Mixture Models (TMMs) based on mixtures of basic component distributions over local structures (e.g. patches in an image) where the dependencies between the localstructures are represented by a ‘priors tensor’ holding the prior probabilities of assigning a component distribution to each localstructure. In their general form, TMMs are intractable as the prior tensor is typically of exponential size. However, when the priors tensor is decomposed it gives rise to an arithmetic circuit which in turn transforms the TMM into a Convolutional Arithmetic Circuit (ConvAC). A ConvAC corresponds to a shallow (single hidden layer) network when the priors tensor is decomposed by a CP (sum of rank1) approach and corresponds to a deep network when the decomposition follows the Hierarchical Tucker (HT) model. The ConvAC representation of a TMM possesses several attractive properties. First, the inference is tractable and is implemented by a forward pass through a deep network. Second, the architectural design of the model follows the deep networks community design, i.e., the structure of TMMs is determined by just two easily understood factors: size of pooling windows and number of channels. Finally, we demonstrate the effectiveness of our model when tackling the problem of classification with missing data, leveraging TMMs unique ability of tractable marginalization which leads to optimal classifiers regardless of the missingness distribution. 
Tensorized LSTM  Long ShortTerm Memory (LSTM) is a popular approach to boosting the ability of Recurrent Neural Networks to store longer term temporal information. The capacity of an LSTM network can be increased by widening and adding layers. However, usually the former introduces additional parameters, while the latter increases the runtime. As an alternative we propose the Tensorized LSTM in which the hidden states are represented by tensors and updated via a crosslayer convolution. By increasing the tensor size, the network can be widened efficiently without additional parameters since the parameters are shared across different locations in the tensor; by delaying the output, the network can be deepened implicitly with little additional runtime since deep computations for each timestep are merged into temporal computations of the sequence. Experiments conducted on five challenging sequence learning tasks show the potential of the proposed model. 
TensorLayer  Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others. Developing a deep learning system is arduous and complex, as it involves constructing neural network architectures, managing training/trained models, tuning optimization process, preprocessing and organizing data, etc. TensorLayer is a versatile Python library that aims at helping researchers and engineers efficiently develop deep learning systems. It offers rich abstractions for neural networks, model and data management, and parallel workflow mechanism. While boosting efficiency, TensorLayer maintains both performance and scalability. TensorLayer was released in September 2016 on GitHub, and has helped people from academia and industry develop realworld applications of deep learning. 
TensorLog  We present an implementation of a probabilistic firstorder logic called TensorLog, in which classes of logical queries are compiled into differentiable functions in a neuralnetwork infrastructure such as Tensorflow or Theano. This leads to a close integration of probabilistic logical reasoning with deeplearning infrastructure: in particular, it enables highperformance deep learning frameworks to be used for tuning the parameters of a probabilistic logic. Experimental results show that TensorLog scales to problems involving hundreds of thousands of knowledgebase triples and tens of thousands of examples. 
TensorLy  Tensor methods are gaining increasing traction in machine learning. However, there are scant to no resources available to perform tensor learning and decomposition in Python. To answer this need we developed TensorLy. TensorLy is a state of the art general purpose library for tensor learning. Written in Python, it aims at following the same standards adopted by the main projects of the Python scientific community and fully integrating with these. It allows for fast and straightforward tensor decomposition and learning and comes with exhaustive tests, thorough documentation and minimal dependencies. It can be easily extended and its BSD licence makes it suitable for both academic and commercial applications. TensorLy is available at https://…/tensorly. 
TensOrMachine  Boolean tensor decomposition approximates data of multiway binary relationships as product of interpretable lowrank binary factors, following the rules of Boolean algebra. Here, we present its first probabilistic treatment. We facilitate scalable samplingbased posterior inference by exploitation of the combinatorial structure of the factor conditionals. Maximum a posteriori decompositions feature higher accuracies than existing techniques throughout a wide range of simulated conditions. Moreover, the probabilistic approach facilitates the treatment of missing data and enables model selection with much greater accuracy. We investigate three realworld datasets. First, temporal interaction networks in a hospital ward and behavioural data of university students demonstrate the inference of instructive latent patterns. Next, we decompose a tensor with more than 10 billion data points, indicating relations of gene expression in cancer patients. Not only does this demonstrate scalability, it also provides an entirely novel perspective on relational properties of continuous data and, in the present example, on the molecular heterogeneity of cancer. Our implementation is available on GitHub: https://…/LogicalFactorisationMachines. 
TensorTrain RNN (TTRNN) 
We present TensorTrain RNN (TTRNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics. Longterm forecasting in such systems is highly challenging, since there exist longterm temporal dependencies, higherorder correlations and sensitivity to error propagation. Our proposed tensor recurrent architecture addresses these issues by learning the nonlinear dynamics directly using higher order moments and highorder state transition functions. Furthermore, we decompose the higherorder structure using the tensortrain (TT) decomposition to reduce the number of parameters while preserving the model performance. We theoretically establish the approximation properties of TensorTrain RNNs for general sequence inputs, and such guarantees are not available for usual RNNs. We also demonstrate significant longterm prediction improvements over general RNN and LSTM architectures on a range of simulated environments with nonlinear dynamics, as well on realworld climate and traffic data. 
Term Document Matrix  A documentterm matrix or termdocument matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a documentterm matrix, rows correspond to documents in the collection and columns correspond to terms. There are various schemes for determining the value that each entry in the matrix should take. One such scheme is tfidf. They are useful in the field of natural language processing. 
Term Frequency – Inverse Document Frequency (TFIDF,TFIDF) 
tfidf, short for term frequencyinverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The tfidf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others. Variations of the tfidf weighting scheme are often used by search engines as a central tool in scoring and ranking a document’s relevance given a user query. tfidf can be successfully used for stopwords filtering in various subject fields including text summarization and classification. One of the simplest ranking functions is computed by summing the tfidf for each query term; many more sophisticated ranking functions are variants of this simple model. 
Termbase Exchange  TermBase eXchange (TBX) ist eine XMLbasierte Auszeichnungssprache für den Austausch von Terminologiedaten, die meist in Terminologiedatenbanken verwaltet werden. Anwendungen, die dieses Format unterstützen, können Terminologiebestände untereinander austauschen und pflegen. Ursprünglich ein Standard der Localization Industry Standards Association (LISA), nahm sich die ISO des Standards an und überarbeitete und spezifizierte ihn in ISO 30042, welcher sich auf ISO 12620, ISO 12200 und ISO 16642 stützt. 
Terminology Extraction  Terminology mining, term extraction, term recognition, or glossary extraction, is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus. In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topicdriven web crawlers, web services, recommender systems, etc. The development of terminology extraction is essential to the language industry. One of the first steps to model the knowledge domain of a virtual community is to collect a vocabulary of domainrelevant terms, constituting the linguistic surface manifestation of domain concepts. Several methods to automatically extract technical terms from domainspecific document warehouses have been described in the literature. Typically, approaches to automatic term extraction make use of linguistic processors (part of speech tagging, phrase chunking) to extract terminological candidates, i.e. syntactically plausible terminological noun phrases, NPs (e.g. compounds ‘credit card’, adjectiveNPs ‘local tourist information office’, and prepositionalNPs ‘board of directors’ – in English, the first two constructs are the most frequent). Terminological entries are then filtered from the candidate list using statistical and machine learning methods. Once filtered, because of their low ambiguity and high specificity, these terms are particularly useful for conceptualizing a knowledge domain or for supporting the creation of a domain ontology. Furthermore, terminology extraction is a very useful starting point for semantic similarity, knowledge management, human translation and machine translation, etc. 
Ternary Neural Networks (TNN) 
The computation and storage requirements for Deep Neural Networks (DNNs) are usually high. This issue limit their deployability on ubiquitous computing devices such as smart phones or wearables. In this paper, we propose ternary neural networks (TNNs) in order to make deep learning more resourceefficient. We train these TNNs using a teacherstudent approach. Using only ternary weights and ternary neurons, with a step activation function of twothresholds, the student ternary network learns to mimic the behaviour of its teacher network. We propose a novel, layerwise greedy methodology for training TNNs. During training, a ternary neural network inherently prunes the smaller weights by setting them to zero. This makes them even more compact thus more resourcefriendly. We devise a purposebuilt hardware design for TNNs and implement it on FPGA. The benchmark results with our purposebuilt hardware running TNNs reveal that, with only 1.24 microjoules per image, we can achieve 97.76% accuracy with 5.37 microsecond latency and with a rate of 255K images per second on MNIST. 
Ternary Plot / Ternary Diagram  A ternary plot, ternary graph, triangle plot, simplex plot, or de Finetti diagram is a barycentric plot on three variables which sum to a constant. It graphically depicts the ratios of the three variables as positions in an equilateral triangle. Ternary 
Ternary Residual Networks  Sub8bit representation of DNNs incur some noticeable loss of accuracy despite rigorous (re)training at lowprecision. Such loss of accuracy essentially makes them equivalent to a much shallower counterpart, diminishing the power of being deep networks. To address this problem of accuracy drop we introduce the notion of \textit{residual networks} where we add more lowprecision edges to sensitive branches of the sub8bit network to compensate for the lost accuracy. Further, we present a perturbation theory to identify such sensitive edges. Aided by such an elegant tradeoff between accuracy and model size, the 82 architecture (8bit activations, ternary weights), enhanced by residual ternary edges, turns out to be sophisticated enough to achieve similar accuracy as 88 representation ($\sim 1\%$ drop from our FP32 baseline), despite $\sim 1.6\times$ reduction in model size, $\sim 26\times$ reduction in number of multiplications , and potentially $\sim 2\times$ inference speed up comparing to 88 representation, on the stateoftheart deep network ResNet101 pretrained on ImageNet dataset. Moreover, depending on the varying accuracy requirements in a dynamic environment, the deployed lowprecision model can be upgraded/downgraded onthefly by partially enabling/disabling residual connections. For example, disabling the least important residual connections in the above enhanced network, the accuracy drop is $\sim 2\%$ (from our FP32 baseline), despite $\sim 1.9\times$ reduction in model size, $\sim 32\times$ reduction in number of multiplications, and potentially $\sim 2.3\times$ inference speed up comparing to 88 representation. Finally, all the ternary connections are sparse in nature, and the residual ternary conversion can be done in a resourceconstraint setting without any lowprecision (re)training and without accessing the data. 
Ternary Weight Neural Networks (TWN) 
We introduce ternary weight networks (TWNs) – neural networks with weights constrained to +1, 0 and 1. The Euclidian distance between full (float or double) precision weights and the ternary weights along with a scaling factor is minimized. Besides, a thresholdbased ternary function is optimized to get an approximated solution which can be fast and easily computed. TWNs have stronger expressive abilities than the recently proposed binary precision counterparts and are thus more effective than the latter. Meanwhile, TWNs achieve up to 16× or 32× model compression rate and need fewer multiplications compared with the full precision counterparts. Benchmarks on MNIST, CIFAR10, and large scale ImageNet datasets show that the performance of TWNs is only slightly worse than the full precision counterparts but outperforms the analogous binary precision counterparts a lot. ➘ “Ternary Neural Networks” 
TernausNetV2  The most common approaches to instance segmentation are complex and use twostage networks with object proposals, conditional randomfields, template matching or recurrent neural networks. In this work we present TernausNetV2 – a simple fully convolutional network that allows extracting objects from a highresolution satellite imagery on an instance level. The network has popular encoderdecoder type of architecture with skip connections but has a few essential modifications that allows using for semantic as well as for instance segmentation tasks. This approach is universal and allows to extend any network that has been successfully applied for semantic segmentation to perform instance segmentation task. In addition, we generalize network encoder that was pretrained for RGB images to use additional input channels. It makes possible to use transfer learning from visual to a wider spectral range. For DeepGlobeCVPR 2018 building detection subchallenge, based on public leaderboard score, our approach shows superior performance in comparison to other methods. The source code corresponding pretrained weights are publicly available at https://…/TernausNetV2 
TerpreT  We study machine learning formulations of inductive program synthesis; that is, given inputoutput examples, synthesize source code that maps inputs to corresponding outputs. Our key contribution is TerpreT, a domainspecific language for expressing program synthesis problems. A TerpreT model is composed of a specification of a program representation and an interpreter that describes how programs map inputs to outputs. The inference task is to observe a set of inputoutput examples and infer the underlying program. From a TerpreT model we automatically perform inference using four different backends: gradient descent (thus each TerpreT model can be seen as defining a differentiable interpreter), linear program (LP) relaxations for graphical models, discrete satisfiability solving, and the Sketch program synthesis system. TerpreT has two main benefits. First, it enables rapid exploration of a range of domains, program representations, and interpreter models. Second, it separates the model specification from the inference algorithm, allowing proper comparisons between different approaches to inference. We illustrate the value of TerpreT by developing several interpreter models and performing an extensive empirical comparison between alternative inference algorithms on a variety of program models. To our knowledge, this is the first work to compare gradientbased search over program space to traditional searchbased alternatives. Our key empirical finding is that constraint solvers dominate the gradient descent and LPbased formulations. This is a workshop summary of a longer report at arXiv:1608.04428 
Test for Excess Significance (TES) 
In any series of typicallypowered experiments, we expect some to fail to be nonsignificant due to sampling error, even if a true effect exists. If we see a series of five experiments, and they are all significant, one thinks that either they are either very high powered, the authors got lucky, or there are some nonsignificant studies missing. For many sets of studies, the first seems implausible because the effect sizes are small; the last is important, because if it is true then the picture we get of the results is misleading. http://…tisticalalchemyandtestforexcess.html http://…/TESsimulation.html 
Test Set  A test set is a set of data used in various areas of information science to assess the strength and utility of a predictive relationship. Test sets are used in artificial intelligence, machine learning, genetic programming and statistics. In all these fields, a test set has much the same role. 
TestBased Bayes Factor (TBF) 
TBFmultinomial 
Text Data Processing (TDP) 

Text Mining  Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving highquality information from text. Highquality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. ‘High quality’ in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities). 
TextBoxes  This paper presents an endtoend trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no postprocess except for a standard nonmaximum suppression. TextBoxes outperforms competing methods in terms of text localization accuracy and is much faster, taking only 0.09s per image in a fast implementation. Furthermore, combined with a text recognizer, TextBoxes significantly outperforms stateoftheart approaches on word spotting and endtoend text recognition tasks. 
TextEnt  In this paper, we describe TextEnt, a neural network model that learns distributed representations of entities and documents directly from a knowledge base (KB). Given a document in a KB consisting of words and entity annotations, we train our model to predict the entity that the document describes and map the document and its target entity close to each other in a continuous vector space. Our model is trained using a large number of documents extracted from Wikipedia. The performance of the proposed model is evaluated using two tasks, namely finegrained entity typing and multiclass text classification. The results demonstrate that our model achieves stateoftheart performance on both tasks. The code and the trained representations are made available online for further academic research. 
Textology  A Textology is a graph of word clusters connected by cooccurrence relations. 
TextSnake  Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks. However, limited by the representations (axisaligned rectangles, rotated rectangles or quadrangles) adopted to describe text, existing methods may fall short when dealing with much more freeform text instances, such as curved text, which are actually very common in realworld scenarios. To tackle this problem, we propose a more flexible representation for scene text, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms. In TextSnake, a text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation. Such geometry attributes are estimated via a Fully Convolutional Network (FCN) model. In experiments, the text detector based on TextSnake achieves stateoftheart or comparable performance on TotalText and SCUTCTW1500, the two newly published benchmarks with special emphasis on curved text in natural images, as well as the widelyused datasets ICDAR 2015 and MSRATD500. Specifically, TextSnake outperforms the baseline on TotalText by more than 40% in Fmeasure. 
TexttoFace (T2F) 
This project combines two of the recent architectures StackGAN and ProGAN for synthesizing faces from textual descriptions. The project uses Face2Text dataset which contains 400 facial images and textual captions for each of them. The data can be obtained by contacting either the RIVAL group or the authors of the aforementioned paper. 
TexttoSpeechSystem (TTS) 
Speech Synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A texttospeech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for highquality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely ‘synthetic’ voice output. The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood clearly. An intelligible texttospeech program allows people with visual impairments or reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1990s. Overview of a typical TTS system Automatic announcement Menu 0:00 A synthetic voice announcing an arriving train in Sweden. Problems playing this file? See media help. Problems playing this file? See media help. A texttospeech system (or ‘engine’) is composed of two parts: a frontend and a backend. The frontend has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of writtenout words. This process is often called text normalization, preprocessing, or tokenization. The frontend then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called texttophoneme or graphemetophoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the frontend. The backend – often referred to as the synthesizer – then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations), which is then imposed on the output speech. 
Textual Grounding  The author argues that users see texts as tools when they recognize the texts’ specific value and function within highly localized use settings. The author argues that users ‘ground’ their texts to local use settings by altering the ways in which the texts structure and represent information (e.g., underlining, annotation, and sketching). The author discusses three practices by which texts are grounded as tools in document reviews: mode shifting, layering, and marking. These practices reflect different ways by which users add, subtract, and restructure information in a text so that it is usable under very specific conditions. This article explores document review as a practice in which grounding is the object of discussion (how others use the reviewed documents) and a practice by which review is facilitated. These observations will be important for exploration of technology to support ‘grounding’ practices. Unsupervised Textual Grounding: Linking Words to Image Concepts 
Textual Membership Queries  Human labeling of textual data can be very timeconsuming and expensive, yet it is critical for the success of an automatic text classification system. In order to minimize human labeling efforts, we propose a novel active learning (AL) solution, that does not rely on existing sources of unlabeled data. It uses a small amount of labeled data as the core set for the synthesis of useful membership queries (MQs) – unlabeled instances synthesized by an algorithm for human labeling. Our solution uses modification operators, functions from the instance space to the instance space that change the input to some extent. We apply the operators on the core set, thus creating a set of new membership queries. Using this framework, we look at the instance space as a search space and apply search algorithms in order to create desirable MQs. We implement this framework in the textual domain. The implementation includes using methods such as WordNet and Word2vec, for replacing text fragments from a given sentence with semantically related ones. We test our framework on several text classification tasks and show improved classifier performance as more MQs are labeled and incorporated into the training set. To the best of our knowledge, this is the first work on membership queries in the textual domain. 
Textures.js  SVG patterns for Data Visualization 
The House Of inteRactions (THOR) 
We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor.allenai.org. AI2THOR consists of near photorealistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning, learning by interaction, planning, visual question answering, unsupervised representation learning, object detection and segmentation, and learning models of cognition. The goal of AI2THOR is to facilitate building visually intelligent models and push the research forward in this domain. 
Theano  Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multidimensional arrays efficiently. Theano features: · tight integration with NumPy – Use numpy.ndarray in Theanocompiled functions. · transparent use of a GPU – Perform dataintensive calculations up to 140x faster than with CPU.(float32 only) · efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs. · speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny. · dynamic C code generation – Evaluate expressions faster. · extensive unittesting and selfverification – Detect and diagnose many types of mistake. Theano has been powering largescale computationally intensive scientific investigations since 2007. But it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal). http://…/theano_word_embeddings 
Thematic Map  Thematic maps are geographical maps in which statistical data are visualized. A thematic map is a type of map especially designed to show a particular theme connected with a specific geographic area. These maps ‘can portray physical, social, political, cultural, economic, sociological, agricultural, or any other aspects of a city, state, region, nation, or continent’. tmap 
Theory of Evidence  The theory of belief functions, also referred to as evidence theory or DempsterShafer theory (DST), is a general framework for reasoning with uncertainty, with understood connections to other frameworks such as probability, possibility and imprecise probability theories. First introduced by Arthur P. Dempster in the context of statistical inference, the theory was later developed by Glenn Shafer into a general framework for modeling epistemic uncertaintya mathematical theory of evidence. The theory allows one to combine evidence from different sources and arrive at a degree of belief (represented by a mathematical object called belief function) that takes into account all the available evidence. In a narrow sense, the term DempsterShafer theory refers to the original conception of the theory by Dempster and Shafer. However, it is more common to use the term in the wider sense of the same general approach, as adapted to specific kinds of situations. In particular, many authors have proposed different rules for combining evidence, often with a view to handling conflicts in evidence better. The early contributions have also been the starting points of many important developments, including the transferable belief model and the theory of hints. 
TherML  In this work we offer a framework for reasoning about a wide class of existing objectives in machine learning. We develop a formal correspondence between this work and thermodynamics and discuss its implications. 
Theta Method  Accurate and robust forecasting methods for univariate time series are very important when the objective is to produce estimates for a large number of time series. In this context, the Theta method called researchers attention due its performance in the largest uptodate forecasting competition, the M3Competition. Theta method proposes the decomposition of the deseasonalised data into two ‘theta lines’. The first theta line removes completely the curvatures of the data, thus being a good estimator of the longterm trend component. The second theta line doubles the curvatures of the series, as to better approximate the shortterm behaviour. http://…/Theta.pdf forecTheta 
Thick Data  Thick Data: ethnographic approaches that uncover the meaning behind Big Data visualization and analysis. Thick Data analysis primarily relies on human brain power to process a small “N” while big data analysis requires computational power (of course with humans writing the algorithms) to process a large “N”. Big Data reveals insights with a particular range of data points, while Thick Data reveals the social context of and connections between data points. Big Data delivers numbers; thick data delivers stories. Big data relies on machine learning; thick data relies on human learning. 
Thingscoop  Thingscoop is a commandline utility for analyzing videos semantically – that means searching, filtering, and describing videos based on objects, places, and other things that appear in them. When you first run thingscoop on a video file, it uses a convolutional neural network to create an ‘index’ of what’s contained in the every second of the input by repeatedly performing image classification on a framebyframe basis. Once an index for a video file has been created, you can search (i.e. get the start and end times of the regions in the video matching the query) and filter (i.e. create a supercut of the matching regions) the input using arbitrary queries. Thingscoop uses a very basic query language that lets you to compose queries that test for the presence or absence of labels with the logical operators ! (not),  (or) and && (and). For example, to search a video the presence of the sky and the absence of the ocean: thingscoop search ‘sky && !ocean’ <file>. Right now two models are supported by thingscoop: vgg_imagenet uses the architecture described in ‘Very Deep Convolutional Networks for LargeScale Image Recognition’ to recognize objects from the ImageNet database, and googlenet_places uses the architecture described in ‘Going Deeper with Convolutions’ to recognize settings and places from the MIT Places database. You can specify which model you’d like to use by running thingscoop models use <model>, where <model> is either vgg_imagenet or googlenet_places. More models will be added soon. Thingscoop is based on Caffe, an opensource deep learning framework. GitXiv 
Thompson Sampling (TS) 
We study the application of the Thompson Sampling (TS) methodology to the stochastic combinatorial multiarmed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distributiondependent regret bound of $O(m\log T / \Delta_{\min}) $ for TS under general CMAB, where $m$ is the number of arms, $T$ is the time horizon, and $\Delta_{\min}$ is the minimum gap between the expected reward of the optimal solution and any nonoptimal solution. We also show that one can not use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms. 
ThoulessAndersonPalmer (TAP,TAP MF) 

ThoulessAndersonPalmer Gibbs Free Energy (TAP Gibbs Free Energy) 
The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization ofMinka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem. 
Threading Building Blocks (TBB) 
Threading Building Blocks (TBB) is a C++ template library developed by Intel for writing software programs that take advantage of multicore processors. The library consists of data structures and algorithms that allow a programmer to avoid some complications arising from the use of native threading packages such as POSIX threads, Windows threads, or the portable Boost Threads in which individual threads of execution are created, synchronized, and terminated manually. Instead the library abstracts access to the multiple processors by allowing the operations to be treated as “tasks”, which are allocated to individual cores dynamically by the library’s runtime engine, and by automating efficient use of the CPU cache. A TBB program creates, synchronizes and destroys graphs of dependent tasks according to algorithms, i.e. highlevel parallel programming paradigms (a.k.a. Algorithmic Skeletons). Tasks are then executed respecting graph dependencies. This approach groups TBB in a family of solutions for parallel programming aiming to decouple the programming from the particulars of the underlying machine. 
ThreeMode Principal Components Analysis  In multivariate analysis the data have usually two way and/or two modes. This book treats prinicipal component analysis of data which can be characterised by threeways and/or modes, like subjects by variables by conditions or occasions. The book extends the work on threemode factor analysis by Tucker and the work on individual differences scaling by Carroll and colleagues. The many examples give a true feeling of the working of the techniques. tuckerR.mmgg 
Thrill  This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on sharedmemory multicore machines, the second part external memory suffix sorting using the induced sorting principle, and the third part distributed external memory suffix sorting with a new distributed algorithmic big data framework named Thrill. 
THUMT  This paper introduces THUMT, an opensource toolkit for neural machine translation (NMT) developed by the Natural Language Processing Group at Tsinghua University. THUMT implements the standard attentionbased encoderdecoder framework on top of Theano and supports three training criteria: maximum likelihood estimation, minimum risk training, and semisupervised training. It features a visualization tool for displaying the relevance between hidden states in neural networks and contextual words, which helps to analyze the internal workings of NMT. Experiments on ChineseEnglish datasets show that THUMT using minimum risk training significantly outperforms GroundHog, a stateoftheart toolkit for NMT. 
Tibble  Tibbles are a modern take on data frames. They keep the features that have stood the test of time, and drop the features that used to be convenient but are now frustrating (i.e. converting character vectors to factors). tibble,tibbletime 
tick  tick is a statistical learning library for Python~3, with a particular emphasis on timedependent models, such as point processes, and tools for generalized linear models and survival analysis. The core of the library is an optimization module providing model computational classes, solvers and proximal operators for regularization. tick relies on a C++ implementation and stateoftheart optimization algorithms to provide very fast computations in a single node multicore setting. Source code and documentation can be downloaded from https://…/tick 
Tidy Data  Tidy datasets are easy to manipulate, model and visualise, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. 
Tiered Sampling  We introduce Tiered Sampling, a novel technique for approximate counting sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size $M$, which can be magnitudes smaller than the number of edges. Our methods addresses the challenging task of counting sparse motifs – subgraph patterns that have low probability to appear in a sample of $M$ edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count we partition the available memory to tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of substructures of the desired motif. By storing more frequent substructures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. We demonstrate the advantage of our method in the specific applications of counting sparse 4 and 5cliques in massive graphs. We present a complete analytical analysis and extensive experimental results using both synthetic and realworld data. Our results demonstrate the advantage of our method in obtaining highquality approximations for the number of 4 and 5cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs. 
Tight SemiNonnegative Matrix Factorization  The nonnegative matrix factorization is a widely used, flexible matrix decomposition, finding applications in biology, image and signal processing and information retrieval, among other areas. Here we present a related matrix factorization. A multiobjective optimization problem finds conical combinations of templates that approximate a given data matrix. The templates are chosen so that as far as possible only the initial data set can be represented this way. However, the templates are not required to be nonnegative nor convex combinations of the original data. 
Tikhonov Regularization  Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of illposed problems. In statistics, the method is known as ridge regression, and with multiple independent discoveries, it is also variously known as the TikhonovMiller method, the PhillipsTwomey method, the constrained linear inversion method, and the method of linear regularization. It is related to the LevenbergMarquardt algorithm for nonlinear leastsquares problems. 
Tile2Vec  Remote sensing lacks methods like the word vector representations and pretrained networks that significantly boost performance across a wide range of natural language and computer vision tasks. To fill this gap, we introduce Tile2Vec, an unsupervised representation learning algorithm that extends the distributional hypothesis from natural language — words appearing in similar contexts tend to have similar meanings — to geospatial data. We demonstrate empirically that Tile2Vec learns semantically meaningful representations on three datasets. Our learned representations significantly improve performance in downstream classification tasks and similarly to word vectors, visual analogies can be obtained by simple arithmetic in the latent space. 
Time Oriented Language (TOL) 
TOL is the Time Oriented Language. It is a programming language dedicated to the world of statistics and focused on time series analysis and stochastic processes. It is a declarative language based on two key features: simple syntactical rules and powerful set of extensible data types and functions. TOL is callable by a small text console, but there is also a graphical interface to easily handle all language’s tools and functions, providing powerful graphical capacities. TOL is distributed under the GNU GPL license. tolBasis 
Time Perception Machine  Numerous powerful point process models have been developed to understand temporal patterns in sequential data from fields such as healthcare, electronic commerce, social networks, and natural disaster forecasting. In this paper, we develop novel models for learning the temporal distribution of human activities in streaming data (e.g., videos and person trajectories). We propose an integrated framework of neural networks and temporal point processes for predicting when the next activity will happen. Because point processes are limited to taking event frames as input, we propose a simple yet effective mechanism to extract features at frames of interest while also preserving the rich information in the remaining frames. We evaluate our model on two challenging datasets. The results show that our model outperforms traditional statistical point process approaches significantly, demonstrating its effectiveness in capturing the underlying temporal dynamics as well as the correlation within sequential activities. Furthermore, we also extend our model to a joint estimation framework for predicting the timing, spatial location, and category of the activity simultaneously, to answer the when, where, and what of activity prediction. 
Time Series Analysis / Time Series  A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones Industrial Average and the annual flow volume of the Nile River at Aswan. Time series are very frequently plotted via line charts. Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, and communications engineering. 
Time Series Cointegrated System (TSCS) 
TSCS 
Time Series Database (TSDB) 
A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range). In some fields these time series are called profiles, curves, or traces. A time series of stock prices might be called a price curve. A time series of energy consumption might be called a load profile. A log of temperature values over time might be called a temperature trace. Despite the disparate names, many of the same mathematical operations, queries, or database transactions are useful for analysing all of them. The implementation of a database that can correctly, reliably, and efficiently implement these operations must be specialized for timeseries data. TSDBs are databases that are optimized for time series data. Software with complex logic or business rules and high transaction volume for time series data may not be practical with traditional relational database management systems. Flat file databases are not a viable option either, if the data and transaction volume reaches a maximum threshold determined by the capacity of individual servers (processing power and storage capacity). Queries for historical data, replete with time ranges and roll ups and arbitrary time zone conversions are difficult in a relational database. Compositions of those rules are even more difficult. This is a problem compounded by the free nature of relational systems themselves. Many relational systems are often not modelled correctly with respect to time series data. TSDBs on the other hand impose a model and this allows them to provide more features for doing so. Ideally, these repositories are often natively implemented using specialized database algorithms. However, it is possible to store time series as binary large objects (BLOBs) in a relational database or by using a VLDB approach coupled with a pure star schema. Efficiency is often improved if time is treated as a discrete quantity rather than as a continuous mathematical dimension. Database joins across multiple time series data sets is only practical when the time tag associated with each data entry spans the same set of discrete times for all data sets across which the join is performed. 
TimeConvolution Layer (tConv) 
Automatic heart sound abnormality detection can play a vital role in the early diagnosis of heart diseases, particularly in lowresource settings. The stateoftheart algorithms for this task utilize a set of Finite Impulse Response (FIR) bandpass filters as a frontend followed by a Convolutional Neural Network (CNN) model. In this work, we propound a novel CNN architecture that integrates the frontend bandpass filters within the network using timeconvolution (tConv) layers, which enables the FIR filterbank parameters to become learnable. Different initialization strategies for the learnable filters, including random parameters and a set of predefined FIR filterbank coefficients, are examined. Using the proposed tConv layers, we add constraints to the learnable FIR filters to ensure linear and zero phase responses. Experimental evaluations are performed on a balanced 4fold crossvalidation task prepared using the PhysioNet/CinC 2016 dataset. Results demonstrate that the proposed models yield superior performance compared to the stateoftheart system, while the linear phase FIR filterbank method provides an absolute improvement of 9.54% over the baseline in terms of an overall accuracy metric. 
Timed DiscreteEvent System (TDES) 
Timed DiscreteEvent Systems are Synchronous Product Structures 
TimeLapse Mining  We introduce an approach for synthesizing timelapse videos of popular landmarks from large community photo collections. The approach is completely automated and leverages the vast quantity of photos available online. First, we cluster 86 million photos into landmarks and popular viewpoints. Then, we sort the photos by date and warp each photo onto a common viewpoint. Finally, we stabilize the appearance of the sequence to compensate for lighting effects and minimize flicker. Our resulting timelapses show diverse changes in the world’s most popular sites, like glaciers shrinking, skyscrapers being constructed, and waterfalls changing course. 
TimetoEvent Data  Timetoevent data, also often referred to as survival data, arise when interest is focused on the time elapsing before an event is experienced. By events we mean occurrences that are of interest in scientific studies from various disciplines such as medicine, epidemiology, demography, biology, sociology, economics, engineering, et cetera. Examples of such events are: death, onset of infection, divorce, unemployment, and failure of a mechanical device. All of these may be subject to scientific interest where one tries to understand their cause or establish risk factors. flexsurvcure,goftte 
TimeWeighted Dynamic Time Warping (TWDTW) 
Dynamic time warping (DTW), which finds the minimum path by providing nonlinear alignments between two time series, has been widely used as a distance measure for time series classification and clustering. However, DTW does not account for the relative importance regarding the phase difference between a reference point and a testing point. This may lead to misclassification especially in applications where the shape similarity between two sequences is a major consideration for an accurate recognition. Therefore, we propose a novel distance measure, called a weighted DTW (WDTW), which is a penaltybased DTW. Our approach penalizes points with higher phase difference between a reference point and a testing point in order to prevent minimum distance distortion caused by outliers. The rationale underlying the proposed distance measure is demonstrated with some illustrative examples. A new weight function, called the modified logistic weight function (MLWF), is also proposed to systematically assign weights as a function of the phase difference between a reference point and a testing point. By applying different weights to adjacent points, the proposed algorithm can enhance the detection of similarity between two time series. We show that some popular distance measures such as DTW and Euclidean distance are special cases of our proposed WDTW measure. We extend the proposed idea to other variants of DTW such as derivative dynamic time warping (DDTW) and propose the weighted version of DDTW. We have compared the performances of our proposed procedures with other popular approaches using public data sets available through the UCR Time Series Data Mining Archive for both time series classification and clustering problems. The experimental results indicate that the proposed approaches can achieve improved accuracy for time series classification and clustering problems. ➚ “Dynamic Time Warping” dtwSat 
Tiny Deeply Supervised Object Detection (TinyDSOD) 
Object detection has made great progress in the past few years along with the development of deep learning. However, most current object detection methods are resource hungry, which hinders their wide deployment to many resource restricted usages such as usages on alwayson devices, batterypowered lowend devices, etc. This paper considers the resource and accuracy tradeoff for resourcerestricted usages during designing the whole object detection framework. Based on the deeply supervised object detection (DSOD) framework, we propose TinyDSOD dedicating to resourcerestricted usages. TinyDSOD introduces two innovative and ultraefficient architecture blocks: depthwise dense block (DDB) based backbone and depthwise featurepyramidnetwork (DFPN) based frontend. We conduct extensive experiments on three famous benchmarks (PASCAL VOC 2007, KITTI, and COCO), and compare TinyDSOD to the stateoftheart ultraefficient object detection solutions such as TinyYOLO, MobileNetSSD (v1 & v2), SqueezeDet, Pelee, etc. Results show that TinyDSOD outperforms these solutions in all the three metrics (parametersize, FLOPs, accuracy) in each comparison. For instance, TinyDSOD achieves 72.1% mAP with only 0.95M parameters and 1.06B FLOPs, which is by far the stateofthearts result with such a low resource requirement. 
Tiny SSD  Object detection is a major challenge in computer vision, involving both object classification and object localization within a scene. While deep neural networks have been shown in recent years to yield very powerful techniques for tackling the challenge of object detection, one of the biggest challenges with enabling such object detection networks for widespread deployment on embedded devices is high computational and memory requirements. Recently, there has been an increasing focus in exploring small deep neural network architectures for object detection that are more suitable for embedded devices, such as Tiny YOLO and SqueezeDet. Inspired by the efficiency of the Fire microarchitecture introduced in SqueezeNet and the object detection performance of the singleshot detection macroarchitecture introduced in SSD, this paper introduces Tiny SSD, a singleshot detection deep convolutional neural network for realtime embedded object detection that is composed of a highly optimized, nonuniform Fire subnetwork stack and a nonuniform subnetwork stack of highly optimized SSDbased auxiliary convolutional feature layers designed specifically to minimize model size while maintaining object detection performance. The resulting Tiny SSD possess a model size of 2.3MB (~26X smaller than Tiny YOLO) while still achieving an mAP of 61.3% on VOC 2007 (~4.2% higher than Tiny YOLO). These experimental results show that very small deep neural network architectures can be designed for realtime object detection that are wellsuited for embedded scenarios. 
Tiramisu  This paper introduces Tiramisu, an optimization framework designed to generate efficient code for highperformance systems such as multicores, GPUs, FPGAs, distributed machines, or any combination of these. Tiramisu relies on a flexible representation based on the polyhedral model and introduces a novel fourlevel IR that allows full separation between algorithms, schedules, datalayouts and communication. This separation simplifies targeting multiple hardware architectures from the same algorithm. We evaluate Tiramisu by writing a set of linear algebra and DNN kernels and by integrating it as a pass in the Halide compiler. We show that Tiramisu extends Halide with many new capabilities, and that Tiramisu can generate efficient code for multicores, GPUs, FPGAs and distributed heterogeneous systems. The performance of code generated by the Tiramisu backends matches or exceeds handoptimized reference implementations. For example, the multicore backend matches the highly optimized Intel MKL library on many kernels and shows speedups reaching 4x over the original Halide. 
TNet  Recent advances in metalearning demonstrate that deep representations combined with the gradient descent method have sufficient capacity to approximate any learning algorithm. A promising approach is the modelagnostic metalearning (MAML) which embeds gradient descent into the metalearner. It optimizes for the initial parameters of the learner to warmstart the gradient descent updates, such that new tasks can be solved using a small number of examples. In this paper we elaborate the gradientbased metalearning, developing two new schemes. First, we present a feedforward neural network, referred to as Tnet, where the linear transformation between two adjacent layers is decomposed as T W such that W is learned by taskspecific learners and the transformation T, which is shared across tasks, is metalearned to speed up the convergence of gradient updates for taskspecific learners. Second, we present MTnet where gradient updates in the Tnet are guided by a binary mask M that is metalearned, restricting the updates to be performed in a subspace. Empirical results demonstrate that our method is less sensitive to the choice of initial learning rates than existing metalearning methods, and achieves the stateoftheart or comparable performance on fewshot classification and regression tasks. 
Tofu  There is a trend towards using very large deep neural networks (DNN) to improve the accuracy of complex machine learning tasks. However, the size of DNN models that can be explored today is limited by the amount of GPU device memory. This paper presents Tofu, a system for partitioning very large DNN models across multiple GPU devices. Tofu is designed for a tensorbased dataflow system: for each operator in the dataflow graph, it partitions its input/output tensors and parallelizes its execution across workers. Tofu can automatically discover how each operator can be partitioned by analyzing its semantics expressed in a simple specification language. Tofu uses a search algorithm based on dynamic programming to determine the best partition strategy for each operator in the entire dataflow graph. Our experiments on an 8GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves better performance than alternative approaches to train very large models on multiple GPUs. 
Topic Compositional Neural Language Model (TCNLM) 
We propose a Topic Compositional Neural Language Model (TCNLM), a novel method designed to simultaneously capture both the global semantic meaning and the local word ordering structure in a document. The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a MixtureofExperts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence. In order to train the MoE model efficiently, a matrix factorization method is applied, by extending each weight matrix of the RNN to be an ensemble of topicdependent weight matrices. The degree to which each member of the ensemble is used is tied to the documentdependent probability of the corresponding topics. Experimental results on several corpora show that the proposed approach outperforms both a pure RNNbased model and other topicguided language models. Further, our model yields sensible topics, and also has the capacity to generate meaningful sentences conditioned on given topics. 
Topic Detection and Tracking (TDT) 
Topic Detection and Tracking (TDT) is a Body of Research and an Evaluation Paradigm that Addresses EventBased Organization of Broadcast News. The TDT Evaluation Tasks of Tracking, Cluster Detection, and First Story Detection are Each Information Filtering Technology in the Sense That They Require That ‘yes or no’ Decisions be Made on a Stream of News Stories Before Additional Stories Have Arrived. http://…/27957310732040517 
Topic Model  In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: “dog” and “bone” will appear more often in documents about dogs, “cat” and “meow” will appear in documents about cats, and “the” and “is” will appear equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents and discovering, based on the statistics of the words in each, what the topics might be and what each document’s balance of topics is. 
Topic Tagging  
TopicRNN  In this paper, we propose TopicRNN, a recurrent neural network (RNN)based language model designed to directly capture the global semantic meaning relating words in a document via latent topics. Because of their sequential nature, RNNs are good at capturing the local structure of a word sequence – both semantic and syntactic – but might face difficulty remembering longrange dependencies. Intuitively, these longrange dependencies are of semantic nature. In contrast, latent topic models are able to capture the global underlying semantic structure of a document but do not account for word ordering. The proposed TopicRNN model integrates the merits of RNNs and latent topic models: it captures local (syntactic) dependencies using an RNN and global (semantic) dependencies using latent topics. Unlike previous work on contextual RNN language modeling, our model is learned endtoend. Empirical results on word prediction show that TopicRNN outperforms existing contextual RNN baselines. In addition, TopicRNN can be used as an unsupervised feature extractor for documents. We do this for sentiment analysis and report a new stateoftheart error rate on the IMDB movie review dataset that amounts to a $13.3\%$ improvement over the previous best result. Finally TopicRNN also yields sensible topics, making it a useful alternative to document models such as latent Dirichlet allocation. 
Topological Anomaly Detection (TAD) 
The technique is essentially a density based outlier detection algorithm that, instead of calculating local densities, constructs a graph of the data using nearestneighbors. The algorithm is different from other kNN outlier detection algorithms in that instead of setting ‘k’ as a parameter, you instead set a maximal interobservation distance (called the graph “resolution” by Gartley and Basener). If the distance between two points is less than the graph resolution, add an edge between those two observations to the graph. Once the full graph is constructed, determine which connected components comprise the “background” of the data by setting some threshold percentage of observations ‘p’: any components with fewer than ‘p’ observations is considered an anomalous component, and all the observations (nodes) in this component are outliers. 
Topological Data Analysis (TDA) 
Topological data analysis (TDA) is a new area of study aimed at having applications in areas such as data mining and computer vision. The main problems are: 1. how one infers highdimensional structure from lowdimensional representations; and 2. how one assembles discrete points into global structure. The human brain can easily extract global structure from representations in a strictly lower dimension, i.e. we infer a 3D environment from a 2D image from each eye. The inference of global structure also occurs when converting discrete data into continuous images, e.g. dotmatrix printers and televisions communicate images via arrays of discrete points. The main method used by topological data analysis is: 1. Replace a set of data points with a family of simplicial complexes, indexed by a proximity parameter. 2. Analyse these topological complexes via algebraic topology – specifically, via the theory of persistent homology. 3. Encode the persistent homology of a data set in the form of a parameterized version of a Betti number which is called a persistence diagram or barcode. http://…/whytopologicaldataanalysisworks Topological Analysis of Data Topology Data Analysis (TDA) 
Topological Regularizer for Classifiers (TopoReg) 
Regularization plays a crucial role in supervised learning. A successfully regularized model strikes a balance between a perfect description of the training data and the ability to generalize to unseen data. Most existing methods enforce a global regularization in a structure agnostic manner. In this paper, we initiate a new direction and propose to enforce the structural simplicity of the classification boundary by regularizing over its topological complexity. In particular, our measurement of topological complexity incorporates the importance of topological features (e.g., connected components, handles, and so on) in a meaningful manner, and provides a direct control over spurious topological structures. We incorporate the new measurement as a topological loss in training classifiers. We also propose an efficient algorithm to compute the gradient. Our method provides a novel way to topologically simplify the global structure of the model, without having to sacrifice too much of the flexibility of the model. We demonstrate the effectiveness of our new topological regularizer on a range of synthetic and realworld datasets. 
Topological Sorting  In computer science, a topological sort (sometimes abbreviated topsort or toposort) or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. 
Topology ToolKit (TTK) 
This system paper presents the Topology ToolKit (TTK), a software platform designed for topological data analysis in scientific visualization. TTK provides a unified, generic, efficient, and robust implementation of key algorithms for the topological analysis of scalar data, including: critical points, integral lines, persistence diagrams, persistence curves, merge trees, contour trees, MorseSmale complexes, fiber surfaces, continuous scatterplots, Jacobi sets, Reeb spaces, and more. TTK is easily accessible to end users due to a tight integration with ParaView. It is also easily accessible to developers through a variety of bindings (Python, VTK/C++) for fast prototyping or through direct, dependencefree, C++, to ease integration into preexisting complex systems. While developing TTK, we faced several algorithmic and software engineering challenges, which we document in this paper. In particular, we present an algorithm for the construction of a discrete gradient that complies to the critical points extracted in the piecewiselinear setting. This algorithm guarantees a combinatorial consistency across the topological abstractions supported by TTK, and importantly, a unified implementation of topological data simplification for multiscale exploration and analysis. We also present a cached triangulation data structure, that supports time efficient and generic traversals, which selfadjusts its memory usage on demand for input simplicial meshes and which implicitly emulates a triangulation for regular grids with no memory overhead. Finally, we describe an original software architecture, which guarantees memory efficient and direct accesses to TTK features, while still allowing for researchers powerful and easy bindings and extensions. TTK is open source (BSD license) and its code, online documentation and video tutorials are available on TTK’s website. 
TopologyBased Pathway Enrichment Analysis (TPEA) 
TPEA 
Torch  Torch is a scientific computing framework with wide support for machine learning algorithms. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation. A summary of core features: · a powerful Ndimensional array · lots of routines for indexing, slicing, transposing, … · amazing interface to C, via LuaJIT · linear algebra routines · neural network, and energybased models · numeric optimization routines · Fast and efficient GPU support · Embeddable, with ports to iOS, Android and FPGA backends https://…/torch7 
Total Distance Multivariance  We introduce two new measures for the dependence of $n \ge 2$ random variables: `distance multivariance’ and `total distance multivariance’. Both measures are based on the weighted $L^2$distance of quantities related to the characteristic functions of the underlying random variables. They extend distance covariance (introduced by Szekely, Rizzo and Bakirov) and generalized distance covariance (introduced in part I) from pairs of random variables to $n$tuplets of random variables. We show that total distance multivariance can be used to detect the independence of $n$ random variables and has a simple finitesample representation in terms of distance matrices of the sample points, where distance is measured by a continuous negative definite function. Based on our theoretical results, we present a test for independence of multiple random vectors which is consistent against all alternatives. 
Total Operating Characteristic (TOC) 
The relative operating characteristic (ROC) is a popular statistical method to measure the association between observed and diagnosed presence of a characteristic. The diagnosis of presence or absence depends on whether the value of an index variable is above a threshold. ROC considers multiple possible thresholds. Each threshold generates a twobytwo contingency table, which contains four central entries: hits, misses, false alarms, and correct rejections. ROC reveals for each threshold only two ratios, hits/(hits + misses) and false alarms/(false alarms + correct rejections). This article introduces the total operating characteristic (TOC), which shows the total information in the contingency table for each threshold. TOC maintains desirable properties of ROC, while TOC reveals strictly more information than ROC in a manner that makes TOC more useful than ROC. TOC 
Total Unduplicated Reach and Frequency (TURF) 
TURF Analysis, an acronym for “Total Unduplicated Reach and Frequency”, is a type of statistical analysis used for providing estimates of media or market potential and devising optimal communication and placement strategies given limited resources. TURF analysis identifies the number of users reached by a communication, and how often they are reached. Although originally used by media schedulers to maximize reach and frequency of media spending across different items (print, broadcast, etc.), TURF is also now used to provide estimates of market potential. For example, if a company plans to market a new yogurt, they may consider launching ten possible flavors, but in reality, only three might be purchased in large quantities. The TURF algorithm identifies the optimal product line to maximize the total number of consumers who will purchase at least one SKU. Typically, when T.U.R.F. is undertaken for optimizing a product range, the analysis only looks at the reach of the product range (ignoring the Frequency component of TURF). turfR 
TotallyLooksLike (TTL) 
Perceptual judgment of image similarity by humans relies on a rich internal representations ranging from lowlevel features to highlevel concepts, scene properties and even cultural associations. Existing methods and datasets attempting to explain perceived similarity use stimuli which arguably do not cover the full breadth of factors that affect human similarity judgments, even those geared toward this goal. We introduce a new dataset dubbed \textbf{TotallyLooksLike} (TTL) after a popular entertainment website, which contains images paired by humans as being visually similar. The dataset contains 6016 imagepairs from the wild, shedding light upon a rich and diverse set of criteria employed by human beings. We conduct experiments to try to reproduce the pairings via features extracted from stateoftheart deep convolutional neural networks, as well as additional human experiments to verify the consistency of the collected data. Though we create conditions to artificially make the matching task increasingly easier, we show that machineextracted representations perform very poorly in terms of reproducing the matching selected by humans. We discuss and analyze these results, suggesting future directions for improvement of learned image representations. 
Touchard Model  We present a novel model, which is a twoparameter extension of the Poisson distribution. Its normalizing constant is related to the Touchard polynomials, hence the name of this model. It is a flexible distribution that can account for both under or overdispersion and concentration of zeros that are frequently found in nonPoisson count data. In contrast to some other generalizations, the Hessian matrix for maximum likelihood estimation of the Touchard parameters has a simple form. We exemplify with three data sets, showing that our suggested model is a competitive candidate for fitting nonPoisson counts. touchard 
Toybox  Deep convolutional neural networks (CNNs) have enjoyed tremendous success in computer vision in the past several years, particularly for visual object recognition.However, how CNNs work remains poorly understood, and the training of deep CNNs is still considered more art than science. To better characterize deep CNNs and the training process, we introduce a new video dataset called Toybox. Images in Toybox come from firstperson, wearable camera recordings of common household objects and toys being manually manipulated to undergo structured transformations like rotations and translations. We also present results from initial experiments using deep CNNs that begin to examine how different distributions of training data can affect visual object recognition performance, and how visual object concepts are represented within a trained network. 
Toyplot  Toyplot, the kidsized plotting toolkit for Python with grownupsized goals: · Develop beautiful interactive, animated plots that embrace the unique capabilities of electronic publishing and support repoducibility. · Create the best possible data graphics ‘outofthebox’, maximizing data ink and minimizing chartjunk. · Provide a clean, minimalist interface that scientists and engineers will love. The Toyplot Tutorial 
tproduct  Kilmer and Martin [Linear Algebra Appl., 435 (2011), pp. 641–658] 
Trace LassoL1 Graph Cut (TLL1GC) 
This work proposes an adaptive trace lasso regularized L1norm based graph cut method for dimensionality reduction of Hyperspectral images, called as `Trace LassoL1 Graph Cut’ (TLL1GC). The underlying idea of this method is to generate the optimal projection matrix by considering both the sparsity as well as the correlation of the data samples. The conventional L2norm used in the objective function is sensitive to noise and outliers. Therefore, in this work L1norm is utilized as a robust alternative to L2norm. Besides, for further improvement of the results, we use a penalty function of trace lasso with the L1GC method. It adaptively balances the L2norm and L1norm simultaneously by considering the data correlation along with the sparsity. We obtain the optimal projection matrix by maximizing the ratio of betweenclass dispersion to withinclass dispersion using L1norm with trace lasso as the penalty. Furthermore, an iterative procedure for this TLL1GC method is proposed to solve the optimization function. The effectiveness of this proposed method is evaluated on two benchmark HSI datasets. 
Training Set  A training set is a set of data used in various areas of information science to discover potentially predictive relationships. Training sets are used in artificial intelligence, machine learning, genetic programming, intelligent systems, and statistics. In all these fields, a training set has much the same role and is often used in conjunction with a test set. 
Training, Validation, Test  Divide the data set into three parts: · Training, Validation, Test (e.g. 50, 25, 25) · Fit model on the TRAINING set · Select model using VALIDATION set · Assess prediction error using TEST set 
Trainless Accuracy Predictor for Architecture Search (TAPAS) 
In recent years an increasing number of researchers and practitioners have been suggesting algorithms for largescale neural network architecture search: genetic algorithms, reinforcement learning, learning curve extrapolation, and accuracy predictors. None of them, however, demonstrated highperformance without training new experiments in the presence of unseen datasets. We propose a new deep neural network accuracy predictor, that estimates in fractions of a second classification performance for unseen input datasets, without training. In contrast to previously proposed approaches, our prediction is not only calibrated on the topological network information, but also on the characterization of the datasetdifficulty which allows us to retune the prediction without any training. Our predictor achieves a performance which exceeds 100 networks per second on a single GPU, thus creating the opportunity to perform largescale architecture search within a few minutes. We present results of two searches performed in 400 seconds on a single GPU. Our best discovered networks reach 93.67% accuracy for CIFAR10 and 81.01% for CIFAR100, verified by training. These networks are performance competitive with other automatically discovered stateoftheart networks however we only needed a small fraction of the time to solution and computational resources. 
TrajclusiVATbased TP  Trajectory prediction (TP) is of great importance for a wide range of locationbased applications in intelligent transport systems such as locationbased advertising, route planning, traffic management, and early warning systems. In the last few years, the widespread use of GPS navigation systems and wireless communication technology enabled vehicles has resulted in huge volumes of trajectory data. The task of utilizing this data employing spatiotemporal techniques for trajectory prediction in an efficient and accurate manner is an ongoing research problem. Existing TP approaches are limited to shortterm predictions. Moreover, they cannot handle a large volume of trajectory data for longterm prediction. To address these limitations, we propose a scalable clustering and Markov chain based hybrid framework, called TrajclusiVATbased TP, for both shortterm and longterm trajectory prediction, which can handle a large number of overlapping trajectories in a dense road network. In addition, TrajclusiVAT can also determine the number of clusters, which represent different movement behaviours in input trajectory data. In our experiments, we compare our proposed approach with a mixed Markov model (MMM)based scheme, and a trajectory clustering, NETSCANbased TP method for both short and longterm trajectory predictions. We performed our experiments on two real, vehicle trajectory datasets, including a largescale trajectory dataset consisting of 3.28 million trajectories obtained from 15,061 taxis in Singapore over a period of one month. Experimental results on two real trajectory datasets show that our proposed approach outperforms the existing approaches in terms of both short and longterm prediction performances, based on prediction accuracy and distance error (in km). 
Trajectory Analysis  traj 
Trajectory Mining  Predicting transportation modes from GPS (Global Positioning System) records is a hot topic in the trajectory mining domain. Each GPS record is called a trajectory point and a trajectory is a sequence of these points. Trajectory mining has applications including but not limited to transportation mode detection, tourism, traffic congestion, smart cities management, animal behaviour analysis, environmental preservation, and traffic dynamics are some of the trajectory mining applications. Transportation modes prediction as one of the tasks in human mobility and vehicle mobility applications plays an important role in resource allocation, traffic management systems, tourism planning and accident detection. 
TRAJEDI  The vast increase in our ability to obtain and store trajectory data necessitates trajectory analytics techniques to extract useful information from this data. Pairwise distance functions are a foundation building block for common operations on trajectory datasets including constrained SELECT queries, knearest neighbors, and similarity and diversity algorithms. The accuracy and performance of these operations depend heavily on the speed and accuracy of the underlying trajectory distance function, which is in turn affected by trajectory calibration. Current methods either require calibrated data, or perform calibration of the entire relevant dataset first, which is expensive and time consuming for large datasets. We present TRAJEDI, a calibrationaware pairwise distance calculation scheme that outperforms naive approaches while preserving accuracy. We also provide analyses of parameter tuning to tradeoff between speed and accuracy. Our scheme is usable with any diversity, similarity or knearest neighbor algorithm. 
Transducer  We allow database user to script a parallel relational database engine with a procedural language. Procedural language code is executed as a user defined relational query operator called transducer. Transducer is tightly integrated with relation engine, including query optimizer, query executor and can be executed in parallel like other query operators. With transducer, we can efficiently execute queries that are very difficult to express in SQL. As example, we show how to run time series and graph queries, etc, within a parallel relational database. 
Transduction  In logic, statistical inference, and supervised learning, transduction or transductive inference is reasoning from observed, specific (training) cases to specific (test) cases. In contrast, induction is reasoning from observed training cases to general rules, which are then applied to the test cases. The distinction is most interesting in cases where the predictions of the transductive model are not achievable by any inductive model. Note that this is caused by transductive inference on different test sets producing mutually inconsistent predictions. 
Transductive Adversarial Network (TAN) 
Transductive Adversarial Networks (TAN) is a novel domainadaptation machine learning framework that is designed for learning a conditional probability distribution on unlabelled input data in a target domain, while also only having access to: (1) easily obtained labelled data from a related source domain, which may have a different conditional probability distribution than the target domain, and (2) a marginalised prior distribution on the labels for the target domain. TAN leverages a fully adversarial training procedure and a unique generator/encoder architecture which approximates the transductive combination of the available source and targetdomain data. A benefit of TAN is that it allows the distance between the source and targetdomain labelvector marginal probability distributions to be greater than 0 (i.e. different tasks across the source and target domains) whereas other domainadaptation algorithms require this distance to equal 0 (i.e. a single task across the source and target domains). TAN can, however, still handle the latter case and is a more generalised approach to this case. Another benefit of TAN is that due to being a fully adversarial algorithm, it has the potential to accurately approximate highly complex distributions. Theoretical analysis demonstrates the viability of the TAN framework. 
Transductive Boltzmann Machine (TBM) 
We present transductive Boltzmann machines (TBMs), which firstly achieve transductive learning of the Gibbs distribution. While exact learning of the Gibbs distribution is impossible by the family of existing Boltzmann machines due to combinatorial explosion of the sample space, TBMs overcome the problem by adaptively constructing the minimum required sample space from data to avoid unnecessary generalization. We theoretically provide biasvariance decomposition of the KL divergence in TBMs to analyze its learnability, and empirically demonstrate that TBMs are superior to the fully visible Boltzmann machines and popularly used restricted Boltzmann machines in terms of efficiency and effectiveness. 
Transductive Conformal Prediction (TCP) 
The conformalClassification package implements Transductive Conformal Prediction (TCP) and Inductive Conformal Prediction (ICP) for classification problems. Conformal Prediction (CP) is a framework that complements the predictions of machine learning algorithms with reliable measures of confidence. TCP gives results with higher validity than ICP, however ICP is computationally faster than TCP. The package conformalClassification is built upon the random forest method, where votes of the random forest for each class are considered as the conformity scores for each data point. Although the main aim of the conformalClassification package is to generate CP errors (pvalues) for classification problems, the package also implements various diagnostic measures such as deviation from validity, error rate, efficiency, observed fuzziness and calibration plots. In future releases, we plan to extend the package to use other machine learning algorithms, (e.g. support vector machines) for model fitting. 
Transductive Propagation Network (TPN) 
Fewshot learning aims to build a learner that quickly generalizes to novel classes even when a limited number of labeled examples (socalled lowdata problem) are available. Metalearning is commonly deployed to mimic the test environment in a training phase for good generalization, where episodes (i.e., learning problems) are manually constructed from the training set. This framework gains a lot of attention to fewshot learning with impressive performance, though the lowdata problem is not fully addressed. In this paper, we propose Transductive Propagation Network (TPN), a transductive method that classifies the entire test set at once to alleviate the lowdata problem. Specifically, our proposed network explicitly learns an underlying manifold space that is appropriate to propagate labels from fewshot examples, where all parameters of feature embedding, manifold structure, and label propagation are estimated in an endtoend way on episodes. We evaluate the proposed method on the commonly used miniImageNet and tieredImageNet benchmarks and achieve the stateoftheart or promising results on these datasets. 
Transfer Automatic Machine Learning  Building effective neural networks requires many design choices. These include the network topology, optimization procedure, regularization, stability methods, and choice of pretrained parameters. This design is time consuming and requires expert input. Automatic Machine Learning aims automate this process using hyperparameter optimization. However, automatic model building frameworks optimize performance on each task independently, whereas human experts leverage prior knowledge when designing a new network. We propose Transfer Automatic Machine Learning, a method to accelerate network design using knowledge of prior tasks. For this, we build upon reinforcement learning architecture design methods to support parallel training on multiple tasks and transfer the search strategy to new tasks. Tested on NLP and Image classification tasks, Transfer Automatic Machine Learning reduces convergence time over singletask methods by almost an order of magnitude on 13 out of 14 tasks. It achieves better test set accuracy on 10 out of 13 tasks NLP tasks and improves performance on CIFAR10 image recognition from 95.3% to 97.1%. 
Transfer Function Model  Transfer function models describe the relationship between the inputs and outputs of a system using a ratio of polynomials. The model order is equal to the order of the denominator polynomial. The roots of the denominator polynomial are referred to as the model poles. The roots of the numerator polynomial are referred to as the model zeros. The parameters of a transfer function model are its poles, zeros and transport delays. 
Transfer Learning  Machine learning and data mining techniques have been used in numerous realworld applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some realworld machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create highperformance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments. Recycling Deep Learning Models with Transfer Learning 
Transferable Joint AttributeIdentity Deep Learning (TJAID) 
Most existing person reidentification (reid) methods require supervised model learning from a separate large set of pairwise labelled training data for every single camera pair. This significantly limits their scalability and usability in realworld large scale deployments with the need for performing reid across many camera views. To address this scalability problem, we develop a novel deep learning method for transferring the labelled information of an existing dataset to a new unseen (unlabelled) target domain for person reid without any supervised learning in the target domain. Specifically, we introduce an Transferable Joint AttributeIdentity Deep Learning (TJAIDL) for simultaneously learning an attributesemantic and identitydiscriminative feature representation space transferrable to any new (unseen) target domain for reid tasks without the need for collecting new labelled training data from the target domain (i.e. unsupervised learning in the target domain). Extensive comparative evaluations validate the superiority of this new TJAIDL model for unsupervised person reid over a wide range of stateoftheart methods on four challenging benchmarks including VIPeR, PRID, Market1501, and DukeMTMCReID. 
Transfinite Mean  We define a generalization of the arithmetic mean to bounded wellordered sequences of real numbers. We show that every probability space admits a wellordered sequences of points such that the measure of each measurable subset is equal to the frequency with which the sequence is in this subset. We include an argument suggested by Woodin that the club filter on $\omega_1$ does not admit such a sequence of order type $\omega_1$. 
Transformation Autoregressive Network  The fundamental task of general density estimation has been of keen interest to machine learning. Recent advances in density estimation have either: a) proposed a flexible model to estimate the conditional factors of the chain rule, $p(x_{i}\, \, x_{i1}, \ldots)$; or b) used flexible, nonlinear transformations of variables of a simple base distribution. Instead, this work jointly leverages transformations of variables and autoregressive conditional models, and proposes novel methods for both. We provide a deeper understanding of our methods, showing a considerable improvement through a comprehensive study over both real world and synthetic data. Moreover, we illustrate the use of our models in outlier detection and image modeling tasks. 
Transformation Forests  Regression models for supervised learning problems with a continuous target are commonly understood as models for the conditional mean of the target given predictors. This notion is simple and therefore appealing for interpretation and visualisation. Information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding of regression models as models for conditional distributions allows much broader inference from such models, for example the computation of prediction intervals. Several random foresttype algorithms aim at estimating conditional distributions, most prominently quantile regression forests (Meinshausen, 2006, JMLR). We propose a novel approach based on a parametric family of distributions characterised by their transformation function. A dedicated novel ‘transformation tree’ algorithm able to detect distributional changes is developed. Based on these transformation trees, we introduce ‘transformation forests’ as an adaptive local likelihood estimator of conditional distribution functions. The resulting models are fully parametric yet very general and allow broad inference procedures, such as the modelbased bootstrap, to be applied in a straightforward way. trtf 
Transformation Invariant GraphBased Network (TIGraNet) 
Learning transformation invariant representations of visual data is an important problem in computer vision. Deep convolutional networks have demonstrated remarkable results for image and video classification tasks. However, they have achieved only limited success in the classification of images that undergo geometric transformations. In this work we present a novel Transformation Invariant Graphbased Network (TIGraNet), which learns graphbased features that are inherently invariant to isometric transformations such as rotation and translation of input images. In particular, images are represented as signals on graphs, which permits to replace classical convolution and pooling layers in deep networks with graph spectral convolution and dynamic graph pooling layers that together contribute to invariance to isometric transformation. Our experiments show high performance on rotated and translated images from the test set compared to classical architectures that are very sensitive to transformations in the data. The inherent invariance properties of our framework provide key advantages, such as increased resiliency to data variability and sustained performance with limited training sets. Our code is available online. 
Transformed Generalized Autoregressive Moving Average (TGARMA) 
Transformed Generalized Autoregressive Moving Average (TGARMA) models were recently proposed to deal with nonadditivity, nonnormality and heteroscedasticity in real time series data. In this paper, a Bayesian approach is proposed for TGARMA models, thus extending the original model. We conducted a simulation study to investigate the performance of Bayesian estimation and Bayesian model selection criteria. In addition, a real dataset was analysed using the proposed approach. 
Transformer  The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoderdecoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 EnglishtoGerman translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 EnglishtoFrench translation task, our model establishes a new singlemodel stateoftheart BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 
TransitionEntropy  Recent years have seen rising needs for locationbased services in our everyday life. Aside from the many advantages provided by these services, they have caused serious concerns regarding the location privacy of users. An adversary such as an untrusted locationbased server can monitor the queried locations by a user to infer critical information such as the user’s home address, health conditions, shopping habits, etc. To address this issue, dummybased algorithms have been developed to increase the anonymity of users, and thus, protecting their privacy. Unfortunately, the existing algorithms only consider a limited amount of side information known by an adversary which may face more serious challenges in practice. In this paper, we incorporate a new type of side information based on consecutive location changes of users and propose a new metric called transitionentropy to investigate the location privacy preservation, followed by two algorithms to improve the transitionentropy for a given dummy generation algorithm. Then, we develop an attack model based on the Viterbi algorithm which can significantly threaten the location privacy of the users. Next, in order to protect the users from Viterbi attack, we propose an algorithm called robust dummy generation (RDG) which can resist against the Viterbi attack while maintaining a high performance in terms of the privacy metrics introduced in the paper. All the algorithms are applied and analyzed on a reallife dataset. 
Transitory Queueing Network (TQN) 
Queueing networks are notoriously difficult to analyze sans both Markovian and stationarity assumptions. Much of the theoretical contribution towards performance analysis of timeinhomogeneous single class queueing networks has focused on Markovian networks, with the recent exception of work in Liu and Whitt (2011) and Mandelbaum and Ramanan (2010). In this paper, we introduce transitory queueing networks as a model of inhomogeneous queueing networks, where a large, but finite, number of jobs arrive at queues in the network over a fixed time horizon. The queues offer FIFO service, and we assume that the service rate can be timevarying. The nonMarkovian dynamics of this model complicate the analysis of network performance metrics, necessitating approximations. In this paper we develop fluid and diffusion approximations to the numberinsystem performance metric by scaling up the number of external arrivals to each queue, following Honnappa et al. (2014). We also discuss the implications for bottleneck detection in tandem queueing networks. 
translate2R  Many companies realizied the advantages of the open source programming language R. translate2R allows a fast and inexpensive migration to R. The manual migration of complex SPSS® scripts has always been tedious and errorprone, but with translate2R the task of translating by hand becomes a thing of the past. The automatic and comprehensible process of translating SPSS® code to R code with translate2R offers users an enormous number of new analytical opportunities. Besides the usual migration process translate2R allows programmers an easy start into programming with R. Make use of translate2R for the translation of scripts to R. We will be pleased to help you in terms of migration projects, or starting off with R. sjPlot,translateSPSS2R 
Translational Recommender Networks  Representing relationships as translations in vector space lives at the heart of many neural embedding models such as word embeddings and knowledge graph embeddings. In this work, we study the connections of this translational principle with collaborative filtering algorithms. We propose Translational Recommender Networks (\textsc{TransRec}), a new attentive neural architecture that utilizes the translational principle to model the relationships between user and item pairs. Our model employs a neural attention mechanism over a \emph{Latent Relational Attentive Memory} (LRAM) module to learn the latent relations between useritem pairs that best explains the interaction. By exploiting adaptive useritem specific translations in vector space, our model also alleviates the geometric inflexibility problem of other metric learning algorithms while enabling greater modeling capability and finegrained fitting of users and items in vector space. The proposed architecture not only demonstrates the stateoftheart performance across multiple recommendation benchmarks but also boasts of improved interpretability. Qualitative studies over the LRAM module shows evidence that our proposed model is able to infer and encode explicit sentiment, temporal and attribute information despite being only trained on implicit feedback. As such, this ascertains the ability of \textsc{TransRec} to uncover hidden relational structure within implicit datasets. 
TransNets  Recently, deep learning methods have been shown to improve the performance of recommender systems over traditional methods, especially when review text is available. For example, a recent model, DeepCoNN, uses neural nets to learn one latent representation for the text of all reviews written by a target user, and a second latent representation for the text of all reviews for a target item, and then combines these latent representations to obtain stateoftheart performance on recommendation tasks. We show that (unsurprisingly) much of the predictive value of review text comes from reviews of the target user for the target item. We then introduce a way in which this information can be used in recommendation, even when the target user’s review for the target item is not available. Our model, called TransNets, extends the DeepCoNN model by introducing an additional latent layer representing the target usertarget item pair. We then regularize this layer, at training time, to be similar to another latent representation of the target user’s review of the target item. We show that TransNets and extensions of it improve substantially over the previous stateoftheart. 
Transportation Theory  In mathematics and economics, transportation theory is a name given to the study of optimal transportation and allocation of resources. The problem was formalized by the French mathematician Gaspard Monge in 1781. In the 1920s A.N. Tolstoi was one of the first to study the transportation problem mathematically. In 1930, in the collection Transportation Planning Volume I for the National Commissariat of Transportation of the Soviet Union, he published a paper ‘Methods of Finding the Minimal Kilometrage in Cargotransportation in space’. Major advances were made in the field during World War II by the Soviet/Russian mathematician and economist Leonid Kantorovich. Consequently, the problem as it is stated is sometimes known as the MongeKantorovich transportation problem. The linear programming formulation of the transportation problem is also known as the HitchcockKoopmans transportation problem. 
Trawl Process  The model is based on a mixed moving average process driven by Levy noise – called a trawl process – where the serial correlation and the crosssectional dependence are modelled independently of each other. Such processes can exhibit short or long memory. trawl 
TRECS  An action should remain identifiable when modifying its speed: consider the contrast between an expert chef and a novice chef each chopping an onion. Here, we expect the novice chef to have a relatively measured and slow approach to chopping when compared to the expert. In general, the speed at which actions are performed, whether slower or faster than average, should not dictate how they are recognized. We explore the erratic behavior caused by this phenomena on stateoftheart deep networkbased methods for action recognition in terms of maximum performance and stability in recognition accuracy across a range of input video speeds. By observing the trends in these metrics and summarizing them based on expected temporal behaviour w.r.t. variations in input video speeds, we find two distinct types of network architectures. In this paper, we propose a preprocessing method named TRECS, as a way to extend deepnetworkbased methods for action recognition to explicitly account for speed variability in the data. We do so by adaptively resampling the inputs to a given model. TRECS is agnostic to the specific deepnetwork model; we apply it to four stateoftheart action recognition architectures, C3D, I3D, TSN, and ConvNet+LSTM. On HMDB51 and UCF101, TRECSbased I3D models show a peak improvement of at least 2.9% in performance over the baseline while TRECSbased C3D models achieve a maximum improvement in stability by 59% over the baseline, on the HMDB51 dataset. 
Tree Based Context Causal Rule Discovery (TCC) 
With the increasing need of personalised decision making, such as personalised medicine and online recommendations, a growing attention has been paid to the discovery of the context and heterogeneity of causal relationships. Most existing methods, however, assume a known cause (e.g. a new drug) and focus on identifying from data the contexts of heterogeneous effects of the cause (e.g. patient groups with different responses to the new drug). There is no approach to efficiently detecting directly from observational data context specific causal relationships, i.e. discovering the causes and their contexts simultaneously. In this paper, by taking the advantages of highly efficient decision tree induction and the well established causal inference framework, we propose the Tree based Context Causal rule discovery (TCC) method, for efficient exploration of context specific causal relationships from data. Experiments with both synthetic and real world data sets show that TCC can effectively discover context specific causal rules from the data. 
Tree of Parzen Estimator  ➘ “Treestructured Parzen Estimator” Speeding up the Hyperparameter Optimization of Deep Convolutional Neural Networks 
Tree of Predictors (ToP) 
We present a new approach to ensemble learning. Our approach constructs a tree of subsets of the feature space and associates a predictor (predictive model) – determined by training one of a given family of base learners on an endogenously determined training set – to each node of the tree; we call the resulting object a tree of predictors. The (locally) optimal tree of predictors is derived recursively; each step involves jointly optimizing the split of the terminal nodes of the previous tree and the choice of learner and training set (hence predictor) for each set in the split. The feature vector of a new instance determines a unique path through the optimal tree of predictors; the final prediction aggregates the predictions of the predictors along this path. We derive loss bounds for the final predictor in terms of the Rademacher complexity of the base learners. We report the results of a number of experiments on a variety of datasets, showing that our approach provides statistically significant improvements over stateoftheart machine learning algorithms, including various ensemble learning methods. Our approach works because it allows us to endogenously create more complex learners – when needed – and endogenously match both the learner and the training set to the characteristics of the dataset while still avoiding overfitting. 
Tree Recurrent Neural Network (TreeRNN) 
In this paper we develop a recurrent neural network (TreeRNN), which is designed to predict a tree rather than a linear sequence as is the case in conventional recurrent neural networks. Our model defines the probability of a sentence by estimating the generation probability of its dependency tree. We construct the tree incrementally by generating the left and right dependents of a node whose probability is computed using recurrent neural networks with shared hidden layers. Application of our model to two language modeling tasks shows that it outperforms or performs on par with related models. GitXiv 
Tree Structured Vector Quantization (TSVQ) 
1. First we apply kmeans to get 2 centroids or prototypes within the entire data set. This provides us with a boundary between the two clusters, which would be a straight line based on the nearest neighbor rule. 2. Next, the data are assigned to the 2 centroids. 3. Then, for the data assigned to each centroid (call them a group), apply 2 centroid kmeans to each group separately. The initialization can be done by splitting the centroid into two. Note that data points channeled to different centroids are treated separately. 4. Repeat the above step. 
TreeBased Pipeline Optimization Tool (TPOT) 
As data science becomes more mainstream, there will be an evergrowing demand for data science tools that are more accessible, exible, and scalable. In response to this demand, automated machine learning (AutoML) researchers have begun building systems that automate the process of designing and optimizing machine learning pipelines. In this paper we present TPOT v0.3, an open source genetic programmingbased AutoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classi cation accuracy on a supervised classi cation task. We benchmark TPOT on a series of 150 supervised classi cation tasks and nd that it signi cantly outperforms a basic machine learning analysis in 21 of them, while experiencing minimal degradation in accuracy on 4 of the benchmarksall without any domain knowledge nor human input. As such, GPbased AutoML systems show considerable promise in the AutoML domain. 
TreeCNN  In recent years, Convolutional Neural Networks (CNNs) have shown remarkable performance in many computer vision tasks such as object recognition and detection. However, complex training issues, such as ‘catastrophic forgetting’ and hyperparameter tuning, make incremental learning in CNNs a difficult challenge. In this paper, we propose a hierarchical deep neural network, with CNNs at multiple levels, and a corresponding training method for lifelong learning. The network grows in a treelike manner to accommodate the new classes of data without losing the ability to identify the previously trained classes. The proposed network was tested on CIFAR10 and CIFAR100 datasets, and compared against the method of fine tuning specific layers of a conventional CNN. We obtained comparable accuracies and achieved 40% and 20% reduction in training effort in CIFAR10 and CIFAR 100 respectively. The network was able to organize the incoming classes of data into featuredriven superclasses. Our model improves upon existing hierarchical CNN models by adding the capability of selfgrowth and also yields important observations on feature selective classification. 
TreeGAN  Generative Adversarial Networks (GANs) have shown great capacity on image generation, in which a discriminative model guides the training of a generative model to construct images that resemble real images. Recently, GANs have been extended from generating images to generating sequences (e.g., poems, music and codes). Existing GANs on sequence generation mainly focus on general sequences, which are grammarfree. In many realworld applications, however, we need to generate sequences in a formal language with the constraint of its corresponding grammar. For example, to test the performance of a database, one may want to generate a collection of SQL queries, which are not only similar to the queries of real users, but also follow the SQL syntax of the target database. Generating such sequences is highly challenging because both the generator and discriminator of GANs need to consider the structure of the sequences and the given grammar in the formal language. To address these issues, we study the problem of syntaxaware sequence generation with GANs, in which a collection of real sequences and a set of predefined grammatical rules are given to both discriminator and generator. We propose a novel GAN framework, namely TreeGAN, to incorporate a given ContextFree Grammar (CFG) into the sequence generation process. In TreeGAN, the generator employs a recurrent neural network (RNN) to construct a parse tree. Each generated parse tree can then be translated to a valid sequence of the given grammar. The discriminator uses a treestructured RNN to distinguish the generated trees from real trees. We show that TreeGAN can generate sequences for any CFG and its generation fully conforms with the given syntax. Experiments on synthetic and real data sets demonstrated that TreeGAN significantly improves the quality of the sequence generation in contextfree languages. 
Treelogy  We propose a novel tree classification system called Treelogy, that fuses deep representations with handcrafted features obtained from leaf images to perform leafbased plant classification. Key to this system are segmentation of the leaf from an untextured background, using convolutional neural networks (CNNs) for learning deep representations, extracting handcrafted features with a number of image processing techniques, training a linear SVM with feature vectors, merging SVM and CNN results, and identifying the species from a dataset of 57 trees. Our classification results show that fusion of deep representations with handcrafted features leads to the highest accuracy. The proposed algorithm is embedded in a smartphone application, which is publicly available. Furthermore, our novel dataset comprised of 5408 leaf images is also made public for use of other researchers. 
Treemapping  In information visualization and computing, treemapping is a method for displaying hierarchical data by using nested rectangles. 
TreeStructured Boosting  Additive models, such as produced by gradient boosting, and full interaction models, such as classification and regression trees (CART), are widely used algorithms that have been investigated largely in isolation. We show that these models exist along a spectrum, revealing neverbeforeknown connections between these two approaches. This paper introduces a novel technique called treestructured boosting for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although treestructured boosting is designed primarily to provide both the model interpretability and predictive performance needed for highstake applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches. 
TreeStructured Long ShortTerm Memory (TreeLSTM) 
For years, recursive neural networks (RvNNs) have shown to be suitable for representing text into fixedlength vectors and achieved good performance on several natural language processing tasks. However, the main drawback of RvNN is that it requires explicit tree structure (e.g. parse tree), which makes data preparation and model implementation hard. In this paper, we propose a novel treestructured long shortterm memory (TreeLSTM) architecture that efficiently learns how to compose taskspecific tree structures only from plain text data. To achieve this property, our model uses StraightThrough (ST) GumbelSoftmax estimator to decide the parent node among candidates and to calculate gradients of the discrete decision. We evaluate the proposed model on natural language interface and sentiment analysis and show that our model outperforms or at least comparable to previous TreeLSTMbased works. We also find that our model converges significantly faster and needs less memory than other models of complex structures. 
TreeStructured MultiLinear Principle Component Analysis (TMPCA) 
A novel text data dimension reduction technique, called the treestructured multilinear principle component analysis (TMPCA), is proposed in this work. Being different from traditional text dimension reduction methods that deal with the wordlevel representation, the TMPCA technique reduces the dimension of input sequences and sentences to simplify the following text classification tasks. It is shown mathematically and experimentally that the TMPCA tool demands much lower complexity (and, hence, less computing power) than the ordinary principle component analysis (PCA). Furthermore, it is demon strated by experimental results that the support vector machine (SVM) method applied to the TMPCAprocessed data achieves commensurable or better performance than the stateoftheart recurrent neural network (RNN) approach. 
Treestructured Parzen Estimator  The Treestructured Parzen Estimator (TPE) is a sequential modelbased optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. 
Trellis Graphics  Extremely useful approach for graphical exploratory data analysis (EDA). Allows to examine for complicated, multiple variable relationships. Types of plots: · xyplot: scatterplot · bwplot: boxplots · stripplot: display univariate data against a numerical variable · dotplot: similar to stripplot · histogram · densityplot: kernel density estimates · barchart · piechart: (Not available in R) · splom: scatterplot matrices · contourplot: contour plot of a surface on a regular grid · levelplot: pseudocolour plot of a surface on a rectangular grid · wireframe: perspective plot of a surface evaluated on a regular grid · cloud: perspective plot of a cloud of points (3D scatterplot) https://…/chapter4.pdf 
Trend Analysis  Trend Analysis is the practice of collecting information and attempting to spot a pattern, or trend, in the information. In some fields of study, the term ‘trend analysis’ has more formally defined meanings. Although trend analysis is often used to predict future events, it could be used to estimate uncertain events in the past, such as how many ancient kings probably ruled between two dates, based on data such as the average years which other known kings reigned. 
TrialChain  The governance of data used for biomedical research and clinical trials is an important requirement for generating accurate results. To improve the visibility of data quality and analysis, we developed TrialChain, a blockchainbased platform that can be used to validate data integrity from large, biomedical research studies. We implemented a private blockchain using the MultiChain platform and integrated it with a data science platform deployed within a large research center. An administrative web application was built with Python to manage the platform, which was built with a microservice architecture using Docker. The TrialChain platform was integrated during data acquisition into our existing data science platform. Using NiFi, data were hashed and logged within the local blockchain infrastructure. To provide public validation, the local blockchain state was periodically synchronized to the public Ethereum network. The use of a combined private/public blockchain platform allows for both public validation of results while maintaining additional security and lower cost for blockchain transactions. Original data and modifications due to downstream analysis can be logged within TrialChain and data assets or results can be rapidly validated when needed using API calls to the platform. The TrialChain platform provides a data governance solution to audit the acquisition and analysis of biomedical research data. The platform provides cryptographic assurance of data authenticity and can also be used to document data analysis. 
Triangle Generative Adversarial Network (DeltaGAN) 
A Triangle Generative Adversarial Network ($\Delta$GAN) is developed for semisupervised crossdomain joint distribution matching, where the training data consists of samples from each domain, and supervision of domain correspondence is provided by only a few paired samples. $\Delta$GAN consists of four neural networks, two generators and two discriminators. The generators are designed to learn the twoway conditional distributions between the two domains, while the discriminators implicitly define a ternary discriminative function, which is trained to distinguish real data pairs and two kinds of fake data pairs. The generators and discriminators are trained together using adversarial learning. Under mild assumptions, in theory the joint distributions characterized by the two generators concentrate to the data distribution. In experiments, three different kinds of domain pairs are considered, imagelabel, imageimage and imageattribute pairs. Experiments on semisupervised image classification, imagetoimage translation and attributebased image generation demonstrate the superiority of the proposed approach. 
Triangle Lasso  Recently, network lasso has drawn many attentions due to its remarkable performance on simultaneous clustering and optimization. However, it usually suffers from the imperfect data (noise, missing values etc), and yields suboptimal solutions. The reason is that it finds the similar instances according to their features directly, which is usually impacted by the imperfect data, and thus returns suboptimal results. In this paper, we propose triangle lasso to avoid its disadvantage. Triangle lasso finds the similar instances according to their neighbours. If two instances have many common neighbours, they tend to become similar. Although some instances are profiled by the imperfect data, it is still able to find the similar counterparts. Furthermore, we develop an efficient algorithm based on Alternating Direction Method of Multipliers (ADMM) to obtain a moderately accurate solution. In addition, we present a dual method to obtain the accurate solution with the low additional time consumption. We demonstrate through extensive numerical experiments that triangle lasso is robust to the imperfect data. It usually yields a better performance than the stateoftheart method when performing data analysis tasks in practical scenarios. 
Triangular Norm (tNorm) 
In mathematics, a tnorm (also Tnorm or, unabbreviated, triangular norm) is a kind of binary operation used in the framework of probabilistic metric spaces and in multivalued logic, specifically in fuzzy logic. A tnorm generalizes intersection in a lattice and conjunction in logic. The name triangular norm refers to the fact that in the framework of probabilistic metric spaces tnorms are used to generalize triangle inequality of ordinary metric spaces. 
Trimmed Clustering  tclust,trimcluster 
Trimmed Ensemble Kalman Filter (TEnKF) 
We study the ensemble Kalman filter (EnKF) algorithm for sequential data assimilation in a general situation, that is, for nonlinear forecast and measurement models with nonadditive and nonGaussian noises. Such applications traditionally force us to choose between inaccurate Gaussian assumptions that permit efficient algorithms (e.g., EnKF), or more accurate direct sampling methods which scale poorly with dimension (e.g., particle filters, or PF). We introduce a trimmed ensemble Kalman filter (TEnKF) which can interpolate between the limiting distributions of the EnKF and PF to facilitate adaptive control over both accuracy and efficiency. This is achieved by introducing a trimming function that removes nonGaussian outliers that introduce errors in the correlation between the model and observed forecast, which otherwise prevent the EnKF from proposing accurate forecast updates. We show for specific trimming functions that the TEnKF exactly reproduces the limiting distributions of the EnKF and PF. We also develop an adaptive implementation which provides control of the effective sample size and allows the filter to overcome periods of increased model nonlinearity. This algorithm allow us to demonstrate substantial improvements over the traditional EnKF in convergence and robustness for the nonlinear Lorenz63 and Lorenz96 models. 
Triple Exponential Smoothing  What happens if the data show trend and seasonality? We now introduce a third equation to take care of seasonality (sometimes called periodicity). The resulting set of equations is called the ‘HoltWinters’ (HW) method after the names of the inventors. 
Triplestore  A triplestore is a purposebuilt database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subjectpredicateobject, like “Bob is 35” or “Bob knows Fred”. Much like a relational database, one stores information in a triplestore and retrieves it via a query language. Unlike a relational database, a triplestore is optimized for the storage and retrieval of triples. In addition to queries, triples can usually be imported/exported using Resource Description Framework (RDF) and other formats. 
TritanDB  The efficient management of data is an important prerequisite for realising the potential of the Internet of Things (IoT). Two issues given the large volume of structured timeseries IoT data are, addressing the difficulties of data integration between heterogeneous Things and improving ingestion and query performance across databases on both resourceconstrained Things and in the cloud. In this paper, we examine the structure of public IoT data and discover that the majority exhibit unique flat, wide and numerical characteristics with a mix of evenly and unevenlyspaced timeseries. We investigate the advances in timeseries databases for telemetry data and combine these findings with microbenchmarks to determine the best compression techniques and storage data structures to inform the design of a novel solution optimised for IoT data. A query translation method with low overhead even on resourceconstrained Things allows us to utilise rich data models like the Resource Description Framework (RDF) for interoperability and data integration on top of the optimised storage. Our solution, TritanDB, shows an order of magnitude performance improvement across both Things and cloud hardware on many stateoftheart databases within IoT scenarios. Finally, we describe how TritanDB supports various analyses of IoT timeseries data like forecasting. 
Tropical Linear Programming  On Tropical Linear and Integer Programs 
TrQuery  In this paper, we present an embeddingbased framework (TrQuery) for recommending solutions of a SPARQL query, including approximate solutions when exact querying solutions are not available due to incompleteness or inconsistencies of realworld RDF data. Within this framework, embedding is applied to score solutions together with edit distance so that we could obtain more finegrained recommendations than those recommendations via edit distance. For instance, graphs of two querying solutions with a similar structure can be distinguished in our proposed framework while the edit distance depending on structural difference becomes unable. To this end, we propose a novel score model built on vector space generated in embedding system to compute the similarity between an approximate subgraph matching and a whole graph matching. Finally, we evaluate our approach on large RDF datasets DBpedia and YAGO, and experimental results show that TrQuery exhibits an excellent behavior in terms of both effectiveness and efficiency. 
True Asymptotic Natural Gradient Optimization (TANGO) 
We introduce a simple algorithm, True Asymptotic Natural Gradient Optimization (TANGO), that converges to a true natural gradient descent in the limit of small learning rates, without explicit Fisher matrix estimation. For quadratic models the algorithm is also an instance of averaged stochastic gradient, where the parameter is a moving average of a ‘fast’, constantrate gradient descent. TANGO appears as a particular delinearization of averaged SGD, and is sometimes quite different on nonquadratic models. This further connects averaged SGD and natural gradient, both of which are arguably optimal asymptotically. In large dimension, small learning rates will be required to approximate the natural gradient well. Still, this shows it is possible to get arbitrarily close to exact natural gradient descent with a lightweight algorithm. 
TrueSkill Ranking System  TrueSkill is a Bayesian ranking algorithm developed by Microsoft Research and used in the Xbox matchmaking system built to address some perceived flaws in the Elo rating system. It is an extension of the Glicko rating system to multiplayer games. The purpose of a ranking system is to both identify and track the skills of gamers in a game (mode) in order to be able to match them into competitive matches. The TrueSkill ranking system only uses the final standings of all teams in a game in order to update the skill estimates (ranks) of all gamers playing in this game. Ranking systems have been proposed for many sports but possibly the most prominent ranking system in use today is ELO. 
TrueSkill Sort (TSSort) 
In this paper we present TSSort, a probabilistic, noise resistant, quickly converging comparison sort algorithm based on Microsoft TrueSkill. The algorithm combines TrueSkill’s updating rules with a newly developed next item pair selection strategy, enabling it to beat standard sorting algorithms w.r.t. convergence speed and noise resistance, as shown in simulations. TSSort is useful if comparisons of items are expensive or noisy, or if intermediate results shall be approximately ordered. 
Truncated Variance Reduction (TruVaR) 
We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and levelset estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically nontrivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and realworld data sets. 
Truncation  In statistics, truncation results in values that are limited above or below, resulting in a truncated sample.[1] Truncation is similar to but distinct from the concept of statistical censoring. A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the bounds entirely omitted, with not even a count of those omitted being kept. With statistical censoring, a note would be recorded documenting which bound (upper or lower) had been exceeded and the value of that bound. With truncated sampling, no note is recorded. 
Trust Region based Derivative Free Optimization (DFOTR) 
In this work, we utilize a Trust Region based Derivative Free Optimization (DFOTR) method to directly maximize the Area Under Receiver Operating Characteristic Curve (AUC), which is a nonsmooth, noisy function. We show that AUC is a smooth function, in expectation, if the distributions of the positive and negative data points obey a jointly normal distribution. The practical performance of this algorithm is compared to three prominent Bayesian optimization methods and random search. The presented numerical results show that DFOTR surpasses Bayesian optimization and random search on various blackbox optimization problem, such as maximizing AUC and hyperparameter tuning. 
Trust Score  Knowing when a classifier’s prediction can be trusted is useful in many applications and critical for safely using AI. While the bulk of the effort in machine learning research has been towards improving classifier performance, understanding when a classifier’s predictions should and should not be trusted has received far less attention. The standard approach is to use the classifier’s discriminant or confidence score; however, we show there exists a considerably more effective alternative. We propose a new score, called the trust score, which measures the agreement between the classifier and a modified nearestneighbor classifier on the testing example. We show empirically that high (low) trust scores produce surprisingly high precision at identifying correctly (incorrectly) classified examples, consistently outperforming the classifier’s confidence score as well as many other baselines. Further, under some mild distributional assumptions, we show that if the trust score for an example is high (low), the classifier will likely agree (disagree) with the Bayesoptimal classifier. Our guarantees consist of nonasymptotic rates of statistical consistency under various nonparametric settings and build on recent developments in topological data analysis. 
Tsallis Entropy  In physics, the Tsallis entropy is a generalization of the standard BoltzmannGibbs entropy. It was introduced in 1988 by Constantino Tsallis[1] as a basis for generalizing the standard statistical mechanics. In the scientific literature, the physical relevance of the Tsallis entropy was occasionally debated. However, from the years 2000 on, an increasingly wide spectrum of natural, artificial and social complex systems have been identified which confirm the predictions and consequences that are derived from this nonadditive entropy, such as nonextensive statistical mechanics,[2] which generalizes the BoltzmannGibbs theory. 
Tsallis Entropy Information Metric (TEIM) 
The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct nearoptimal decision trees. Most of them, however, are greedy algorithms which have the drawback of obtaining only local optimums. Besides, common split criteria, e.g. Shannon entropy, Gain Ratio and Gini index, are also not flexible due to lack of adjustable parameters on data sets. To address the above issues, we propose a series of novel methods using Tsallis entropy in this paper. Firstly, a Tsallis Entropy Criterion (TEC) algorithm is proposed to unify Shannon entropy, Gain Ratio and Gini index, which generalizes the split criteria of decision trees. Secondly, we propose a Tsallis Entropy Information Metric (TEIM) algorithm for efficient construction of decision trees. The TEIM algorithm takes advantages of the adaptability of Tsallis conditional entropy and the reducing greediness ability of twostage approach. Experimental results on UCI data sets indicate that the TEC algorithm achieves statistically significant improvement over the classical algorithms, and that the TEIM algorithm yields significantly better decision trees in both classification accuracy and tree complexity. 
TshinghuaalphaAlgorithm  Tshinghuaalpha algorithm which uses timestamps in the log files to construct a Petri net. It is related to the a algorithm, but uses a different approach. Details can be found in. It is interesting to note that this mining plugin was the first plugin developed by researchers outside of our research group. Researchers from Tshinghua University in China (Jianmin Wang and Wen Lijie) were able to develop and integrate this plugin without any help or changes to the framework. 
TSViz  This paper presents a novel framework for demystification of convolutional deep learning models for time series analysis. This is a step towards making informed/explainable decisions in the domain of time series, powered by deep learning. There have been numerous efforts to increase the interpretability of imagecentric deep neural network models, where the learned features are more intuitive to visualize. Visualization in timeseries is much more complicated as there is no direct interpretation of the filters and inputs as compared to image modality. In addition, little or no concentration has been devoted for the development of such tools in the domain of timeseries in the past. The visualization engine of the presented framework provides possibilities to explore and analyze a network from different dimensions at four different levels of abstraction. This enables the user to uncover different aspects of the model which includes important filters, filter clusters, and input saliency maps. These representations allow to understand the network features so that the acceptability of deep networks for timeseries data can be enhanced. This is extremely important in domains like finance, industry 4.0, selfdriving cars, healthcare, counterterrorism etc., where reasons for reaching a particular prediction are equally important as the prediction itself. The framework \footnote{Framework download link: https://hidden.for.blind.review} can also aid in discovery of the filters which are contributing nothing to the final prediction, hence, can be pruned without any significant loss in performance. 
Tuatara GS1  The Tuatara GS1 algorithm relies on the more advanced Tuatara GS2 algorithm which generates relationships between objects based on principles in congnition related to Computational Theory of the Mind (CTM) (Pinker, S. 1997) and autoassociation (Xijin Ge , Shuichi Iwata, 2002) and reinforced learning (Wenhuan, X., Nandi, A. K., Zhang, J., Evans, K. G. 2005) with exponential decays that follow the Golden Ratio F (Dunlap, Richard A. 1997). 
Tube Convolutional Neural Network (TCNN) 
Deep learning has been demonstrated to achieve excellent results for image classification and object detection. However, the impact of deep learning on video analysis (e.g. action detection and recognition) has been limited due to complexity of video data and lack of annotations. Previous convolutional neural networks (CNN) based video action detection approaches usually consist of two major steps: framelevel action proposal detection and association of proposals across frames. Also, these methods employ twostream CNN framework to handle spatial and temporal feature separately. In this paper, we propose an endtoend deep network called Tube Convolutional Neural Network (TCNN) for action detection in videos. The proposed architecture is a unified network that is able to recognize and localize action based on 3D convolution features. A video is first divided into equal length clips and for each clip a set of tube proposals are generated next based on 3D Convolutional Network (ConvNet) features. Finally, the tube proposals of different clips are linked together employing network flow and spatiotemporal action detection is performed using these linked video proposals. Extensive experiments on several video datasets demonstrate the superior performance of TCNN for classifying and localizing actions in both trimmed and untrimmed videos compared to stateofthearts. 
TUCRL  While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e.g., in mountain car, the product space of speed and position contains configurations that are not physically reachable). This leads to defining weaklycommunicating or multichain MDPs. In this paper, we introduce \tucrl, the first algorithm able to perform efficient explorationexploitation in any finite Markov Decision Process (MDP) without requiring any form of prior knowledge. In particular, for any MDP with $S^{\texttt{C}}$ communicating states, $A$ actions and $\Gamma^{\texttt{C}} \leq S^{\texttt{C}}$ possible communicating next states, we derive a $\widetilde{O}(D^{\texttt{C}} \sqrt{\Gamma^{\texttt{C}} S^{\texttt{C}} AT})$ regret bound, where $D^{\texttt{C}}$ is the diameter (i.e., the longest shortest path) of the communicating part of the MDP. This is in contrast with optimistic algorithms (e.g., UCRL, Optimistic PSRL) that suffer linear regret in weaklycommunicating MDPs, as well as posterior sampling or regularised algorithms (e.g., REGAL), which require prior knowledge on the bias span of the optimal policy to bias the exploration to achieve sublinear regret. We also prove that in weaklycommunicating MDPs, no algorithm can ever achieve a logarithmic growth of the regret without first suffering a linear regret for a number of steps that is exponential in the parameters of the MDP. Finally, we report numerical simulations supporting our theoretical findings and showing how TUCRL overcomes the limitations of the stateoftheart. 
Tukey MeanDifference Plot  The Tukey meandifference plot is a scatter graph produced not for (x,y) values themselves, but for modified coordinates (X,Y) : X = (x+y)/2, Y = yx. Such a plot is useful, for example, to analyze data with strong correlation between x and y – when the (x,y) dots on the plot are close to the diagonal x=y. In this case, the value of the transformed variable X is about the same as x and y; and the variable Y shows the difference between x and y. The Tukey meandifference plot is meaningful for two similar variables – that is, when both x and y are of the same physical dimension and expressed in the same units – e.g mass in pounds (or kilograms, …), length in foots (or meters, …). Otherwise, it makes no sense to sum up or subtract values of the variables x and y. 
Tunable GMM Kernel  While tree methods have been popular in practice, researchers and practitioners are also looking for simple algorithms which can reach similar accuracy of trees. In 2010, (Ping Li UAI’10) developed the method of ‘abcrobustlogitboost’ and compared it with other supervised learning methods on datasets used by the deep learning literature. In this study, we propose a series of ‘tunable GMM kernels’ which are simple and perform largely comparably to tree methods on the same datasets. Note that ‘abcrobustlogitboost’ substantially improved the original ‘GDBT’ in that (a) it developed a treesplit formula based on secondorder information of the derivatives of the loss function; (b) it developed a new set of derivatives for multiclass classification formulation. In the prior study in 2017, the ‘generalized minmax’ (GMM) kernel was shown to have good performance compared to the ‘radialbasis function’ (RBF) kernel. However, as demonstrated in this paper, the original GMM kernel is often not as competitive as tree methods on the datasets used in the deep learning literature. Since the original GMM kernel has no parameters, we propose tunable GMM kernels by adding tuning parameters in various ways. Three basic (i.e., with only one parameter) GMM kernels are the ‘$e$GMM kernel’, ‘$p$GMM kernel’, and ‘$\gamma$GMM kernel’, respectively. Extensive experiments show that they are able to produce good results for a large number of classification tasks. Furthermore, the basic kernels can be combined to boost the performance. 
Tune  Modern machine learning algorithms are increasingly computationally demanding, requiring specialized hardware and distributed computation to achieve high performance in a reasonable time frame. Many hyperparameter search algorithms have been proposed for improving the efficiency of model selection, however their adaptation to the distributed compute environment is often adhoc. We propose Tune, a unified framework for model selection and training that provides a narrowwaist interface between training scripts and search algorithms. We show that this interface meets the requirements for a broad range of hyperparameter search algorithms, allows straightforward scaling of search to large clusters, and simplifies algorithm implementation. We demonstrate the implementation of several stateoftheart hyperparameter search algorithms in Tune. Tune is available at http://…/tune.html. 
Tunnel Network  Traditionally, deep learning algorithms update the network weights whereas the network architecture is chosen manually, using a process of trial and error. In this work, we propose two novel approaches that automatically update the network structure while also learning its weights. The novelty of our approach lies in our parameterization where the depth, or additional complexity, is encapsulated continuously in the parameter space through control parameters that add additional complexity. We propose two methods: In tunnel networks, this selection is done at the level of a hidden unit, and in budding perceptrons, this is done at the level of a network layer; updating this control parameter introduces either another hidden unit or another hidden layer. We show the effectiveness of our methods on the synthetic twospirals data and on two real data sets of MNIST and MIRFLICKR, where we see that our proposed methods, with the same set of hyperparameters, can correctly adjust the network complexity to the task complexity. 
Turbo Filtering  In this manuscript a method for developing novel filtering algorithms through the parallel concatenation of two Bayesian filters is illustrated. Our description of this method, called turbo filtering, is based on a new graphical model; this allows us to efficiently describe both the processing accomplished inside each of the constituent filter and the interactions between them. This model is exploited to develop two new filtering algorithms for conditionally linear Gaussian systems. Numerical results for a specific dynamic system evidence that such filters can achieve a better complexityaccuracy tradeoff than marginalized particle filtering. 
TurekFletcher Model  Modelaveraging is commonly used as a means of allowing for model uncertainty in parameter estimation. In the frequentist framework, a modelaveraged estimate of a parameter is the weighted mean of the estimates from each of the candidate models, the weights typically being chosen using an information criterion. Current methods for calculating a modelaveraged confidence interval assume approximate normality of the modelaveraged estimate, i.e., they are Wald intervals. As in the singlemodel setting, we might improve the coverage performance of this interval by a onetoone transformation of the parameter, obtaining a Wald interval, and then backtransforming the endpoints. However, a transformation that works in the singlemodel setting may not when modelaveraging, due to the weighting and the need to estimate the weights. In the singlemodel setting, a natural alternative is to use a profile likelihood interval, which generally provides better coverage than a Wald interval. We propose a method for modelaveraging a set of singlemodel profile likelihood intervals, making use of the link between profile likelihood intervals and Bayesian credible intervals. We illustrate its use in an example involving negative binomial regression, and perform two simulation studies to compare its coverage properties with the existing Wald intervals. 
Turfjs  Turf.js is a JavaScript library for spatial analysis. It helps you analyze, aggregate, and transform data in order to visualize it in new ways and answer advanced questions about it. lawn 
TuringBox  AI researchers employ not only the scientific method, but also methodology from mathematics and engineering. However, the use of the scientific method – specifically hypothesis testing – in AI is typically conducted in service of engineering objectives. Growing interest in topics such as fairness and algorithmic bias show that engineeringfocused questions only comprise a subset of the important questions about AI systems. This results in the AI Knowledge Gap: the number of unique AI systems grows faster than the number of studies that characterize these systems’ behavior. To close this gap, we argue that the study of AI could benefit from the greater inclusion of researchers who are well positioned to formulate and test hypotheses about the behavior of AI systems. We examine the barriers preventing social and behavioral scientists from conducting such studies. Our diagnosis suggests that accelerating the scientific study of AI systems requires new incentives for academia and industry, mediated by new tools and institutions. To address these needs, we propose a twosided marketplace called TuringBox. On one side, AI contributors upload existing and novel algorithms to be studied scientifically by others. On the other side, AI examiners develop and post machine intelligence tasks designed to evaluate and characterize algorithmic behavior. We discuss this market’s potential to democratize the scientific study of AI behavior, and thus narrow the AI Knowledge Gap. 
TutorialBank  The field of Natural Language Processing (NLP) is growing rapidly, with new research published daily along with an abundance of tutorials, codebases and other online resources. In order to learn this dynamic field or stay uptodate on the latest research, students as well as educators and researchers must constantly sift through multiple sources to find valuable, relevant information. To address this situation, we introduce TutorialBank, a new, publicly available dataset which aims to facilitate NLP education and research. We have manually collected and categorized over 6,300 resources on NLP as well as the related fields of Artificial Intelligence (AI), Machine Learning (ML) and Information Retrieval (IR). Our dataset is notably the largest manuallypicked corpus of resources intended for NLP education which does not include only academic papers. Additionally, we have created both a search engine and a commandline tool for the resources and have annotated the corpus to include lists of research topics, relevant resources for each topic, prerequisite relations among topics, relevant subparts of individual resources, among other annotations. We are releasing the dataset and present several avenues for further research. 
TVClust  In this paper, we propose a modelbased clustering method (TVClust) that robustly incorporates noisy side information as softconstraints and aims to seek a consensus between side information and the observed data. Our method is based on a nonparametric Bayesian hierarchical model that combines the probabilistic model for the data instance and the one for the sideinformation. An efficient Gibbs sampling algorithm is proposed for posterior inference. Using the smallvariance asymptotics of our probabilistic model, we then derive a new deterministic clustering algorithm (RDPmeans). It can be viewed as an extension of Kmeans that allows for the inclusion of side information and has the additional property that the number of clusters does not need to be specified a priori. Empirical studies have been carried out to compare our work with many constrained clustering algorithms from the literature on both a variety of data sets and under a variety of conditions such as using noisy side information and erroneous k values. The results of our experiments show strong results for our probabilistic and deterministic approaches under these conditions when compared to other algorithms in the literature. 
Tweedie Distribution  In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal and gamma distributions, the purely discrete scaled Poisson distribution, and the class of mixed compound Poissongamma distributions which have positive mass at zero, but are otherwise continuous. For any random variable Y that obeys a Tweedie distribution, the variance var(Y) relates to the mean E(Y) by the power law, where a and p are positive constants. The Tweedie distributions were named by Bent Joergensen after Maurice Tweedie, a statistician and medical physicist at the University of Liverpool, UK, who presented the first thorough study of these distributions in 1984. 
Tweedie Model  TDboost 
Tweet2Vec  We present Tweet2Vec, a novel method for generating generalpurpose vector representation of tweets. The model learns tweet embeddings using characterlevel CNNLSTM encoderdecoder. We trained our model on 3 million, randomly selected Englishlanguage tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous stateoftheart in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on Englishlanguage tweets, the method presented can be used to learn tweet embeddings for different languages. 
TwiInsight  Social media platforms contain a great wealth of information which provides opportunities for us to explore hidden patterns or unknown correlations, and understand people’s satisfaction with what they are discussing. As one showcase, in this paper, we present a system, TwiInsight which explores the insight of Twitter data. Different from other Twitter analysis systems, TwiInsight automatically extracts the popular topics under different categories (e.g., healthcare, food, technology, sports and transport) discussed in Twitter via topic modeling and also identifies the correlated topics across different categories. Additionally, it also discovers the people’s opinions on the tweets and topics via the sentiment analysis. The system also employs an intuitive and informative visualization to show the uncovered insight. Furthermore, we also develop and compare six most popular algorithms – three for sentiment analysis and three for topic modeling. 
Twin Sort Technique  The objective behind the Twin Sort technique is to sort the list of unordered data elements efficiently and to allow efficient and simple arrangement of data elements within the data structure with optimization of comparisons and iterations in the sorting method. This sorting technique effectively terminates the iterations when there is no need of comparison if the elements are all sorted in between the iterations. Unlike Quick sort, Merge sorting technique, this new sorting technique is based on the iterative method of sorting elements within the data structure. So it will be advantageous for optimization of iterations when there is no need for sorting elements. Finally, the Twin Sort technique is more efficient and simple method of arranging elements within a data structure and it is easy to implement when comparing to the other sorting technique. By the introduction of optimization of comparison and iterations, it will never allow the arranging task on the ordered elements. 
Twin Support Vector Machine (TSVM,TWSVM) 
Twin Support Vector Machine (TWSVM) is an emerging machine learning method suitable for both classification and regression problems. It utilizes the concept of Generalized Eigenvalues Proximal Support Vector Machine (GEPSVM) and finds two nonparallel planes for each class by solving a pair of Quadratic Programming Problems. It enhances the computational speed as compared to the traditional Support Vector Machine (SVM). TWSVM was initially constructed to solve binary classification problems; later researchers successfully extended it for multiclass problem domain. TWSVM always gives promising empirical results, due to which it has many attractive features which enhance its applicability. This paper presents the research development of TWSVM in recent years. This study is divided into two main broad categories – variant based and multiclass based TWSVM methods. The paper primarily discusses the basic concept of TWSVM and highlights its applications in recent years. A comparative analysis of various research contributions based on TWSVM is also presented. This is helpful for researchers to effectively utilize the TWSVM as an emergent research methodology and encourage them to work further in the performance enhancement of TWSVM. 
Two Alternatives Forced Choice Score (2AFC) 
➚ “Generalized Discrimination Score” 
Two oneSided Tests (TOST) 
Two onesided tests (TOST) procedure to test equivalence for ttests, correlations, and metaanalyses, including power analysis for ttests and correlations. Allows you to specify equivalence bounds in raw scale units or in terms of effect sizes. TOSTER 
Two Stage Least Squares (2SLS,MIIV2SLS) 
TwoStage least squares (2SLS) regression analysis is a statistical technique that is used in the analysis of structural equations. This technique is the extension of the OLS method. It is used when the dependent variable’s error terms are correlated with the independent variables. Additionally, it is useful when there are feedback loops in the model. In structural equations modeling, we use the maximum likelihood method to estimate the path coefficient. This technique is an alternative in SEM modeling to estimate the path coefficient. This technique can also be applied in quasiexperimental studies. MIIVsem 
TwoDimensional Linear Discriminant Analysis (2DLDA) 
➚ “Generalized LpNorm TwoDimensional Linear Discriminant Analysis” 
TwoStage Learning (TSL) 
➚ “Learning Through Deterministic Assignment of Hidden Parameters” 
TwoWing Optimization Strategy (TwoWingOS) 
Determining whether a given claim is supported by evidence is a fundamental NLP problem that is best modeled as Textual Entailment. However, given a large collection of text, finding evidence that could support or refute a given claim is a challenge in itself, amplified by the fact that different evidence might be needed to support or refute a claim. Nevertheless, most prior work decouples evidence identification from determining the truth value of the claim given the evidence. We propose to consider these two aspects jointly. We develop TwoWingOS (twowing optimization strategy), a system that, while identifying appropriate evidence for a claim, also determines whether or not the claim is supported by the evidence. Given the claim, TwoWingOS attempts to identify a subset of the evidence candidates; given the predicted evidence, it then attempts to determine the truth value of the corresponding claim. We treat this challenge as coupled optimization problems, training a joint model for it. TwoWingOS offers two advantages: (i) Unlike pipeline systems, it facilitates flexiblesize evidence set, and (ii) Joint training improves both the claim entailment and the evidence identification. Experiments on a benchmark dataset show stateoftheart performance. Code: https://…/FEVER 
TypeSQL  Interacting with relational databases through natural language helps users of any background easily query and analyze a vast amount of data. This requires a system that understands users’ questions and converts them to SQL queries automatically. In this paper we present a novel approach, TypeSQL, which views this problem as a slot filling task. Additionally, TypeSQL utilizes type information to better understand rare entities and numbers in natural language questions. We test this idea on the WikiSQL dataset and outperform the prior stateoftheart by 5.5% in much less time. We also show that accessing the content of databases can significantly improve the performance when users’ queries are not wellformed. TypeSQL gets 82.6% accuracy, a 17.5% absolute improvement compared to the previous contentsensitive model. 
Typicality and Eccentricity Data Analysis (TEDA) 
The typicality and eccentricity data analysis (TEDA) framework was put forward by Angelov (2013) <DOI:10.14313/JAMRIS_22014/16>. It has been further developed into multiple different techniques since, and provides a nonparametric way of determining how similar an observation, from a process that is not purely random, is to other observations generated by the process. teda 
Advertisements