Table2Answer Semantic parsing is the task of mapping natural language to logic form. In question answering, semantic parsing can be used to map the question to logic form and execute the logic form to get the answer. One key problem for semantic parsing is the hard label work. We study this problem in another way: we do not use the logic form any more. Instead we only use the schema and answer info. We think that the logic form step can be injected into the deep model. The reason why we think removing the logic form step is possible is that human can do the task without explicit logic form. We use BERT-based model and do the experiment in the WikiSQL dataset, which is a large natural language to SQL dataset. Our experimental evaluations that show that our model can achieves the baseline results in WikiSQL dataset. Tableau Public Tableau Public is a free data storytelling application. Create and share interactive charts and graphs, stunning maps, live dashboards and fun applications in minutes, then publish anywhere on the web. Anyone can do it, it’s that easy – and it’s free. Tabular GAN(TGAN) Generative adversarial networks (GANs) implicitly learn the probability distribution of a dataset and can draw samples from the distribution. This paper presents, Tabular GAN (TGAN), a generative adversarial network which can generate tabular data like medical or educational records. Using the power of deep neural networks, TGAN generates high-quality and fully synthetic tables while simultaneously generating discrete and continuous variables. When we evaluate our model on three datasets, we find that TGAN outperforms conventional statistical generative models in both capturing the correlation between columns and scaling up for large datasets. tAdv Machine learning, especially deep learning, is widely applied to a range of applications including computer vision, robotics and natural language processing. However, it has been shown that machine learning models are vulnerable to adversarial examples, carefully crafted samples that deceive learning models. In-depth studies on adversarial examples can help better understand potential vulnerabilities and therefore improve model robustness. Recent works have introduced various methods which generate adversarial examples. However, all require the perturbation to be of small magnitude ($\mathcal{L}_p$ norm) for them to be imperceptible to humans, which is hard to deploy in practice. In this paper we propose two novel methods, tAdv and cAdv, which leverage texture transfer and colorization to generate natural perturbation with a large $\mathcal{L}_p$ norm. We conduct extensive experiments to show that the proposed methods are general enough to attack both image classification and image captioning tasks on ImageNet and MSCOCO dataset. In addition, we conduct comprehensive user studies under various conditions to show that our generated adversarial examples are imperceptible to humans even when the perturbations are large. We also evaluate the transferability and robustness of the proposed attacks against several state-of-the-art defenses. Tag Management System(TMS) A Tag Management System (TMS) replaces hard-coded tags that are used for marketing, analytics, and testing on a website, with dynamic tags that are easier to implement and update. Every tag management system uses a container tag – a small snippet of code that allows you to dynamically insert tags into your website. You can think of container tags as buckets that hold other types of tags. You control which tags are added to the buckets using a simple web interface. In 2012, Google released a TMS called Google Tag Manager, which has quickly become one of the most widely used Tag Management Systems in the market. The benefits of tag management (and specifically Google Tag Manager) are enormous to any business, large or small. You can add and update Google AdWords tags, Google Analytics tags, DoubleClick Floodlight tags and many non-Google third-party tags directly from Google Tag Manager, instead of editing site code. This reduces errors, frees you from having to involve a webmaster, and allows you to quickly deploy tags on your site. To effectively use tag management, it’s important to understand basic concepts like the data layer, triggers, and variables. Tagger We present a framework for efficient perceptual inference that explicitly reasons about the segmentation of its inputs and features. Rather than being trained for any specific segmentation, our framework learns the grouping process in an unsupervised manner or alongside any supervised task. By enriching the representations of a neural network, we enable it to group the representations of different objects in an iterative manner. By allowing the system to amortize the iterative inference of the groupings, we achieve very fast convergence. In contrast to many other recently proposed methods for addressing multi-object scenes, our system does not assume the inputs to be images and can therefore directly handle other modalities. For multi-digit classification of very cluttered images that require texture segmentation, our method offers improved classification performance over convolutional networks despite being fully connected. Furthermore, we observe that our system greatly improves on the semi-supervised result of a baseline Ladder network on our dataset, indicating that segmentation can also improve sample efficiency. Tagging Systems Tagging systems have become increasingly popular. These systems enable users to add keywords (i.e., ‘tags’) to Internet resources (e.g., web pages, images, videos) without relying on a controlled vocabulary. Tagging systems have the potential to improve search, spam detection, reputation systems, and personal organization while introducing new modalities of social communication and opportunities for data mining. This potential is largely due to the social structure that underlies many of the current systems. Despite the rapid expansion of applications that support tagging of resources, tagging systems are still not well studied or understood. In this paper, we provide a short description of the academic related work to date. We offer a model of tagging systems, specifically in the context of web-based systems, to help us illustrate the possible benefits of these tools. Since many such systems already exist, we provide a taxonomy of tagging systems to help inform their analysis and design, and thus enable researchers to frame and compare evidence for the sustainability of such systems. We also provide a simple taxonomy of incentives and contribution models to inform potential evaluative frameworks. While this work does not present comprehensive empirical results, we present a preliminary study of the photosharing and tagging system Flickr to demonstrate our model and explore some of the issues in one sample system. This analysis helps us outline and motivate possible future directions of research in tagging systems. Tag-Guided HyperRecNN/TreeLSTM(TG-HRecNN/TreeLSTM) Recursive Neural Network (RecNN), a type of models which compose words or phrases recursively over syntactic tree structures, has been proven to have superior ability to obtain sentence representation for a variety of NLP tasks. However, RecNN is born with a thorny problem that a shared compositional function for each node of trees can’t capture the complex semantic compositionality so that the expressive power of model is limited. In this paper, in order to address this problem, we propose Tag-Guided HyperRecNN/TreeLSTM (TG-HRecNN/TreeLSTM), which introduces hypernetwork into RecNNs to take as inputs Part-of-Speech (POS) tags of word/phrase and generate the semantic composition parameters dynamically. Experimental results on five datasets for two typical NLP tasks show proposed models both obtain significant improvement compared with RecNN and TreeLSTM consistently. Our TG-HTreeLSTM outperforms all existing RecNN-based models and achieves or is competitive with state-of-the-art on four sentence classification benchmarks. The effectiveness of our models is also demonstrated by qualitative analysis. Tail Dependence In probability theory, the tail dependence of a pair of random variables is a measure of their comovements in the tails of the distributions. The concept is used in extreme value theory. Random variables that appear to exhibit no correlation can show tail dependence in extreme deviations. For instance, it is a stylized fact of stock returns that they commonly exhibit tail dependence. Takagi-Sugeno-Kang(TSK) Optimize TSK Fuzzy Systems for Big Data Regression Problems: Mini-Batch Gradient Descent with Regularization, DropRule and AdaBound (MBGD-RDA) Takeuchi’s Information Criteria(TIC) Takeuchi’s Information Criteria (TIC) is a linearization of maximum likelihood estimator bias which shrinks the model parameters towards the maximum entropy distribution, even when the model is mis-specified. In statistical machine learning, $L_2$ regularization (a.k.a. ridge regression) also introduces a parameterized bias term with the goal of minimizing out-of-sample entropy, but generally requires a numerical solver to find the regularization parameter. Takeya Semantic Structure Analysis(TSSA) SSRA TallyQA Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world’s largest dataset for open-ended counting. We propose a new algorithm for counting that uses relation networks with region proposals. Our method lets relation networks be efficiently used with high-resolution imagery. It yields state-of-the-art results compared to baseline and recent systems on both TallyQA and the HowMany-QA benchmark. Tamed Cross Entropy(TCE) We present the Tamed Cross Entropy (TCE) loss function, a robust derivative of the standard Cross Entropy (CE) loss used in deep learning for classification tasks. However, unlike other robust losses, the TCE loss is designed to exhibit the same training properties than the CE loss in noiseless scenarios. Therefore, the TCE loss requires no modification on the training regime compared to the CE loss and, in consequence, can be applied in all applications where the CE loss is currently used. We evaluate the TCE loss using the ResNet architecture on four image datasets that we artificially contaminated with various levels of label noise. The TCE loss outperforms the CE loss in every tested scenario. TamperNN Neural networks are powering the deployment of embedded devices and Internet of Things. Applications range from personal assistants to critical ones such as self-driving cars. It has been shown recently that models obtained from neural nets can be trojaned ; an attacker can then trigger an arbitrary model behavior facing crafted inputs. This has a critical impact on the security and reliability of those deployed devices. We introduce novel algorithms to detect the tampering with deployed models, classifiers in particular. In the remote interaction setup we consider, the proposed strategy is to identify markers of the model input space that are likely to change class if the model is attacked, allowing a user to detect a possible tampering. This setup makes our proposal compatible with a wide rage of scenarios, such as embedded models, or models exposed through prediction APIs. We experiment those tampering detection algorithms on the canonical MNIST dataset, over three different types of neural nets, and facing five different attacks (trojaning, quantization, fine-tuning, compression and watermarking). We then validate over five large models (VGG16, VGG19, ResNet, MobileNet, DenseNet) with a state of the art dataset (VGGFace2), and report results demonstrating the possibility of an efficient detection of model tampering. Tangent Distance Preserving Mapping(TDPM) This paper considers the problem of nonlinear dimensionality reduction. Unlike existing methods, such as LLE, ISOMAP, which attempt to unfold the true manifold in the low dimensional space, our algorithm tries to preserve the nonlinear structure of the manifold, and shows how the manifold is folded in the high dimensional space. We call this method Tangent Distance Preserving Mapping (TDPM). TDPM uses tangent distance instead of geodesic distance, and then applies MDS to the tangent distance matrix to map the manifold into a low dimensional space in which we can get its nonlinear structure. Tangent-Normal Adversarial Regularization The ever-increasing size of modern datasets combined with the difficulty of obtaining label information has made semi-supervised learning of significant practical importance in modern machine learning applications. Compared with supervised learning, the key difficulty in semi-supervised learning is how to make full use of the unlabeled data. In order to utilize manifold information provided by unlabeled data, we propose a novel regularization called the tangent-normal adversarial regularization, which is composed by two parts. The two terms complement with each other and jointly enforce the smoothness along two different directions that are crucial for semi-supervised learning. One is applied along the tangent space of the data manifold, aiming to enforce local invariance of the classifier on the manifold, while the other is performed on the normal space orthogonal to the tangent space, intending to impose robustness on the classifier against the noise causing the observed data deviating from the underlying data manifold. Both of the two regularizers are achieved by the strategy of virtual adversarial training. Our method has achieved state-of-the-art performance on semi-supervised learning tasks on both artificial dataset and FashionMNIST dataset. TANKER Named Entity Recognition and Disambiguation (NERD) systems have recently been widely researched to deal with the significant growth of the Web. NERD systems are crucial for several Natural Language Processing (NLP) tasks such as summarization, understanding, and machine translation. However, there is no standard interface specification, i.e. these systems may vary significantly either for exporting their outputs or for processing the inputs. Thus, when a given company desires to implement more than one NERD system, the process is quite exhaustive and prone to failure. In addition, industrial solutions demand critical requirements, e.g., large-scale processing, completeness, versatility, and licenses. Commonly, these requirements impose a limitation, making good NERD models to be ignored by companies. This paper presents TANKER, a distributed architecture which aims to overcome scalability, reliability and failure tolerance limitations related to industrial needs by combining NERD systems. To this end, TANKER relies on a micro-services oriented architecture, which enables agile development and delivery of complex enterprise applications. In addition, TANKER provides a standardized API which makes possible to combine several NERD systems at once. Target Diagram tdr Target Driven Instance Detector(TDID) While state-of-the-art general object detectors are getting better and better, there are not many systems specifically designed to take advantage of the instance detection problem. For many applications, such as household robotics, a system may need to recognize a few very specific instances at a time. Speed can be critical in these applications, as can the need to recognize previously unseen instances. We introduce a Target Driven Instance Detector(TDID), which modifies existing general object detectors for the instance recognition setting. TDID not only improves performance on instances seen during training, with a fast runtime, but is also able to generalize to detect novel instances. Target Set Selection in a Social Network(TSS Problem) Given a social network with diffusion probabilities as edge weights and an integer k, which k nodes should be chosen for initial injection of information to maximize influence in the network? This problem is known as Target Set Selection in a social network (TSS Problem) and more popularly, Social Influence Maximization Problem (SIM Problem). Target-Based Temporal Difference Learning The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side. In this work, we introduce a new family of target-based temporal difference (TD) learning algorithms and provide theoretical analysis on their convergences. In contrast to the standard TD-learning, target-based TD algorithms maintain two separate learning parameters-the target variable and online variable. Particularly, we introduce three members in the family, called the averaging TD, double TD, and periodic TD, where the target variable is updated through an averaging, symmetric, or periodic fashion, mirroring those techniques used in deep Q-learning practice. We establish asymptotic convergence analyses for both averaging TD and double TD and a finite sample analysis for periodic TD. In addition, we also provide some simulation results showing potentially superior convergence of these target-based TD algorithms compared to the standard TD-learning. While this work focuses on linear function approximation and policy evaluation setting, we consider this as a meaningful step towards the theoretical understanding of deep Q-learning variants with target networks. Targeted Kernel Network(TKN) We propose Attentive Regularization (AR), a method to constrain the activation maps of kernels in Convolutional Neural Networks (CNNs) to specific regions of interest (ROIs). Each kernel learns a location of specialization along with its weights through standard backpropagation. A differentiable attention mechanism requiring no additional supervision is used to optimize the ROIs. Traditional CNNs of different types and structures can be modified with this idea into equivalent Targeted Kernel Networks (TKNs), while keeping the network size nearly identical. By restricting kernel ROIs, we reduce the number of sliding convolutional operations performed throughout the network in its forward pass, speeding up both training and inference. We evaluate our proposed architecture on both synthetic and natural tasks across multiple domains. TKNs obtain significant improvements over baselines, requiring less computation (around an order of magnitude) while achieving superior performance. Targeted Learning The statistics profession is at a unique point in history. The need for valid statistical tools is greater than ever; data sets are massive, often measuring hundreds of thousands of measurements for a single subject. The field is ready to move towards clear objective benchmarks under which tools can be evaluated. Targeted learning allows 1) the full generalization and utilization of cross-validation as an estimator selection tool so that the subjective choices made by humans are now made by the machine, and 2) targeting the fitting of the probability distribution of the data toward the target parameter representing the scientific question of interest. Targeted learning methods build machine-learning-based estimators of parameters defined as features of the probability distribution of the data, while also providing influence-curve or bootstrap-based confidence internals. The theory offers a general template for creating targeted maximum likelihood estimators for a data structure, nonparametric or semiparametric statistical model, and parameter mapping. These estimators of causal inference parameters are double robust and have a variety of other desirable statistical properties. Targeted maximum likelihood estimation built on the loss-based ‘super learning’ system such that lower-dimensional parameters could be targeted (e.g., a marginal causal effect); the remaining bias for the (low-dimensional) target feature of the probability distribution was removed. Targeted learning for effect estimation and causal inference allows for the complete integration of machine learning advances in prediction while providing statistical inference for the target parameter(s) of interest. http://…/9781441997814 http://…/papers SuperLearner,tmle Targeted Maximum Likelihood Estimation(TMLE) Maximum likelihood estimation fits a model to data, minimizing a global measure, such as mean squared error (MSE). When we are interested in one particular parameter of the data distribution and consider the remaining parameters to be nuisance parameters, we would prefer an estimate that has smaller bias and variance for the targeted parameter, at the expense of increased bias and/or variance in the estimation of nuisance parameters. Targeted maximum likelihood estimation targets the MLE estimate of the parameter of interest in a way that reduces bias. This bias reduction is sometimes accompanied by an increase in the variance of the estimate, but the procedure often reduces variance as well in finite samples. Asymptotically, TMLE is maximally efficient when the model and nuisance parameters are correctly specified. The framework of targeted maximum likelihood estimation (TMLE), introduced in van der Laan & Rubin (2006), is a principled approach for constructing asymptotically linear and efficient substitution estimators in rich infinite-dimensional models. The mechanics of TMLE hinge upon first-order approximations of the parameter of interest as a mapping on the space of probability distributions. For such approximations to hold, a second-order remainder term must tend to zero sufficiently fast. In practice, this means an initial estimator of the underlying data-generating distribution with a sufficiently large rate of convergence must be available — in many cases, this requirement is prohibitively difficult to satisfy. http://…/paper335 Targeted Minimum Loss Based Estimation(TMLE) Targeted minimum loss based estimation (TMLE) provides a template for the construction of semiparametric locally efficient double robust substitution estimators of the target parameter of the data generating distribution in a semiparametric censored data or causal inference model based on a sample of independent and identically distributed copies from this data generating distribution. A New Approach to Hierarchical Data Analysis: Targeted Maximum Likelihood Estimation of Cluster-Based Effects Under Interference Tarjan’s Strongly Connected Components Algorithm Tarjan’s Algorithm (named for its discoverer, Robert Tarjan) is a graph theory algorithm for finding the strongly connected components of a graph. Although it precedes it chronologically, it can be seen as an improved version of Kosaraju’s algorithm, and is comparable in efficiency to the path-based strong component algorithm. Task Embedded Coordinate Update(TECU) We in this paper propose a realizable framework TECU, which embeds task-specific strategies into update schemes of coordinate descent, for optimizing multivariate non-convex problems with coupled objective functions. On one hand, TECU is capable of improving algorithm efficiencies through embedding productive numerical algorithms, for optimizing univariate sub-problems with nice properties. From the other side, it also augments probabilities to receive desired results, by embedding advanced techniques in optimizations of realistic tasks. Integrating both numerical algorithms and advanced techniques together, TECU is proposed in a unified framework for solving a class of non-convex problems. Although the task embedded strategies bring inaccuracies in sub-problem optimizations, we provide a realizable criterion to control the errors, meanwhile, to ensure robust performances with rigid theoretical analyses. By respectively embedding ADMM and a residual-type CNN in our algorithm framework, the experimental results verify both efficiency and effectiveness of embedding task-oriented strategies in coordinate descent for solving practical problems. Task Transfer Net(TTNet) In this work, we present a novel meta-learning algorithm, i.e. TTNet (Task Transfer Net), that regresses model parameters for novel tasks for which no ground truth is available (zero-shot tasks). In order to adapt to novel zero-shot tasks, our meta-learner learns from the model parameters of known tasks (with ground truth) and the correlation of known tasks to zero-shot tasks. Such intuition finds its foothold in cognitive science, where a subject (human baby) can adapt to a novel-concept (depth understanding) by correlating it with old concepts (hand movement or self-motion), without receiving explicit supervision. We evaluated our model on the Taskonomy dataset, with four tasks as zero-shot: surface-normal, room layout, depth, and camera pose estimation. These tasks were chosen based on the data acquisition complexity and the complexity associated with the learning process using a deep network. Our proposed methodology out-performs state-of-the-art models (which use ground truth)on each of our zero-shot tasks, showing promise on zero-shot task transfer. We also conducted extensive experiments to study the various choices of our methodology, as well as showed how the proposed method can also be used in transfer learning. To the best of our knowledge, this is the firstsuch effort on zero-shot learning in the task space. Task2Vec We introduce a method to provide vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations. Given a dataset with ground-truth labels and a loss function defined over those labels, we process images through a ‘probe network’ and compute an embedding based on estimates of the Fisher information matrix associated with the probe network parameters. This provides a fixed-dimensional embedding of the task that is independent of details such as the number of classes and does not require any understanding of the class label semantics. We demonstrate that this embedding is capable of predicting task similarities that match our intuition about semantic and taxonomic relations between different visual tasks (e.g., tasks based on classifying different types of plants are similar) We also demonstrate the practical value of this framework for the meta-task of selecting a pre-trained feature extractor for a new task. We present a simple meta-learning framework for learning a metric on embeddings that is capable of predicting which feature extractors will perform well. Selecting a feature extractor with task embedding obtains a performance close to the best available feature extractor, while costing substantially less than exhaustively training and evaluating on all available feature extractors. Task-Aware Feature Embedding Network(TAFE-Net) Learning good feature embeddings for images often requires substantial training data. As a consequence, in settings where training data is limited (e.g., few-shot and zero-shot learning), we are typically forced to use a generic feature embedding across various tasks. Ideally, we want to construct feature embeddings that are tuned for the given task. In this work, we propose Task-Aware Feature Embedding Networks (TAFE-Nets) to learn how to adapt the image representation to a new task in a meta learning fashion. Our network is composed of a meta learner and a prediction network. Based on a task input, the meta learner generates parameters for the feature layers in the prediction network so that the feature embedding can be accurately adjusted for that task. We show that TAFE-Net is highly effective in generalizing to new tasks or concepts and evaluate the TAFE-Net on a range of benchmarks in zero-shot and few-shot learning. Our model matches or exceeds the state-of-the-art on all tasks. In particular, our approach improves the prediction accuracy of unseen attribute-object pairs by 4 to 15 points on the challenging visual attribute-object composition task. Task-Embedded Control Network Much like humans, robots should have the ability to leverage knowledge from previously learned tasks in order to learn new tasks quickly in new and unfamiliar environments. Despite this, most robot learning approaches have focused on learning a single task, from scratch, with a limited notion of generalisation, and no way of leveraging the knowledge to learn other tasks more efficiently. One possible solution is meta-learning, but many of the related approaches are limited in their ability to scale to a large number of tasks and to learn further tasks without forgetting previously learned ones. With this in mind, we introduce Task-Embedded Control Networks, which employ ideas from metric learning in order to create a task embedding that can be used by a robot to learn new tasks from one or more demonstrations. In the area of visually-guided manipulation, we present simulation results in which we surpass the performance of a state-of-the-art method when using only visual information from each demonstration. Additionally, we demonstrate that our approach can also be used in conjunction with domain randomisation to train our few-shot learning ability in simulation and then deploy in the real world without any additional training. Once deployed, the robot can learn new tasks from a single real-world demonstration. Task-Free Continual Learning Methods proposed in the literature towards continual deep learning typically operate in a task-based sequential learning setup. A sequence of tasks is learned, one at a time, with all data of current task available but not of previous or future tasks. Task boundaries and identities are known at all times. This setup, however, is rarely encountered in practical applications. Therefore we investigate how to transform continual learning to an online setup. We develop a system that keeps on learning over time in a streaming fashion, with data distributions gradually changing and without the notion of separate tasks. To this end, we build on the work on Memory Aware Synapses, and show how this method can be made online by providing a protocol to decide i) when to update the importance weights, ii) which data to use to update them, and iii) how to accumulate the importance weights at each update step. Experimental results show the validity of the approach in the context of two applications: (self-)supervised learning of a face recognition model by watching soap series and learning a robot to avoid collisions. TauCharts Javascript charts with a focus on data, design and flexibility. Free open source D3.js-based library. TauCharts is the data-focused charting library. Our goal – help people to build interactive complex visualizations easily. Achieve Charting Zen With TauCharts taucharts tau-False Positive Learning(tau-FPL) Learning a classifier with control on the false-positive rate plays a critical role in many machine learning applications. Existing approaches either introduce prior knowledge dependent label cost or tune parameters based on traditional classifiers, which lack consistency in methodology because they do not strictly adhere to the false-positive rate constraint. In this paper, we propose a novel scoring-thresholding approach, tau-False Positive Learning (tau-FPL) to address this problem. We show the scoring problem which takes the false-positive rate tolerance into accounts can be efficiently solved in linear time, also an out-of-bootstrap thresholding method can transform the learned ranking function into a low false-positive classifier. Both theoretical analysis and experimental results show superior performance of the proposed tau-FPL over existing approaches. tax2vec The use of background knowledge remains largely unexploited in many text classification tasks. In this work, we explore word taxonomies as means for constructing new semantic features, which may improve the performance and robustness of the learned classifiers. We propose tax2vec, a parallel algorithm for constructing taxonomy based features, and demonstrate its use on six short-text classification problems, including gender, age and personality type prediction, drug effectiveness and side effect prediction, and news topic prediction. The experimental results indicate that the interpretable features constructed using tax2vec can notably improve the performance of classifiers; the constructed features, in combination with fast, linear classifiers tested against strong baselines, such as hierarchical attention neural networks, achieved comparable or better classification results on short documents. Further, tax2vec can also serve for extraction of corpus-specific keywords. Finally, we investigated the semantic space of potential features where we observe a similarity with the well known Zipf’s law. Taxicab Correspondence Analysis Taxicab Correspondence Analysis, Choulakian (2006) . Classical correspondence analysis (CA) is a statistical method to analyse 2-dimensional tables of positive numbers and is typically applied to contingency tables (Benzecri, J.-P. (1973). L’Analyse des Donnees. Volume II. L’Analyse des Correspondances. Paris, France: Dunod). Classical CA is based on the Euclidean distance. Taxicab CA is like classical CA but is based on the Taxicab or Manhattan distance. For some tables, Taxicab CA gives more informative results than classical CA. TaxicabCA TaxoGen Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods. Taxonomy Induction for Fictional Domains(TiFi) Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin. Taxonomy of Speech Recognition Systems Automatic voice-controlled systems have changed the way humans interact with a computer. Voice or speech recognition systems allow a user to make a hands-free request to the computer, which in turn processes the request and serves the user with appropriate responses. After years of research and developments in machine learning and artificial intelligence, today voice-controlled technologies have become more efficient and are widely applied in many domains to enable and improve human-to-human and human-to-computer interactions. The state-of-the-art e-commerce applications with the help of web technologies offer interactive and user-friendly interfaces. However, there are some instances where people, especially with visual disabilities, are not able to fully experience the serviceability of such applications. A voice-controlled system embedded in a web application can enhance user experience and can provide voice as a means to control the functionality of e-commerce websites. In this paper, we propose a taxonomy of speech recognition systems (SRS) and present a voice-controlled commodity purchase e-commerce application using IBM Watson speech-to-text to demonstrate its usability. The prototype can be extended to other application scenarios such as government service kiosks and enable analytics of the converted text data for scenarios such as medical diagnosis at the clinics. Taylor-Based Optimized Recursive Extended Exponential Smoothed Neural Networks Forecasting Taylor-based Optimized Recursive Extended Exponential Smoothed Neural Networks Forecasting Method TBATS Models(TBATS) The identifier BATS is an acronym for key features of the model: Box-Cox transform, ARMA errors, Trend, and Seasonal components. The initial T in TBATS is connoting ‘trigonometric’. TCDCaps The critical challenge in tracking-by-detection framework is how to avoid drift problem during online learning, where the robust features for a variety of appearance changes are difficult to be learned and a reasonable intersection over union (IoU) threshold that defines the true/false positives is hard to set. This paper presents the TCDCaps method to address the problems above via a cascaded dense capsule architecture. To get robust features, we extend original capsules with dense-connected routing, which are referred as DCaps. Depending on the preservation of part-whole relationships in the Capsule Networks, our dense-connected capsules can capture a variety of appearance variations. In addition, to handle the issue of IoU threshold, a cascaded DCaps model (CDCaps) is proposed to improve the quality of candidates, it consists of sequential DCaps trained with increasing IoU thresholds so as to sequentially improve the quality of candidates. Extensive experiments on 3 popular benchmarks demonstrate the robustness of the proposed TCDCaps. t-Distributed Stochastic Neighbor Embedding(t-SNE,TSNE) t-distributed stochastic neighbor embedding (t-SNE) is a machine learning algorithm for dimensionality reduction developed by Laurens van der Maaten and Geoffrey Hinton. It is a nonlinear dimensionality reduction technique that is particularly well suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points. The t-SNE algorithms comprises two main stages. First, t-SNE constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an infinitesimal probability of being picked. Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback-Leibler divergence between the two distributions with respect to the locations of the points in the map. http://…buted-stochastic-neighbor-embedding-t-sne Visualizing Data using t-SNE tsne TE141K Text effects are combinations of visual elements such as outlines, colors and textures of text, which can dramatically improve its artistry. Although text effects are extensively utilized in the design industry, they are usually created by human experts due to their extreme complexity, which is laborious and not practical for normal users. In recent years, some efforts have been made for automatic text effects transfer, however, the lack of data limits the capability of transfer models. To address this problem, we introduce a new text effects dataset, TE141K, with 141,081 text effects/glyph pairs in total. Our dataset consists of 152 professionally designed text effects, rendered on glyphs including English letters, Chinese characters, Arabic numerals, etc. To the best of our knowledge, this is the largest dataset for text effects transfer as far. Based on this dataset, we propose a baseline approach named Text Effects Transfer GAN (TET-GAN), which supports the transfer of all 152 styles in one model and can efficiently extend to new styles. Finally, we conduct a comprehensive comparison where 14 style transfer models are benchmarked. Experimental results demonstrate the superiority of TET-GAN both qualitatively and quantitatively, and indicate that our dataset is effective and challenging. Tea Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests, and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests. TEA Functions(TEA) · Transformations are functions that take existing input data and apply a function to it such that it changes form. A simple example could be combining first name, middle name, and last name fields in source data and creating a full name field that is the combination of the three sub fields. · Enrichments are functions that take existing input data, combined with additional data sources, and create new information that could not be gleaned from either source independently. For example, one could take two different lists of individuals and use pattern matching to create relationships that are not apparent from either list itself. · Augmentations are functions that add data of use in combination with the input data. The result is a more complete set of information that combines data from multiple sources. For example, a set of business entities gleaned from a conference attendee list, combined with Dun and Bradstreet profiles for those entities, creates a more complete set of information for each business entity. Teacher Guided Search for Architectures by Generation and Evaluation(TG-SAGE) Strong improvements in network performance in vision tasks have resulted from the search of alternative network architectures, and prior work has shown that this search process can be automated and guided by evaluating candidate network performance following limited training (Performance Guided Architecture Search or PGAS). However, because of the large architecture search spaces and the high computational cost associated with evaluating each candidate model, further gains in computational efficiency are needed. Here we present a method termed Teacher Guided Search for Architectures by Generation and Evaluation (TG-SAGE) that produces up to an order of magnitude in search efficiency over PGAS methods. Specifically, TG-SAGE guides each step of the architecture search by evaluating the similarity of internal representations of the candidate networks with those of the (fixed) teacher network. We show that this procedure leads to significant reduction in required per-sample training and that, this advantage holds for two different search spaces of architectures, and two different search algorithms. We further show that in the space of convolutional cells for visual categorization, TG-SAGE finds a cell structure with similar performance as was previously found using other methods but at a total computational cost that is two orders of magnitude lower than Neural Architecture Search (NAS) and more than four times lower than progressive neural architecture search (PNAS). These results suggest that TG-SAGE can be used to accelerate network architecture search in cases where one has access to some or all of the internal representations of a teacher network of interest, such as the brain. Teacher-Student Curriculum Learning(TSCL) We propose Teacher-Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e. where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student’s performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with LSTM and navigation in Minecraft. Using our automatically generated curriculum enabled to solve a Minecraft maze that could not be solved at all when training directly on solving the maze, and the learning was an order of magnitude faster than uniform sampling of subtasks. Teacher-Student Feature Selection(TSFS) High-dimensional data in many machine learning applications leads to computational and analytical complexities. Feature selection provides an effective way for solving these problems by removing irrelevant and redundant features, thus reducing model complexity and improving accuracy and generalization capability of the model. In this paper, we present a novel teacher-student feature selection (TSFS) method in which a ‘teacher’ (a deep neural network or a complicated dimension reduction method) is first employed to learn the best representation of data in low dimension. Then a ‘student’ network (a simple neural network) is used to perform feature selection by minimizing the reconstruction error of low dimensional representation. Although the teacher-student scheme is not new, to the best of our knowledge, it is the first time that this scheme is employed for feature selection. The proposed TSFS can be used for both supervised and unsupervised feature selection. This method is evaluated on different datasets and is compared with state-of-the-art existing feature selection methods. The results show that TSFS performs better in terms of classification and clustering accuracies and reconstruction error. Moreover, experimental evaluations demonstrate a low degree of sensitivity to parameter selection in the proposed method. Teaching Explanations for Decisions(TED) Artificial intelligence systems are being increasingly deployed due to their potential to increase the efficiency, scale, consistency, fairness, and accuracy of decisions. However, as many of these systems are opaque in their operation, there is a growing demand for such systems to provide explanations for their decisions. Conventional approaches to this problem attempt to expose or discover the inner workings of a machine learning model with the hope that the resulting explanations will be meaningful to the consumer. In contrast, this paper suggests a new approach to this problem. It introduces a simple, practical framework, called Teaching Explanations for Decisions (TED), that provides meaningful explanations that match the mental model of the consumer. We illustrate the generality and effectiveness of this approach with two different examples, resulting in highly accurate explanations with no loss of prediction accuracy for these two examples. Teaching Risk Learning near-optimal behaviour from an expert’s demonstrations typically relies on the assumption that the learner knows the features that the true reward function depends on. In this paper, we study the problem of learning from demonstrations in the setting where this is not the case, i.e., where there is a mismatch between the worldviews of the learner and the expert. We introduce a natural quantity, the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms based on inverse reinforcement learning. Based on these findings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner’s worldview, and thus ultimately enable her to find a near-optimal policy. Teaching-Learning-Based Optimization(TLBO) A new efficient optimization method, called ‘Teaching-Learning-Based Optimization (TLBO)’, is proposed in this paper for the optimization of mechanical design problems. This method works on the effect of influence of a teacher on learners. Like other nature-inspired algorithms, TLBO is also a population-based method and uses a population of solutions to proceed to the global solution. The population is considered as a group of learners or a class of learners. The process of TLBO is divided into two parts: the first part consists of the ‘Teacher Phase’ and the second part consists of the ‘Learner Phase’. ‘Teacher Phase’ means learning from the teacher and ‘Learner Phase’ means learning by the interaction between learners. The basic philosophy of the TLBO method is explained in detail. To check the effectiveness of the method it is tested on five different constrained benchmark test functions with different characteristics, four different benchmark mechanical design problems and six mechanical design optimization problems which have real world applications. The effectiveness of the TLBO method is compared with the other population-based optimization algorithms based on the best solution, average solution, convergence rate and computational effort. Results show that TLBO is more effective and efficient than the other optimization methods for the mechanical design optimization problems considered. This novel optimization method can be easily extended to other engineering design optimization problems. Teaching Learning Based Optimization Algorithm TEA-DNN Embedded deep learning platforms have witnessed two simultaneous improvements. First, the accuracy of convolutional neural networks (CNNs) has been significantly improved through the use of automated neural-architecture search (NAS) algorithms to determine CNN structure. Second, there has been increasing interest in developing application-specific platforms for CNNs that provide improved inference performance and energy consumption as compared to GPUs. Embedded deep learning platforms differ in the amount of compute resources and memory-access bandwidth, which would affect performance and energy consumption of CNNs. It is therefore critical to consider the available hardware resources in the network architecture search. To this end, we introduce TEA-DNN, a NAS algorithm targeting multi-objective optimization of execution time, energy consumption, and classification accuracy of CNN workloads on embedded architectures. TEA-DNN leverages energy and execution time measurements on embedded hardware when exploring the Pareto-optimal curves across accuracy, execution time, and energy consumption and does not require additional effort to model the underlying hardware. We apply TEA-DNN for image classification on actual embedded platforms (NVIDIA Jetson TX2 and Intel Movidius Neural Compute Stick). We highlight the Pareto-optimal operating points that emphasize the necessity to explicitly consider hardware characteristics in the search process. To the best of our knowledge, this is the most comprehensive study of Pareto-optimal models across a range of hardware platforms using actual measurements on hardware to obtain objective values. TechKG Knowledge graph is a kind of valuable knowledge base which would benefit lots of AI-related applications. Up to now, lots of large-scale knowledge graphs have been built. However, most of them are non-Chinese and designed for general purpose. In this work, we introduce TechKG, a large scale Chinese knowledge graph that is technology-oriented. It is built automatically from massive technical papers that are published in Chinese academic journals of different research domains. Some carefully designed heuristic rules are used to extract high quality entities and relations. Totally, it comprises of over 260 million triplets that are built upon more than 52 million entities which come from 38 research domains. Our preliminary ex-periments indicate that TechKG has high adaptability and can be used as a dataset for many diverse AI-related applications. We released TechKG at: http://www.techkg.cn. Technical Debt Technical debt is a metaphor used to convey the idea that doing things in a ‘quick and dirty’ way when designing and constructing a software leads to a situation where one incurs more and more deferred future expenses. Similarly to financial debt, technical debt requires payment of interest in the form of the additional development effort that could have been avoided if the quick and dirty design choices have not been made. Technical debt applies to all the aspects of software development, spanning from initial requirements analysis to deployment, and software evolution. Technical debt is becoming very popular from scientific and industrial perspectives. In particular, there is an increase in the number of related papers over the years. There is also an increase in the number of related tools and of their adoption in the industry, especially since technical debt is very pricey and therefore needs to be managed. However, techniques to estimate technical debt are inadequate, insufficient since they mostly focus on requirements, code, and test, disregarding key artifacts such as the software architecture and the technologies used by the software at hand. TeKnowbase In this paper, we describe the construction of TeKnowbase, a knowledge-base of technical concepts in computer science. Our main information sources are technical websites such as Webopedia and Techtarget as well as Wikipedia and online textbooks. We divide the knowledge-base construction problem into two parts — the acquisition of entities and the extraction of relationships among these entities. Our knowledge-base consists of approximately 100,000 triples. We conducted an evaluation on a sample of triples and report an accuracy of a little over 90\%. We additionally conducted classification experiments on StackOverflow data with features from TeKnowbase and achieved improved classification accuracy. Tell Me Something New(TMSN) We present a novel approach for parallel computation in the context of machine learning that we call ‘Tell Me Something New’ (TMSN). This approach involves a set of independent workers that use broadcast to update each other when they observe ‘something new’. TMSN does not require synchronization or a head node and is highly resilient against failing machines or laggards. We demonstrate the utility of TMSN by applying it to learning boosted trees. We show that our implementation is 10 times faster than XGBoost and LightGBM on the splice-site prediction problem. Temperature Scaling(TS) Confidence calibration – the problem of predicting probability estimates representative of the true correctness likelihood – is important for classification models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration. We evaluate the performance of various post-processing calibration methods on state-ofthe- art architectures with image and document classification datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling – a singleparameter variant of Platt Scaling – is surprisingly effective at calibrating predictions. A New Loss Function for Temperature Scaling to have Better Calibrated Deep Networks Template Model Builder glmmTMB Temporal Affine Network(TAN) With superiorities on low cost, portability, and free of radiation, echocardiogram is a widely used imaging modality for left ventricle (LV) function quantification. However, automatic LV segmentation and motion tracking is still a challenging task. In addition to fuzzy border definition, low contrast, and abounding artifacts on typical ultrasound images, the shape and size of the LV change significantly in a cardiac cycle. In this work, we propose a temporal affine network (TAN) to perform image analysis in a warped image space, where the shape and size variations due to the cardiac motion as well as other artifacts are largely compensated. Furthermore, we perform three frequent echocardiogram interpretation tasks simultaneously: standard cardiac plane recognition, LV landmark detection, and LV segmentation. Instead of using three networks with one dedicating to each task, we use a multi-task network to perform three tasks simultaneously. Since three tasks share the same encoder, the compact network improves the segmentation accuracy with more supervision. The network is further finetuned with optical flow adjusted annotations to enhance motion coherence in the segmentation result. Experiments on 1,714 2D echocardiographic sequences demonstrate that the proposed method achieves state-of-the-art segmentation accuracy with real-time efficiency. Temporal Aggregation We call temporal aggregation the situation in which a variable that evolves through time can not be observed at all dates. This phenomenon arises frequently in economics, where it is very expensive to collect data on certain variables, and there is no reason to believe that economic time series are collected at the frequency required to fully capture the movements of the economy. For example, we only have quarterly observations on GNP, but it is reasonable to believe that the behavior of GNP within a quarter carries relevant information about the structure of the economy. Temporal Aggregation Network(TAN) We present Temporal Aggregation Network (TAN) which decomposes 3D convolutions into spatial and temporal aggregation blocks. By stacking spatial and temporal convolutions repeatedly, TAN forms a deep hierarchical representation for capturing spatio-temporal information in videos. Since we do not apply 3D convolutions in each layer but only apply temporal aggregation blocks once after each spatial downsampling layer in the network, we significantly reduce the model complexity. The use of dilated convolutions at different resolutions of the network helps in aggregating multi-scale spatio-temporal information efficiently. Experiments show that our model is well suited for dense multi-label action recognition, which is a challenging sub-topic of action recognition that requires predicting multiple action labels in each frame. We outperform state-of-the-art methods by 5% and 3% on the Charades and Multi-THUMOS dataset respectively. Temporal Automatic Relation Discovery in Sequences(TARDIS) Recent empirical results on long-term dependency tasks have shown that neural networks augmented with an external memory can learn the long-term dependency tasks more easily and achieve better generalization than vanilla recurrent neural networks (RNN). We suggest that memory augmented neural networks can reduce the effects of vanishing gradients by creating shortcut (or wormhole) connections. Based on this observation, we propose a novel memory augmented neural network model called TARDIS (Temporal Automatic Relation Discovery in Sequences). The controller of TARDIS can store a selective set of embeddings of its own previous hidden states into an external memory and revisit them as and when needed. For TARDIS, memory acts as a storage for wormhole connections to the past to propagate the gradients more effectively and it helps to learn the temporal dependencies. The memory structure of TARDIS has similarities to both Neural Turing Machines (NTM) and Dynamic Neural Turing Machines (D-NTM), but both read and write operations of TARDIS are simpler and more efficient. We use discrete addressing for read/write operations which helps to substantially to reduce the vanishing gradient problem with very long sequences. Read and write operations in TARDIS are tied with a heuristic once the memory becomes full, and this makes the learning problem simpler when compared to NTM or D-NTM type of architectures. We provide a detailed analysis on the gradient propagation in general for MANNs. We evaluate our models on different long-term dependency tasks and report competitive results in all of them. Temporal Bibliographic Network We present two ways (instantaneous and cumulative) to transform bibliographic networks, using the works’ publication year, into corresponding temporal networks based on temporal quantities. We also show how to use the addition of temporal quantities to define interesting temporal properties of nodes, links and their groups thus providing an insight into evolution of bibliographic networks. Using the multiplication of temporal networks we obtain different derived temporal networks providing us with new views on studied networks. The proposed approach is illustrated with examples from the collection of bibliographic networks on peer review. Temporal Causal Model Temporal causal modeling attempts to discover key causal relationships in time series data. In temporal causal modeling, you specify a set of target series anda set of candidate inputs to those targets. The procedure then builds an autoregressivetime series model for each target and includes only those inputs that have a causalrelationship with the target. This approach differs from traditional time series modelingwhere you must explicitly specify the predictors for a target series. Since temporalcausal modeling typically involves building models for multiple related time series, theresult is referred to as a model system. In the context of temporal causal modeling, the term causal refers to Granger causality. A time series X is said to ‘Grangercause’ another time series Y if regressing for Y in terms of past values of both X and Yresults in a better model for Y than regressing only on past values of Y. Temporal Convolutional Network(TCN) The dominant paradigm for video-based action segmentation is composed of two steps: first, for each frame, compute low-level features using Dense Trajectories or a Convolutional Neural Network that encode spatiotemporal information locally, and second, input these features into a classifier that captures high-level temporal relationships, such as a Recurrent Neural Network (RNN). While often effective, this decoupling requires specifying two separate models, each with their own complexities, and prevents capturing more nuanced long-range spatiotemporal relationships. We propose a unified approach, as demonstrated by our Temporal Convolutional Network (TCN), that hierarchically captures relationships at low-, intermediate-, and high-level time-scales. Our model achieves superior or competitive performance using video or sensor data on three public action segmentation datasets and can be trained in a fraction of the time it takes to train an RNN. Temporal Convolutional Nets (TCNs) Take Over from RNNs for NLP Predictions Temporal Database A temporal database is a database with built-in support for handling data involving time, being related to Slowly changing dimension concept, for example a temporal data model and a temporal version of Structured Query Language (SQL). More specifically the temporal aspects usually include valid time and transaction time. These attributes can be combined to form bitemporal data. Valid time is the time period during which a fact is true with respect to the real world. Transaction time is the time period during which a fact stored in the database is considered to be true. Bitemporal data combines both Valid and Transaction Time. It is possible to have timelines other than Valid Time and Transaction Time, such as Decision Time, in the database. In that case the database is called a multitemporal database as opposed to a bitemporal database. However, this approach introduces additional complexities such as dealing with the validity of (foreign) keys. Temporal databases are in contrast to current databases, which store only facts which are believed to be true at the current time. Temporal Deformable Convolutional Encoder-Decoder Network(TDConvED) It is well believed that video captioning is a fundamental but challenging task in both computer vision and artificial intelligence fields. The prevalent approach is to map an input video to a variable-length output sentence in a sequence to sequence manner via Recurrent Neural Network (RNN). Nevertheless, the training of RNN still suffers to some degree from vanishing/exploding gradient problem, making the optimization difficult. Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations. In this paper, we present a novel design — Temporal Deformable Convolutional Encoder-Decoder Networks (dubbed as TDConvED) that fully employ convolutions in both encoder and decoder networks for video captioning. Technically, we exploit convolutional block structures that compute intermediate states of a fixed number of inputs and stack several blocks to capture long-term relationships. The structure in encoder is further equipped with temporal deformable convolution to enable free-form deformation of temporal sampling. Our model also capitalizes on temporal attention mechanism for sentence generation. Extensive experiments are conducted on both MSVD and MSR-VTT video captioning datasets, and superior results are reported when comparing to conventional RNN-based encoder-decoder techniques. More remarkably, TDConvED increases CIDEr-D performance from 58.8% to 67.2% on MSVD. Temporal Dependency Network(TDN) While conventional methods for sequential learning focus on interaction between consecutive inputs, we suggest a new method which captures composite semantic flows with variable-length dependencies. In addition, the semantic structures within given sequential data can be interpreted by visualizing temporal dependencies learned from the method. The proposed method, called Temporal Dependency Network (TDN), represents a video as a temporal graph whose node represents a frame of the video and whose edge represents the temporal dependency between two frames of a variable distance. The temporal dependency structure of semantic is discovered by learning parameterized kernels of graph convolutional methods. We evaluate the proposed method on the large-scale video dataset, Youtube-8M. By visualizing the temporal dependency structures as experimental results, we show that the suggested method can find the temporal dependency structures of video semantic. Temporal Difference Learning(TD) Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning. A key idea of TD learning is that it is learning predictive knowledge about the environment in the form of value functions, from which it can derive its behavior to address long-term sequential decision making problems. The agent’s horizon of interest, that is, how immediate or long-term a TD learning agent predicts into the future, is adjusted through a discount rate parameter. In this paper, we introduce an alternative view on the discount rate, with insight from digital signal processing, to include complex-valued discounting. Our results show that setting the discount rate to appropriately chosen complex numbers allows for online and incremental estimation of the Discrete Fourier Transform (DFT) of a signal of interest with TD learning. We thereby extend the types of knowledge representable by value functions, which we show are particularly useful for identifying periodic effects in the reward sequence. Temporal Difference Variational Auto-Encoder One motivation for learning generative models of environments is to use them as simulators for model-based reinforcement learning. Yet, it is intuitively clear that when time horizons are long, rolling out single step transitions is inefficient and often prohibitive. In this paper, we propose a generative model that learns state representations containing explicit beliefs about states several time steps in the future and that can be rolled out directly in these states without executing single step transitions. The model is trained on pairs of temporally separated time points, using an analogue of temporal difference learning used in reinforcement learning, taking the belief about possible futures at one time point as a bootstrap for training the belief at an earlier time. While we focus purely on the study of the model rather than its use in reinforcement learning, the model architecture we design respects agents’ constraints as it builds the representation online. Temporal Event Graph(TEG) Temporal networks are increasingly being used to model the interactions of complex systems. Most studies require the temporal aggregation of edges (or events) into discrete time steps to perform analysis. In this article we describe a static, lossless, and unique representation of a temporal network, the temporal event graph (TEG). The TEG describes the temporal network in terms of both the inter-event time and two-event temporal motif distributions. By considering these distributions in unison we provide a new method to characterise the behaviour of individuals and collectives in temporal networks as well as providing a natural decomposition of the network. We illustrate the utility of the TEG by providing examples on both synthetic and real temporal networks. Temporal Exponential Random Graph Model(TERGM) Given the growing number of available tools for modeling dynamic networks, the choice of a suitable model becomes central. It is often difficult to compare the different models with respect to their applicability and interpretation. The goal of this survey is to provide an overview of popular dynamic network models. The survey is focused on introducing binary network models with their corresponding assumptions, advantages, and shortfalls. The models are divided according to generating processes, operating in discrete and continuous time, respectively. First, we introduce the Temporal Exponential Random Graph Model (TERGM) and its extension, the Separable TERGM (STERGM), both being time-discrete models. These models are then contrasted with continuous process models, focusing on the Relational Event Model (REM). We additionally show how the REM can handle time-clustered observations, i.e., continuous time data observed at discrete time points. Besides the discussion of theoretical properties and fitting procedures, we specifically focus on the application of the models using a network that represents international arms transfers. The data allow to demonstrate the applicability and interpretation of the network models. btergm Temporal Hierarchical Clustering We study hierarchical clusterings of metric spaces that change over time. This is a natural geometric primitive for the analysis of dynamic data sets. Specifically, we introduce and study the problem of finding a temporally coherent sequence of hierarchical clusterings from a sequence of unlabeled point sets. We encode the clustering objective by embedding each point set into an ultrametric space, which naturally induces a hierarchical clustering of the set of points. We enforce temporal coherence among the embeddings by finding correspondences between successive pairs of ultrametric spaces which exhibit small distortion in the Gromov-Hausdorff sense. We present both upper and lower bounds on the approximability of the resulting optimization problems. Temporal Knowledge Distillation(TKD) Deep neural networks based methods have been proved to achieve outstanding performance on object detection and classification tasks. Despite significant performance improvement, due to the deep structures, they still require prohibitive runtime to process images and maintain the highest possible performance for real-time applications. Observing the phenomenon that human vision system (HVS) relies heavily on the temporal dependencies among frames from the visual input to conduct recognition efficiently, we propose a novel framework dubbed as TKD: temporal knowledge distillation. This framework distills the temporal knowledge from a heavy neural networks based model over selected video frames (the perception of the moments) to a light-weight model. To enable the distillation, we put forward two novel procedures: 1) an Long-short Term Memory (LSTM) based key frame selection method; and 2) a novel teacher-bounded loss design. To validate, we conduct comprehensive empirical evaluations using different object detection methods over multiple datasets including Youtube-Objects and Hollywood scene dataset. Our results show consistent improvement in accuracy-speed trad-offs for object detection over the frames of the dynamic scene, compare to other modern object recognition methods. Temporal Logic In logic, temporal logic is any system of rules and symbolism for representing, and reasoning about, propositions qualified in terms of time (for example, ‘I am always hungry’, ‘I will eventually be hungry’, or ‘I will be hungry until I eat something’). It is sometimes also used to refer to tense logic, a modal logic-based system of temporal logic introduced by Arthur Prior in the late 1950s, with important contributions by Hans Kamp. It has been further developed by computer scientists, notably Amir Pnueli, and logicians. Temporal logic has found an important application in formal verification, where it is used to state requirements of hardware or software systems. For instance, one may wish to say that whenever a request is made, access to a resource is eventually granted, but it is never granted to two requestors simultaneously. Such a statement can conveniently be expressed in a temporal logic. Temporal Multinomial Mixture(TMM) Evolutionary clustering aims at capturing the temporal evolution of clusters. This issue is particularly important in the context of social media data that are naturally temporally driven. In this paper, we propose a new probabilistic model-based evolutionary clustering technique. The Temporal Multinomial Mixture (TMM) is an extension of classical mixture model that optimizes feature co-occurrences in the trade-off with temporal smoothness. Our model is evaluated for two recent case studies on opinion aggregation over time. We compare four different probabilistic clustering models and we show the superiority of our proposal in the task of instance-oriented clustering. Temporal Network Autocorrelation Models(TNAM) tnam,xergm Temporal Network Centrality(TNC) TNC Temporal Overdrive Recurrent Neural Network In this work we present a novel recurrent neural network architecture designed to model systems characterized by multiple characteristic timescales in their dynamics. The proposed network is composed by several recurrent groups of neurons that are trained to separately adapt to each timescale, in order to improve the system identification process. We test our framework on time series prediction tasks and we show some promising, preliminary results achieved on synthetic data. To evaluate the capabilities of our network, we compare the performance with several state-of-the-art recurrent architectures. Temporal Pattern Mining Temporal Pattern Mining (TPM) is the problem of mining predictive complex temporal patterns from multivariate time series in a supervised setting. Temporal Point Process A temporal point process is a random process whose realizations consist of the times {tau_j}_(j in J) of isolated events. Temporal Recurrent Network(TRN) Most work on temporal action detection is formulated in an offline manner, in which the start and end times of actions are determined after the entire video is fully observed. However, real-time applications including surveillance and driver assistance systems require identifying actions as soon as each video frame arrives, based only on current and historical observations. In this paper, we propose a novel framework, Temporal Recurrent Networks (TRNs), to model greater temporal context of a video frame by simultaneously performing online action detection and anticipation of the immediate future. At each moment in time, our approach makes use of both accumulated historical evidence and predicted future information to better recognize the action that is currently occurring, and integrates both of these into a unified end-to-end architecture. We evaluate our approach on two popular online action detection datasets, HDD and TVSeries, as well as another widely used dataset, THUMOS’14. The results show that TRN significantly outperforms the state-of-the-art. Temporal Regularized Matrix Factorization(TRMF) Matrix factorization approaches have been applied to a variety of applications, from recommendation systems to multi-label learning. Standard low rank matrix factorization methods fail in cases when the data can be modeled as a time series, since they do not take into account the dependencies among factors, while EM algorithms designed for time series data are inapplicable to large multiple time series data. To overcome this, matrix factorization approaches are augmented with dynamic linear model based regularization frameworks. A major drawback in such approaches is that the exact dependencies between the latent factors are assumed to be known. In this paper, we introduce a Temporal Regularized Matrix Factorization (TRMF) method, an efficient alternating minimization scheme that not only learns the latent time series factors, but also the dependencies among the latent factors. TRMF is highly general, and subsumes several existing matrix factorization approaches for time series data. We make interesting connections to graph based matrix factorization methods in the context of learning the dependencies. Experiments on both real and synthetic data show that TRMF is highly scalable, and outperforms several existing approaches used for common large scale time series tasks. Temporal Term Histogram(TTH) Temporal text, i.e., time-stamped text data are found abundantly in a variety of data sources like newspapers, blogs and social media posts. While today’s data management systems provide facilities for searching full-text data, they do not provide any simple primitives for performing analytical operations with text. This paper proposes the temporal term histograms (TTH) as an intermediate primitive that can be used for analytical tasks. We propose an algebra, with operators and equivalence rules for TTH and present a reference implementation on a relational database system. Temporal Walk Networks evolve continuously over time with the addition, deletion, and changing of links and nodes. Such temporal networks (or edge streams) consist of a sequence of timestamped edges and are seemingly ubiquitous. Despite the importance of accurately modeling the temporal information, most embedding methods ignore it entirely or approximate the temporal network using a sequence of static snapshot graphs. In this work, we introduce the notion of \emph{temporal walks} for learning dynamic embeddings from temporal networks. Temporal walks capture the temporally valid interactions (\eg, flow of information, spread of disease) in the dynamic network in a lossless fashion. Based on the notion of temporal walks, we describe a general class of embeddings called continuous-time dynamic network embeddings (CTDNEs) that completely avoid the issues and problems that arise when approximating the temporal network as a sequence of static snapshot graphs. Unlike previous work, CTDNEs learn dynamic node embeddings directly from the temporal network at the finest temporal granularity and thus use only temporally valid information. As such CTDNEs naturally support online learning of the node embeddings in a streaming real-time fashion. The experiments demonstrate the effectiveness of this class of embedding methods for prediction in temporal networks. Temporal-Difference Learning(TD Learning) Temporal Difference (TD) learning refers to a class of model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function. These methods sample from the environment, like Monte Carlo methods, and perform updates based on current estimates, like dynamic programming methods. While Monte Carlo methods only adjust their estimates once the final outcome is known, TD methods adjust predictions to match later, more accurate, predictions about the future before the final outcome is known. This is a form of bootstrapping, as illustrated with the following example: ‘Suppose you wish to predict the weather for Saturday, and you have some model that predicts Saturday’s weather, given the weather of each day in the week. In the standard case, you would wait until Saturday and then adjust all your models. However, when it is, for example, Friday, you should have a pretty good idea of what the weather would be on Saturday – and thus be able to change, say, Saturday’s model before Saturday arrives’. Temporal difference methods are related to the temporal difference model of animal learning. Temporal Difference Learning in Python Temporal Difference Learning with Neural Networks – Study of the Leakage Propagation Problem Temporal-Spatial Mapping Deep learning models have enjoyed great success for image related computer vision tasks like image classification and object detection. For video related tasks like human action recognition, however, the advancements are not as significant yet. The main challenge is the lack of effective and efficient models in modeling the rich temporal spatial information in a video. We introduce a simple yet effective operation, termed Temporal-Spatial Mapping (TSM), for capturing the temporal evolution of the frames by jointly analyzing all the frames of a video. We propose a video level 2D feature representation by transforming the convolutional features of all frames to a 2D feature map, referred to as VideoMap. With each row being the vectorized feature representation of a frame, the temporal-spatial features are compactly represented, while the temporal dynamic evolution is also well embedded. Based on the VideoMap representation, we further propose a temporal attention model within a shallow convolutional neural network to efficiently exploit the temporal-spatial dynamics. The experiment results show that the proposed scheme achieves the state-of-the-art performance, with 4.2% accuracy gain over Temporal Segment Network (TSN), a competing baseline method, on the challenging human action benchmark dataset HMDB51. TensiStrength Computer systems need to be able to react to stress in order to perform optimally on some tasks. This article describes TensiStrength, a system to detect the strength of stress and relaxation expressed in social media text messages. TensiStrength uses a lexical approach and a set of rules to detect direct and indirect expressions of stress or relaxation, particularly in the context of transportation. It is slightly more effective than a comparable sentiment analysis program, although their similar performances occur despite differences on almost half of the tweets gathered. The effectiveness of TensiStrength depends on the nature of the tweets classified, with tweets that are rich in stress-related terms being particularly problematic. Although generic machine learning methods can give better performance than TensiStrength overall, they exploit topic-related terms in a way that may be undesirable in practical applications and that may not work as well in more focused contexts. In conclusion, TensiStrength and generic machine learning approaches work well enough to be practical choices for intelligent applications that need to take advantage of stress information, and the decision about which to use depends on the nature of the texts analysed and the purpose of the task. Tensor Comprehensions Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user preferences, ad placement, etc. Competing frameworks for building these networks such as TensorFlow, Chainer, CNTK, Torch/PyTorch, Caffe1/2, MXNet and Theano, explore different tradeoffs between usability and expressiveness, research or production orientation and supported hardware. They operate on a DAG of computational operators, wrapping high-performance libraries such as CUDNN (for NVIDIA GPUs) or NNPACK (for various CPUs), and automate memory allocation, synchronization, distribution. Custom operators are needed where the computation does not fit existing high-performance library calls, usually at a high engineering cost. This is frequently required when new operators are invented by researchers: such operators suffer a severe performance penalty, which limits the pace of innovation. Furthermore, even if there is an existing runtime call these frameworks can use, it often doesn’t offer optimal performance for a user’s particular network architecture and dataset, missing optimizations between operators as well as optimizations that can be done knowing the size and shape of data. Our contributions include (1) a language close to the mathematics of deep learning called Tensor Comprehensions offering both imperative and declarative styles, (2) a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, also providing optimizations such as operator fusion and specialization for specific sizes, (3) a compilation cache populated by an autotuner. [Abstract cutoff] Tensor Core The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called ‘Tensor Core’ that performs one matrix-multiply-and-accumulate on 4×4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta microarchitecture, provides 640 Tensor Cores with a theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to program NVIDIA Tensor Cores, their performances and the precision loss due to computation in mixed precision. Currently, NVIDIA provides three different ways of programming matrix-multiply-and-accumulate on Tensor Cores: the CUDA Warp Matrix Multiply Accumulate (WMMA) API, CUTLASS, a templated library based on WMMA, and cuBLAS GEMM. After experimenting with different approaches, we found that NVIDIA Tensor Cores can deliver up to 83 Tflops/s in mixed precision on a Tesla V100 GPU, seven and three times the performance in single and half precision respectively. A WMMA implementation of batched GEMM reaches a performance of 4 Tflops/s. While precision loss due to matrix multiplication with half precision input might be critical in many HPC applications, it can be considerably reduced at the cost of increased computation. Our results indicate that HPC applications using matrix multiplications can strongly benefit from using of NVIDIA Tensor Cores. Tensor Ensemble Learning In big data applications, classical ensemble learning is typically infeasible on the raw input data and dimensionality reduction techniques are necessary. To this end, novel framework that generalises classic flat-view ensemble learning to multidimensional tensor-valued data is introduced. This is achieved by virtue of tensor decompositions, whereby the proposed method, referred to as tensor ensemble learning (TEL), decomposes every input data sample into multiple factors which allows for a flexibility in the choice of multiple learning algorithms in order to improve test performance. The TEL framework is shown to naturally compress multidimensional data in order to take advantage of the inherent multi-way data structure and exploit the benefit of ensemble learning. The proposed framework is verified through the application of Higher Order Singular Value Decomposition (HOSVD) to the ETH-80 dataset and is shown to outperform the classical ensemble learning approach of bootstrap aggregating. Tensor Graph Convolutional Neural Network(TGCNN) In this paper, we propose a novel tensor graph convolutional neural network (TGCNN) to conduct convolution on factorizable graphs, for which here two types of problems are focused, one is sequential dynamic graphs and the other is cross-attribute graphs. Especially, we propose a graph preserving layer to memorize salient nodes of those factorized subgraphs, i.e. cross graph convolution and graph pooling. For cross graph convolution, a parameterized Kronecker sum operation is proposed to generate a conjunctive adjacency matrix characterizing the relationship between every pair of nodes across two subgraphs. Taking this operation, then general graph convolution may be efficiently performed followed by the composition of small matrices, which thus reduces high memory and computational burden. Encapsuling sequence graphs into a recursive learning, the dynamics of graphs can be efficiently encoded as well as the spatial layout of graphs. To validate the proposed TGCNN, experiments are conducted on skeleton action datasets as well as matrix completion dataset. The experiment results demonstrate that our method can achieve more competitive performance with the state-of-the-art methods. Tensor Graphical Lasso(TeraLasso) The Bigraphical Lasso estimator was proposed to parsimoniously model the precision matrices of matrix-normal data based on the Cartesian product of graphs. By enforcing extreme sparsity (the number of parameters) and explicit structures on the precision matrix, this model has excellent potential for improving scalability of the computation and interpretability of complex data analysis. As a result, this model significantly reduces the size of the sample in order to learn the precision matrices, and hence the conditional probability models along different coordinates such as space, time and replicates. In this work, we extend the Bigraphical Lasso (BiGLasso) estimator to the TEnsor gRAphical Lasso (TeraLasso) estimator and propose an analogous method for modeling the precision matrix of tensor-valued data. We establish consistency for both the BiGLasso and TeraLasso estimators and obtain the rates of convergence in the operator and Frobenius norm for estimating the precision matrix. We design a scalable gradient descent method for solving the objective function and analyze the computational convergence rate, showing that the composite gradient descent algorithm is guaranteed to converge at a geometric rate to the global minimizer. Finally, we provide simulation evidence and analysis of a meteorological dataset, showing that we can recover graphical structures and estimate the precision matrices, as predicted by theory. Tensor Methods Tensors are generalizations of matrices that let you look beyond pairwise relationships to higher-dimensional models (a matrix is a second-order tensor). For instance, one can examine patterns between any three (or more) dimensions in data sets. In a text mining application, this leads to models that incorporate the co-occurrence of three or more words, and in social networks, you can use tensors to encode arbitrary degrees of influence (e.g., ‘friend of friend of friend’ of a user). Tensors, as generalizations of vectors and matrices, have become increasingly popular in different areas of machine learning and data mining, where they are employed to approach a diverse number of difficult learning and analysis tasks. Prominent examples include learning on multi-relational data and large-scale knowledge bases, recommendation systems, computer vision, mining boolean data, neuroimaging or the analysis of time-varying networks. The success of tensors methods is strongly related to their ability to efficiently model, analyse and predict data with multiple modalities. To address specific challenges and problems, a variety of methods has been developed in different fields of application. http://…ce-tensor-libraries-for-data-science.html http://…/39352 Tensor Monte Carlo Multi-sample objectives improve over single-sample estimates by giving tighter variational bounds and more accurate estimates of posterior uncertainty. However, these multi-sample techniques scale poorly, in the sense that the number of samples required to maintain the same quality of posterior approximation scales exponentially in the number of latent dimensions. One approach to addressing these issues is sequential Monte Carlo (SMC). However for many problems SMC is prohibitively slow because the resampling steps imposes an inherently sequential structure on the computation, which is difficult to effectively parallelise on GPU hardware. We developed tensor Monte-Carlo to address these issues. In particular, whereas the usual multi-sample objective draws $K$ samples from a joint distribution over all latent variables, we draw $K$ samples for each of the $n$ individual latent variables, and form our bound by averaging over all $K^n$ combinations of samples from each individual latent. While this sum over exponentially many terms might seem to be intractable, in many cases it can be efficiently computed by exploiting conditional independence structure. In particular, we generalise and simplify classical algorithms such as message passing by noting that these sums can be computed can be written in an extremely simple, general form: a series of tensor inner-products which can be depicted graphically as reductions of a factor graph. As such, we can straightforwardly combine summation over discrete variables with importance sampling over importance sampling over continuous variables. Tensor Network(TN) The harnessing of modern computational abilities for many-body wave-function representations is naturally placed as a prominent avenue in contemporary condensed matter physics. Specifically, highly expressive computational schemes that are able to efficiently represent the entanglement properties of many-particle systems are of interest. In the seemingly unrelated field of machine learning, deep network architectures have exhibited an unprecedented ability to tractably encompass the dependencies characterizing hard learning tasks such as image classification. However, key questions regarding deep learning architecture design still have no adequate theoretical answers. In this paper, we establish a Tensor Network (TN) based common language between the two disciplines, which allows us to offer bidirectional contributions. By showing that many-body wave-functions are structurally equivalent to mappings of ConvACs and RACs, we construct their TN equivalents, and suggest quantum entanglement measures as natural quantifiers of dependencies in such networks. Accordingly, we propose a novel entanglement based deep learning design scheme. In the other direction, we identify that an inherent re-use of information in state-of-the-art deep learning architectures is a key trait that distinguishes them from TNs. We suggest a new TN manifestation of information re-use, which enables TN constructs of powerful architectures such as deep recurrent networks and overlapping convolutional networks. This allows us to theoretically demonstrate that the entanglement scaling supported by these architectures can surpass that of commonly used TNs in 1D, and can support volume law entanglement in 2D polynomially more efficiently than RBMs. We thus provide theoretical motivation to shift trending neural-network based wave-function representations closer to state-of-the-art deep learning architectures. Tensor Network Language Model We propose a new statistical model suitable for machine learning tasks of systems with long distance correlations such as human languages. The model is based on directed acyclic graph decorated by multi-linear tensor maps in the vertices and vector spaces in the edges, called tensor network. Such tensor networks have been previously employed for effective numerical computation of the renormalization group flow on the space of effective quantum field theories and lattice models of statistical mechanics. We provide explicit algebro-geometric analysis of the parameter moduli space for tree graphs, discuss model properties and applications such as statistical translation. Tensor Neural Network(t-NN) We propose a tensor neural network ($t$-NN) framework that offers an exciting new paradigm for designing neural networks with multidimensional (tensor) data. Our network architecture is based on the $t$-product (Kilmer and Martin, 2011), an algebraic formulation to multiply tensors via circulant convolution. In this $t$-product algebra, we interpret tensors as $t$-linear operators analogous to matrices as linear operators, and hence our framework inherits mimetic matrix properties. To exemplify the elegant, matrix-mimetic algebraic structure of our $t$-NNs, we expand on recent work (Haber and Ruthotto, 2017) which interprets deep neural networks as discretizations of non-linear differential equations and introduces stable neural networks which promote superior generalization. Motivated by this dynamic framework, we introduce a stable $t$-NN which facilitates more rapid learning because of its reduced, more powerful parameterization. Through our high-dimensional design, we create a more compact parameter space and extract multidimensional correlations otherwise latent in traditional algorithms. We further generalize our $t$-NN framework to a family of tensor-tensor products (Kernfeld, Kilmer, and Aeron, 2015) which still induce a matrix-mimetic algebraic structure. Through numerical experiments on the MNIST and CIFAR-10 datasets, we demonstrate the more powerful parameterizations and improved generalizability of stable $t$-NNs. Tensor Product Generation Network(TPGN) We present a new tensor product generation network (TPGN) that generates natural language descriptions for images. The model has a novel architecture that instantiates a general framework for encoding and processing symbolic structure through neural network computation. This framework is built on Tensor Product Representations (TPRs). We evaluated the proposed TPGN on the MS COCO image captioning task. The experimental results show that the TPGN outperforms the LSTM based state-of-the-art baseline with a significant margin. Further, we show that our caption generation model can be interpreted as generating sequences of grammatical categories and retrieving words by their categories from a plan encoded as a distributed representation. Tensor Rank Decomposition In multilinear algebra, the tensor rank decomposition or canonical polyadic decomposition (CPD) may be regarded as a generalization of the matrix singular value decomposition (SVD) to tensors, which has found application in statistics, signal processing, psychometrics, linguistics and chemometrics. It was introduced by Hitchcock in 1927 and later rediscovered several times, notably in psychometrics. For this reason, the tensor rank decomposition is sometimes historically referred to as PARAFAC or CANDECOMP. Tensor Regression Networks(TRN) To date, most convolutional neural network architectures output predictions by flattening 3rd-order activation tensors, and applying fully-connected output layers. This approach has two drawbacks: (i) we lose rich, multi-modal structure during the flattening process and (ii) fully-connected layers require many parameters. We present the first attempt to circumvent these issues by expressing the output of a neural network directly as the the result of a multi-linear mapping from an activation tensor to the output. By imposing low-rank constraints on the regression tensor, we can efficiently solve problems for which existing solutions are badly parametrized. Our proposed tensor regression layer replaces flattening operations and fully-connected layers by leveraging multi-modal structure in the data and expressing the regression weights via a low rank tensor decomposition. Additionally, we combine tensor regression with tensor contraction to further increase efficiency. Augmenting the VGG and ResNet architectures, we demonstrate large reductions in the number of parameters with negligible impact on performance on the ImageNet dataset. Tensor Ring Long-Short Term Memory(TR-LSTM) Recurrent Neural Networks (RNNs) and their variants, such as Long-Short Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, have achieved promising performance in sequential data modeling. The hidden layers in RNNs can be regarded as the memory units, which are helpful in storing information in sequential contexts. However, when dealing with high dimensional input data, such as video and text, the input-to-hidden linear transformation in RNNs brings high memory usage and huge computational cost. This makes the training of RNNs unscalable and difficult. To address this challenge, we propose a novel compact LSTM model, named as TR-LSTM (tensor ring Long-Short Term Memory), by utilizing the low-rank tensor ring decomposition (TRD) to reformulate the input-to-hidden transformation. Compared with other tensor decomposition methods, TR-LSTM is more stable. In addition, TR-LSTM can complete an end-to-end training and also provide a fundamental building block for RNNs in handling large input data. Experiments on real-world action recognition datasets have demonstrated the promising performance of the proposed TR-LSTM compared with the tensor train LSTM and other state-of-the-art competitors. Tensor Ring Network(TR-Net) Deep neural networks have demonstrated state-of-the-art performance in a variety of real-world applications. In order to obtain performance gains, these networks have grown larger and deeper, containing millions or even billions of parameters and over a thousand layers. The trade-off is that these large architectures require an enormous amount of memory, storage, and computation, thus limiting their usability. Inspired by the recent tensor ring factorization, we introduce Tensor Ring Networks (TR-Nets), which significantly compress both the fully connected layers and the convolutional layers of deep neural networks. Our results show that our TR-Nets approach {is able to compress LeNet-5 by $11\times$ without losing accuracy}, and can compress the state-of-the-art Wide ResNet by $243\times$ with only 2.3\% degradation in {Cifar10 image classification}. Overall, this compression scheme shows promise in scientific computing and deep learning, especially for emerging resource-constrained devices such as smartphones, wearables, and IoT devices. Tensor Robust Principal Component Analysis(TRPCA) In this paper, we consider the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover the low-rank and sparse components from their sum. Our model is based on the recently proposed tensor-tensor product (or t-product) [13]. Induced by the t-product, we first rigorously deduce the tensor spectral norm, tensor nuclear norm, and tensor average rank, and show that the tensor nuclear norm is the convex envelope of the tensor average rank within the unit ball of the tensor spectral norm. These definitions, their relationships and properties are consistent with matrix cases. Equipped with the new tensor nuclear norm, we then solve the TRPCA problem by solving a convex program and provide the theoretical guarantee for the exact recovery. Our TRPCA model and recovery guarantee include matrix RPCA as a special case. Numerical experiments verify our results, and the applications to image recovery and background modeling problems demonstrate the effectiveness of our method. ➘ “Tensor-Based Robust Principal Component Analysis” Tensor Space Language Model(TSLM) In the literature, tensors have been effectively used for capturing the context information in language models. However, the existing methods usually adopt relatively-low order tensors, which have limited expressive power in modeling language. Developing a higher-order tensor representation is challenging, in terms of deriving an effective solution and showing its generality. In this paper, we propose a language model named Tensor Space Language Model (TSLM), by utilizing tensor networks and tensor decomposition. In TSLM, we build a high-dimensional semantic space constructed by the tensor product of word vectors. Theoretically, we prove that such tensor representation is a generalization of the n-gram language model. We further show that this high-order tensor representation can be decomposed to a recursive calculation of conditional probability for language modeling. The experimental results on Penn Tree Bank (PTB) dataset and WikiText benchmark demonstrate the effectiveness of TSLM. Tensor Switching Networks(TS) We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks. Tensor Train PCA(TT-PCA) Tensor train is a hierarchical tensor network structure that helps alleviate the curse of dimensionality by parameterizing large-scale multidimensional data via a set of network of low-rank tensors. Associated with such a construction is a notion of Tensor Train subspace and in this paper we propose a TT-PCA algorithm for estimating this structured subspace from the given data. By maintaining low rank tensor structure, TT-PCA is more robust to noise comparing with PCA or Tucker-PCA. This is borne out numerically by testing the proposed approach on the Extended YaleFace Dataset B. Tensor Train Rank Minimization Tensor train (TT) decomposition provides a space-efficient representation for higher-order tensors. Despite its advantage, we face two crucial limitations when we apply the TT decomposition to machine learning problems: the lack of statistical theory and of scalable algorithms. In this paper, we address the limitations. First, we introduce a convex relaxation of the TT decomposition problem and derive its error bound for the tensor completion task. Next, we develop an alternating optimization method with a randomization technique, in which the time complexity is as efficient as the space complexity is. In experiments, we numerically confirm the derived bounds and empirically demonstrate the performance of our method with a real higher-order tensor. Tensor2Tensor Deep Learning (DL) has enabled the rapid advancement of many useful technologies, such as machine translation, speech recognition and object detection. In the research community, one can find code open-sourced by the authors to help in replicating their results and further advancing deep learning. However, most of these DL systems use unique setups that require significant engineering effort and may only work for a specific problem or architecture, making it hard to run new experiments and compare the results. Today, we are happy to release Tensor2Tensor (T2T), an open-source system for training deep learning models in TensorFlow. T2T facilitates the creation of state-of-the art models for a wide variety of ML applications, such as translation, parsing, image captioning and more, enabling the exploration of various ideas much faster than previously possible. This release also includes a library of datasets and models, including the best models from a few recent papers (Attention Is All You Need, Depthwise Separable Convolutions for Neural Machine Translation and One Model to Learn Them All) to help kick-start your own DL research. Tensor-Based Robust Principal Component Analysis(Tensor-RPCA) This paper studies tensor-based Robust Principal Component Analysis (RPCA) using atomic-norm regularization. Given the superposition of a sparse and a low-rank tensor, we present conditions under which it is possible to exactly recover the sparse and low-rank components. Our results improve on existing performance guarantees for tensor-RPCA, including those for matrix RPCA. Our guarantees also show that atomic-norm regularization provides better recovery for tensor-structured data sets than other approaches based on matricization. In addition to these performance guarantees, we study a nonconvex formulation of the tensor atomic-norm and identify a class of local minima of this nonconvex program that are globally optimal. We demonstrate the strong performance of our approach in numerical experiments, where we show that our nonconvex model reliably recovers tensors with ranks larger than all of their side lengths, significantly outperforming other algorithms that require matricization. TensorFlow TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org. TensorFlow Agents We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel without interference of the global interpreter lock. As part of this project, we introduce BatchPPO, an efficient implementation of the proximal policy optimization algorithm. By open sourcing TensorFlow Agents, we hope to provide a flexible starting point for future projects that accelerates future research in the field. Tensorflow Boosted Trees(TFBT) TF Boosted Trees (TFBT) is a new open-sourced frame-work for the distributed training of gradient boosted trees. It is based on TensorFlow, and its distinguishing features include a novel architecture, automatic loss differentiation, layer-by-layer boosting that results in smaller ensembles and faster prediction, principled multi-class handling, and a number of regularization techniques to prevent overfitting. TensorFlow Data Validation(TFDV) TensorFlow Data Validation is a library for exploring and validating machine learning data. tf.DataValidation is designed to be highly scalable and to work well with TensorFlow and TensorFlow Extended (TFX). Hands on Tensorflow Data Validation TensorFlow Eager TensorFlow Eager is a multi-stage, Python-embedded domain-specific language for hardware-accelerated machine learning, suitable for both interactive research and production. TensorFlow, which TensorFlow Eager extends, requires users to represent computations as dataflow graphs; this permits compiler optimizations and simplifies deployment but hinders rapid prototyping and run-time dynamism. TensorFlow Eager eliminates these usability costs without sacrificing the benefits furnished by graphs: It provides an imperative front-end to TensorFlow that executes operations immediately and a JIT tracer that translates Python functions composed of TensorFlow operations into executable dataflow graphs. TensorFlow Eager thus offers a multi-stage programming model that makes it easy to interpolate between imperative and staged execution in a single package. TensorFlow Estimators We present a framework for specifying, training, evaluating, and deploying machine learning models. Our focus is on simplifying cutting edge machine learning for practitioners in order to bring such technologies into production. Recognizing the fast evolution of the field of deep learning, we make no attempt to capture the design space of all possible model architectures in a domain- specific language (DSL) or similar configuration language. We allow users to write code to define their models, but provide abstractions that guide develop- ers to write models in ways conducive to productionization. We also provide a unifying Estimator interface, making it possible to write downstream infrastructure (e.g. distributed training, hyperparameter tuning) independent of the model implementation. We balance the competing demands for flexibility and simplicity by offering APIs at different levels of abstraction, making common model architectures available out of the box, while providing a library of utilities designed to speed up experimentation with model architectures. To make out of the box models flexible and usable across a wide range of problems, these canned Estimators are parameterized not only over traditional hyperparameters, but also using feature columns, a declarative specification describing how to interpret input data. We discuss our experience in using this framework in re- search and production environments, and show the impact on code health, maintainability, and development speed. TensorFlow Extended(TFX) Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful or- chestration of many components|a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and nally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such or- chestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt. We present TensorFlow Extended (TFX), a TensorFlow- based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform con guration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions. We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis. TensorFlow Filesystem(TFFS) A funny way to access your tensorflow model’s tensors. Use this project to map your model into a filesystem. Then, access your tensors as if they were files, using your favorite UNIX commands. tffs is implemented using Filesystem in Userspace (FUSE). It requires tensorflow and fusepy to be installed. TensorFlow Hub TensorFlow Hub is a library to foster the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a TensorFlow graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning. Modules contain variables that have been pre-trained for a task using a large dataset. By reusing a module on a related task, you can: • train a model with a smaller dataset, • improve generalization, or • significantly speed up training. Tensorflow Planner(TF-Plan) In many real-world planning problems with factored, mixed discrete and continuous state and action spaces such as Reservoir Control, Heating Ventilation, and Air Conditioning, and Navigation domains, it is difficult to obtain a model of the complex nonlinear dynamics that govern state evolution. However, the ubiquity of modern sensors allows us to collect large quantities of data from each of these complex systems and build accurate, nonlinear deep neural network models of their state transitions. But there remains one major problem for the task of control — how can we plan with deep network learned transition models without resorting to Monte Carlo Tree Search and other black-box transition model techniques that ignore model structure and do not easily extend to mixed discrete and continuous domains? In this paper, we introduce two types of nonlinear planning methods that can leverage deep neural network learned transition models: Hybrid Deep MILP Planner (HD-MILP-Plan) and Tensorflow Planner (TF-Plan). In HD-MILP-Plan, we make the critical observation that the Rectified Linear Unit transfer function for deep networks not only allows faster convergence of model learning, but also permits a direct compilation of the deep network transition model to a Mixed-Integer Linear Program encoding. Further, we identify deep network specific optimizations for HD-MILP-Plan that improve performance over a base encoding and show that we can plan optimally with respect to the learned deep networks. In TF-Plan, we take advantage of the efficiency of auto-differentiation tools and GPU-based computation where we encode a subclass of purely continuous planning problems as Recurrent Neural Networks and directly optimize the actions through backpropagation. We compare both planners and show that TF-Plan is able to approximate the optimal plans found by HD-MILP-Plan in less computation time… TensorFlow Probability TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFlow ecosystem, TensorFlow Probability provides integration of probabilistic methods with deep networks, gradient-based inference via automatic differentiation, and scalability to large datasets and models via hardware acceleration (e.g., GPUs) and distributed computation. TensorFlow Ranking(TF-Ranking) TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform. It contains the following components: • Commonly used loss functions including pointwise, pairwise, and listwise losses. • Commonly used ranking metrics like Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG). • Multi-item (also known as groupwise) scoring functions. • LambdaLoss implementation for direct ranking metric optimization. • Unbiased Learning-to-Rank from biased feedback data. We envision that this library will provide a convenient open platform for hosting and advancing state-of-the-art ranking models based on deep learning techniques, and thus facilitate both academic research as well as industrial applications. TensorFlow.js A WebGL accelerated, browser based JavaScript library for training and deploying ML models. TensorForce Reinforcement learning approaches have long appealed to the data management community due to their ability to learn to control dynamic behavior from raw system performance. Recent successes in combining deep neural networks with reinforcement learning have sparked significant new interest in this domain. However, practical solutions remain elusive due to large training data requirements, algorithmic instability, and lack of standard tools. In this work, we introduce LIFT, an end-to-end software stack for applying deep reinforcement learning to data management tasks. While prior work has frequently explored applications in simulations, LIFT centers on utilizing human expertise to learn from demonstrations, thus lowering online training times. We further introduce TensorForce, a TensorFlow library for applied deep reinforcement learning exposing a unified declarative interface to common RL algorithms, thus providing a backend to LIFT. We demonstrate the utility of LIFT in two case studies in database compound indexing and resource management in stream processing. Results show LIFT controllers initialized from demonstrations can outperform human baselines and heuristics across latency metrics and space usage by up to 70%. Tensorial Mixture Models We introduce a generative model, we call Tensorial Mixture Models (TMMs) based on mixtures of basic component distributions over local structures (e.g. patches in an image) where the dependencies between the local-structures are represented by a ‘priors tensor’ holding the prior probabilities of assigning a component distribution to each local-structure. In their general form, TMMs are intractable as the prior tensor is typically of exponential size. However, when the priors tensor is decomposed it gives rise to an arithmetic circuit which in turn transforms the TMM into a Convolutional Arithmetic Circuit (ConvAC). A ConvAC corresponds to a shallow (single hidden layer) network when the priors tensor is decomposed by a CP (sum of rank-1) approach and corresponds to a deep network when the decomposition follows the Hierarchical Tucker (HT) model. The ConvAC representation of a TMM possesses several attractive properties. First, the inference is tractable and is implemented by a forward pass through a deep network. Second, the architectural design of the model follows the deep networks community design, i.e., the structure of TMMs is determined by just two easily understood factors: size of pooling windows and number of channels. Finally, we demonstrate the effectiveness of our model when tackling the problem of classification with missing data, leveraging TMMs unique ability of tractable marginalization which leads to optimal classifiers regardless of the missingness distribution. Tensorized LSTM Long Short-Term Memory (LSTM) is a popular approach to boosting the ability of Recurrent Neural Networks to store longer term temporal information. The capacity of an LSTM network can be increased by widening and adding layers. However, usually the former introduces additional parameters, while the latter increases the runtime. As an alternative we propose the Tensorized LSTM in which the hidden states are represented by tensors and updated via a cross-layer convolution. By increasing the tensor size, the network can be widened efficiently without additional parameters since the parameters are shared across different locations in the tensor; by delaying the output, the network can be deepened implicitly with little additional runtime since deep computations for each timestep are merged into temporal computations of the sequence. Experiments conducted on five challenging sequence learning tasks show the potential of the proposed model. TensorLayer Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others. Developing a deep learning system is arduous and complex, as it involves constructing neural network architectures, managing training/trained models, tuning optimization process, preprocessing and organizing data, etc. TensorLayer is a versatile Python library that aims at helping researchers and engineers efficiently develop deep learning systems. It offers rich abstractions for neural networks, model and data management, and parallel workflow mechanism. While boosting efficiency, TensorLayer maintains both performance and scalability. TensorLayer was released in September 2016 on GitHub, and has helped people from academia and industry develop real-world applications of deep learning. TensorLog We present an implementation of a probabilistic first-order logic called TensorLog, in which classes of logical queries are compiled into differentiable functions in a neural-network infrastructure such as Tensorflow or Theano. This leads to a close integration of probabilistic logical reasoning with deep-learning infrastructure: in particular, it enables high-performance deep learning frameworks to be used for tuning the parameters of a probabilistic logic. Experimental results show that TensorLog scales to problems involving hundreds of thousands of knowledge-base triples and tens of thousands of examples. TensorLy Tensor methods are gaining increasing traction in machine learning. However, there are scant to no resources available to perform tensor learning and decomposition in Python. To answer this need we developed TensorLy. TensorLy is a state of the art general purpose library for tensor learning. Written in Python, it aims at following the same standards adopted by the main projects of the Python scientific community and fully integrating with these. It allows for fast and straightforward tensor decomposition and learning and comes with exhaustive tests, thorough documentation and minimal dependencies. It can be easily extended and its BSD licence makes it suitable for both academic and commercial applications. TensorLy is available at https://…/tensorly. TensOrMachine Boolean tensor decomposition approximates data of multi-way binary relationships as product of interpretable low-rank binary factors, following the rules of Boolean algebra. Here, we present its first probabilistic treatment. We facilitate scalable sampling-based posterior inference by exploitation of the combinatorial structure of the factor conditionals. Maximum a posteriori decompositions feature higher accuracies than existing techniques throughout a wide range of simulated conditions. Moreover, the probabilistic approach facilitates the treatment of missing data and enables model selection with much greater accuracy. We investigate three real-world data-sets. First, temporal interaction networks in a hospital ward and behavioural data of university students demonstrate the inference of instructive latent patterns. Next, we decompose a tensor with more than 10 billion data points, indicating relations of gene expression in cancer patients. Not only does this demonstrate scalability, it also provides an entirely novel perspective on relational properties of continuous data and, in the present example, on the molecular heterogeneity of cancer. Our implementation is available on GitHub: https://…/LogicalFactorisationMachines. TensorNetwork TensorNetwork is an open source library for implementing tensor network algorithms. Tensor networks are sparse data structures originally designed for simulating quantum many-body physics, but are currently also applied in a number of other research areas, including machine learning. We demonstrate the use of the API with applications both physics and machine learning, with details appearing in companion papers. TensorSCONE Machine learning has become a critical component of modern data-driven online services. Typically, the training phase of machine learning techniques requires to process large-scale datasets which may contain private and sensitive information of customers. This imposes significant security risks since modern online services rely on cloud computing to store and process the sensitive data. In the untrusted computing infrastructure, security is becoming a paramount concern since the customers need to trust the thirdparty cloud provider. Unfortunately, this trust has been violated multiple times in the past. To overcome the potential security risks in the cloud, we answer the following research question: how to enable secure executions of machine learning computations in the untrusted infrastructure? To achieve this goal, we propose a hardware-assisted approach based on Trusted Execution Environments (TEEs), specifically Intel SGX, to enable secure execution of the machine learning computations over the private and sensitive datasets. More specifically, we propose a generic and secure machine learning framework based on Tensorflow, which enables secure execution of existing applications on the commodity untrusted infrastructure. In particular, we have built our system called TensorSCONE from ground-up by integrating TensorFlow with SCONE, a shielded execution framework based on Intel SGX. The main challenge of this work is to overcome the architectural limitations of Intel SGX in the context of building a secure TensorFlow system. Our evaluation shows that we achieve reasonable performance overheads while providing strong security properties with low TCB. Tensor-Train factorized LSTM(TT-LSTM) In recent years, Long Short-Term Memory (LSTM) has become a popular choice for speech separation and speech enhancement task. The capability of LSTM network can be enhanced by widening and adding more layers. However, this would introduce millions of parameters in the network and also increase the requirement of computational resources. These limitations hinders the efficient implementation of RNN models in low-end devices such as mobile phones and embedded systems with limited memory. To overcome these issues, we proposed to use an efficient alternative approach of reducing parameters by representing the weight matrix parameters of LSTM based on Tensor-Train (TT) format. We called this Tensor-Train factorized LSTM as TT-LSTM model. Based on this TT-LSTM units, we proposed a deep TensorNet model for single-channel speech enhancement task. Experimental results in various test conditions and in terms of standard speech quality and intelligibility metrics, demonstrated that the proposed deep TT-LSTM based speech enhancement framework can achieve competitive performances with the state-of-the-art uncompressed RNN model, even though the proposed model architecture is orders of magnitude less complex. Tensor-Train RNN(TT-RNN) We present Tensor-Train RNN (TT-RNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics. Long-term forecasting in such systems is highly challenging, since there exist long-term temporal dependencies, higher-order correlations and sensitivity to error propagation. Our proposed tensor recurrent architecture addresses these issues by learning the nonlinear dynamics directly using higher order moments and high-order state transition functions. Furthermore, we decompose the higher-order structure using the tensor-train (TT) decomposition to reduce the number of parameters while preserving the model performance. We theoretically establish the approximation properties of Tensor-Train RNNs for general sequence inputs, and such guarantees are not available for usual RNNs. We also demonstrate significant long-term prediction improvements over general RNN and LSTM architectures on a range of simulated environments with nonlinear dynamics, as well on real-world climate and traffic data. TensorTuner TensorFlow is a popular deep learning framework used by data scientists to solve a wide-range of machine learning and deep learning problems such as image classification and speech recognition. It also operates at a large scale and in heterogeneous environments — it allows users to train neural network models or deploy them for inference using GPUs, CPUs and deep learning specific custom-designed hardware such as TPUs. Even though TensorFlow supports a variety of optimized backends, realizing the best performance using a backend may require additional efforts. For instance, getting the best performance from a CPU backend requires careful tuning of its threading model. Unfortunately, the best tuning approach used today is manual, tedious, time-consuming, and, more importantly, may not guarantee the best performance. In this paper, we develop an automatic approach, called TensorTuner, to search for optimal parameter settings of TensorFlow’s threading model for CPU backends. We evaluate TensorTuner on both Eigen and Intel’s MKL CPU backends using a set of neural networks from TensorFlow’s benchmarking suite. Our evaluation results demonstrate that the parameter settings found by TensorTuner produce 2% to 123% performance improvement for the Eigen CPU backend and 1.5% to 28% performance improvement for the MKL CPU backend over the performance obtained using their best-known parameter settings. This highlights the fact that the default parameter settings in Eigen CPU backend are not the ideal settings; and even for a carefully hand-tuned MKL backend, the settings may be sub-optimal. Our evaluations also revealed that TensorTuner is efficient at finding the optimal settings — it is able to converge to the optimal settings quickly by pruning more than 90% of the parameter search space. Tentacular AI We briefly introduce herein a new form of distributed, multi-agent artificial intelligence, which we refer to as ‘tentacular.’ Tentacular AI is distinguished by six attributes, which among other things entail a capacity for reasoning and planning based in highly expressive calculi (logics), and which enlists subsidiary agents across distances circumscribed only by the reach of one or more given networks. Term Document Matrix A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. There are various schemes for determining the value that each entry in the matrix should take. One such scheme is tf-idf. They are useful in the field of natural language processing. Term Frequency – Inverse Document Frequency(TF-IDF,TFIDF) tf-idf, short for term frequency-inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others. Variations of the tf-idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document’s relevance given a user query. tf-idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification. One of the simplest ranking functions is computed by summing the tf-idf for each query term; many more sophisticated ranking functions are variants of this simple model. Term Frequency Difference and Category Ratio Based Feature Selection(TFDCR) Communication through e-mails remains to be highly formalized, conventional and indispensable method for the exchange of information over the Internet. An ever-increasing ratio and adversary nature of spam e-mails have posed a great many challenges such as uneven class distribution, unequal error cost, frequent change of content and personalized context-sensitive discrimination. In this research, we propose a novel and distinctive approach to develop an incremental personalized e-mail spam filter. The proposed work is described using three significant contributions. First, we applied a novel term frequency difference and category ratio based feature selection function TFDCR to select the most discriminating features irrespective of the number of samples in each class. Second, an incremental learning model is used which enables the classifier to update the discriminant function dynamically. Third, a heuristic function called selectionRankWeight is introduced to upgrade the existing feature set that determines new features carrying strong discriminating ability from an incoming set of e-mails. Three public e-mail datasets possessing different characteristics are used to evaluate the filter performance. Experiments are conducted to compare the feature selection efficiency of TFDCR and to observe the filter performance under both the batch and the incremental learning mode. The results demonstrate the superiority of TFDCR as the most effective f eature selection function. The incremental learning model incorporating dynamic feature update function overcomes the problem of drifting concepts. The proposed filter validates its efficiency and feasibility by substantially improving the classification accuracy and reducing the false positive error of misclassifying legitimate e-mail as spam. Termbase Exchange TermBase eXchange (TBX) ist eine XML-basierte Auszeichnungssprache für den Austausch von Terminologiedaten, die meist in Terminologiedatenbanken verwaltet werden. Anwendungen, die dieses Format unterstützen, können Terminologiebestände untereinander austauschen und pflegen. Ursprünglich ein Standard der Localization Industry Standards Association (LISA), nahm sich die ISO des Standards an und überarbeitete und spezifizierte ihn in ISO 30042, welcher sich auf ISO 12620, ISO 12200 und ISO 16642 stützt. Terminology Extraction Terminology mining, term extraction, term recognition, or glossary extraction, is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus. In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topic-driven web crawlers, web services, recommender systems, etc. The development of terminology extraction is essential to the language industry. One of the first steps to model the knowledge domain of a virtual community is to collect a vocabulary of domain-relevant terms, constituting the linguistic surface manifestation of domain concepts. Several methods to automatically extract technical terms from domain-specific document warehouses have been described in the literature. Typically, approaches to automatic term extraction make use of linguistic processors (part of speech tagging, phrase chunking) to extract terminological candidates, i.e. syntactically plausible terminological noun phrases, NPs (e.g. compounds ‘credit card’, adjective-NPs ‘local tourist information office’, and prepositional-NPs ‘board of directors’ – in English, the first two constructs are the most frequent). Terminological entries are then filtered from the candidate list using statistical and machine learning methods. Once filtered, because of their low ambiguity and high specificity, these terms are particularly useful for conceptualizing a knowledge domain or for supporting the creation of a domain ontology. Furthermore, terminology extraction is a very useful starting point for semantic similarity, knowledge management, human translation and machine translation, etc. Termite Data-driven analysis is important in virtually every modern organization. Yet, most data is underutilized because it remains locked in silos inside of organizations; large organizations have thousands of databases, and billions of files that are not integrated together in a single, queryable repository. Despite 40+ years of continuous effort by the database community, data integration still remains an open challenge. In this paper, we advocate a different approach: rather than trying to infer a common schema, we aim to find another common representation for diverse, heterogeneous data. Specifically, we argue for an embedding (i.e., a vector space) in which all entities, rows, columns, and paragraphs are represented as points. In the embedding, the distance between points indicates their degree of relatedness. We present Termite, a prototype we have built to learn the best embedding from the data. Because the best representation is learned, this allows Termite to avoid much of the human effort associated with traditional data integration tasks. On top of Termite, we have implemented a Termite-Join operator, which allows people to identify related concepts, even when these are stored in databases with different schemas and in unstructured data such as text files, webpages, etc. Finally, we show preliminary evaluation results of our prototype via a user study, and describe a list of future directions we have identified. Ternary Neural Networks(TNN) The computation and storage requirements for Deep Neural Networks (DNNs) are usually high. This issue limit their deployability on ubiquitous computing devices such as smart phones or wearables. In this paper, we propose ternary neural networks (TNNs) in order to make deep learning more resource-efficient. We train these TNNs using a teacher-student approach. Using only ternary weights and ternary neurons, with a step activation function of two-thresholds, the student ternary network learns to mimic the behaviour of its teacher network. We propose a novel, layer-wise greedy methodology for training TNNs. During training, a ternary neural network inherently prunes the smaller weights by setting them to zero. This makes them even more compact thus more resource-friendly. We devise a purpose-built hardware design for TNNs and implement it on FPGA. The benchmark results with our purpose-built hardware running TNNs reveal that, with only 1.24 microjoules per image, we can achieve 97.76% accuracy with 5.37 microsecond latency and with a rate of 255K images per second on MNIST. Ternary Plot / Ternary Diagram A ternary plot, ternary graph, triangle plot, simplex plot, or de Finetti diagram is a barycentric plot on three variables which sum to a constant. It graphically depicts the ratios of the three variables as positions in an equilateral triangle. Ternary Ternary Residual Networks Sub-8-bit representation of DNNs incur some noticeable loss of accuracy despite rigorous (re)training at low-precision. Such loss of accuracy essentially makes them equivalent to a much shallower counterpart, diminishing the power of being deep networks. To address this problem of accuracy drop we introduce the notion of \textit{residual networks} where we add more low-precision edges to sensitive branches of the sub-8-bit network to compensate for the lost accuracy. Further, we present a perturbation theory to identify such sensitive edges. Aided by such an elegant trade-off between accuracy and model size, the 8-2 architecture (8-bit activations, ternary weights), enhanced by residual ternary edges, turns out to be sophisticated enough to achieve similar accuracy as 8-8 representation ($\sim 1\%$ drop from our FP-32 baseline), despite $\sim 1.6\times$ reduction in model size, $\sim 26\times$ reduction in number of multiplications , and potentially $\sim 2\times$ inference speed up comparing to 8-8 representation, on the state-of-the-art deep network ResNet-101 pre-trained on ImageNet dataset. Moreover, depending on the varying accuracy requirements in a dynamic environment, the deployed low-precision model can be upgraded/downgraded on-the-fly by partially enabling/disabling residual connections. For example, disabling the least important residual connections in the above enhanced network, the accuracy drop is $\sim 2\%$ (from our FP-32 baseline), despite $\sim 1.9\times$ reduction in model size, $\sim 32\times$ reduction in number of multiplications, and potentially $\sim 2.3\times$ inference speed up comparing to 8-8 representation. Finally, all the ternary connections are sparse in nature, and the residual ternary conversion can be done in a resource-constraint setting without any low-precision (re)training and without accessing the data. Ternary Weight Neural Networks(TWN) We introduce ternary weight networks (TWNs) – neural networks with weights constrained to +1, 0 and -1. The Euclidian distance between full (float or double) precision weights and the ternary weights along with a scaling factor is minimized. Besides, a threshold-based ternary function is optimized to get an approximated solution which can be fast and easily computed. TWNs have stronger expressive abilities than the recently proposed binary precision counterparts and are thus more effective than the latter. Meanwhile, TWNs achieve up to 16× or 32× model compression rate and need fewer multiplications compared with the full precision counterparts. Benchmarks on MNIST, CIFAR-10, and large scale ImageNet datasets show that the performance of TWNs is only slightly worse than the full precision counterparts but outperforms the analogous binary precision counterparts a lot. ➘ “Ternary Neural Networks” TernausNetV2 The most common approaches to instance segmentation are complex and use two-stage networks with object proposals, conditional random-fields, template matching or recurrent neural networks. In this work we present TernausNetV2 – a simple fully convolutional network that allows extracting objects from a high-resolution satellite imagery on an instance level. The network has popular encoder-decoder type of architecture with skip connections but has a few essential modifications that allows using for semantic as well as for instance segmentation tasks. This approach is universal and allows to extend any network that has been successfully applied for semantic segmentation to perform instance segmentation task. In addition, we generalize network encoder that was pre-trained for RGB images to use additional input channels. It makes possible to use transfer learning from visual to a wider spectral range. For DeepGlobe-CVPR 2018 building detection sub-challenge, based on public leaderboard score, our approach shows superior performance in comparison to other methods. The source code corresponding pre-trained weights are publicly available at https://…/TernausNetV2 TerpreT We study machine learning formulations of inductive program synthesis; that is, given input-output examples, synthesize source code that maps inputs to corresponding outputs. Our key contribution is TerpreT, a domain-specific language for expressing program synthesis problems. A TerpreT model is composed of a specification of a program representation and an interpreter that describes how programs map inputs to outputs. The inference task is to observe a set of input-output examples and infer the underlying program. From a TerpreT model we automatically perform inference using four different back-ends: gradient descent (thus each TerpreT model can be seen as defining a differentiable interpreter), linear program (LP) relaxations for graphical models, discrete satisfiability solving, and the Sketch program synthesis system. TerpreT has two main benefits. First, it enables rapid exploration of a range of domains, program representations, and interpreter models. Second, it separates the model specification from the inference algorithm, allowing proper comparisons between different approaches to inference. We illustrate the value of TerpreT by developing several interpreter models and performing an extensive empirical comparison between alternative inference algorithms on a variety of program models. To our knowledge, this is the first work to compare gradient-based search over program space to traditional search-based alternatives. Our key empirical finding is that constraint solvers dominate the gradient descent and LP-based formulations. This is a workshop summary of a longer report at arXiv:1608.04428 TEST Tracking developments in the highly dynamic data-technology landscape are vital to keeping up with novel technologies and tools, in the various areas of Artificial Intelligence (AI). However, It is difficult to keep track of all the relevant technology keywords. In this paper, we propose a novel system that addresses this problem. This tool is used to automatically detect the existence of new technologies and tools in text, and extract terms used to describe these new technologies. The extracted new terms can be logged as new AI technologies as they are found on-the-fly in the web. It can be subsequently classified into the relevant semantic labels and AI domains. Our proposed tool is based on a two-stage cascading model — the first stage classifies if the sentence contains a technology term or not; and the second stage identifies the technology keyword in the sentence. We obtain a competitive accuracy for both tasks of sentence classification and text identification. Test for Excess Significance(TES) In any series of typically-powered experiments, we expect some to fail to be non-significant due to sampling error, even if a true effect exists. If we see a series of five experiments, and they are all significant, one thinks that either they are either very high powered, the authors got lucky, or there are some nonsignificant studies missing. For many sets of studies, the first seems implausible because the effect sizes are small; the last is important, because if it is true then the picture we get of the results is misleading. http://…tistical-alchemy-and-test-for-excess.html http://…/TESsimulation.html Test Oracle In computing, software engineering and software testing a test oracle, or just oracle, is a mechanism for determining whether a test has passed or failed. The use of oracles involves comparing the output(s) of the system under test, for a given test-case input, to the output(s) that the oracle determines that product should have. The term ‘test oracle’ was first introduced in a paper by William E. Howden. Additional work on different kinds of oracles was explored by Elaine Weyuker. Oracles often operate separately from the system under test. However, method postconditions are part of the system under test, as automated oracles in design by contract models. Determining the correct output for a given input (and a set of program/system states) is known as the oracle problem or test oracle problem which is a much harder problem than it seems, and involves working with problems related to controllability and observability. Test Set A test set is a set of data used in various areas of information science to assess the strength and utility of a predictive relationship. Test sets are used in artificial intelligence, machine learning, genetic programming and statistics. In all these fields, a test set has much the same role. Test-Based Bayes Factor(TBF) TBFmultinomial Testing Concept Activation Vectors(TCAV) The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net’s internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result-for example, how sensitive a prediction of zebra is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application. Testing Framework for Learning-Based Android Malware Detection Systems(TLAMD) Many IoT (Internet of Things) systems run Android systems or Android-like systems. With the continuous development of machine learning algorithms, the learning-based Android malware detection system for IoT devices has gradually increased. However, these learning-based detection models are often vulnerable to adversarial samples. An automated testing framework is needed to help these learning-based malware detection systems for IoT devices perform security analysis. The current methods of generating adversarial samples mostly require training parameters of models and most of the methods are aimed at image data. To solve this problem, we propose a \textbf{t}esting framework for \textbf{l}earning-based \textbf{A}ndroid \textbf{m}alware \textbf{d}etection systems (TLAMD) for IoT Devices. The key challenge is how to construct a suitable fitness function to generate an effective adversarial sample without affecting the features of the application. By introducing genetic algorithms and some technical improvements, our test framework can generate adversarial samples for the IoT Android Application with a success rate of nearly 100\% and can perform black-box testing on the system. Testing with Concept Activation Vectors(TCAV) The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net’s internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result–for example, how sensitive a prediction of ‘zebra’ is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application. TCAV: Interpretability Beyond Feature Attribution Tetris Inference efficiency is the predominant consideration in designing deep learning accelerators. Previous work mainly focuses on skipping zero values to deal with remarkable ineffectual computation, while zero bits in non-zero values, as another major source of ineffectual computation, is often ignored. The reason lies on the difficulty of extracting essential bits during operating multiply-and-accumulate (MAC) in the processing element. Based on the fact that zero bits occupy as high as 68.9% fraction in the overall weights of modern deep convolutional neural network models, this paper firstly proposes a weight kneading technique that could eliminate ineffectual computation caused by either zero value weights or zero bits in non-zero weights, simultaneously. Besides, a split-and-accumulate (SAC) computing pattern in replacement of conventional MAC, as well as the corresponding hardware accelerator design called Tetris are proposed to support weight kneading at the hardware level. Experimental results prove that Tetris could speed up inference up to 1.50x, and improve power efficiency up to 5.33x compared with the state-of-the-art baselines. Tex2Shape We present a simple yet effective method to infer detailed full human body shape from only a single photograph. Our model can infer full-body shape including face, hair, and clothing including wrinkles at interactive frame-rates. Results feature details even on parts that are occluded in the input image. Our main idea is to turn shape regression into an aligned image-to-image translation problem. The input to our method is a partial texture map of the visible region obtained from off-the-shelf methods. From a partial texture, we estimate detailed normal and vector displacement maps, which can be applied to a low-resolution smooth body model to add detail and clothing. Despite being trained purely with synthetic data, our model generalizes well to real-world photographs. Numerous results demonstrate the versatility and robustness of our method. t-Exponential Memory Network Recent advances in deep learning have brought to the fore models that can make multiple computational steps in the service of completing a task; these are capable of describ- ing long-term dependencies in sequential data. Novel recurrent attention models over possibly large external memory modules constitute the core mechanisms that enable these capabilities. Our work addresses learning subtler and more complex underlying temporal dynamics in language modeling tasks that deal with sparse sequential data. To this end, we improve upon these recent advances, by adopting concepts from the field of Bayesian statistics, namely variational inference. Our proposed approach consists in treating the network parameters as latent variables with a prior distribution imposed over them. Our statistical assumptions go beyond the standard practice of postulating Gaussian priors. Indeed, to allow for handling outliers, which are prevalent in long observed sequences of multivariate data, multivariate t-exponential distributions are imposed. On this basis, we proceed to infer corresponding posteriors; these can be used for inference and prediction at test time, in a way that accounts for the uncertainty in the available sparse training data. Specifically, to allow for our approach to best exploit the merits of the t-exponential family, our method considers a new t-divergence measure, which generalizes the concept of the Kullback-Leibler divergence. We perform an extensive experimental evaluation of our approach, using challenging language modeling benchmarks, and illustrate its superiority over existing state-of-the-art techniques. Text Data Processing(TDP) Text Graph Convolutional Network(Text GCN) Text Classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (e.g., convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents. Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification. Text Infilling Recent years have seen remarkable progress of text generation in different contexts, such as the most common setting of generating text from scratch, and the emerging paradigm of retrieval-and-rewriting. Text infilling, which fills missing text portions of a sentence or paragraph, is also of numerous use in real life, yet is under-explored. Previous work has focused on restricted settings by either assuming single word per missing portion or limiting to a single missing portion to the end of the text. This paper studies the general task of text infilling, where the input text can have an arbitrary number of portions to be filled, each of which may require an arbitrary unknown number of tokens. We study various approaches for the task, including a self-attention model with segment-aware position encoding and bidirectional context modeling. We create extensive supervised data by masking out text with varying strategies. Experiments show the self-attention model greatly outperforms others, creating a strong baseline for future research. Text Mining Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. ‘High quality’ in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities). Text Morphing In this paper, we introduce a novel natural language generation task, termed as text morphing, which targets at generating the intermediate sentences that are fluency and smooth with the two input sentences. We propose the Morphing Networks consisting of the editing vector generation networks and the sentence editing networks which are trained jointly. Specifically, the editing vectors are generated with a recurrent neural networks model from the lexical gap between the source sentence and the target sentence. Then the sentence editing networks iteratively generate new sentences with the current editing vector and the sentence generated in the previous step. We conduct experiments with 10 million text morphing sequences which are extracted from the Yelp review dataset. Experiment results show that the proposed method outperforms baselines on the text morphing task. We also discuss directions and opportunities for future research of text morphing. Text-Adaptive Generative Adversarial Network(TAGAN) This paper addresses the problem of manipulating images using natural language description. Our task aims to semantically modify visual attributes of an object in an image according to the text describing the new visual appearance. Although existing methods synthesize images having new attributes, they do not fully preserve text-irrelevant contents of the original image. In this paper, we propose the text-adaptive generative adversarial network (TAGAN) to generate semantically manipulated images while preserving text-irrelevant contents. The key to our method is the text-adaptive discriminator that creates word-level local discriminators according to input text to classify fine-grained attributes independently. With this discriminator, the generator learns to generate images where only regions that correspond to the given text are modified. Experimental results show that our method outperforms existing methods on CUB and Oxford-102 datasets, and our results were mostly preferred on a user study. Extensive analysis shows that our method is able to effectively disentangle visual attributes and produce pleasing outputs. TextBoxes This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression. TextBoxes outperforms competing methods in terms of text localization accuracy and is much faster, taking only 0.09s per image in a fast implementation. Furthermore, combined with a text recognizer, TextBoxes significantly outperforms state-of-the-art approaches on word spotting and end-to-end text recognition tasks. TextCohesion In this paper, we propose a pixel-wise detector named TextCohesion for scene text detection especially for those with arbitrary shapes. TextChohesion splits a text instance into 5 key components: a Text Skeleton, and four Directional pixel Regions. These components are easy to handle rather than directly control the entire text instance. We also introduce a confidence scoring mechanism to filter out the characters that are similar to texts. Our method can integrate text contexts intensively even grasp clues when it is very complex background. Experiments on challenging benchmarks demonstrate that our TextCohesion clearly outperform state-of-the-art methods and it achieves an F-measure of 84.6 and 86.3 on Total-Text and SCUT-CTW1500 respectively. TextComplexityDE This paper presents TextComplexityDE, a dataset consisting of 1000 sentences in German language taken from 23 Wikipedia articles in 3 different article-genres to be used for developing text-complexity predictor models and automatic text simplification in German language. The dataset includes subjective assessment of different text-complexity aspects provided by German learners in level A and B. In addition, it contains manual simplification of 250 of those sentences provided by native speakers and subjective assessment of the simplified sentences by participants from the target group. The subjective ratings were collected using both laboratory studies and crowdsourcing approach. TextContourNet We study the problem of extracting text instance contour information from images and use it to assist scene text detection. We propose a novel and effective framework for this and experimentally demonstrate that: (1) A CNN that can be effectively used to extract instance-level text contour from natural images. (2) The extracted contour information can be used for better scene text detection. We propose two ways for learning the contour task together with the scene text detection: (1) as an auxiliary task and (2) as multi-task cascade. Extensive experiments with different benchmark datasets demonstrate that both designs improve the performance of a state-of-the-art scene text detector and that a multi-task cascade design achieves the best performance. Text-Driven Graph Embedding With Pairs Sampling(TGE-PS) In graphs with rich text information, constructing expressive graph representations requires incorporating textual information with structural information. Graph embedding models are becoming more and more popular in representing graphs, yet they are faced with two issues: sampling efficiency and text utilization. Through analyzing existing models, we find their training objectives are composed of pairwise proximities, and there are large amounts of redundant node pairs in Random Walk-based methods. Besides, inferring graph structures directly from texts (also known as zero-shot scenario) is a problem that requires higher text utilization. To solve these problems, we propose a novel Text-driven Graph Embedding with Pairs Sampling (TGE-PS) framework. TGE-PS uses Pairs Sampling (PS) to generate training samples which reduces ~99% training samples and is competitive compared to Random Walk. TGE-PS uses Text-driven Graph Embedding (TGE) which adopts word- and character-level embeddings to generate node embeddings. We evaluate TGE-PS on several real-world datasets, and experimental results demonstrate that TGE-PS produces state-of-the-art results in traditional and zero-shot link prediction tasks. TextEnt In this paper, we describe TextEnt, a neural network model that learns distributed representations of entities and documents directly from a knowledge base (KB). Given a document in a KB consisting of words and entity annotations, we train our model to predict the entity that the document describes and map the document and its target entity close to each other in a continuous vector space. Our model is trained using a large number of documents extracted from Wikipedia. The performance of the proposed model is evaluated using two tasks, namely fine-grained entity typing and multiclass text classification. The results demonstrate that our model achieves state-of-the-art performance on both tasks. The code and the trained representations are made available online for further academic research. TextNet Reading text from images remains challenging due to multi-orientation, perspective distortion and especially the curved nature of irregular text. Most of existing approaches attempt to solve the problem in two or multiple stages, which is considered to be the bottleneck to optimize the overall performance. To address this issue, we propose an end-to-end trainable network architecture, named TextNet, which is able to simultaneously localize and recognize irregular text from images. Specifically, we develop a scale-aware attention mechanism to learn multi-scale image features as a backbone network, sharing fully convolutional features and computation for localization and recognition. In text detection branch, we directly generate text proposals in quadrangles, covering oriented, perspective and curved text regions. To preserve text features for recognition, we introduce a perspective RoI transform layer, which can align quadrangle proposals into small feature maps. Furthermore, in order to extract effective features for recognition, we propose to encode the aligned RoI features by RNN into context information, combining spatial attention mechanism to generate text sequences. This overall pipeline is capable of handling both regular and irregular cases. Finally, text localization and recognition tasks can be jointly trained in an end-to-end fashion with designed multi-task loss. Experiments on standard benchmarks show that the proposed TextNet can achieve state-of-the-art performance, and outperform existing approaches on irregular datasets by a large margin. Textology A Textology is a graph of word clusters connected by co-occurrence relations. TextSnake Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks. However, limited by the representations (axis-aligned rectangles, rotated rectangles or quadrangles) adopted to describe text, existing methods may fall short when dealing with much more free-form text instances, such as curved text, which are actually very common in real-world scenarios. To tackle this problem, we propose a more flexible representation for scene text, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms. In TextSnake, a text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation. Such geometry attributes are estimated via a Fully Convolutional Network (FCN) model. In experiments, the text detector based on TextSnake achieves state-of-the-art or comparable performance on Total-Text and SCUT-CTW1500, the two newly published benchmarks with special emphasis on curved text in natural images, as well as the widely-used datasets ICDAR 2015 and MSRA-TD500. Specifically, TextSnake outperforms the baseline on Total-Text by more than 40% in F-measure. Text-to-Face(T2F) This project combines two of the recent architectures StackGAN and ProGAN for synthesizing faces from textual descriptions. The project uses Face2Text dataset which contains 400 facial images and textual captions for each of them. The data can be obtained by contacting either the RIVAL group or the authors of the aforementioned paper. Text-to-Speech-System(TTS) Speech Synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely ‘synthetic’ voice output. The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood clearly. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1990s. Overview of a typical TTS system Automatic announcement Menu 0:00 A synthetic voice announcing an arriving train in Sweden. Problems playing this file? See media help. Problems playing this file? See media help. A text-to-speech system (or ‘engine’) is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end – often referred to as the synthesizer – then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations), which is then imposed on the output speech. textTOvec We address two challenges of probabilistic topic modelling in order to better estimate the probability of a word in a given context, i.e., P(word|context): (1) No Language Structure in Context: Probabilistic topic models ignore word order by summarizing a given context as a ‘bag-of-word’ and consequently the semantics of words in the context is lost. The LSTM-LM learns a vector-space representation of each word by accounting for word order in local collocation patterns and models complex characteristics of language (e.g., syntax and semantics), while the TM simultaneously learns a latent representation from the entire document and discovers the underlying thematic structure. We unite two complementary paradigms of learning the meaning of word occurrences by combining a TM and a LM in a unified probabilistic framework, named as ctx-DocNADE. (2) Limited Context and/or Smaller training corpus of documents: In settings with a small number of word occurrences (i.e., lack of context) in short text or data sparsity in a corpus of few documents, the application of TMs is challenging. We address this challenge by incorporating external knowledge into neural autoregressive topic models via a language modelling approach: we use word embeddings as input of a LSTM-LM with the aim to improve the word-topic mapping on a smaller and/or short-text corpus. The proposed DocNADE extension is named as ctx-DocNADEe. We present novel neural autoregressive topic model variants coupled with neural LMs and embeddings priors that consistently outperform state-of-the-art generative TMs in terms of generalization (perplexity), interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains. Textual Grounding The author argues that users see texts as tools when they recognize the texts’ specific value and function within highly localized use settings. The author argues that users ‘ground’ their texts to local use settings by altering the ways in which the texts structure and represent information (e.g., underlining, annotation, and sketching). The author discusses three practices by which texts are grounded as tools in document reviews: mode shifting, layering, and marking. These practices reflect different ways by which users add, subtract, and restructure information in a text so that it is usable under very specific conditions. This article explores document review as a practice in which grounding is the object of discussion (how others use the reviewed documents) and a practice by which review is facilitated. These observations will be important for exploration of technology to support ‘grounding’ practices. Unsupervised Textual Grounding: Linking Words to Image Concepts Textual Membership Queries Human labeling of textual data can be very time-consuming and expensive, yet it is critical for the success of an automatic text classification system. In order to minimize human labeling efforts, we propose a novel active learning (AL) solution, that does not rely on existing sources of unlabeled data. It uses a small amount of labeled data as the core set for the synthesis of useful membership queries (MQs) – unlabeled instances synthesized by an algorithm for human labeling. Our solution uses modification operators, functions from the instance space to the instance space that change the input to some extent. We apply the operators on the core set, thus creating a set of new membership queries. Using this framework, we look at the instance space as a search space and apply search algorithms in order to create desirable MQs. We implement this framework in the textual domain. The implementation includes using methods such as WordNet and Word2vec, for replacing text fragments from a given sentence with semantically related ones. We test our framework on several text classification tasks and show improved classifier performance as more MQs are labeled and incorporated into the training set. To the best of our knowledge, this is the first work on membership queries in the textual domain. Texture Effects Transfer GAN(TET-GAN) Text effects transfer technology automatically makes the text dramatically more impressive. However, previous style transfer methods either study the model for general style, which cannot handle the highly-structured text effects along the glyph, or require manual design of subtle matching criteria for text effects. In this paper, we focus on the use of the powerful representation abilities of deep neural features for text effects transfer. For this purpose, we propose a novel Texture Effects Transfer GAN (TET-GAN), which consists of a stylization subnetwork and a destylization subnetwork. The key idea is to train our network to accomplish both the objective of style transfer and style removal, so that it can learn to disentangle and recombine the content and style features of text effects images. To support the training of our network, we propose a new text effects dataset with as much as 64 professionally designed styles on 837 characters. We show that the disentangled feature representations enable us to transfer or remove all these styles on arbitrary glyphs using one network. Furthermore, the flexible network design empowers TET-GAN to efficiently extend to a new text style via one-shot learning where only one example is required. We demonstrate the superiority of the proposed method in generating high-quality stylized text over the state-of-the-art methods. TextureNet We introduce, TextureNet, a neural network architecture designed to extract features from high-resolution signals associated with 3D surface meshes (e.g., color texture maps). The key idea is to utilize a 4-rotational symmetric (4-RoSy) field to define a domain for convolution on a surface. Though 4-RoSy fields have several properties favorable for convolution on surfaces (low distortion, few singularities, consistent parameterization, etc.), orientations are ambiguous up to 4-fold rotation at any sample point. So, we introduce a new convolutional operator invariant to the 4-RoSy ambiguity and use it in a network to extract features from high-resolution signals on geodesic neighborhoods of a surface. In comparison to alternatives, such as PointNet based methods which lack a notion of orientation, the coherent structure given by these neighborhoods results in significantly stronger features. As an example application, we demonstrate the benefits of our architecture for 3D semantic segmentation of textured 3D meshes. The results show that our method outperforms all existing methods on the basis of mean IoU by a significant margin in both geometry-only (6.4%) and RGB+Geometry (6.9-8.2%) settings. Textures.js SVG patterns for Data Visualization TextVQA Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today’s VQA models can not read! Our paper takes a first step towards addressing this problem. First, we introduce a new ‘TextVQA’ dataset to facilitate progress on this important problem. Existing datasets either have a small proportion of questions about text (e.g., the VQA dataset) or are too small (e.g., the VizWiz dataset). TextVQA contains 45,336 questions on 28,408 images that require reasoning about text to answer. Second, we introduce a novel model architecture that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the image. Consequently, we call our approach Look, Read, Reason & Answer (LoRRA). We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset. We find that the gap between human performance and machine performance is significantly larger on TextVQA than on VQA 2.0, suggesting that TextVQA is well-suited to benchmark progress along directions complementary to VQA 2.0. TF-Replicator We describe TF-Replicator, a framework for distributed machine learning designed for DeepMind researchers and implemented as an abstraction over TensorFlow. TF-Replicator simplifies writing data-parallel and model-parallel research code. The same models can be effortlessly deployed to different cluster architectures (i.e. one or many machines containing CPUs, GPUs or TPU accelerators) using synchronous or asynchronous training regimes. To demonstrate the generality and scalability of TF-Replicator, we implement and benchmark three very different models: (1) A ResNet-50 for ImageNet classification, (2) a SN-GAN for class-conditional ImageNet image generation, and (3) a D4PG reinforcement learning agent for continuous control. Our results show strong scalability performance without demanding any distributed systems expertise of the user. The TF-Replicator programming model will be open-sourced as part of TensorFlow 2.0 (see https://…/25 ). tGM-VAE Resting-state functional connectivity states are often identified as clusters of dynamic connectivity patterns. However, existing clustering approaches do not distinguish major states from rarely occurring minor states and hence are sensitive to noise. To address this issue, we propose to model major states using a non-linear generative process guided by a Gaussian-mixture distribution in a low-dimensional latent space, while separately modeling the connectivity patterns of minor states by a non-informative uniform distribution. We embed this truncated Gaussian-Mixture model in a Variational Autoencoder framework to obtain a general joint clustering and outlier detection approach, tGM-VAE. When applied to synthetic data with known ground-truth, tGM-VAE is more accurate in clustering connectivity patterns than existing approaches. On the rs-fMRI of 593 healthy adolescents, tGM-VAE identifies meaningful major connectivity states. The dwell time of these states significantly correlates with age. The House Of inteRactions(THOR) We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor.allenai.org. AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning, learning by interaction, planning, visual question answering, unsupervised representation learning, object detection and segmentation, and learning models of cognition. The goal of AI2-THOR is to facilitate building visually intelligent models and push the research forward in this domain. Theano Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features: · tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions. · transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only) · efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs. · speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny. · dynamic C code generation – Evaluate expressions faster. · extensive unit-testing and self-verification – Detect and diagnose many types of mistake. Theano has been powering large-scale computationally intensive scientific investigations since 2007. But it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal). http://…/theano_word_embeddings TheFragebogen Quality of Experience (QoE) typically involves conducting experiments in which stimuli are presented to participants and their judgments as well as behavioral data are collected. Nowadays, many experiments require software for the presentation of stimuli and the data collection from participants. While different software solutions exist, these are not tailored to conduct experiments on QoE. Moreover, replicating experiments or repeating the same experiment in different settings (e. g., laboratory vs. crowdsourcing) can further increase the software complexity. TheFragebogen is an open-source, versatile, extendable software framework for the implementation of questionnaires – especially for research on QoE. Implemented questionnaires can be presented with a state-of-the-art web browser to support a broad range of devices while the use of a web server being optional. Out-of-the-box, TheFragebogen provides graphical exact scales as well as free-hand input, the ability to collect behavioral data, and playback multimedia content. Thematic Map Thematic maps are geographical maps in which statistical data are visualized. A thematic map is a type of map especially designed to show a particular theme connected with a specific geographic area. These maps ‘can portray physical, social, political, cultural, economic, sociological, agricultural, or any other aspects of a city, state, region, nation, or continent’. tmap Theory of Evidence The theory of belief functions, also referred to as evidence theory or Dempster-Shafer theory (DST), is a general framework for reasoning with uncertainty, with understood connections to other frameworks such as probability, possibility and imprecise probability theories. First introduced by Arthur P. Dempster in the context of statistical inference, the theory was later developed by Glenn Shafer into a general framework for modeling epistemic uncertainty-a mathematical theory of evidence. The theory allows one to combine evidence from different sources and arrive at a degree of belief (represented by a mathematical object called belief function) that takes into account all the available evidence. In a narrow sense, the term Dempster-Shafer theory refers to the original conception of the theory by Dempster and Shafer. However, it is more common to use the term in the wider sense of the same general approach, as adapted to specific kinds of situations. In particular, many authors have proposed different rules for combining evidence, often with a view to handling conflicts in evidence better. The early contributions have also been the starting points of many important developments, including the transferable belief model and the theory of hints. TherML In this work we offer a framework for reasoning about a wide class of existing objectives in machine learning. We develop a formal correspondence between this work and thermodynamics and discuss its implications. Thermodynamic Analytics ToolkIt(TATi) We describe a TensorFlow-based library for posterior sampling and exploration in machine learning applications. TATi, the Thermodynamic Analytics ToolkIt, implements algorithms for 2nd order (underdamped) Langevin dynamics and Hamiltonian Monte Carlo (HMC). It also allows for rapid prototyping of new sampling methods in pure Python and supports an ensemble framework for generating multiple trajectories in parallel, a capability that is demonstrated by the implementation of a recently proposed ensemble preconditioning sampling procedure. In addition to explaining the architecture of TATi and its connections with the TensorFlow framework, this article contains preliminary numerical experiments to explore the efficiency of posterior sampling strategies in ML applications, in comparison to standard training strategies. We provide a glimpse of the potential of the new toolkit by studying (and visualizing) the loss landscape of a neural network applied to the MNIST hand-written digits data set. Theta Method Accurate and robust forecasting methods for univariate time series are very important when the objective is to produce estimates for a large number of time series. In this context, the Theta method called researchers attention due its performance in the largest up-to-date forecasting competition, the M3-Competition. Theta method proposes the decomposition of the deseasonalised data into two ‘theta lines’. The first theta line removes completely the curvatures of the data, thus being a good estimator of the long-term trend component. The second theta line doubles the curvatures of the series, as to better approximate the short-term behaviour. http://…/Theta.pdf forecTheta Thick Data Thick Data: ethnographic approaches that uncover the meaning behind Big Data visualization and analysis. Thick Data analysis primarily relies on human brain power to process a small “N” while big data analysis requires computational power (of course with humans writing the algorithms) to process a large “N”. Big Data reveals insights with a particular range of data points, while Thick Data reveals the social context of and connections between data points. Big Data delivers numbers; thick data delivers stories. Big data relies on machine learning; thick data relies on human learning. Thinging This paper examines conceptual models and their application to computational thinking. Computational thinking is a fundamental skill for everybody, not just for computer scientists. It has been promoted as skills that are as fundamental for all as numeracy and literacy. According to authorities in the field, the best way to characterize computational thinking is the way in which computer scientists think and the manner in which they reason how computer scientists think for the rest of us. Core concepts in computational thinking include such notions as algorithmic thinking, abstraction, decomposition, and generalization. This raises several issues and challenges that still need to be addressed, including the fundamental characteristics of computational thinking and its relationship with modeling patterns (e.g., object-oriented) that lead to programming/coding. Thinking pattern refers to recurring templates used by designers in thinking. In this paper, we propose a representation of thinking activity by adopting a thinking pattern called thinging that utilizes a diagrammatic technique called thinging machine (TM). We claim that thinging is a valuable process as a fundamental skill for everybody in computational thinking. The viability of such a proclamation is illustrated through examples and a case study. Thinging Machine A control model is typically classified into three forms: conceptual, mathematical and simulation (computer). This paper analyzes a conceptual modeling application with respect to an inventory management system. Today, most organizations utilize computer systems for inventory control that provide protection when interruptions or breakdowns occur within work processes. Modeling the inventory processes is an active area of research that utilizes many diagrammatic techniques, including data flow diagrams, Universal Modeling Language (UML) diagrams and Integration DEFinition (IDEF). We claim that current conceptual modeling frameworks lack uniform notions and have inability to appeal to designers and analysts. We propose modeling an inventory system as an abstract machine, called a Thinging Machine (TM), with five operations: creation, processing, receiving, releasing and transferring. The paper provides side-by-side contrasts of some existing examples of conceptual modeling methodologies that apply to TM. Additionally, TM is applied in a case study of an actual inventory system that uses IBM Maximo. The resulting conceptual depictions point to the viability of FM as a valuable tool for developing a high-level representation of inventory processes. Thingscoop Thingscoop is a command-line utility for analyzing videos semantically – that means searching, filtering, and describing videos based on objects, places, and other things that appear in them. When you first run thingscoop on a video file, it uses a convolutional neural network to create an ‘index’ of what’s contained in the every second of the input by repeatedly performing image classification on a frame-by-frame basis. Once an index for a video file has been created, you can search (i.e. get the start and end times of the regions in the video matching the query) and filter (i.e. create a supercut of the matching regions) the input using arbitrary queries. Thingscoop uses a very basic query language that lets you to compose queries that test for the presence or absence of labels with the logical operators ! (not), || (or) and && (and). For example, to search a video the presence of the sky and the absence of the ocean: thingscoop search ‘sky && !ocean’ . Right now two models are supported by thingscoop: vgg_imagenet uses the architecture described in ‘Very Deep Convolutional Networks for Large-Scale Image Recognition’ to recognize objects from the ImageNet database, and googlenet_places uses the architecture described in ‘Going Deeper with Convolutions’ to recognize settings and places from the MIT Places database. You can specify which model you’d like to use by running thingscoop models use , where is either vgg_imagenet or googlenet_places. More models will be added soon. Thingscoop is based on Caffe, an open-source deep learning framework. GitXiv Think Again Network This short paper introduces an abstraction called Think Again Networks (ThinkNet) which can be applied to any state-dependent function (such as a recurrent neural network). Here we show a simple application in Language Modeling which achieves state of the art perplexity on the Penn Treebank. Thompson Sampling(TS) We study the application of the Thompson Sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of $O(m\log T / \Delta_{\min})$ for TS under general CMAB, where $m$ is the number of arms, $T$ is the time horizon, and $\Delta_{\min}$ is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one can not use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms. Thouless-Anderson-Palmer(TAP,TAP MF) Thouless-Anderson-Palmer Gibbs Free Energy(TAP Gibbs Free Energy) The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization ofMinka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem. Threading Building Blocks(TBB) Threading Building Blocks (TBB) is a C++ template library developed by Intel for writing software programs that take advantage of multi-core processors. The library consists of data structures and algorithms that allow a programmer to avoid some complications arising from the use of native threading packages such as POSIX threads, Windows threads, or the portable Boost Threads in which individual threads of execution are created, synchronized, and terminated manually. Instead the library abstracts access to the multiple processors by allowing the operations to be treated as “tasks”, which are allocated to individual cores dynamically by the library’s run-time engine, and by automating efficient use of the CPU cache. A TBB program creates, synchronizes and destroys graphs of dependent tasks according to algorithms, i.e. high-level parallel programming paradigms (a.k.a. Algorithmic Skeletons). Tasks are then executed respecting graph dependencies. This approach groups TBB in a family of solutions for parallel programming aiming to decouple the programming from the particulars of the underlying machine. Three-Mode Principal Components Analysis In multivariate analysis the data have usually two way and/or two modes. This book treats prinicipal component analysis of data which can be characterised by three-ways and/or modes, like subjects by variables by conditions or occasions. The book extends the work on three-mode factor analysis by Tucker and the work on individual differences scaling by Carroll and colleagues. The many examples give a true feeling of the working of the techniques. tuckerR.mmgg Three-Player Generative Adversarial Network(Three-Player GAN) We propose a Three-Player Generative Adversarial Network to improve classification networks. In addition to the game played between the discriminator and generator, a competition is introduced between the generator and the classifier. The generator’s objective is to synthesize samples that are both realistic and hard to label for the classifier. Even though we make no assumptions on the type of augmentations to learn, we find that the model is able to synthesize realistically looking examples that are hard for the classification model. Furthermore, the classifier becomes more robust when trained on these difficult samples. The method is evaluated on a public dataset for traffic sign recognition. Three-Stage Subspace Clustering Framework(3S-SC) Subspace clustering (SC) refers to the problem of clustering high-dimensional data into a union of low-dimensional subspaces. Based on spectral clustering, state-of-the-art approaches solve SC problem within a two-stage framework. In the first stage, data representation techniques are applied to draw an affinity matrix from the original data. In the second stage, spectral clustering is directly applied to the affinity matrix so that data can be grouped into different subspaces. However, the affinity matrix obtained in the first stage usually fails to reveal the authentic relationship between data points, which leads to inaccurate clustering results. In this paper, we propose a universal Three-Stage Subspace Clustering framework (3S-SC). Graph-Based Transformation and Optimization (GBTO) is added between data representation and spectral clustering. The affinity matrix is obtained in the first stage, then it goes through the second stage, where the proposed GBTO is applied to generate a reconstructed affinity matrix with more authentic similarity between data points. Spectral clustering is applied after GBTO, which is the third stage. We verify our 3S-SC framework with GBTO through theoretical analysis. Experiments on both synthetic data and the real-world data sets of handwritten digits and human faces demonstrate the universality of the proposed 3S-SC framework in improving the connectivity and accuracy of SC methods based on $\ell_0$, $\ell_1$, $\ell_2$ or nuclear norm regularization. Three-Way Decisions-Based Conflict Analysis Model(TWDCAM) Three-way decision theory, which trisects the universe with less risks or costs, is considered as a powerful mathematical tool for handling uncertainty in incomplete and imprecise information tables, and provides an effective tool for conflict analysis decision making in real-time situations. In this paper, we propose the concepts of the agreement, disagreement and neutral subsets of a strategy with two evaluation functions, which establish the three-way decisions-based conflict analysis models(TWDCAMs) for trisecting the universe of agents, and employ a pair of two-way decisions models to interpret the mechanism of the three-way decision rules for an agent. Subsequently, we develop the concepts of the agreement, disagreement and neutral strategies of an agent group with two evaluation functions, which build the TWDCAMs for trisecting the universe of issues, and take a couple of two-way decisions models to explain the mechanism of the three-way decision rules for an issue. Finally, we reconstruct Fan, Qi and Wei’s conflict analysis models(FQWCAMs) and Sun, Ma and Zhao’s conflict analysis models(SMZCAMs) with two evaluation functions, and interpret FQWCAMs and SMZCAMs with a pair of two-day decisions models, which illustrates that FQWCAMs and SMZCAMs are special cases of TWDCAMs. Thresholded Adaptive Calibration Error(TACE) The reliability of a machine learning model’s confidence in its predictions is critical for highrisk applications. Calibration-the idea that a model’s predicted probabilities of outcomes reflect true probabilities of those outcomes-formalizes this notion. While analyzing the calibration of deep neural networks, we’ve identified core problems with the way calibration is currently measured. We design the Thresholded Adaptive Calibration Error (TACE) metric to resolve these pathologies and show that it outperforms other metrics, especially in settings where predictions beyond the maximum prediction that is chosen as the output class matter. There are many cases where what a practitioner cares about is the calibration of a specific prediction, and so we introduce a dynamic programming based Prediction Specific Calibration Error (PSCE) that smoothly considers the calibration of nearby predictions to give an estimate of the calibration error of a specific prediction. Thresholding Graph Bandits In this paper, we introduce a new online decision making paradigm that we call Thresholding Graph Bandits. The main goal is to efficiently identify a subset of arms in a multi-armed bandit problem whose means are above a specified threshold. While traditionally in such problems, the arms are assumed to be independent, in our paradigm we further suppose that we have access to the similarity between the arms in the form of a graph, allowing us gain information about the arm means in fewer samples. Such settings play a key role in a wide range of modern decision making problems where rapid decisions need to be made in spite of the large number of options available at each time. We present GrAPL, a novel algorithm for the thresholding graph bandit problem. We demonstrate theoretically that this algorithm is effective in taking advantage of the graph structure when available and the reward function homophily (that strongly connected arms have similar rewards) when favorable. We confirm these theoretical findings via experiments on both synthetic and real data. THresholding method based on ORder Statistic(THORS) In this paper, we propose an effective THresholding method based on ORder Statistic, called THORS, to convert an arbitrary scoring-type classifier, which can induce a continuous cumulative distribution function of the score, into a cost-sensitive one. The procedure, uses order statistic to find an optimal threshold for classification, requiring almost no knowledge of classifiers itself. Unlike common data-driven methods, we analytically show that THORS has theoretical guaranteed performance, theoretical bounds for the costs and lower time complexity. Coupled with empirical results on several real-world data sets, we argue that THORS is the preferred cost-sensitive technique. Thrill This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on shared-memory multi-core machines, the second part external memory suffix sorting using the induced sorting principle, and the third part distributed external memory suffix sorting with a new distributed algorithmic big data framework named Thrill. ThumbNet Although deep convolutional neural networks (CNNs) have achieved great success in the computer vision community, its real-world application is still impeded by its voracious demand of computational resources. Current works mostly seek to compress the network by reducing its parameters or parameter-incurred computation, neglecting the influence of the input image on the system complexity. Based on the fact that input images of a CNN contain much redundant spatial content, we propose in this paper an efficient and unified framework, dubbed as ThumbNet, to simultaneously accelerate and compress CNN models by enabling them to infer on one thumbnail image. We provide three effective strategies to train ThumbNet. In doing so, ThumbNet learns an inference network that performs equally well on small images as the original-input network on large images. With ThumbNet, not only do we obtain the thumbnail-input inference network that can drastically reduce computation and memory requirements, but also we obtain an image downscaler that can generate thumbnail images for generic classification tasks. Extensive experiments show the effectiveness of ThumbNet, and demonstrate that the thumbnail-input inference network learned by ThumbNet can adequately retain the accuracy of the original-input network even when the input images are downscaled 16 times. THUMT This paper introduces THUMT, an open-source toolkit for neural machine translation (NMT) developed by the Natural Language Processing Group at Tsinghua University. THUMT implements the standard attention-based encoder-decoder framework on top of Theano and supports three training criteria: maximum likelihood estimation, minimum risk training, and semi-supervised training. It features a visualization tool for displaying the relevance between hidden states in neural networks and contextual words, which helps to analyze the internal workings of NMT. Experiments on Chinese-English datasets show that THUMT using minimum risk training significantly outperforms GroundHog, a state-of-the-art toolkit for NMT. ThunderNet Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Code will be released for paper reproduction. Tibble Tibbles are a modern take on data frames. They keep the features that have stood the test of time, and drop the features that used to be convenient but are now frustrating (i.e. converting character vectors to factors). tibble,tibbletime tick tick is a statistical learning library for Python~3, with a particular emphasis on time-dependent models, such as point processes, and tools for generalized linear models and survival analysis. The core of the library is an optimization module providing model computational classes, solvers and proximal operators for regularization. tick relies on a C++ implementation and state-of-the-art optimization algorithms to provide very fast computations in a single node multi-core setting. Source code and documentation can be downloaded from https://…/tick Tidy Data Tidy datasets are easy to manipulate, model and visualise, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. Tiered Sampling We introduce Tiered Sampling, a novel technique for approximate counting sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size $M$, which can be magnitudes smaller than the number of edges. Our methods addresses the challenging task of counting sparse motifs – sub-graph patterns that have low probability to appear in a sample of $M$ edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count we partition the available memory to tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. We demonstrate the advantage of our method in the specific applications of counting sparse 4 and 5-cliques in massive graphs. We present a complete analytical analysis and extensive experimental results using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs. TigerGraph We present TigerGraph, a graph database system built from the ground up to support massively parallel computation of queries and analytics. TigerGraph’s high-level query language, GSQL, is designed for compatibility with SQL, while simultaneously allowing NoSQL programmers to continue thinking in Bulk-Synchronous Processing (BSP) terms and reap the benefits of high-level specification. GSQL is sufficiently high-level to allow declarative SQL-style programming, yet sufficiently expressive to concisely specify the sophisticated iterative algorithms required by modern graph analytics and traditionally coded in general-purpose programming languages like C++ and Java. We report very strong scale-up and scale-out performance over a benchmark we published on GitHub for full reproducibility. Tight Semi-Nonnegative Matrix Factorization The nonnegative matrix factorization is a widely used, flexible matrix decomposition, finding applications in biology, image and signal processing and information retrieval, among other areas. Here we present a related matrix factorization. A multi-objective optimization problem finds conical combinations of templates that approximate a given data matrix. The templates are chosen so that as far as possible only the initial data set can be represented this way. However, the templates are not required to be nonnegative nor convex combinations of the original data. Tikhonov Regularization Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of ill-posed problems. In statistics, the method is known as ridge regression, and with multiple independent discoveries, it is also variously known as the Tikhonov-Miller method, the Phillips-Twomey method, the constrained linear inversion method, and the method of linear regularization. It is related to the Levenberg-Marquardt algorithm for non-linear least-squares problems. ➚ “Ridge Regression” Tile Constraint Efficient explorative data analysis systems must take into account both what a user knows and wants to know. This paper proposes a principled framework for interactive visual exploration of relations in data, through views most informative given the user’s current knowledge and objectives. The user can input pre-existing knowledge of relations in the data and also formulate specific exploration interests, then taken into account in the exploration. The idea is to steer the exploration process towards the interests of the user, instead of showing uninteresting or already known relations. The user’s knowledge is modelled by a distribution over data sets parametrised by subsets of rows and columns of data, called tile constraints. We provide a computationally efficient implementation of this concept based on constrained randomisation. Furthermore, we describe a novel dimensionality reduction method for finding the views most informative to the user, which at the limit of no background knowledge and with generic objectives reduces to PCA. We show that the method is suitable for interactive use and robust to noise, outperforms standard projection pursuit visualisation methods, and gives understandable and useful results in analysis of real-world data. We have released an open-source implementation of the framework. Tile2Vec Remote sensing lacks methods like the word vector representations and pre-trained networks that significantly boost performance across a wide range of natural language and computer vision tasks. To fill this gap, we introduce Tile2Vec, an unsupervised representation learning algorithm that extends the distributional hypothesis from natural language — words appearing in similar contexts tend to have similar meanings — to geospatial data. We demonstrate empirically that Tile2Vec learns semantically meaningful representations on three datasets. Our learned representations significantly improve performance in downstream classification tasks and similarly to word vectors, visual analogies can be obtained by simple arithmetic in the latent space. Time Aligned Common and Individual Factor Analysis(TACIFA) Many modern data sets require inference methods that can estimate the shared and individual-specific components of variability in collections of matrices that change over time. Promising methods have been developed to analyze these types of data in static cases, but very few approaches are available for dynamic settings. To address this gap, we consider novel models and inference methods for pairs of matrices in which the columns correspond to multivariate observations at different time points. In order to characterize common and individual features, we propose a Bayesian dynamic factor modeling framework called Time Aligned Common and Individual Factor Analysis (TACIFA) that includes uncertainty in time alignment through an unknown warping function. We provide theoretical support for the proposed model, showing identifiability and posterior concentration. The structure enables efficient computation through a Hamiltonian Monte Carlo (HMC) algorithm. We show excellent performance in simulations, and illustrate the method through application to a social synchrony experiment. Time Oriented Language(TOL) TOL is the Time Oriented Language. It is a programming language dedicated to the world of statistics and focused on time series analysis and stochastic processes. It is a declarative language based on two key features: simple syntactical rules and powerful set of extensible data types and functions. TOL is callable by a small text console, but there is also a graphical interface to easily handle all language’s tools and functions, providing powerful graphical capacities. TOL is distributed under the GNU GPL license. tolBasis Time Perception Machine Numerous powerful point process models have been developed to understand temporal patterns in sequential data from fields such as health-care, electronic commerce, social networks, and natural disaster forecasting. In this paper, we develop novel models for learning the temporal distribution of human activities in streaming data (e.g., videos and person trajectories). We propose an integrated framework of neural networks and temporal point processes for predicting when the next activity will happen. Because point processes are limited to taking event frames as input, we propose a simple yet effective mechanism to extract features at frames of interest while also preserving the rich information in the remaining frames. We evaluate our model on two challenging datasets. The results show that our model outperforms traditional statistical point process approaches significantly, demonstrating its effectiveness in capturing the underlying temporal dynamics as well as the correlation within sequential activities. Furthermore, we also extend our model to a joint estimation framework for predicting the timing, spatial location, and category of the activity simultaneously, to answer the when, where, and what of activity prediction. Time Reversibility from Ordinal Patterns(TiROP) Time irreversibility is a common signature of nonlinear processes, and a fundamental property of non-equilibrium systems driven by non-conservative forces. A time series is said to be reversible if its statistical properties are invariant regardless of the direction of time. Here we propose the Time Reversibility from Ordinal Patterns method (TiROP) to assess time-reversibility from an observed finite time series. TiROP captures the information of scalar observations in time forward, as well as its time-reversed counterpart by means of ordinal patterns. The method compares both underlying information contents by quantifying its (dis)-similarity via Jensen-Shannon divergence. The statistic is contrasted with a population of divergences coming from a set of surrogates to unveil the temporal nature and its involved time scales. We tested TiROP in different synthetic and real, linear and non linear time series, juxtaposed with results from the classical Ramsey’s time reversibility test. Our results depict a novel, fast-computation, and fully data-driven methodology to assess time-reversibility at different time scales with no further assumptions over data. This approach adds new insights about the current non-linear analysis techniques, and also could shed light on determining new physiological biomarkers of high reliability and computational efficiency. Time Series Analysis / Time Series A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones Industrial Average and the annual flow volume of the Nile River at Aswan. Time series are very frequently plotted via line charts. Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, and communications engineering. Time Series Cointegrated System(TSCS) TSCS Time Series Data Compression and Abstraction(TSDCA) In the era of big data, practical applications in various domains continually generate large-scale time-series data. Among them, some data show significant or potential periodicity characteristics, such as meteorological and financial data. It is critical to efficiently identify the potential periodic patterns from massive time-series data and provide accurate predictions. In this paper, a Periodicity-based Parallel Time Series Prediction (PPTSP) algorithm for large-scale time-series data is proposed and implemented in the Apache Spark cloud computing environment. To effectively handle the massive historical datasets, a Time Series Data Compression and Abstraction (TSDCA) algorithm is presented, which can reduce the data scale as well as accurately extracting the characteristics. Based on this, we propose a Multi-layer Time Series Periodic Pattern Recognition (MTSPPR) algorithm using the Fourier Spectrum Analysis (FSA) method. In addition, a Periodicity-based Time Series Prediction (PTSP) algorithm is proposed. Data in the subsequent period are predicted based on all previous period models, in which a time attenuation factor is introduced to control the impact of different periods on the prediction results. Moreover, to improve the performance of the proposed algorithms, we propose a parallel solution on the Apache Spark platform, using the Streaming real-time computing module. To efficiently process the large-scale time-series datasets in distributed computing environments, Distributed Streams (DStreams) and Resilient Distributed Datasets (RDDs) are used to store and calculate these datasets. Extensive experimental results show that our PPTSP algorithm has significant advantages compared with other algorithms in terms of prediction accuracy and performance. Time Series Database(TSDB) A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range). In some fields these time series are called profiles, curves, or traces. A time series of stock prices might be called a price curve. A time series of energy consumption might be called a load profile. A log of temperature values over time might be called a temperature trace. Despite the disparate names, many of the same mathematical operations, queries, or database transactions are useful for analysing all of them. The implementation of a database that can correctly, reliably, and efficiently implement these operations must be specialized for time-series data. TSDBs are databases that are optimized for time series data. Software with complex logic or business rules and high transaction volume for time series data may not be practical with traditional relational database management systems. Flat file databases are not a viable option either, if the data and transaction volume reaches a maximum threshold determined by the capacity of individual servers (processing power and storage capacity). Queries for historical data, replete with time ranges and roll ups and arbitrary time zone conversions are difficult in a relational database. Compositions of those rules are even more difficult. This is a problem compounded by the free nature of relational systems themselves. Many relational systems are often not modelled correctly with respect to time series data. TSDBs on the other hand impose a model and this allows them to provide more features for doing so. Ideally, these repositories are often natively implemented using specialized database algorithms. However, it is possible to store time series as binary large objects (BLOBs) in a relational database or by using a VLDB approach coupled with a pure star schema. Efficiency is often improved if time is treated as a discrete quantity rather than as a continuous mathematical dimension. Database joins across multiple time series data sets is only practical when the time tag associated with each data entry spans the same set of discrete times for all data sets across which the join is performed. Time Series Momentum We document significant ‘time series momentum’ in equity index, currency, commodity, and bond futures for each of the 58 liquid instruments we consider. We find persistence in returns for one to 12 months that partially reverses over longer horizons, consistent with sentiment theories of initial under-reaction and delayed over-reaction. A diversified portfolio of time series momentum strategies across all asset classes delivers substantial abnormal returns with little exposure to standard asset pricing factors and performs best during extreme markets. Examining the trading activities of speculators and hedgers, we find that speculators profit from time series momentum at the expense of hedgers. Time Series Momentum (aka Trend-Following): A Good Time for a Refresh Enhancing Time Series Momentum Strategies Using Deep Neural Networks Time Series Trajectory Time Varying System ➘ “Time-Variant System” Time-Conditional Generative Adversarial Network(T-CGAN) In this paper we propose a data augmentation method for time series with irregular sampling, Time-Conditional Generative Adversarial Network (T-CGAN). Our approach is based on Conditional Generative Adversarial Networks (CGAN), where the generative step is implemented by a deconvolutional NN and the discriminative step by a convolutional NN. Both the generator and the discriminator are conditioned on the sampling timestamps, to learn the hidden relationship between data and timestamps, and consequently to generate new time series. We evaluate our model with synthetic and real-world datasets. For the synthetic data, we compare the performance of a classifier trained with T-CGAN-generated data, against the performance of the same classifier trained on the original data. Results show that classifiers trained on T-CGAN-generated data perform the same as classifiers trained on real data, even with very short time series and small training sets. For the real world datasets, we compare our method with other techniques of data augmentation for time series, such as time slicing and time warping, over a classification problem with unbalanced datasets. Results show that our method always outperforms the other approaches, both in case of regularly sampled and irregularly sampled time series. We achieve particularly good performance in case with a small training set and short, noisy, irregularly-sampled time series. Time-Convolution Layer(tConv) Automatic heart sound abnormality detection can play a vital role in the early diagnosis of heart diseases, particularly in low-resource settings. The state-of-the-art algorithms for this task utilize a set of Finite Impulse Response (FIR) band-pass filters as a front-end followed by a Convolutional Neural Network (CNN) model. In this work, we propound a novel CNN architecture that integrates the front-end bandpass filters within the network using time-convolution (tConv) layers, which enables the FIR filter-bank parameters to become learnable. Different initialization strategies for the learnable filters, including random parameters and a set of predefined FIR filter-bank coefficients, are examined. Using the proposed tConv layers, we add constraints to the learnable FIR filters to ensure linear and zero phase responses. Experimental evaluations are performed on a balanced 4-fold cross-validation task prepared using the PhysioNet/CinC 2016 dataset. Results demonstrate that the proposed models yield superior performance compared to the state-of-the-art system, while the linear phase FIR filterbank method provides an absolute improvement of 9.54% over the baseline in terms of an overall accuracy metric. Timed Discrete-Event System(TDES) Timed Discrete-Event Systems are Synchronous Product Structures Time-Domain Audio Separation Network(TasNet) Robust speech processing in multitalker acoustic environments requires automatic speech separation. While single-channel, speaker-independent speech separation methods have recently seen great progress, the accuracy, latency, and computational cost of speech separation remain insufficient. The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of spectrogram representations for speech separation, and the long latency in calculating the spectrogram. To address these shortcomings, we propose the time-domain audio separation network (TasNet), which is a deep learning autoencoder framework for time-domain speech separation. TasNet uses a convolutional encoder to create a representation of the signal that is optimized for extracting individual speakers. Speaker extraction is achieved by applying a weighting function (mask) to the encoder output. The modified encoder representation is then inverted to the sound waveform using a linear decoder. The masks are found using a temporal convolutional network consisting of dilated convolutions, which allow the network to model the long-term dependencies of the speech signal. This end-to-end speech separation algorithm significantly outperforms previous time-frequency methods in terms of separating speakers in mixed audio, even when compared to the separation accuracy achieved with the ideal time-frequency mask of the speakers. In addition, TasNet has a smaller model size and a shorter minimum latency, making it a suitable solution for both offline and real-time speech separation applications. This study therefore represents a major step toward actualizing speech separation for real-world speech processing technologies. Time-Lapse Mining We introduce an approach for synthesizing time-lapse videos of popular landmarks from large community photo collections. The approach is completely automated and leverages the vast quantity of photos available online. First, we cluster 86 million photos into landmarks and popular viewpoints. Then, we sort the photos by date and warp each photo onto a common viewpoint. Finally, we stabilize the appearance of the sequence to compensate for lighting effects and minimize flicker. Our resulting time-lapses show diverse changes in the world’s most popular sites, like glaciers shrinking, skyscrapers being constructed, and waterfalls changing course. Time-Series eXplanation(TSXplain) Neural networks (NN) are considered as black-boxes due to the lack of explainability and transparency of their decisions. This significantly hampers their deployment in environments where explainability is essential along with the accuracy of the system. Recently, significant efforts have been made for the interpretability of these deep networks with the aim to open up the black-box. However, most of these approaches are specifically developed for visual modalities. In addition, the interpretations provided by these systems require expert knowledge and understanding for intelligibility. This indicates a vital gap between the explainability provided by the systems and the novice user. To bridge this gap, we present a novel framework i.e. Time-Series eXplanation (TSXplain) system which produces a natural language based explanation of the decision taken by a NN. It uses the extracted statistical features to describe the decision of a NN, merging the deep learning world with that of statistics. The two-level explanation provides ample description of the decision made by the network to aid an expert as well as a novice user alike. Our survey and reliability assessment test confirm that the generated explanations are meaningful and correct. We believe that generating natural language based descriptions of the network’s decisions is a big step towards opening up the black-box. Time-to-Event Data Time-to-event data, also often referred to as survival data, arise when interest is focused on the time elapsing before an event is experienced. By events we mean occurrences that are of interest in scientific studies from various disciplines such as medicine, epidemiology, demography, biology, sociology, economics, engineering, et cetera. Examples of such events are: death, onset of infection, divorce, unemployment, and failure of a mechanical device. All of these may be subject to scientific interest where one tries to understand their cause or establish risk factors. flexsurvcure,goftte Time-Variant System A time-variant system is a system that is not time invariant (TIV). Roughly speaking, its output characteristics depend explicitly upon time. In other words, a system in which certain quantities governing the system’s behavior change with time, so that the system will respond differently to the same input at different times. Time-Varying Survivor Average Causal Effect(TV-SACE) In semicompeting risks problems, nonterminal time-to-event outcomes such as time to hospital readmission are subject to truncation by death. These settings are often modeled with illness-death models for the hazards of the terminal and nonterminal events, but evaluating causal treatment effects with hazard models is problematic due to conditioning on survival (a post-treatment outcome) that is embedded in the definition of a hazard. Extending an existing survivor average causal effect (SACE) estimand, we frame the evaluation of treatment effects in the context of semicompeting risks with principal stratification and introduce two new causal estimands: the time-varying survivor average causal effect (TV-SACE) and the restricted mean survivor average causal effect (RM-SACE). These principal causal effects are defined among units that would survive regardless of assigned treatment. We adopt a Bayesian estimation procedure that parameterizes illness-death models for both treatment arms. We outline a frailty specification that can accommodate within-person correlation between nonterminal and terminal event times, and we discuss potential avenues for adding model flexibility. The method is demonstrated in the context of hospital readmission among late-stage pancreatic cancer patients. Time-Warp-Invariant The literature postulates that the dynamic time warping (dtw) distance can cope with temporal variations but stores and processes time series in a form as if the dtw-distance cannot cope with such variations. To address this inconsistency, we first show that the dtw-distance is not warping-invariant-despite its name and contrary to its characterization in some publications. The lack of warping-invariance contributes to the inconsistency mentioned above and to a strange behavior. To eliminate these peculiarities, we convert the dtw-distance to a warping-invariant semi-metric, called time-warp-invariant (twi) distance. Empirical results suggest that the error rates of the twi and dtw nearest-neighbor classifier are practically equivalent in a Bayesian sense. However, the twi-distance requires less storage and computation time than the dtw-distance for a broad range of problems. These results challenge the current practice of applying the dtw-distance in nearest-neighbor classification and suggest the proposed twi-distance as a more efficient and consistent option. Time-Weighted Dynamic Time Warping(TWDTW) Dynamic time warping (DTW), which finds the minimum path by providing non-linear alignments between two time series, has been widely used as a distance measure for time series classification and clustering. However, DTW does not account for the relative importance regarding the phase difference between a reference point and a testing point. This may lead to misclassification especially in applications where the shape similarity between two sequences is a major consideration for an accurate recognition. Therefore, we propose a novel distance measure, called a weighted DTW (WDTW), which is a penalty-based DTW. Our approach penalizes points with higher phase difference between a reference point and a testing point in order to prevent minimum distance distortion caused by outliers. The rationale underlying the proposed distance measure is demonstrated with some illustrative examples. A new weight function, called the modified logistic weight function (MLWF), is also proposed to systematically assign weights as a function of the phase difference between a reference point and a testing point. By applying different weights to adjacent points, the proposed algorithm can enhance the detection of similarity between two time series. We show that some popular distance measures such as DTW and Euclidean distance are special cases of our proposed WDTW measure. We extend the proposed idea to other variants of DTW such as derivative dynamic time warping (DDTW) and propose the weighted version of DDTW. We have compared the performances of our proposed procedures with other popular approaches using public data sets available through the UCR Time Series Data Mining Archive for both time series classification and clustering problems. The experimental results indicate that the proposed approaches can achieve improved accuracy for time series classification and clustering problems. ➚ “Dynamic Time Warping” dtwSat TinBiNN Reduced-precision arithmetic improves the size, cost, power and performance of neural networks in digital logic. In convolutional neural networks, the use of 1b weights can achieve state-of-the-art error rates while eliminating multiplication, reducing storage and improving power efficiency. The BinaryConnect binary-weighted system, for example, achieves 9.9% error using floating-point activations on the CIFAR-10 dataset. In this paper, we introduce TinBiNN, a lightweight vector processor overlay for accelerating inference computations with 1b weights and 8b activations. The overlay is very small — it uses about 5,000 4-input LUTs and fits into a low cost iCE40 UltraPlus FPGA from Lattice Semiconductor. To show this can be useful, we build two embedded ‘person detector’ systems by shrinking the original BinaryConnect network. The first is a 10-category classifier with a 89% smaller network that runs in 1,315ms and achieves 13.6% error. The other is a 1-category classifier that is even smaller, runs in 195ms, and has only 0.4% error. In both classifiers, the error can be attributed entirely to training and not reduced precision. Tiny Deeply Supervised Object Detection(Tiny-DSOD) Object detection has made great progress in the past few years along with the development of deep learning. However, most current object detection methods are resource hungry, which hinders their wide deployment to many resource restricted usages such as usages on always-on devices, battery-powered low-end devices, etc. This paper considers the resource and accuracy trade-off for resource-restricted usages during designing the whole object detection framework. Based on the deeply supervised object detection (DSOD) framework, we propose Tiny-DSOD dedicating to resource-restricted usages. Tiny-DSOD introduces two innovative and ultra-efficient architecture blocks: depthwise dense block (DDB) based backbone and depthwise feature-pyramid-network (D-FPN) based front-end. We conduct extensive experiments on three famous benchmarks (PASCAL VOC 2007, KITTI, and COCO), and compare Tiny-DSOD to the state-of-the-art ultra-efficient object detection solutions such as Tiny-YOLO, MobileNet-SSD (v1 & v2), SqueezeDet, Pelee, etc. Results show that Tiny-DSOD outperforms these solutions in all the three metrics (parameter-size, FLOPs, accuracy) in each comparison. For instance, Tiny-DSOD achieves 72.1% mAP with only 0.95M parameters and 1.06B FLOPs, which is by far the state-of-the-arts result with such a low resource requirement. Tiny SSD Object detection is a major challenge in computer vision, involving both object classification and object localization within a scene. While deep neural networks have been shown in recent years to yield very powerful techniques for tackling the challenge of object detection, one of the biggest challenges with enabling such object detection networks for widespread deployment on embedded devices is high computational and memory requirements. Recently, there has been an increasing focus in exploring small deep neural network architectures for object detection that are more suitable for embedded devices, such as Tiny YOLO and SqueezeDet. Inspired by the efficiency of the Fire microarchitecture introduced in SqueezeNet and the object detection performance of the single-shot detection macroarchitecture introduced in SSD, this paper introduces Tiny SSD, a single-shot detection deep convolutional neural network for real-time embedded object detection that is composed of a highly optimized, non-uniform Fire sub-network stack and a non-uniform sub-network stack of highly optimized SSD-based auxiliary convolutional feature layers designed specifically to minimize model size while maintaining object detection performance. The resulting Tiny SSD possess a model size of 2.3MB (~26X smaller than Tiny YOLO) while still achieving an mAP of 61.3% on VOC 2007 (~4.2% higher than Tiny YOLO). These experimental results show that very small deep neural network architectures can be designed for real-time object detection that are well-suited for embedded scenarios. Tipping Point Analysis Tipping point analysis of electrical resistance data with early warning signals of failure for predictive maintenance Tiramisu This paper introduces Tiramisu, an optimization framework designed to generate efficient code for high-performance systems such as multicores, GPUs, FPGAs, distributed machines, or any combination of these. Tiramisu relies on a flexible representation based on the polyhedral model and introduces a novel four-level IR that allows full separation between algorithms, schedules, data-layouts and communication. This separation simplifies targeting multiple hardware architectures from the same algorithm. We evaluate Tiramisu by writing a set of linear algebra and DNN kernels and by integrating it as a pass in the Halide compiler. We show that Tiramisu extends Halide with many new capabilities, and that Tiramisu can generate efficient code for multicores, GPUs, FPGAs and distributed heterogeneous systems. The performance of code generated by the Tiramisu backends matches or exceeds hand-optimized reference implementations. For example, the multicore backend matches the highly optimized Intel MKL library on many kernels and shows speedups reaching 4x over the original Halide. T-Net Recent advances in meta-learning demonstrate that deep representations combined with the gradient descent method have sufficient capacity to approximate any learning algorithm. A promising approach is the model-agnostic meta-learning (MAML) which embeds gradient descent into the meta-learner. It optimizes for the initial parameters of the learner to warm-start the gradient descent updates, such that new tasks can be solved using a small number of examples. In this paper we elaborate the gradient-based meta-learning, developing two new schemes. First, we present a feedforward neural network, referred to as T-net, where the linear transformation between two adjacent layers is decomposed as T W such that W is learned by task-specific learners and the transformation T, which is shared across tasks, is meta-learned to speed up the convergence of gradient updates for task-specific learners. Second, we present MT-net where gradient updates in the T-net are guided by a binary mask M that is meta-learned, restricting the updates to be performed in a subspace. Empirical results demonstrate that our method is less sensitive to the choice of initial learning rates than existing meta-learning methods, and achieves the state-of-the-art or comparable performance on few-shot classification and regression tasks. Tobit Model The Tobit model is a statistical model proposed by James Tobin (1958) to describe the relationship between a non-negative dependent variable y i {\displaystyle y_{i}} y_{i} and an independent variable (or vector) x i {\displaystyle x_{i}} x_{i}.[1] The term Tobit was derived from Tobin’s name by truncating and adding -it by analogy with the probit model.[2] The Tobit model is distinct from the truncated regression model, which is in general different and requires a different estimator.[3] The model supposes that there is a latent (i.e. unobservable) variable y i * {\displaystyle y_{i}^{*}} y_i^*. This variable linearly depends on x i {\displaystyle x_{i}} x_{i} via a parameter (vector) ß {\displaystyle \beta } \beta which determines the relationship between the independent variable (or vector) x i {\displaystyle x_{i}} x_{i} and the latent variable y i * {\displaystyle y_{i}^{*}} y_i^* (just as in a linear model). In addition, there is a normally distributed error term u i {\displaystyle u_{i}} u_{i} to capture random influences on this relationship. The observable variable y i {\displaystyle y_{i}} y_{i} is defined as the ramp function: equal to the latent variable whenever the latent variable is above zero, and zero otherwise. Tofu There is a trend towards using very large deep neural networks (DNN) to improve the accuracy of complex machine learning tasks. However, the size of DNN models that can be explored today is limited by the amount of GPU device memory. This paper presents Tofu, a system for partitioning very large DNN models across multiple GPU devices. Tofu is designed for a tensor-based dataflow system: for each operator in the dataflow graph, it partitions its input/output tensors and parallelizes its execution across workers. Tofu can automatically discover how each operator can be partitioned by analyzing its semantics expressed in a simple specification language. Tofu uses a search algorithm based on dynamic programming to determine the best partition strategy for each operator in the entire dataflow graph. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves better performance than alternative approaches to train very large models on multiple GPUs. TonY Training machine learning (ML) models on large datasets requires considerable computing power. To speed up training, it is typical to distribute training across several machines, often with specialized hardware like GPUs or TPUs. Managing a distributed training job is complex and requires dealing with resource contention, distributed configurations, monitoring, and fault tolerance. In this paper, we describe TonY, an open-source orchestrator for distributed ML jobs built at LinkedIn to address these challenges. Toolbox for Interval Reachability Analysis(TIRA) This paper presents TIRA, a Matlab library gathering several methods for the computation of interval over-approximations of the reachable sets for both continuous- and discrete-time nonlinear systems. Unlike other existing tools, the main strength of interval-based reachability analysis is its simplicity and scalability, rather than the accuracy of the over-approximations. The current implementation of TIRA contains four reachability methods covering wide classes of nonlinear systems, handled with recent results relying on contraction/growth bounds and monotonicity concepts. TIRA’s architecture features a central function working as a hub between the user-defined reachability problem and the library of available reachability methods. This design choice offers increased extensibility of the library, where users can define their own method in a separate function and add the function call in the hub function. Topic Compositional Neural Language Model(TCNLM) We propose a Topic Compositional Neural Language Model (TCNLM), a novel method designed to simultaneously capture both the global semantic meaning and the local word ordering structure in a document. The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-of-Experts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence. In order to train the MoE model efficiently, a matrix factorization method is applied, by extending each weight matrix of the RNN to be an ensemble of topic-dependent weight matrices. The degree to which each member of the ensemble is used is tied to the document-dependent probability of the corresponding topics. Experimental results on several corpora show that the proposed approach outperforms both a pure RNN-based model and other topic-guided language models. Further, our model yields sensible topics, and also has the capacity to generate meaningful sentences conditioned on given topics. Topic Detection and Tracking(TDT) Topic Detection and Tracking (TDT) is a Body of Research and an Evaluation Paradigm that Addresses Event-Based Organization of Broadcast News. The TDT Evaluation Tasks of Tracking, Cluster Detection, and First Story Detection are Each Information Filtering Technology in the Sense That They Require That ‘yes or no’ Decisions be Made on a Stream of News Stories Before Additional Stories Have Arrived. http://…/279-5731073-2040517 Topic Grouper We introduce Topic Grouper as a complementary approach in the field of probabilistic topic modeling. Topic Grouper creates a disjunctive partitioning of the training vocabulary in a stepwise manner such that resulting partitions represent topics. It is governed by a simple generative model, where the likelihood to generate the training documents via topics is optimized. The algorithm starts with one-word topics and joins two topics at every step. It therefore generates a solution for every desired number of topics ranging between the size of the training vocabulary and one. The process represents an agglomerative clustering that corresponds to a binary tree of topics. A resulting tree may act as a containment hierarchy, typically with more general topics towards the root of tree and more specific topics towards the leaves. Topic Grouper is not governed by a background distribution such as the Dirichlet and avoids hyper parameter optimizations. We show that Topic Grouper has reasonable predictive power and also a reasonable theoretical and practical complexity. Topic Grouper can deal well with stop words and function words and tends to push them into their own topics. Also, it can handle topic distributions, where some topics are more frequent than others. We present typical examples of computed topics from evaluation datasets, where topics appear conclusive and coherent. In this context, the fact that each word belongs to exactly one topic is not a major limitation; in some scenarios this can even be a genuine advantage, e.g.~a related shopping basket analysis may aid in optimizing groupings of articles in sales catalogs. Topic Memory Network Many classification models work poorly on short texts due to data sparsity. To address this issue, we propose topic memory networks for short text classification with a novel topic memory mechanism to encode latent topic representations indicative of class labels. Different from most prior work that focuses on extending features with external knowledge or pre-trained topics, our model jointly explores topic inference and text classification with memory networks in an end-to-end manner. Experimental results on four benchmark datasets show that our model outperforms state-of-the-art models on short text classification, meanwhile generates coherent topics. Topic Model In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: “dog” and “bone” will appear more often in documents about dogs, “cat” and “meow” will appear in documents about cats, and “the” and “is” will appear equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents and discovering, based on the statistics of the words in each, what the topics might be and what each document’s balance of topics is. Topic Tagging Topic-Based Convolutional Neural Network(TB-CNN) With the explosive development of mobile Internet, short text has been applied extensively. The difference between classifying short text and long documents is that short text is of shortness and sparsity. Thus, it is challenging to deal with short text classification owing to its less semantic information. In this paper, we propose a novel topic-based convolutional neural network (TB-CNN) based on Latent Dirichlet Allocation (LDA) model and convolutional neural network. Comparing to traditional CNN methods, TB-CNN generates topic words with LDA model to reduce the sparseness and combines the embedding vectors of topic words and input words to extend feature space of short text. The validation results on IMDB movie review dataset show the improvement and effectiveness of TB-CNN. TopicRNN In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics. Because of their sequential nature, RNNs are good at capturing the local structure of a word sequence – both semantic and syntactic – but might face difficulty remembering long-range dependencies. Intuitively, these long-range dependencies are of semantic nature. In contrast, latent topic models are able to capture the global underlying semantic structure of a document but do not account for word ordering. The proposed TopicRNN model integrates the merits of RNNs and latent topic models: it captures local (syntactic) dependencies using an RNN and global (semantic) dependencies using latent topics. Unlike previous work on contextual RNN language modeling, our model is learned end-to-end. Empirical results on word prediction show that TopicRNN outperforms existing contextual RNN baselines. In addition, TopicRNN can be used as an unsupervised feature extractor for documents. We do this for sentiment analysis and report a new state-of-the-art error rate on the IMDB movie review dataset that amounts to a $13.3\%$ improvement over the previous best result. Finally TopicRNN also yields sensible topics, making it a useful alternative to document models such as latent Dirichlet allocation. Top-K High Utility Itemset(THUI) A comparative study of top-k high utility itemset mining methods Top-N-Rank We propose Top-N-Rank, a novel family of list-wise Learning-to-Rank models for reliably recommending the N top-ranked items. The proposed models optimize a variant of the widely used cumulative discounted gain (DCG) objective function which differs from DCG in two important aspects: (i) It limits the evaluation of DCG only on the top N items in the ranked lists, thereby eliminating the impact of low-ranked items on the learned ranking function; and (ii) it incorporates weights that allow the model to leverage multiple types of implicit feedback with differing levels of reliability or trustworthiness. Because the resulting objective function is non-smooth and hence challenging to optimize, we consider two smooth approximations of the objective function, using the traditional sigmoid function and the rectified linear unit (ReLU). We propose a family of learning-to-rank algorithms (Top-N-Rank) that work with any smooth objective function. Then, a more efficient variant, Top-N-Rank.ReLU, is introduced, which effectively exploits the properties of ReLU function to reduce the computational complexity of Top-N-Rank from quadratic to linear in the average number of items rated by users. The results of our experiments using two widely used benchmarks, namely, the MovieLens data set and the Amazon Video Games data set demonstrate that: (i) The top-N truncation’ of the objective function substantially improves the ranking quality of the top N recommendations; (ii) using the ReLU for smoothing the objective function yields significant improvement in both ranking quality as well as runtime as compared to using the sigmoid; and (iii) Top-N-Rank.ReLU substantially outperforms the well-performing list-wise ranking methods in terms of ranking quality. Topological Anomaly Detection(TAD) The technique is essentially a density based outlier detection algorithm that, instead of calculating local densities, constructs a graph of the data using nearest-neighbors. The algorithm is different from other kNN outlier detection algorithms in that instead of setting ‘k’ as a parameter, you instead set a maximal inter-observation distance (called the graph “resolution” by Gartley and Basener). If the distance between two points is less than the graph resolution, add an edge between those two observations to the graph. Once the full graph is constructed, determine which connected components comprise the “background” of the data by setting some threshold percentage of observations ‘p’: any components with fewer than ‘p’ observations is considered an anomalous component, and all the observations (nodes) in this component are outliers. Topological Data Analysis(TDA) Topological data analysis (TDA) is a new area of study aimed at having applications in areas such as data mining and computer vision. The main problems are: 1. how one infers high-dimensional structure from low-dimensional representations; and 2. how one assembles discrete points into global structure. The human brain can easily extract global structure from representations in a strictly lower dimension, i.e. we infer a 3D environment from a 2D image from each eye. The inference of global structure also occurs when converting discrete data into continuous images, e.g. dot-matrix printers and televisions communicate images via arrays of discrete points. The main method used by topological data analysis is: 1. Replace a set of data points with a family of simplicial complexes, indexed by a proximity parameter. 2. Analyse these topological complexes via algebraic topology – specifically, via the theory of persistent homology. 3. Encode the persistent homology of a data set in the form of a parameterized version of a Betti number which is called a persistence diagram or barcode. Why Topological Data Analysis Works Topological Analysis of Data Topology Data Analysis (TDA) Topological Regularizer for Classifiers(TopoReg) Regularization plays a crucial role in supervised learning. A successfully regularized model strikes a balance between a perfect description of the training data and the ability to generalize to unseen data. Most existing methods enforce a global regularization in a structure agnostic manner. In this paper, we initiate a new direction and propose to enforce the structural simplicity of the classification boundary by regularizing over its topological complexity. In particular, our measurement of topological complexity incorporates the importance of topological features (e.g., connected components, handles, and so on) in a meaningful manner, and provides a direct control over spurious topological structures. We incorporate the new measurement as a topological loss in training classifiers. We also propose an efficient algorithm to compute the gradient. Our method provides a novel way to topologically simplify the global structure of the model, without having to sacrifice too much of the flexibility of the model. We demonstrate the effectiveness of our new topological regularizer on a range of synthetic and real-world datasets. Topological Sorting In computer science, a topological sort (sometimes abbreviated topsort or toposort) or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topology ToolKit(TTK) This system paper presents the Topology ToolKit (TTK), a software platform designed for topological data analysis in scientific visualization. TTK provides a unified, generic, efficient, and robust implementation of key algorithms for the topological analysis of scalar data, including: critical points, integral lines, persistence diagrams, persistence curves, merge trees, contour trees, Morse-Smale complexes, fiber surfaces, continuous scatterplots, Jacobi sets, Reeb spaces, and more. TTK is easily accessible to end users due to a tight integration with ParaView. It is also easily accessible to developers through a variety of bindings (Python, VTK/C++) for fast prototyping or through direct, dependence-free, C++, to ease integration into pre-existing complex systems. While developing TTK, we faced several algorithmic and software engineering challenges, which we document in this paper. In particular, we present an algorithm for the construction of a discrete gradient that complies to the critical points extracted in the piecewise-linear setting. This algorithm guarantees a combinatorial consistency across the topological abstractions supported by TTK, and importantly, a unified implementation of topological data simplification for multi-scale exploration and analysis. We also present a cached triangulation data structure, that supports time efficient and generic traversals, which self-adjusts its memory usage on demand for input simplicial meshes and which implicitly emulates a triangulation for regular grids with no memory overhead. Finally, we describe an original software architecture, which guarantees memory efficient and direct accesses to TTK features, while still allowing for researchers powerful and easy bindings and extensions. TTK is open source (BSD license) and its code, online documentation and video tutorials are available on TTK’s website. Topology-Based Pathway Enrichment Analysis(TPEA) TPEA TopoResNet Skin cancer is one of the most common cancers in the United States. As technological advancements are made, algorithmic diagnosis of skin lesions is becoming more important. In this paper, we develop algorithms for segmenting the actual diseased area of skin in a given image of a skin lesion, and for classifying different types of skin lesions pictured in a given image. The cores of the algorithms used were based in persistent homology, an algebraic topology technique that is part of the rising field of Topological Data Analysis (TDA). The segmentation algorithm utilizes a similar concept to persistent homology that captures the robustness of segmented regions. For classification, we design two families of topological features from persistence diagrams—which we refer to as {\em persistence statistics} (PS) and {\em persistence curves} (PC), and use linear support vector machine as classifiers. We also combined those topological features, PS and PC, into ResNet-101 model, which we call {\em TopoResNet-101}, the results show that PS and PC are effective in two folds—improving classification performances and stabilizing the training process. Although convolutional features are the most important learning targets in CNN models, global information of images may be lost in the training process. Because topological features were extracted globally, our results show that the global property of topological features provide additional information to machine learning models. Torch Torch is a scientific computing framework with wide support for machine learning algorithms. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation. A summary of core features: · a powerful N-dimensional array · lots of routines for indexing, slicing, transposing, … · amazing interface to C, via LuaJIT · linear algebra routines · neural network, and energy-based models · numeric optimization routines · Fast and efficient GPU support · Embeddable, with ports to iOS, Android and FPGA backends https://…/torch7 torchbearer We introduce torchbearer, a model fitting library for pytorch aimed at researchers working on deep learning or differentiable programming. The torchbearer library provides a high level metric and callback API that can be used for a wide range of applications. We also include a series of built in callbacks that can be used for: model persistence, learning rate decay, logging, data visualization and more. The extensive documentation includes an example library for deep learning and dynamic programming problems and can be found at http://torchbearer.readthedocs.io. The code is licensed under the MIT License and available at https://…/torchbearer. TorMentor Distributed machine learning (ML) systems today use an unsophisticated threat model: data sources must trust a central ML process. We propose a brokered learning abstraction that allows data sources to contribute towards a globally-shared model with provable privacy guarantees in an untrusted setting. We realize this abstraction by building on federated learning, the state of the art in multi-party ML, to construct TorMentor: an anonymous hidden service that supports private multi-party ML. We define a new threat model by characterizing, developing and evaluating new attacks in the brokered learning setting, along with new defenses for these attacks. We show that TorMentor effectively protects data providers against known ML attacks while providing them with a tunable trade-off between model accuracy and privacy. We evaluate TorMentor with local and geo-distributed deployments on Azure/Tor. In an experiment with 200 clients and 14 MB of data per client, our prototype trained a logistic regression model using stochastic gradient descent in 65s. Total Distance Multivariance We introduce two new measures for the dependence of $n \ge 2$ random variables: distance multivariance’ and total distance multivariance’. Both measures are based on the weighted $L^2$-distance of quantities related to the characteristic functions of the underlying random variables. They extend distance covariance (introduced by Szekely, Rizzo and Bakirov) and generalized distance covariance (introduced in part I) from pairs of random variables to $n$-tuplets of random variables. We show that total distance multivariance can be used to detect the independence of $n$ random variables and has a simple finite-sample representation in terms of distance matrices of the sample points, where distance is measured by a continuous negative definite function. Based on our theoretical results, we present a test for independence of multiple random vectors which is consistent against all alternatives. Total Operating Characteristic(TOC) The relative operating characteristic (ROC) is a popular statistical method to measure the association between observed and diagnosed presence of a characteristic. The diagnosis of presence or absence depends on whether the value of an index variable is above a threshold. ROC considers multiple possible thresholds. Each threshold generates a two-by-two contingency table, which contains four central entries: hits, misses, false alarms, and correct rejections. ROC reveals for each threshold only two ratios, hits/(hits + misses) and false alarms/(false alarms + correct rejections). This article introduces the total operating characteristic (TOC), which shows the total information in the contingency table for each threshold. TOC maintains desirable properties of ROC, while TOC reveals strictly more information than ROC in a manner that makes TOC more useful than ROC. TOC Total Unduplicated Reach and Frequency(TURF) TURF Analysis, an acronym for “Total Unduplicated Reach and Frequency”, is a type of statistical analysis used for providing estimates of media or market potential and devising optimal communication and placement strategies given limited resources. TURF analysis identifies the number of users reached by a communication, and how often they are reached. Although originally used by media schedulers to maximize reach and frequency of media spending across different items (print, broadcast, etc.), TURF is also now used to provide estimates of market potential. For example, if a company plans to market a new yogurt, they may consider launching ten possible flavors, but in reality, only three might be purchased in large quantities. The TURF algorithm identifies the optimal product line to maximize the total number of consumers who will purchase at least one SKU. Typically, when T.U.R.F. is undertaken for optimizing a product range, the analysis only looks at the reach of the product range (ignoring the Frequency component of TURF). turfR Totally-Looks-Like(TTL) Perceptual judgment of image similarity by humans relies on a rich internal representations ranging from low-level features to high-level concepts, scene properties and even cultural associations. Existing methods and datasets attempting to explain perceived similarity use stimuli which arguably do not cover the full breadth of factors that affect human similarity judgments, even those geared toward this goal. We introduce a new dataset dubbed \textbf{Totally-Looks-Like} (TTL) after a popular entertainment website, which contains images paired by humans as being visually similar. The dataset contains 6016 image-pairs from the wild, shedding light upon a rich and diverse set of criteria employed by human beings. We conduct experiments to try to reproduce the pairings via features extracted from state-of-the-art deep convolutional neural networks, as well as additional human experiments to verify the consistency of the collected data. Though we create conditions to artificially make the matching task increasingly easier, we show that machine-extracted representations perform very poorly in terms of reproducing the matching selected by humans. We discuss and analyze these results, suggesting future directions for improvement of learned image representations. Touchard Model We present a novel model, which is a two-parameter extension of the Poisson distribution. Its normalizing constant is related to the Touchard polynomials, hence the name of this model. It is a flexible distribution that can account for both under- or overdispersion and concentration of zeros that are frequently found in non-Poisson count data. In contrast to some other generalizations, the Hessian matrix for maximum likelihood estimation of the Touchard parameters has a simple form. We exemplify with three data sets, showing that our suggested model is a competitive candidate for fitting non-Poisson counts. touchard ToyArchitecture Research in Artificial Intelligence (AI) has focused mostly on two extremes: either on small improvements in narrow AI domains, or on universal theoretical frameworks which are usually uncomputable, incompatible with theories of biological intelligence, or lack practical implementations. The goal of this work is to combine the main advantages of the two: to follow a big picture view, while providing a particular theory and its implementation. In contrast with purely theoretical approaches, the resulting architecture should be usable in realistic settings, but also form the core of a framework containing all the basic mechanisms, into which it should be easier to integrate additional required functionality. In this paper, we present a novel, purposely simple, and interpretable hierarchical architecture which combines multiple different mechanisms into one system: unsupervised learning of a model of the world, learning the influence of one’s own actions on the world, model-based reinforcement learning, hierarchical planning and plan execution, and symbolic/sub-symbolic integration in general. The learned model is stored in the form of hierarchical representations with the following properties: 1) they are increasingly more abstract, but can retain details when needed, and 2) they are easy to manipulate in their local and symbolic-like form, thus also allowing one to observe the learning process at each level of abstraction. On all levels of the system, the representation of the data can be interpreted in both a symbolic and a sub-symbolic manner. This enables the architecture to learn efficiently using sub-symbolic methods and to employ symbolic inference. Toybox Deep convolutional neural networks (CNNs) have enjoyed tremendous success in computer vision in the past several years, particularly for visual object recognition.However, how CNNs work remains poorly understood, and the training of deep CNNs is still considered more art than science. To better characterize deep CNNs and the training process, we introduce a new video dataset called Toybox. Images in Toybox come from first-person, wearable camera recordings of common household objects and toys being manually manipulated to undergo structured transformations like rotations and translations. We also present results from initial experiments using deep CNNs that begin to examine how different distributions of training data can affect visual object recognition performance, and how visual object concepts are represented within a trained network. Toyplot Toyplot, the kid-sized plotting toolkit for Python with grownup-sized goals: · Develop beautiful interactive, animated plots that embrace the unique capabilities of electronic publishing and support repoducibility. · Create the best possible data graphics ‘out-of-the-box’, maximizing data ink and minimizing chartjunk. · Provide a clean, minimalist interface that scientists and engineers will love. The Toyplot Tutorial T-PRISM We introduce a new logic programming language T-PRISM based on tensor embeddings. Our embedding scheme is a modification of the distribution semantics in PRISM, one of the state-of-the-art probabilistic logic programming languages, by replacing distribution functions with multidimensional arrays, i.e., tensors. T-PRISM consists of two parts: logic programming part and numerical computation part. The former provides flexible and interpretable modeling at the level of first order logic, and the latter part provides scalable computation utilizing parallelization and hardware acceleration with GPUs. Combing these two parts provides a remarkably wide range of high-level declarative modeling from symbolic reasoning to deep learning. To embody this programming language, we also introduce a new semantics, termed tensorized semantics, which combines the traditional least model semantics in logic programming with the embeddings of tensors. In T-PRISM, we first derive a set of equations related to tensors from a given program using logical inference, i.e., Prolog execution in a symbolic space and then solve the derived equations in a continuous space by TensorFlow. Using our preliminary implementation of T-PRISM, we have successfully dealt with a wide range of modeling. We have succeeded in dealing with real large-scale data in the declarative modeling. This paper presents a DistMult model for knowledge graphs using the FB15k and WN18 datasets. t-product Kilmer and Martin [Linear Algebra Appl., 435 (2011), pp. 641–658] Trace Lasso-L1 Graph Cut(TL-L1GC) This work proposes an adaptive trace lasso regularized L1-norm based graph cut method for dimensionality reduction of Hyperspectral images, called as Trace Lasso-L1 Graph Cut’ (TL-L1GC). The underlying idea of this method is to generate the optimal projection matrix by considering both the sparsity as well as the correlation of the data samples. The conventional L2-norm used in the objective function is sensitive to noise and outliers. Therefore, in this work L1-norm is utilized as a robust alternative to L2-norm. Besides, for further improvement of the results, we use a penalty function of trace lasso with the L1GC method. It adaptively balances the L2-norm and L1-norm simultaneously by considering the data correlation along with the sparsity. We obtain the optimal projection matrix by maximizing the ratio of between-class dispersion to within-class dispersion using L1-norm with trace lasso as the penalty. Furthermore, an iterative procedure for this TL-L1GC method is proposed to solve the optimization function. The effectiveness of this proposed method is evaluated on two benchmark HSI datasets. Trainable Time Warping(TTW) DTW calculates the similarity or alignment between two signals, subject to temporal warping. However, its computational complexity grows exponentially with the number of time-series. Although there have been algorithms developed that are linear in the number of time-series, they are generally quadratic in time-series length. The exception is generalized time warping (GTW), which has linear computational cost. Yet, it can only identify simple time warping functions. There is a need for a new fast, high-quality multisequence alignment algorithm. We introduce trainable time warping (TTW), whose complexity is linear in both the number and the length of time-series. TTW performs alignment in the continuous-time domain using a sinc convolutional kernel and a gradient-based optimization technique. We compare TTW and GTW on 85 UCR datasets in time-series averaging and classification. TTW outperforms GTW on 67.1% of the datasets for the averaging tasks, and 61.2% of the datasets for the classification tasks. Trained Rank Pruning The performance of Deep Neural Networks (DNNs) keeps elevating in recent years with increasing network depth and width. To enable DNNs on edge devices like mobile phones, researchers proposed several network compression methods including pruning, quantization and factorization. Among the factorization-based approaches, low-rank approximation has been widely adopted because of its solid theoretical rationale and efficient implementations. Several previous works attempted to directly approximate a pre-trained model by low-rank decomposition; however, small approximation errors in parameters can ripple a large prediction loss. As a result, performance usually drops significantly and a sophisticated fine-tuning is required to recover accuracy. We argue that it is not optimal to separate low-rank approximation from training. Unlike previous works, this paper integrates low rank approximation and regularization into the training. We propose Trained Rank Pruning (TRP), which iterates low rank approximation and training. TRP maintains the capacity of original network while imposes low-rank constraints during training. A stochastic sub-gradient descent optimized nuclear regularization is utilized to further encourage low rank in TRP. The TRP trained network has low-rank structure in nature, and can be approximated with negligible performance loss, eliminating fine-tuning after low rank approximation. The methods are comprehensively evaluated on CIFAR-10 and ImageNet, outperforming previous compression methods using low rank approximation. Training Set A training set is a set of data used in various areas of information science to discover potentially predictive relationships. Training sets are used in artificial intelligence, machine learning, genetic programming, intelligent systems, and statistics. In all these fields, a training set has much the same role and is often used in conjunction with a test set. Training, Validation, Test Divide the data set into three parts: · Training, Validation, Test (e.g. 50, 25, 25) · Fit model on the TRAINING set · Select model using VALIDATION set · Assess prediction error using TEST set Train-less Accuracy Predictor for Architecture Search(TAPAS) In recent years an increasing number of researchers and practitioners have been suggesting algorithms for large-scale neural network architecture search: genetic algorithms, reinforcement learning, learning curve extrapolation, and accuracy predictors. None of them, however, demonstrated high-performance without training new experiments in the presence of unseen datasets. We propose a new deep neural network accuracy predictor, that estimates in fractions of a second classification performance for unseen input datasets, without training. In contrast to previously proposed approaches, our prediction is not only calibrated on the topological network information, but also on the characterization of the dataset-difficulty which allows us to re-tune the prediction without any training. Our predictor achieves a performance which exceeds 100 networks per second on a single GPU, thus creating the opportunity to perform large-scale architecture search within a few minutes. We present results of two searches performed in 400 seconds on a single GPU. Our best discovered networks reach 93.67% accuracy for CIFAR-10 and 81.01% for CIFAR-100, verified by training. These networks are performance competitive with other automatically discovered state-of-the-art networks however we only needed a small fraction of the time to solution and computational resources. Introducing TAPAS Traj-clusiVAT-based TP Trajectory prediction (TP) is of great importance for a wide range of location-based applications in intelligent transport systems such as location-based advertising, route planning, traffic management, and early warning systems. In the last few years, the widespread use of GPS navigation systems and wireless communication technology enabled vehicles has resulted in huge volumes of trajectory data. The task of utilizing this data employing spatio-temporal techniques for trajectory prediction in an efficient and accurate manner is an ongoing research problem. Existing TP approaches are limited to short-term predictions. Moreover, they cannot handle a large volume of trajectory data for long-term prediction. To address these limitations, we propose a scalable clustering and Markov chain based hybrid framework, called Traj-clusiVAT-based TP, for both short-term and long-term trajectory prediction, which can handle a large number of overlapping trajectories in a dense road network. In addition, Traj-clusiVAT can also determine the number of clusters, which represent different movement behaviours in input trajectory data. In our experiments, we compare our proposed approach with a mixed Markov model (MMM)-based scheme, and a trajectory clustering, NETSCAN-based TP method for both short- and long-term trajectory predictions. We performed our experiments on two real, vehicle trajectory datasets, including a large-scale trajectory dataset consisting of 3.28 million trajectories obtained from 15,061 taxis in Singapore over a period of one month. Experimental results on two real trajectory datasets show that our proposed approach outperforms the existing approaches in terms of both short- and long-term prediction performances, based on prediction accuracy and distance error (in km). Trajectory Analysis traj Trajectory Mining Predicting transportation modes from GPS (Global Positioning System) records is a hot topic in the trajectory mining domain. Each GPS record is called a trajectory point and a trajectory is a sequence of these points. Trajectory mining has applications including but not limited to transportation mode detection, tourism, traffic congestion, smart cities management, animal behaviour analysis, environmental preservation, and traffic dynamics are some of the trajectory mining applications. Transportation modes prediction as one of the tasks in human mobility and vehicle mobility applications plays an important role in resource allocation, traffic management systems, tourism planning and accident detection. TRAJEDI The vast increase in our ability to obtain and store trajectory data necessitates trajectory analytics techniques to extract useful information from this data. Pair-wise distance functions are a foundation building block for common operations on trajectory datasets including constrained SELECT queries, k-nearest neighbors, and similarity and diversity algorithms. The accuracy and performance of these operations depend heavily on the speed and accuracy of the underlying trajectory distance function, which is in turn affected by trajectory calibration. Current methods either require calibrated data, or perform calibration of the entire relevant dataset first, which is expensive and time consuming for large datasets. We present TRAJEDI, a calibrationaware pair-wise distance calculation scheme that outperforms naive approaches while preserving accuracy. We also provide analyses of parameter tuning to trade-off between speed and accuracy. Our scheme is usable with any diversity, similarity or k-nearest neighbor algorithm. TransATT Attribute acquisition for classes is a key step in ontology construction, which is often achieved by community members manually. This paper investigates an attention-based automatic paradigm called TransATT for attribute acquisition, by learning the representation of hierarchical classes and attributes in Chinese ontology. The attributes of an entity can be acquired by merely inspecting its classes, because the entity can be regard as the instance of its classes and inherit their attributes. For explicitly describing of the class of an entity unambiguously, we propose class-path to represent the hierarchical classes in ontology, instead of the terminal class word of the hypernym-hyponym relation (i.e., is-a relation) based hierarchy. The high performance of TransATT on attribute acquisition indicates the promising ability of the learned representation of class-paths and attributes. Moreover, we construct a dataset named \textbf{BigCilin11k}. To the best of our knowledge, this is the first Chinese dataset with abundant hierarchical classes and entities with attributes. TransC Concepts, which represent a group of different instances sharing common properties, are essential information in knowledge representation. Most conventional knowledge embedding methods encode both entities (concepts and instances) and relations as vectors in a low dimensional semantic space equally, ignoring the difference between concepts and instances. In this paper, we propose a novel knowledge graph embedding model named TransC by differentiating concepts and instances. Specifically, TransC encodes each concept in knowledge graph as a sphere and each instance as a vector in the same semantic space. We use the relative positions to model the relations between concepts and instances (i.e., instanceOf), and the relations between concepts and sub-concepts (i.e., subClassOf). We evaluate our model on both link prediction and triple classification tasks on the dataset based on YAGO. Experimental results show that TransC outperforms state-of-the-art methods, and captures the semantic transitivity for instanceOf and subClassOf relation. Our codes and datasets can be obtained from h…/ github.com/davidlvxin/TransC. Trans-DLR The goal of knowledge representation learning is to embed entities and relations into a low-dimensional, continuous vector space. How to push a model to its limit and obtain better results is of great significance in knowledge graph’s applications. We propose a simple and elegant method, Trans-DLR, whose main idea is dynamic learning rate control during training. Our method achieves remarkable improvement, compared with recent GAN-based method. Moreover, we introduce a new negative sampling trick which corrupts not only entities, but also relations, in different probabilities. We also develop an efficient way, which fully utilizes multiprocessing and parallel computing, to speed up evaluation of the model in link prediction tasks. Experiments show that our method is effective. Transducer We allow database user to script a parallel relational database engine with a procedural language. Procedural language code is executed as a user defined relational query operator called transducer. Transducer is tightly integrated with relation engine, including query optimizer, query executor and can be executed in parallel like other query operators. With transducer, we can efficiently execute queries that are very difficult to express in SQL. As example, we show how to run time series and graph queries, etc, within a parallel relational database. Transduction In logic, statistical inference, and supervised learning, transduction or transductive inference is reasoning from observed, specific (training) cases to specific (test) cases. In contrast, induction is reasoning from observed training cases to general rules, which are then applied to the test cases. The distinction is most interesting in cases where the predictions of the transductive model are not achievable by any inductive model. Note that this is caused by transductive inference on different test sets producing mutually inconsistent predictions. Transductive Adversarial Network(TAN) Transductive Adversarial Networks (TAN) is a novel domain-adaptation machine learning framework that is designed for learning a conditional probability distribution on unlabelled input data in a target domain, while also only having access to: (1) easily obtained labelled data from a related source domain, which may have a different conditional probability distribution than the target domain, and (2) a marginalised prior distribution on the labels for the target domain. TAN leverages a fully adversarial training procedure and a unique generator/encoder architecture which approximates the transductive combination of the available source- and target-domain data. A benefit of TAN is that it allows the distance between the source- and target-domain label-vector marginal probability distributions to be greater than 0 (i.e. different tasks across the source and target domains) whereas other domain-adaptation algorithms require this distance to equal 0 (i.e. a single task across the source and target domains). TAN can, however, still handle the latter case and is a more generalised approach to this case. Another benefit of TAN is that due to being a fully adversarial algorithm, it has the potential to accurately approximate highly complex distributions. Theoretical analysis demonstrates the viability of the TAN framework. Transductive Boltzmann Machine(TBM) We present transductive Boltzmann machines (TBMs), which firstly achieve transductive learning of the Gibbs distribution. While exact learning of the Gibbs distribution is impossible by the family of existing Boltzmann machines due to combinatorial explosion of the sample space, TBMs overcome the problem by adaptively constructing the minimum required sample space from data to avoid unnecessary generalization. We theoretically provide bias-variance decomposition of the KL divergence in TBMs to analyze its learnability, and empirically demonstrate that TBMs are superior to the fully visible Boltzmann machines and popularly used restricted Boltzmann machines in terms of efficiency and effectiveness. Transductive Conformal Prediction(TCP) The conformalClassification package implements Transductive Conformal Prediction (TCP) and Inductive Conformal Prediction (ICP) for classification problems. Conformal Prediction (CP) is a framework that complements the predictions of machine learning algorithms with reliable measures of confidence. TCP gives results with higher validity than ICP, however ICP is computationally faster than TCP. The package conformalClassification is built upon the random forest method, where votes of the random forest for each class are considered as the conformity scores for each data point. Although the main aim of the conformalClassification package is to generate CP errors (p-values) for classification problems, the package also implements various diagnostic measures such as deviation from validity, error rate, efficiency, observed fuzziness and calibration plots. In future releases, we plan to extend the package to use other machine learning algorithms, (e.g. support vector machines) for model fitting. Transductive Propagation Network(TPN) Few-shot learning aims to build a learner that quickly generalizes to novel classes even when a limited number of labeled examples (so-called low-data problem) are available. Meta-learning is commonly deployed to mimic the test environment in a training phase for good generalization, where episodes (i.e., learning problems) are manually constructed from the training set. This framework gains a lot of attention to few-shot learning with impressive performance, though the low-data problem is not fully addressed. In this paper, we propose Transductive Propagation Network (TPN), a transductive method that classifies the entire test set at once to alleviate the low-data problem. Specifically, our proposed network explicitly learns an underlying manifold space that is appropriate to propagate labels from few-shot examples, where all parameters of feature embedding, manifold structure, and label propagation are estimated in an end-to-end way on episodes. We evaluate the proposed method on the commonly used miniImageNet and tieredImageNet benchmarks and achieve the state-of-the-art or promising results on these datasets. Transfer Adaptation Learning Transfer Adaptation Learning: A Decade Survey Transfer Automatic Machine Learning Building effective neural networks requires many design choices. These include the network topology, optimization procedure, regularization, stability methods, and choice of pre-trained parameters. This design is time consuming and requires expert input. Automatic Machine Learning aims automate this process using hyperparameter optimization. However, automatic model building frameworks optimize performance on each task independently, whereas human experts leverage prior knowledge when designing a new network. We propose Transfer Automatic Machine Learning, a method to accelerate network design using knowledge of prior tasks. For this, we build upon reinforcement learning architecture design methods to support parallel training on multiple tasks and transfer the search strategy to new tasks. Tested on NLP and Image classification tasks, Transfer Automatic Machine Learning reduces convergence time over single-task methods by almost an order of magnitude on 13 out of 14 tasks. It achieves better test set accuracy on 10 out of 13 tasks NLP tasks and improves performance on CIFAR-10 image recognition from 95.3% to 97.1%. Transfer Channel Pruning(TCP) Deep unsupervised domain adaptation (UDA) has recently received increasing attention from researchers. However, existing methods are computationally intensive due to the computation cost of Convolutional Neural Networks (CNN) adopted by most work. To date, there is no effective network compression method for accelerating these models. In this paper, we propose a unified Transfer Channel Pruning (TCP) approach for accelerating UDA models. TCP is capable of compressing the deep UDA model by pruning less important channels while simultaneously learning transferable features by reducing the cross-domain distribution divergence. Therefore, it reduces the impact of negative transfer and maintains competitive performance on the target task. To the best of our knowledge, TCP is the first approach that aims at accelerating deep UDA models. TCP is validated on two benchmark datasets-Office-31 and ImageCLEF-DA with two common backbone networks-VGG16 and ResNet50. Experimental results demonstrate that TCP achieves comparable or better classification accuracy than other comparison methods while significantly reducing the computational cost. To be more specific, in VGG16, we get even higher accuracy after pruning 26% floating point operations (FLOPs); in ResNet50, we also get higher accuracy on half of the tasks after pruning 12% FLOPs. We hope that TCP will open a new door for future research on accelerating transfer learning models. Transfer Entropy Transfer entropy is a non-parametric statistic measuring the amount of directed (time-asymmetric) transfer of information between two random processes. Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y. Transfer entropy reduces to Granger causality for vector auto-regressive processes. Hence, it is advantageous when the model assumption of Granger causality doesn’t hold, for example, analysis of non-linear signals. However, it usually requires more samples for accurate estimation. The probabilities in the entropy formula can be estimated using different approaches (binning, nearest neighbors) or, in order to reduce complexity, using a non-uniform embedding. While it was originally defined for bivariate analysis, transfer entropy has been extended to multivariate forms, either conditioning on other potential source variables or considering transfer from a collection of sources, although these forms require more samples again. Transfer Feature Generating Networks With Semantic Classes Structure(TFGNSCS) Suffering from the generating feature inconsistence of seen classes training model for following the distribution of unseen classes , most of existing feature generating networks difficultly obtain satisfactory performance for the challenging generalization zero-shot learning (GZSL) by adversarial learning the distribution of semantic classes. To alleviate the negative influence of this inconsistence for zero-shot learning (ZSL), transfer feature generating networks with semantic classes structure (TFGNSCS) is proposed to construct networks model for improving the performance of ZSL and GZSL. TFGNSCS can not only consider the semantic structure relationship between seen and unseen classes but also learn the difference of generating features by balancing transfer information between seen and unseen classes in networks. The proposed method can integrate a Wasserstein generative adversarial network with classification loss and transfer loss to generate enough CNN feature, on which softmax classifiers are trained for ZSL and GZSL. Experiments demonstrate that the performance of TFGNSCS outperforms that of the state of the arts on four challenging datasets, which are CUB,FLO,SUN, AWA in GZSL. Transfer Function Model Transfer function models describe the relationship between the inputs and outputs of a system using a ratio of polynomials. The model order is equal to the order of the denominator polynomial. The roots of the denominator polynomial are referred to as the model poles. The roots of the numerator polynomial are referred to as the model zeros. The parameters of a transfer function model are its poles, zeros and transport delays. Transfer Incremental Learning using Data Augmentation(TILDA) Deep learning-based methods have reached state of the art performances, relying on large quantity of available data and computational power. Such methods still remain highly inappropriate when facing a major open machine learning problem, which consists of learning incrementally new classes and examples over time. Combining the outstanding performances of Deep Neural Networks (DNNs) with the flexibility of incremental learning techniques is a promising venue of research. In this contribution, we introduce Transfer Incremental Learning using Data Augmentation (TILDA). TILDA is based on pre-trained DNNs as feature extractor, robust selection of feature vectors in subspaces using a nearest-class-mean based technique, majority votes and data augmentation at both the training and the prediction stages. Experiments on challenging vision datasets demonstrate the ability of the proposed method for low complexity incremental learning, while achieving significantly better accuracy than existing incremental counterparts. Transfer Learning Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments. Recycling Deep Learning Models with Transfer Learning Transfer Metric Learning(TML) Distance metric learning (DML) aims to find an appropriate way to reveal the underlying data relationship. It is critical in many machine learning, pattern recognition and data mining algorithms, and usually require large amount of label information (class labels or pair/triplet constraints) to achieve satisfactory performance. However, the label information may be insufficient in real-world applications due to the high-labeling cost, and DML may fail in this case. Transfer metric learning (TML) is able to mitigate this issue for DML in the domain of interest (target domain) by leveraging knowledge/information from other related domains (source domains). Although achieved a certain level of development, TML has limited success in various aspects such as selective transfer, theoretical understanding, handling complex data, big data and extreme cases. In this survey, we present a systematic review of the TML literature. In particular, we group TML into different categories according to different settings and metric transfer strategies, such as direct metric approximation, subspace approximation, distance approximation, and distribution approximation. A summarization and insightful discussion of the various TML approaches and their applications will be presented. Finally, we provide some challenges and possible future directions. Transferable BERT(TransBERT) Recent advances, such as GPT and BERT, have shown success in incorporating a pre-trained transformer language model and fine-tuning operation to improve downstream NLP systems. However, this framework still has some fundamental problems in effectively incorporating supervised knowledge from other related tasks. In this study, we investigate a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Particularly, we propose utilizing three kinds of transfer tasks, including natural language inference, sentiment classification, and next action prediction, to further train BERT based on a pre-trained model. This enables the model to get a better initialization for the target task. We take story ending prediction as the target task to conduct experiments. The final result, an accuracy of 91.8%, dramatically outperforms previous state-of-the-art baseline methods. Several comparative experiments give some helpful suggestions on how to select transfer tasks. Error analysis shows what are the strength and weakness of BERT-based models for story ending prediction. Transferable Dialogue State Generator(TRADE) In this thesis, we leverage the neural copy mechanism and memory-augmented neural networks (MANNs) to address existing challenge of neural task-oriented dialogue learning. We show the effectiveness of our strategy by achieving good performance in multi-domain dialogue state tracking, retrieval-based dialogue systems, and generation-based dialogue systems. We first propose a transferable dialogue state generator (TRADE) that leverages its copy mechanism to get rid of dialogue ontology and share knowledge between domains. We also evaluate unseen domain dialogue state tracking and show that TRADE enables zero-shot dialogue state tracking and can adapt to new few-shot domains without forgetting the previous domains. Second, we utilize MANNs to improve retrieval-based dialogue learning. They are able to capture dialogue sequential dependencies and memorize long-term information. We also propose a recorded delexicalization copy strategy to replace real entity values with ordered entity types. Our models are shown to surpass other retrieval baselines, especially when the conversation has a large number of turns. Lastly, we tackle generation-based dialogue learning with two proposed models, the memory-to-sequence (Mem2Seq) and global-to-local memory pointer network (GLMP). Mem2Seq is the first model to combine multi-hop memory attention with the idea of the copy mechanism. GLMP further introduces the concept of response sketching and double pointers copying. We show that GLMP achieves the state-of-the-art performance on human evaluation. Transferable Joint Attribute-Identity Deep Learning(TJ-AID) Most existing person re-identification (re-id) methods require supervised model learning from a separate large set of pairwise labelled training data for every single camera pair. This significantly limits their scalability and usability in real-world large scale deployments with the need for performing re-id across many camera views. To address this scalability problem, we develop a novel deep learning method for transferring the labelled information of an existing dataset to a new unseen (unlabelled) target domain for person re-id without any supervised learning in the target domain. Specifically, we introduce an Transferable Joint Attribute-Identity Deep Learning (TJ-AIDL) for simultaneously learning an attribute-semantic and identitydiscriminative feature representation space transferrable to any new (unseen) target domain for re-id tasks without the need for collecting new labelled training data from the target domain (i.e. unsupervised learning in the target domain). Extensive comparative evaluations validate the superiority of this new TJ-AIDL model for unsupervised person re-id over a wide range of state-of-the-art methods on four challenging benchmarks including VIPeR, PRID, Market-1501, and DukeMTMC-ReID. Transferlearning Oriented Minority Over-Sampling Technique(TOMO) Cross-project defect prediction (CPDP) aims to predict defects of projects lacking training data by using prediction models trained on historical defect data from other projects. However, since the distribution differences between datasets from different projects, it is still a challenge to build high-quality CPDP models. Unfortunately, class imbalanced nature of software defect datasets further increases the difficulty. In this paper, we propose a transferlearning oriented minority over-sampling technique (TOMO) based feature weighting transfer naive Bayes (FWTNB) approach (TOMOFWTNB) for CPDP by considering both classimbalance and feature importance problems. Differing from traditional over-sampling techniques, TOMO not only can balance the data but reduce the distribution difference. And then FWTNB is used to further increase the similarity of two distributions. Experiments are performed on 11 public defect datasets. The experimental results show that (1) TOMO improves the average G-Measure by 23.7\%$\sim$41.8\%, and the average MCC by 54.2\%$\sim$77.8\%. (2) feature weighting (FW) strategy improves the average G-Measure by 11\%, and the average MCC by 29.2\%. (3) TOMOFWTNB improves the average G-Measure value by at least 27.8\%, and the average MCC value by at least 71.5\%, compared with existing state-of-theart CPDP approaches. It can be concluded that (1) TOMO is very effective for addressing class-imbalance problem in CPDP scenario; (2) our FW strategy is helpful for CPDP; (3) TOMOFWTNB outperforms previous state-of-the-art CPDP approaches. Transferlearning Oriented Minority Over-Sampling Technique Based Feature Weighting Transfer Naive Bayes(TOMOFWTNB) Cross-project defect prediction (CPDP) aims to predict defects of projects lacking training data by using prediction models trained on historical defect data from other projects. However, since the distribution differences between datasets from different projects, it is still a challenge to build high-quality CPDP models. Unfortunately, class imbalanced nature of software defect datasets further increases the difficulty. In this paper, we propose a transferlearning oriented minority over-sampling technique (TOMO) based feature weighting transfer naive Bayes (FWTNB) approach (TOMOFWTNB) for CPDP by considering both classimbalance and feature importance problems. Differing from traditional over-sampling techniques, TOMO not only can balance the data but reduce the distribution difference. And then FWTNB is used to further increase the similarity of two distributions. Experiments are performed on 11 public defect datasets. The experimental results show that (1) TOMO improves the average G-Measure by 23.7\%$\sim$41.8\%, and the average MCC by 54.2\%$\sim$77.8\%. (2) feature weighting (FW) strategy improves the average G-Measure by 11\%, and the average MCC by 29.2\%. (3) TOMOFWTNB improves the average G-Measure value by at least 27.8\%, and the average MCC value by at least 71.5\%, compared with existing state-of-theart CPDP approaches. It can be concluded that (1) TOMO is very effective for addressing class-imbalance problem in CPDP scenario; (2) our FW strategy is helpful for CPDP; (3) TOMOFWTNB outperforms previous state-of-the-art CPDP approaches. Transferred Single and Couple Representation Learning Network(TSCN) Group re-identification (G-ReID) is an important yet less-studied task. Its challenges not only lie in appearance changes of individuals which have been well-investigated in general person re-identification (ReID), but also derive from group layout and membership changes. So the key task of G-ReID is to learn representations robust to such changes. To address this issue, we propose a Transferred Single and Couple Representation Learning Network (TSCN). Its merits are two aspects: 1) Due to the lack of labelled training samples, existing G-ReID methods mainly rely on unsatisfactory hand-crafted features. To gain the superiority of deep learning models, we treat a group as multiple persons and transfer the domain of a labeled ReID dataset to a G-ReID target dataset style to learn single representations. 2) Taking into account the neighborhood relationship in a group, we further propose learning a novel couple representation between two group members, that achieves more discriminative power in G-ReID tasks. In addition, an unsupervised weight learning method is exploited to adaptively fuse the results of different views together according to result patterns. Extensive experimental results demonstrate the effectiveness of our approach that significantly outperforms state-of-the-art methods by 11.7\% CMC-1 on the Road Group dataset and by 39.0\% CMC-1 on the DukeMCMT dataset. TransferTransfo We introduce a new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model. Fine-tuning is performed by using a multi-task objective which combines several unsupervised prediction tasks. The resulting fine-tuned model shows strong improvements over the current state-of-the-art end-to-end conversational models like memory augmented seq2seq and information-retrieval models. On the privately held PERSONA-CHAT dataset of the Conversational Intelligence Challenge 2, this approach obtains a new state-of-the-art, with respective perplexity, Hits@1 and F1 metrics of 16.28 (45 % absolute improvement), 80.7 (46 % absolute improvement) and 19.5 (20 % absolute improvement). Transfinite Mean We define a generalization of the arithmetic mean to bounded well-ordered sequences of real numbers. We show that every probability space admits a well-ordered sequences of points such that the measure of each measurable subset is equal to the frequency with which the sequence is in this subset. We include an argument suggested by Woodin that the club filter on $\omega_1$ does not admit such a sequence of order type $\omega_1$. Transformable Bottleneck Network(TBN) We propose a novel approach to performing fine-grained 3D manipulation of image content via a convolutional neural network, which we call the Transformable Bottleneck Network (TBN). It applies given spatial transformations directly to a volumetric bottleneck within our encoder-bottleneck-decoder architecture. Multi-view supervision encourages the network to learn to spatially disentangle the feature space within the bottleneck. The resulting spatial structure can be manipulated with arbitrary spatial transformations. We demonstrate the efficacy of TBNs for novel view synthesis, achieving state-of-the-art results on a challenging benchmark. We demonstrate that the bottlenecks produced by networks trained for this task contain meaningful spatial structure that allows us to intuitively perform a variety of image manipulations in 3D, well beyond the rigid transformations seen during training. These manipulations include non-uniform scaling, non-rigid warping, and combining content from different images. Finally, we extract explicit 3D structure from the bottleneck, performing impressive 3D reconstruction from a single input image. Transformation Autoregressive Network The fundamental task of general density estimation has been of keen interest to machine learning. Recent advances in density estimation have either: a) proposed a flexible model to estimate the conditional factors of the chain rule, $p(x_{i}\, |\, x_{i-1}, \ldots)$; or b) used flexible, non-linear transformations of variables of a simple base distribution. Instead, this work jointly leverages transformations of variables and autoregressive conditional models, and proposes novel methods for both. We provide a deeper understanding of our methods, showing a considerable improvement through a comprehensive study over both real world and synthetic data. Moreover, we illustrate the use of our models in outlier detection and image modeling tasks. Transformation Forests Regression models for supervised learning problems with a continuous target are commonly understood as models for the conditional mean of the target given predictors. This notion is simple and therefore appealing for interpretation and visualisation. Information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding of regression models as models for conditional distributions allows much broader inference from such models, for example the computation of prediction intervals. Several random forest-type algorithms aim at estimating conditional distributions, most prominently quantile regression forests (Meinshausen, 2006, JMLR). We propose a novel approach based on a parametric family of distributions characterised by their transformation function. A dedicated novel ‘transformation tree’ algorithm able to detect distributional changes is developed. Based on these transformation trees, we introduce ‘transformation forests’ as an adaptive local likelihood estimator of conditional distribution functions. The resulting models are fully parametric yet very general and allow broad inference procedures, such as the model-based bootstrap, to be applied in a straightforward way. trtf Transformation Invariant Graph-Based Network(TIGraNet) Learning transformation invariant representations of visual data is an important problem in computer vision. Deep convolutional networks have demonstrated remarkable results for image and video classification tasks. However, they have achieved only limited success in the classification of images that undergo geometric transformations. In this work we present a novel Transformation Invariant Graph-based Network (TIGraNet), which learns graph-based features that are inherently invariant to isometric transformations such as rotation and translation of input images. In particular, images are represented as signals on graphs, which permits to replace classical convolution and pooling layers in deep networks with graph spectral convolution and dynamic graph pooling layers that together contribute to invariance to isometric transformation. Our experiments show high performance on rotated and translated images from the test set compared to classical architectures that are very sensitive to transformations in the data. The inherent invariance properties of our framework provide key advantages, such as increased resiliency to data variability and sustained performance with limited training sets. Our code is available online. Transformation-Equivariant Representation(TER) AVT: Unsupervised Learning of Transformation Equivariant Representations by Autoencoding Variational Transformations Transformative Knowledge Discovery Big data analytics provides an interdisciplinary framework that is essential to support the current trend for solving real-world problems collaboratively. The progression of big data analytics framework must be clearly understood so that novel approaches can be developed to advance this state-of-the-art discipline. An ignorance of observing the progression of this fast-growing discipline may lead to duplications in research and waste of efforts. Its main companion field, machine learning, helps solve many big data analytics problems; therefore, it is also important to understand the progression of machine learning in the big data analytics framework. One of the current research efforts in big data analytics is the integration of deep learning and Bayesian optimization, which can help the automatic initialization and optimization of hyperparameters of deep learning and enhance the implementation of iterative algorithms in software. The hyperparameters include the weights used in deep learning, and the number of clusters in Bayesian mixture models that characterize data heterogeneity. The big data analytics research also requires computer systems and software that are capable of storing, retrieving, processing, and analyzing big data that are generally large, complex, heterogeneous, unstructured, unpredictable, and exposed to scalability problems. Therefore, it is appropriate to introduce a new research topic – transformative knowledge discovery – that provides a research ground to study and develop smart machine learning models and algorithms that are automatic, adaptive, and cognitive to address big data analytics problems and challenges. The new research domain will also create research opportunities to work on this interdisciplinary research space and develop solutions to support research in other disciplines that may not have expertise in the research area of big data analytics. For example, the research, such as detection and characterization of retinal diseases in medical sciences and the classification of highly interacting species in environmental sciences can benefit from the knowledge and expertise in big data analytics. Transformative Machine Learning The key to success in machine learning (ML) is the use of effective data representations. Traditionally, data representations were hand-crafted. Recently it has been demonstrated that, given sufficient data, deep neural networks can learn effective implicit representations from simple input representations. However, for most scientific problems, the use of deep learning is not appropriate as the amount of available data is limited, and/or the output models must be explainable. Nevertheless, many scientific problems do have significant amounts of data available on related tasks, which makes them amenable to multi-task learning, i.e. learning many related problems simultaneously. Here we propose a novel and general representation learning approach for multi-task learning that works successfully with small amounts of data. The fundamental new idea is to transform an input intrinsic data representation (i.e., handcrafted features), to an extrinsic representation based on what a pre-trained set of models predict about the examples. This transformation has the dual advantages of producing significantly more accurate predictions, and providing explainable models. To demonstrate the utility of this transformative learning approach, we have applied it to three real-world scientific problems: drug-design (quantitative structure activity relationship learning), predicting human gene expression (across different tissue types and drug treatments), and meta-learning for machine learning (predicting which machine learning methods work best for a given problem). In all three problems, transformative machine learning significantly outperforms the best intrinsic representation. Transformed Generalized Autoregressive Moving Average(TGARMA) Transformed Generalized Autoregressive Moving Average (TGARMA) models were recently proposed to deal with non-additivity, non-normality and heteroscedasticity in real time series data. In this paper, a Bayesian approach is proposed for TGARMA models, thus extending the original model. We conducted a simulation study to investigate the performance of Bayesian estimation and Bayesian model selection criteria. In addition, a real dataset was analysed using the proposed approach. Transformer The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Transformer Network ➘ “Transformer-XL” Transformer-XL Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. As a solution, we propose a novel neural architecture, \textit{Transformer-XL}, that enables Transformer to learn dependency beyond a fixed length without disrupting temporal coherence. Concretely, it consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the problem of context fragmentation. As a result, Transformer-XL learns dependency that is about 80\% longer than RNNs and 450\% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformer during evaluation. Additionally, we improve the state-of-the-art (SoTA) results of bpc/perplexity from 1.06 to 0.99 on enwiki8, from 1.13 to 1.08 on text8, from 20.5 to 18.3 on WikiText-103, from 23.7 to 21.8 on One Billion Word, and from 55.3 to 54.5 on Penn Treebank (without finetuning). Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch. Transition-Entropy Recent years have seen rising needs for location-based services in our everyday life. Aside from the many advantages provided by these services, they have caused serious concerns regarding the location privacy of users. An adversary such as an untrusted location-based server can monitor the queried locations by a user to infer critical information such as the user’s home address, health conditions, shopping habits, etc. To address this issue, dummy-based algorithms have been developed to increase the anonymity of users, and thus, protecting their privacy. Unfortunately, the existing algorithms only consider a limited amount of side information known by an adversary which may face more serious challenges in practice. In this paper, we incorporate a new type of side information based on consecutive location changes of users and propose a new metric called transition-entropy to investigate the location privacy preservation, followed by two algorithms to improve the transition-entropy for a given dummy generation algorithm. Then, we develop an attack model based on the Viterbi algorithm which can significantly threaten the location privacy of the users. Next, in order to protect the users from Viterbi attack, we propose an algorithm called robust dummy generation (RDG) which can resist against the Viterbi attack while maintaining a high performance in terms of the privacy metrics introduced in the paper. All the algorithms are applied and analyzed on a real-life dataset. Transitory Queueing Network(TQN) Queueing networks are notoriously difficult to analyze sans both Markovian and stationarity assumptions. Much of the theoretical contribution towards performance analysis of time-inhomogeneous single class queueing networks has focused on Markovian networks, with the recent exception of work in Liu and Whitt (2011) and Mandelbaum and Ramanan (2010). In this paper, we introduce transitory queueing networks as a model of inhomogeneous queueing networks, where a large, but finite, number of jobs arrive at queues in the network over a fixed time horizon. The queues offer FIFO service, and we assume that the service rate can be time-varying. The non-Markovian dynamics of this model complicate the analysis of network performance metrics, necessitating approximations. In this paper we develop fluid and diffusion approximations to the number-in-system performance metric by scaling up the number of external arrivals to each queue, following Honnappa et al. (2014). We also discuss the implications for bottleneck detection in tandem queueing networks. translate2R Many companies realizied the advantages of the open source programming language R. translate2R allows a fast and inexpensive migration to R. The manual migration of complex SPSS® scripts has always been tedious and error-prone, but with translate2R the task of translating by hand becomes a thing of the past. The automatic and comprehensible process of translating SPSS® code to R code with translate2R offers users an enormous number of new analytical opportunities. Besides the usual migration process translate2R allows programmers an easy start into programming with R. Make use of translate2R for the translation of scripts to R. We will be pleased to help you in terms of migration projects, or starting off with R. sjPlot,translateSPSS2R Translational Recommender Networks Representing relationships as translations in vector space lives at the heart of many neural embedding models such as word embeddings and knowledge graph embeddings. In this work, we study the connections of this translational principle with collaborative filtering algorithms. We propose Translational Recommender Networks (\textsc{TransRec}), a new attentive neural architecture that utilizes the translational principle to model the relationships between user and item pairs. Our model employs a neural attention mechanism over a \emph{Latent Relational Attentive Memory} (LRAM) module to learn the latent relations between user-item pairs that best explains the interaction. By exploiting adaptive user-item specific translations in vector space, our model also alleviates the geometric inflexibility problem of other metric learning algorithms while enabling greater modeling capability and fine-grained fitting of users and items in vector space. The proposed architecture not only demonstrates the state-of-the-art performance across multiple recommendation benchmarks but also boasts of improved interpretability. Qualitative studies over the LRAM module shows evidence that our proposed model is able to infer and encode explicit sentiment, temporal and attribute information despite being only trained on implicit feedback. As such, this ascertains the ability of \textsc{TransRec} to uncover hidden relational structure within implicit datasets. TransNets Recently, deep learning methods have been shown to improve the performance of recommender systems over traditional methods, especially when review text is available. For example, a recent model, DeepCoNN, uses neural nets to learn one latent representation for the text of all reviews written by a target user, and a second latent representation for the text of all reviews for a target item, and then combines these latent representations to obtain state-of-the-art performance on recommendation tasks. We show that (unsurprisingly) much of the predictive value of review text comes from reviews of the target user for the target item. We then introduce a way in which this information can be used in recommendation, even when the target user’s review for the target item is not available. Our model, called TransNets, extends the DeepCoNN model by introducing an additional latent layer representing the target user-target item pair. We then regularize this layer, at training time, to be similar to another latent representation of the target user’s review of the target item. We show that TransNets and extensions of it improve substantially over the previous state-of-the-art. Transportation Theory In mathematics and economics, transportation theory is a name given to the study of optimal transportation and allocation of resources. The problem was formalized by the French mathematician Gaspard Monge in 1781. In the 1920s A.N. Tolstoi was one of the first to study the transportation problem mathematically. In 1930, in the collection Transportation Planning Volume I for the National Commissariat of Transportation of the Soviet Union, he published a paper ‘Methods of Finding the Minimal Kilometrage in Cargo-transportation in space’. Major advances were made in the field during World War II by the Soviet/Russian mathematician and economist Leonid Kantorovich. Consequently, the problem as it is stated is sometimes known as the Monge-Kantorovich transportation problem. The linear programming formulation of the transportation problem is also known as the Hitchcock-Koopmans transportation problem. Trawl Process The model is based on a mixed moving average process driven by Levy noise – called a trawl process – where the serial correlation and the cross-sectional dependence are modelled independently of each other. Such processes can exhibit short or long memory. trawl T-RECS An action should remain identifiable when modifying its speed: consider the contrast between an expert chef and a novice chef each chopping an onion. Here, we expect the novice chef to have a relatively measured and slow approach to chopping when compared to the expert. In general, the speed at which actions are performed, whether slower or faster than average, should not dictate how they are recognized. We explore the erratic behavior caused by this phenomena on state-of-the-art deep network-based methods for action recognition in terms of maximum performance and stability in recognition accuracy across a range of input video speeds. By observing the trends in these metrics and summarizing them based on expected temporal behaviour w.r.t. variations in input video speeds, we find two distinct types of network architectures. In this paper, we propose a preprocessing method named T-RECS, as a way to extend deep-network-based methods for action recognition to explicitly account for speed variability in the data. We do so by adaptively resampling the inputs to a given model. T-RECS is agnostic to the specific deep-network model; we apply it to four state-of-the-art action recognition architectures, C3D, I3D, TSN, and ConvNet+LSTM. On HMDB51 and UCF101, T-RECS-based I3D models show a peak improvement of at least 2.9% in performance over the baseline while T-RECS-based C3D models achieve a maximum improvement in stability by 59% over the baseline, on the HMDB51 dataset. Tree Based Context Causal Rule Discovery(TCC) With the increasing need of personalised decision making, such as personalised medicine and online recommendations, a growing attention has been paid to the discovery of the context and heterogeneity of causal relationships. Most existing methods, however, assume a known cause (e.g. a new drug) and focus on identifying from data the contexts of heterogeneous effects of the cause (e.g. patient groups with different responses to the new drug). There is no approach to efficiently detecting directly from observational data context specific causal relationships, i.e. discovering the causes and their contexts simultaneously. In this paper, by taking the advantages of highly efficient decision tree induction and the well established causal inference framework, we propose the Tree based Context Causal rule discovery (TCC) method, for efficient exploration of context specific causal relationships from data. Experiments with both synthetic and real world data sets show that TCC can effectively discover context specific causal rules from the data. Tree Boosted Varying Coefficient Framework This paper investigates the integration of gradient boosted decision trees and varying coefficient models. We introduce the tree boosted varying coefficient framework which justifies the implementation of decision tree boosting as the nonparametric effect modifiers in varying coefficient models. This framework requires no structural assumptions in the space containing the varying coefficient covariates, is easy to implement, and keeps a balance between model complexity and interpretability. To provide statistical guarantees, we prove the asymptotic consistency of the proposed method under the regression settings with $L^2$ loss. We further conduct a thorough empirical study to show that the proposed method is capable of providing accurate predictions as well as intelligible visual explanations. Tree of Parzen Estimator ➘ “Tree-structured Parzen Estimator” Speeding up the Hyperparameter Optimization of Deep Convolutional Neural Networks Tree of Predictors(ToP) We present a new approach to ensemble learning. Our approach constructs a tree of subsets of the feature space and associates a predictor (predictive model) – determined by training one of a given family of base learners on an endogenously determined training set – to each node of the tree; we call the resulting object a tree of predictors. The (locally) optimal tree of predictors is derived recursively; each step involves jointly optimizing the split of the terminal nodes of the previous tree and the choice of learner and training set (hence predictor) for each set in the split. The feature vector of a new instance determines a unique path through the optimal tree of predictors; the final prediction aggregates the predictions of the predictors along this path. We derive loss bounds for the final predictor in terms of the Rademacher complexity of the base learners. We report the results of a number of experiments on a variety of datasets, showing that our approach provides statistically significant improvements over state-of-the-art machine learning algorithms, including various ensemble learning methods. Our approach works because it allows us to endogenously create more complex learners – when needed – and endogenously match both the learner and the training set to the characteristics of the dataset while still avoiding over-fitting. Tree Recurrent Neural Network(TreeRNN) In this paper we develop a recurrent neural network (TreeRNN), which is designed to predict a tree rather than a linear sequence as is the case in conventional recurrent neural networks. Our model defines the probability of a sentence by estimating the generation probability of its dependency tree. We construct the tree incrementally by generating the left and right dependents of a node whose probability is computed using recurrent neural networks with shared hidden layers. Application of our model to two language modeling tasks shows that it outperforms or performs on par with related models. GitXiv Tree Structured Vector Quantization(TSVQ) 1. First we apply k-means to get 2 centroids or prototypes within the entire data set. This provides us with a boundary between the two clusters, which would be a straight line based on the nearest neighbor rule. 2. Next, the data are assigned to the 2 centroids. 3. Then, for the data assigned to each centroid (call them a group), apply 2 centroid k-means to each group separately. The initialization can be done by splitting the centroid into two. Note that data points channeled to different centroids are treated separately. 4. Repeat the above step. Tree Tensor Network(TTN) Matrix product states (MPS), a tensor network designed for one-dimensional quantum systems, has been recently proposed for generative modeling of natural data (such as images) in terms of `Born machine’. However, the exponential decay of correlation in MPS restricts its representation power heavily for modeling complex data such as natural images. In this work, we push forward the effort of applying tensor networks to machine learning by employing the Tree Tensor Network (TTN) which exhibits balanced performance in expressibility and efficient training and sampling. We design the tree tensor network to utilize the 2-dimensional prior of the natural images and develop sweeping learning and sampling algorithms which can be efficiently implemented utilizing Graphical Processing Units (GPU). We apply our model to random binary patterns and the binary MNIST datasets of handwritten digits. We show that TTN is superior to MPS for generative modeling in keeping correlation of pixels in natural images, as well as giving better log-likelihood scores in standard datasets of handwritten digits. We also compare its performance with state-of-the-art generative models such as the Variational AutoEncoders, Restricted Boltzmann machines, and PixelCNN. Finally, we discuss the future development of Tensor Network States in machine learning problems. Tree-Based Optimization(TBO) Designing search algorithms for finding global optima is one of the most active research fields, recently. These algorithms consist of two main categories, i.e., classic mathematical and metaheuristic algorithms. This article proposes a meta-algorithm, Tree-Based Optimization (TBO), which uses other heuristic optimizers as its sub-algorithms in order to improve the performance of search. The proposed algorithm is based on mathematical tree subject and improves performance and speed of search by iteratively removing parts of the search space having low fitness, in order to minimize and purify the search space. The experimental results on several well-known benchmarks show the outperforming performance of TBO algorithm in finding the global solution. Experiments on high dimensional search spaces show significantly better performance when using the TBO algorithm. The proposed algorithm improves the search algorithms in both accuracy and speed aspects, especially for high dimensional searching such as in VLSI CAD tools for Integrated Circuit (IC) design. Tree-Based Pipeline Optimization Tool(TPOT) As data science becomes more mainstream, there will be an ever-growing demand for data science tools that are more accessible, exible, and scalable. In response to this demand, automated machine learning (AutoML) researchers have begun building systems that automate the process of designing and optimizing machine learning pipelines. In this paper we present TPOT v0.3, an open source genetic programming-based AutoML system that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classi cation accuracy on a supervised classi cation task. We benchmark TPOT on a series of 150 supervised classi cation tasks and nd that it signi cantly outperforms a basic machine learning analysis in 21 of them, while experiencing minimal degradation in accuracy on 4 of the benchmarks|all without any domain knowledge nor human input. As such, GP-based AutoML systems show considerable promise in the AutoML domain. Tree-CNN In recent years, Convolutional Neural Networks (CNNs) have shown remarkable performance in many computer vision tasks such as object recognition and detection. However, complex training issues, such as ‘catastrophic forgetting’ and hyper-parameter tuning, make incremental learning in CNNs a difficult challenge. In this paper, we propose a hierarchical deep neural network, with CNNs at multiple levels, and a corresponding training method for lifelong learning. The network grows in a tree-like manner to accommodate the new classes of data without losing the ability to identify the previously trained classes. The proposed network was tested on CIFAR-10 and CIFAR-100 datasets, and compared against the method of fine tuning specific layers of a conventional CNN. We obtained comparable accuracies and achieved 40% and 20% reduction in training effort in CIFAR-10 and CIFAR 100 respectively. The network was able to organize the incoming classes of data into feature-driven super-classes. Our model improves upon existing hierarchical CNN models by adding the capability of self-growth and also yields important observations on feature selective classification. TreeGAN Generative Adversarial Networks (GANs) have shown great capacity on image generation, in which a discriminative model guides the training of a generative model to construct images that resemble real images. Recently, GANs have been extended from generating images to generating sequences (e.g., poems, music and codes). Existing GANs on sequence generation mainly focus on general sequences, which are grammar-free. In many real-world applications, however, we need to generate sequences in a formal language with the constraint of its corresponding grammar. For example, to test the performance of a database, one may want to generate a collection of SQL queries, which are not only similar to the queries of real users, but also follow the SQL syntax of the target database. Generating such sequences is highly challenging because both the generator and discriminator of GANs need to consider the structure of the sequences and the given grammar in the formal language. To address these issues, we study the problem of syntax-aware sequence generation with GANs, in which a collection of real sequences and a set of pre-defined grammatical rules are given to both discriminator and generator. We propose a novel GAN framework, namely TreeGAN, to incorporate a given Context-Free Grammar (CFG) into the sequence generation process. In TreeGAN, the generator employs a recurrent neural network (RNN) to construct a parse tree. Each generated parse tree can then be translated to a valid sequence of the given grammar. The discriminator uses a tree-structured RNN to distinguish the generated trees from real trees. We show that TreeGAN can generate sequences for any CFG and its generation fully conforms with the given syntax. Experiments on synthetic and real data sets demonstrated that TreeGAN significantly improves the quality of the sequence generation in context-free languages. Treelogy We propose a novel tree classification system called Treelogy, that fuses deep representations with hand-crafted features obtained from leaf images to perform leaf-based plant classification. Key to this system are segmentation of the leaf from an untextured background, using convolutional neural networks (CNNs) for learning deep representations, extracting hand-crafted features with a number of image processing techniques, training a linear SVM with feature vectors, merging SVM and CNN results, and identifying the species from a dataset of 57 trees. Our classification results show that fusion of deep representations with hand-crafted features leads to the highest accuracy. The proposed algorithm is embedded in a smart-phone application, which is publicly available. Furthermore, our novel dataset comprised of 5408 leaf images is also made public for use of other researchers. Treemapping In information visualization and computing, treemapping is a method for displaying hierarchical data by using nested rectangles. Tree-Structured Boosting Additive models, such as produced by gradient boosting, and full interaction models, such as classification and regression trees (CART), are widely used algorithms that have been investigated largely in isolation. We show that these models exist along a spectrum, revealing never-before-known connections between these two approaches. This paper introduces a novel technique called tree-structured boosting for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although tree-structured boosting is designed primarily to provide both the model interpretability and predictive performance needed for high-stake applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches. Tree-Structured Long Short-Term Memory(Tree-LSTM) For years, recursive neural networks (RvNNs) have shown to be suitable for representing text into fixed-length vectors and achieved good performance on several natural language processing tasks. However, the main drawback of RvNN is that it requires explicit tree structure (e.g. parse tree), which makes data preparation and model implementation hard. In this paper, we propose a novel tree-structured long short-term memory (Tree-LSTM) architecture that efficiently learns how to compose task-specific tree structures only from plain text data. To achieve this property, our model uses Straight-Through (ST) Gumbel-Softmax estimator to decide the parent node among candidates and to calculate gradients of the discrete decision. We evaluate the proposed model on natural language interface and sentiment analysis and show that our model outperforms or at least comparable to previous Tree-LSTM-based works. We also find that our model converges significantly faster and needs less memory than other models of complex structures. Tree-Structured Multi-Linear Principle Component Analysis(TMPCA) A novel text data dimension reduction technique, called the tree-structured multi-linear principle component analysis (TMPCA), is proposed in this work. Being different from traditional text dimension reduction methods that deal with the word-level representation, the TMPCA technique reduces the dimension of input sequences and sentences to simplify the following text classification tasks. It is shown mathematically and experimentally that the TMPCA tool demands much lower complexity (and, hence, less computing power) than the ordinary principle component analysis (PCA). Furthermore, it is demon- strated by experimental results that the support vector machine (SVM) method applied to the TMPCA-processed data achieves commensurable or better performance than the state-of-the-art recurrent neural network (RNN) approach. Tree-structured Parzen Estimator The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. Trellis Graphics Extremely useful approach for graphical exploratory data analysis (EDA). Allows to examine for complicated, multiple variable relationships. Types of plots: · xyplot: scatterplot · bwplot: boxplots · stripplot: display univariate data against a numerical variable · dotplot: similar to stripplot · histogram · densityplot: kernel density estimates · barchart · piechart: (Not available in R) · splom: scatterplot matrices · contourplot: contour plot of a surface on a regular grid · levelplot: pseudo-colour plot of a surface on a rectangular grid · wireframe: perspective plot of a surface evaluated on a regular grid · cloud: perspective plot of a cloud of points (3D scatterplot) https://…/chapter4.pdf Trellis Network We present trellis networks, a new architecture for sequence modeling. On the one hand, a trellis network is a temporal convolutional network with special structure, characterized by weight tying across depth and direct injection of the input into deep layers. On the other hand, we show that truncated recurrent networks are equivalent to trellis networks with special sparsity structure in their weight matrices. Thus trellis networks with general weight matrices generalize truncated recurrent networks. We leverage these connections to design high-performing trellis networks that absorb structural and algorithmic elements from both recurrent and convolutional models. Experiments demonstrate that trellis networks outperform the current state of the art on a variety of challenging benchmarks, including word-level language modeling on Penn Treebank and WikiText-103, character-level language modeling on Penn Treebank, and stress tests designed to evaluate long-term memory retention. The code is available at https://…/trellisnet . Trend Analysis Trend Analysis is the practice of collecting information and attempting to spot a pattern, or trend, in the information. In some fields of study, the term ‘trend analysis’ has more formally defined meanings. Although trend analysis is often used to predict future events, it could be used to estimate uncertain events in the past, such as how many ancient kings probably ruled between two dates, based on data such as the average years which other known kings reigned. Trend Feature Symbolic Aggregate Approximation(TFSAX) Symbolic Aggregate approximation (SAX) is a classical symbolic approach in many time series data mining applications. However, SAX only reflects the segment mean value feature and misses important information in a segment, namely the trend of the value change in the segment. Such a miss may cause a wrong classification in some cases, since the SAX representation cannot distinguish different time series with similar average values but different trends. In this paper, we present Trend Feature Symbolic Aggregate approximation (TFSAX) to solve this problem. First, we utilize Piecewise Aggregate Approximation (PAA) approach to reduce dimensionality and discretize the mean value of each segment by SAX. Second, extract trend feature in each segment by using trend distance factor and trend shape factor. Then, design multi-resolution symbolic mapping rules to discretize trend information into symbols. We also propose a modified distance measure by integrating the SAX distance with a weighted trend distance. We show that our distance measure has a tighter lower bound to the Euclidean distance than that of the original SAX. The experimental results on diverse time series data sets demonstrate that our proposed representation significantly outperforms the original SAX representation and an improved SAX representation for classification. TrialChain The governance of data used for biomedical research and clinical trials is an important requirement for generating accurate results. To improve the visibility of data quality and analysis, we developed TrialChain, a blockchain-based platform that can be used to validate data integrity from large, biomedical research studies. We implemented a private blockchain using the MultiChain platform and integrated it with a data science platform deployed within a large research center. An administrative web application was built with Python to manage the platform, which was built with a microservice architecture using Docker. The TrialChain platform was integrated during data acquisition into our existing data science platform. Using NiFi, data were hashed and logged within the local blockchain infrastructure. To provide public validation, the local blockchain state was periodically synchronized to the public Ethereum network. The use of a combined private/public blockchain platform allows for both public validation of results while maintaining additional security and lower cost for blockchain transactions. Original data and modifications due to downstream analysis can be logged within TrialChain and data assets or results can be rapidly validated when needed using API calls to the platform. The TrialChain platform provides a data governance solution to audit the acquisition and analysis of biomedical research data. The platform provides cryptographic assurance of data authenticity and can also be used to document data analysis. Triangle Generative Adversarial Network(Delta-GAN) A Triangle Generative Adversarial Network ($\Delta$-GAN) is developed for semi-supervised cross-domain joint distribution matching, where the training data consists of samples from each domain, and supervision of domain correspondence is provided by only a few paired samples. $\Delta$-GAN consists of four neural networks, two generators and two discriminators. The generators are designed to learn the two-way conditional distributions between the two domains, while the discriminators implicitly define a ternary discriminative function, which is trained to distinguish real data pairs and two kinds of fake data pairs. The generators and discriminators are trained together using adversarial learning. Under mild assumptions, in theory the joint distributions characterized by the two generators concentrate to the data distribution. In experiments, three different kinds of domain pairs are considered, image-label, image-image and image-attribute pairs. Experiments on semi-supervised image classification, image-to-image translation and attribute-based image generation demonstrate the superiority of the proposed approach. Triangle Lasso Recently, network lasso has drawn many attentions due to its remarkable performance on simultaneous clustering and optimization. However, it usually suffers from the imperfect data (noise, missing values etc), and yields sub-optimal solutions. The reason is that it finds the similar instances according to their features directly, which is usually impacted by the imperfect data, and thus returns sub-optimal results. In this paper, we propose triangle lasso to avoid its disadvantage. Triangle lasso finds the similar instances according to their neighbours. If two instances have many common neighbours, they tend to become similar. Although some instances are profiled by the imperfect data, it is still able to find the similar counterparts. Furthermore, we develop an efficient algorithm based on Alternating Direction Method of Multipliers (ADMM) to obtain a moderately accurate solution. In addition, we present a dual method to obtain the accurate solution with the low additional time consumption. We demonstrate through extensive numerical experiments that triangle lasso is robust to the imperfect data. It usually yields a better performance than the state-of-the-art method when performing data analysis tasks in practical scenarios. Triangular Norm(t-Norm) In mathematics, a t-norm (also T-norm or, unabbreviated, triangular norm) is a kind of binary operation used in the framework of probabilistic metric spaces and in multi-valued logic, specifically in fuzzy logic. A t-norm generalizes intersection in a lattice and conjunction in logic. The name triangular norm refers to the fact that in the framework of probabilistic metric spaces t-norms are used to generalize triangle inequality of ordinary metric spaces. Trident Network(TridentNet) Scale variation is one of the key challenges in object detection. In this work, we first present a controlled experiment to investigate the effect of receptive fields on the detection of different scale objects. Based on the findings from the exploration experiments, we propose a novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power. We construct a parallel multi-branch architecture in which each branch shares the same transformation parameters but with different receptive fields. Then, we propose a scale-aware training scheme to specialize each branch by sampling object instances of proper scales for training. As a bonus, a fast approximation version of TridentNet could achieve significant improvements without any additional parameters and computational cost. On the COCO dataset, our TridentNet with ResNet-101 backbone achieves state-of-the-art single-model results by obtaining an mAP of 48.4. Code will be made publicly available. TrieJax Graph pattern matching (e.g., finding all cycles and cliques) has become an important component in many critical domains such as social networks, biology, and cyber-security. This development motivated research to develop faster algorithms that target graph pattern matching. In recent years, the database community has shown that mapping graph pattern matching problems to a new class of relational join algorithms provides an efficient framework for computing these problems. In this paper, we argue that this new class of relational join algorithms is highly amenable to specialized hardware acceleration thanks to two fundamental properties: improved locality and inherent concurrency. The improved locality is a result of the provably bound number of intermediate results these algorithms generate, which results in smaller working sets. In addition, their inherent concurrency can be leveraged for effective hardware acceleration and hiding memory latency. We demonstrate the hardware amenability of this new class of algorithms by introducing TrieJax, a hardware accelerator for graph pattern matching. The TrieJax design leverages the improved locality and high concurrency properties to dramatically accelerate graph pattern matching, and can be tightly integrated into existing manycore processors. We evaluate TrieJax on a set standard graph pattern matching queries and datasets. Our evaluation shows that TrieJax outperforms recently proposed hardware accelerators for graph and database processing that do not employ the new class of algorithms by 7-63x on average (up to 539x), while consuming 15-179x less energy (up to 1750x). systems that do incorporate modern relational join algorithms by 9-20x on average (up to 45x), while consuming 59-110x less energy (up to 372x). Trigger Detection Dynamic Memory Network(TD-DMN) The task of event detection involves identifying and categorizing event triggers. Contextual information has been shown effective on the task. However, existing methods which utilize contextual information only process the context once. We argue that the context can be better exploited by processing the context multiple times, allowing the model to perform complex reasoning and to generate better context representation, thus improving the overall performance. Meanwhile, dynamic memory network (DMN) has demonstrated promising capability in capturing contextual information and has been applied successfully to various tasks. In light of the multi-hop mechanism of the DMN to model the context, we propose the trigger detection dynamic memory network (TD-DMN) to tackle the event detection problem. We performed a five-fold cross-validation on the ACE-2005 dataset and experimental results show that the multi-hop mechanism does improve the performance and the proposed model achieves best $F_1$ score compared to the state-of-the-art methods. TrIK-SVM The proposed work aims at proposing a alternative kernel decomposition in the context of kernel machines with indefinite kernels. The original paper of KSVM (SVM in Kre\v{i}n spaces) uses the eigen-decomposition, our proposition avoids this decompostion. We explain how it can help in designing an algorithm that won’t require to compute the full kernel matrix. Finally we illustrate the good behavior of the proposed method compared to KSVM. Trill Trill is a high-performance one-pass in-memory streaming analytics engine from Microsoft Research. It can handle both real-time and offline data, and is based on a temporal data and query model. Trill can be used as a streaming engine, a lightweight in-memory relational engine, and as a progressive query processor (for early query results on partial data). Trimmed Clustering tclust,trimcluster Trimmed Ensemble Kalman Filter(TEnKF) We study the ensemble Kalman filter (EnKF) algorithm for sequential data assimilation in a general situation, that is, for nonlinear forecast and measurement models with non-additive and non-Gaussian noises. Such applications traditionally force us to choose between inaccurate Gaussian assumptions that permit efficient algorithms (e.g., EnKF), or more accurate direct sampling methods which scale poorly with dimension (e.g., particle filters, or PF). We introduce a trimmed ensemble Kalman filter (TEnKF) which can interpolate between the limiting distributions of the EnKF and PF to facilitate adaptive control over both accuracy and efficiency. This is achieved by introducing a trimming function that removes non-Gaussian outliers that introduce errors in the correlation between the model and observed forecast, which otherwise prevent the EnKF from proposing accurate forecast updates. We show for specific trimming functions that the TEnKF exactly reproduces the limiting distributions of the EnKF and PF. We also develop an adaptive implementation which provides control of the effective sample size and allows the filter to overcome periods of increased model nonlinearity. This algorithm allow us to demonstrate substantial improvements over the traditional EnKF in convergence and robustness for the nonlinear Lorenz-63 and Lorenz-96 models. TrIMS Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal — suffering from ‘cold start’ latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement. Triple Exponential Smoothing What happens if the data show trend and seasonality? We now introduce a third equation to take care of seasonality (sometimes called periodicity). The resulting set of equations is called the ‘Holt-Winters’ (HW) method after the names of the inventors. ➚ “Holt-Winters Method” Triple Trustworthiness Measurement Frame(TTMF) The Knowledge graph (KG) uses the triples to describe the facts in the real world. It has been widely used in intelligent analysis and understanding of big data. In constructing a KG, especially in the process of automation building, some noises and errors are inevitably introduced or much knowledges is missed. However, learning tasks based on the KG and its underlying applications both assume that the knowledge in the KG is completely correct and inevitably bring about potential errors. Therefore, in this paper, we establish a unified knowledge graph triple trustworthiness measurement framework to calculate the confidence values for the triples that quantify its semantic correctness and the true degree of the facts expressed. It can be used not only to detect and eliminate errors in the KG but also to identify new triples to improve the KG. The framework is a crisscrossing neural network structure. It synthesizes the internal semantic information in the triples and the global inference information of the KG to achieve the trustworthiness measurement and fusion in the three levels of entity-level, relationship-level, and KG-global-level. We conducted experiments on the common dataset FB15K (from Freebase) and analyzed the validity of the model’s output confidence values. We also tested the framework in the knowledge graph error detection or completion tasks. The experimental results showed that compared with other models, our model achieved significant and consistent improvements on the above tasks, further confirming the capabilities of our model. Triplestore A triplestore is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject-predicate-object, like “Bob is 35” or “Bob knows Fred”. Much like a relational database, one stores information in a triplestore and retrieves it via a query language. Unlike a relational database, a triplestore is optimized for the storage and retrieval of triples. In addition to queries, triples can usually be imported/exported using Resource Description Framework (RDF) and other formats. TripletBoost We consider the problem of classification in a comparison-based setting: given a set of objects, we only have access to triplet comparisons of the form ‘object $x_i$ is closer to object $x_j$ than to object $x_k$.” In this paper we introduce TripletBoost, a new method that can learn a classifier just from such triplet comparisons. The main idea is to aggregate the triplets information into weak classifiers, which can subsequently be boosted to a strong classifier. Our method has two main advantages: (i) it is applicable to data from any metric space, and (ii) it can deal with large scale problems using only passively obtained and noisy triplets. We derive theoretical generalization guarantees and a lower bound on the number of necessary triplets, and we empirically show that our method is both competitive with state of the art approaches and resistant to noise. Triply Supervised Decoder Network(TripleNet) Joint object detection and semantic segmentation can be applied to many fields, such as self-driving cars and unmanned surface vessels. An initial and important progress towards this goal has been achieved by simply sharing the deep convolutional features for the two tasks. However, this simple scheme is unable to make full use of the fact that detection and segmentation are mutually beneficial. To overcome this drawback, we propose a framework called TripleNet where triple supervisions including detection-oriented supervision, class-aware segmentation supervision, and class-agnostic segmentation supervision are imposed on each layer of the decoder network. Class-agnostic segmentation supervision provides an objectness prior knowledge for both semantic segmentation and object detection. Besides the three types of supervisions, two light-weight modules (i.e., inner-connected module and attention skip-layer fusion) are also incorporated into each layer of the decoder. In the proposed framework, detection and segmentation can sufficiently boost each other. Moreover, class-agnostic and class-aware segmentation on each decoder layer are not performed at the test stage. Therefore, no extra computational costs are introduced at the test stage. Experimental results on the VOC2007 and VOC2012 datasets demonstrate that the proposed TripleNet is able to improve both the detection and segmentation accuracies without adding extra computational costs. TritanDB The efficient management of data is an important prerequisite for realising the potential of the Internet of Things (IoT). Two issues given the large volume of structured time-series IoT data are, addressing the difficulties of data integration between heterogeneous Things and improving ingestion and query performance across databases on both resource-constrained Things and in the cloud. In this paper, we examine the structure of public IoT data and discover that the majority exhibit unique flat, wide and numerical characteristics with a mix of evenly and unevenly-spaced time-series. We investigate the advances in time-series databases for telemetry data and combine these findings with microbenchmarks to determine the best compression techniques and storage data structures to inform the design of a novel solution optimised for IoT data. A query translation method with low overhead even on resource-constrained Things allows us to utilise rich data models like the Resource Description Framework (RDF) for interoperability and data integration on top of the optimised storage. Our solution, TritanDB, shows an order of magnitude performance improvement across both Things and cloud hardware on many state-of-the-art databases within IoT scenarios. Finally, we describe how TritanDB supports various analyses of IoT time-series data like forecasting. Trolley Problem The trolley problem is a thought experiment in ethics. The general form of the problem is this: You see a runaway trolley moving toward five tied-up (or otherwise incapacitated) people lying on the tracks. You are standing next to a lever that controls a switch. If you pull the lever, the trolley will be redirected onto a side track, and the five people on the main track will be saved. However, there is a single person lying on the side track. You have two options: 1. Do nothing and allow the trolley to kill the five people on the main track. 2. Pull the lever, diverting the trolley onto the side track where it will kill one person. Which is the more ethical option? Tropical Linear Programming On Tropical Linear and Integer Programs Tropical Probability Theory Tropical probability theory and an application to the entropic cone TrQuery In this paper, we present an embedding-based framework (TrQuery) for recommending solutions of a SPARQL query, including approximate solutions when exact querying solutions are not available due to incompleteness or inconsistencies of real-world RDF data. Within this framework, embedding is applied to score solutions together with edit distance so that we could obtain more fine-grained recommendations than those recommendations via edit distance. For instance, graphs of two querying solutions with a similar structure can be distinguished in our proposed framework while the edit distance depending on structural difference becomes unable. To this end, we propose a novel score model built on vector space generated in embedding system to compute the similarity between an approximate subgraph matching and a whole graph matching. Finally, we evaluate our approach on large RDF datasets DBpedia and YAGO, and experimental results show that TrQuery exhibits an excellent behavior in terms of both effectiveness and efficiency. True Asymptotic Natural Gradient Optimization(TANGO) We introduce a simple algorithm, True Asymptotic Natural Gradient Optimization (TANGO), that converges to a true natural gradient descent in the limit of small learning rates, without explicit Fisher matrix estimation. For quadratic models the algorithm is also an instance of averaged stochastic gradient, where the parameter is a moving average of a ‘fast’, constant-rate gradient descent. TANGO appears as a particular de-linearization of averaged SGD, and is sometimes quite different on non-quadratic models. This further connects averaged SGD and natural gradient, both of which are arguably optimal asymptotically. In large dimension, small learning rates will be required to approximate the natural gradient well. Still, this shows it is possible to get arbitrarily close to exact natural gradient descent with a lightweight algorithm. TrueSkill Ranking System TrueSkill is a Bayesian ranking algorithm developed by Microsoft Research and used in the Xbox matchmaking system built to address some perceived flaws in the Elo rating system. It is an extension of the Glicko rating system to multiplayer games. The purpose of a ranking system is to both identify and track the skills of gamers in a game (mode) in order to be able to match them into competitive matches. The TrueSkill ranking system only uses the final standings of all teams in a game in order to update the skill estimates (ranks) of all gamers playing in this game. Ranking systems have been proposed for many sports but possibly the most prominent ranking system in use today is ELO. TrueSkill Sort(TSSort) In this paper we present TSSort, a probabilistic, noise resistant, quickly converging comparison sort algorithm based on Microsoft TrueSkill. The algorithm combines TrueSkill’s updating rules with a newly developed next item pair selection strategy, enabling it to beat standard sorting algorithms w.r.t. convergence speed and noise resistance, as shown in simulations. TSSort is useful if comparisons of items are expensive or noisy, or if intermediate results shall be approximately ordered. Truncated Normal Distribution In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above (or both). The truncated normal distribution has wide applications in statistics and econometrics. For example, it is used to model the probabilities of the binary outcomes in the probit model and to model censored data in the Tobit model. Truncated Variance Reduction(TruVaR) We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically non-trivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and real-world data sets. Truncated-Uniform-Laplace(Tulap) We derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance. We show that in general, DP hypothesis tests can be written in terms of linear constraints, and for exchangeable data can always be expressed as a function of the empirical distribution. Using this structure, we prove a ‘Neyman-Pearson lemma’ for binomial data under DP, where the DP-UMP only depends on the sample sum. Our tests can also be stated as a post-processing of a random variable, whose distribution we coin ”Truncated-Uniform-Laplace” (Tulap), a generalization of the Staircase and discrete Laplace distributions. Furthermore, we obtain exact $p$-values, which are easily computed in terms of the Tulap random variable. Using the above techniques, we show that our tests can be applied to give uniformly most accurate one-sided confidence intervals and optimal confidence distributions. We also derive uniformly most powerful unbiased (UMPU) two-sided tests, which lead to uniformly most accurate unbiased (UMAU) two-sided confidence intervals. We show that our results can be applied to distribution-free hypothesis tests for continuous data. Our simulation results demonstrate that all our tests have exact type I error, and are more powerful than current techniques. Truncation In statistics, truncation results in values that are limited above or below, resulting in a truncated sample.[1] Truncation is similar to but distinct from the concept of statistical censoring. A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the bounds entirely omitted, with not even a count of those omitted being kept. With statistical censoring, a note would be recorded documenting which bound (upper or lower) had been exceeded and the value of that bound. With truncated sampling, no note is recorded. Trunc-Match Ranking functions return ranked lists of items, and users often interact with these items. How to evaluate ranking functions using historical interaction logs, also known as off-policy evaluation, is an important but challenging problem. The commonly used Inverse Propensity Scores (IPS) approaches work better for the single item case, but suffer from extremely low data efficiency for the ranked list case. In this paper, we study how to improve the data efficiency of IPS approaches in the offline comparison setting. We propose two approaches Trunc-match and Rand-interleaving for offline comparison using uniformly randomized data. We show that these methods can improve the data efficiency and also the comparison sensitivity based on one of the largest email search engines. Trust Region based Derivative Free Optimization(DFO-TR) In this work, we utilize a Trust Region based Derivative Free Optimization (DFO-TR) method to directly maximize the Area Under Receiver Operating Characteristic Curve (AUC), which is a nonsmooth, noisy function. We show that AUC is a smooth function, in expectation, if the distributions of the positive and negative data points obey a jointly normal distribution. The practical performance of this algorithm is compared to three prominent Bayesian optimization methods and random search. The presented numerical results show that DFO-TR surpasses Bayesian optimization and random search on various black-box optimization problem, such as maximizing AUC and hyperparameter tuning. Trust Score Knowing when a classifier’s prediction can be trusted is useful in many applications and critical for safely using AI. While the bulk of the effort in machine learning research has been towards improving classifier performance, understanding when a classifier’s predictions should and should not be trusted has received far less attention. The standard approach is to use the classifier’s discriminant or confidence score; however, we show there exists a considerably more effective alternative. We propose a new score, called the trust score, which measures the agreement between the classifier and a modified nearest-neighbor classifier on the testing example. We show empirically that high (low) trust scores produce surprisingly high precision at identifying correctly (incorrectly) classified examples, consistently outperforming the classifier’s confidence score as well as many other baselines. Further, under some mild distributional assumptions, we show that if the trust score for an example is high (low), the classifier will likely agree (disagree) with the Bayes-optimal classifier. Our guarantees consist of non-asymptotic rates of statistical consistency under various nonparametric settings and build on recent developments in topological data analysis. Tsallis Entropy In physics, the Tsallis entropy is a generalization of the standard Boltzmann-Gibbs entropy. It was introduced in 1988 by Constantino Tsallis[1] as a basis for generalizing the standard statistical mechanics. In the scientific literature, the physical relevance of the Tsallis entropy was occasionally debated. However, from the years 2000 on, an increasingly wide spectrum of natural, artificial and social complex systems have been identified which confirm the predictions and consequences that are derived from this nonadditive entropy, such as nonextensive statistical mechanics,[2] which generalizes the Boltzmann-Gibbs theory. Tsallis Entropy Actor-Critic(TAC) We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected long-term rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new value-function based mechanism for high-level action selection. Empirically we show that TAC, RAC and EAC can achieve state-of-the-art performance on a range of benchmark control tasks, outperforming SAC and several cutting-edge learning algorithms in terms of both sample efficiency and effectiveness. Tsallis Entropy Information Metric(TEIM) The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near-optimal decision trees. Most of them, however, are greedy algorithms which have the drawback of obtaining only local optimums. Besides, common split criteria, e.g. Shannon entropy, Gain Ratio and Gini index, are also not flexible due to lack of adjustable parameters on data sets. To address the above issues, we propose a series of novel methods using Tsallis entropy in this paper. Firstly, a Tsallis Entropy Criterion (TEC) algorithm is proposed to unify Shannon entropy, Gain Ratio and Gini index, which generalizes the split criteria of decision trees. Secondly, we propose a Tsallis Entropy Information Metric (TEIM) algorithm for efficient construction of decision trees. The TEIM algorithm takes advantages of the adaptability of Tsallis conditional entropy and the reducing greediness ability of two-stage approach. Experimental results on UCI data sets indicate that the TEC algorithm achieves statistically significant improvement over the classical algorithms, and that the TEIM algorithm yields significantly better decision trees in both classification accuracy and tree complexity. Tsallis Entropy Maximization ➘ “Tsallis Markov Decision Process” Tsallis Markov Decision Process In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis Markov decision process, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for the original RL problem and RL with various types of entropy, including the well-known standard Shannon-Gibbs (SG) entropy, using an additional real-valued parameter, called an entropic index. By controlling the entropic index, we can generate various types of entropy, including the SG entropy, and a different entropy results in a different class of the optimal policy in Tsallis MDPs. We also provide a full mathematical analysis of Tsallis MDPs, including the optimality condition, performance error bounds, and convergence. Our theoretical result enables us to use any positive entropic index in RL. To handle complex and large-scale problems, we propose a model-free actor-critic RL method using Tsallis entropy maximization. We evaluate the regularization effect of the Tsallis entropy with various values of entropic indices and show that the entropic index controls the exploration tendency of the proposed method. For a different type of RL problems, we find that a different value of the entropic index is desirable. The proposed method is evaluated using the MuJoCo simulator and achieves the state-of-the-art performance. TSAVE Supervised dimension reduction for time series is challenging as there may be temporal dependence between the response $y$ and the predictors $\boldsymbol x$. Recently a time series version of sliced inverse regression, TSIR, was suggested, which applies approximate joint diagonalization of several supervised lagged covariance matrices to consider the temporal nature of the data. In this paper we develop this concept further and propose a time series version of sliced average variance estimation, TSAVE. As both TSIR and TSAVE have their own advantages and disadvantages, we consider furthermore a hybrid version of TSIR and TSAVE. Based on examples and simulations we demonstrate and evaluate the differences between the three methods and show also that they are superior to apply their iid counterparts to when also using lagged values of the explaining variables as predictors. Tsetlin Machine(TM) A Tsetlin machine is a form of learning automaton based upon algorithms from reinforcement learning to learn expressions from propositional logic. Ole-Christoffer Granmo gave the method its name after Michael Lvovitch Tsetlin and his Tsetlin automata. The method uses computationally simpler and more efficient primitives compared to more ordinary artificial neural networks, but while the method may be faster it has a steep drop in signal-to-noise ratio as the signal space increases. Tshinghua-alpha-Algorithm Tshinghua-alpha algorithm which uses timestamps in the log files to construct a Petri net. It is related to the a algorithm, but uses a different approach. Details can be found in. It is interesting to note that this mining plug-in was the first plug-in developed by researchers outside of our research group. Researchers from Tshinghua University in China (Jianmin Wang and Wen Lijie) were able to develop and integrate this plug-in without any help or changes to the framework. TStream Transactional state management relieves users from managing state consistency during stream processing by themselves. This paper introduces TStream, a highly scalable data stream processing system (DSPS) with built-in transactional state management. TStream is specifically designed for modern shared-memory multicore architectures. TStream’s key contribution is a novel asynchronous state transaction processing paradigm. By detaching and postponing state accesses from the stream application computation logic, TStream minimizes unnecessary stalls caused by state management in stream processing. The postponed state accesses naturally form a batch, and we further propose an operation-chain based execution model that aggressively extracts parallelism opportunities within each batch of state access operations guaranteeing consistency without locks. To confirm the effectiveness of our proposal, we compared TStream against four alternative designs on a 40-core machine. Our extensive experiment study show that TStream yields much higher throughput and scalability with limited latency penalty when processing different types of workloads. TSViz This paper presents a novel framework for demystification of convolutional deep learning models for time series analysis. This is a step towards making informed/explainable decisions in the domain of time series, powered by deep learning. There have been numerous efforts to increase the interpretability of image-centric deep neural network models, where the learned features are more intuitive to visualize. Visualization in time-series is much more complicated as there is no direct interpretation of the filters and inputs as compared to image modality. In addition, little or no concentration has been devoted for the development of such tools in the domain of time-series in the past. The visualization engine of the presented framework provides possibilities to explore and analyze a network from different dimensions at four different levels of abstraction. This enables the user to uncover different aspects of the model which includes important filters, filter clusters, and input saliency maps. These representations allow to understand the network features so that the acceptability of deep networks for time-series data can be enhanced. This is extremely important in domains like finance, industry 4.0, self-driving cars, health-care, counter-terrorism etc., where reasons for reaching a particular prediction are equally important as the prediction itself. The framework \footnote{Framework download link: https://hidden.for.blind.review} can also aid in discovery of the filters which are contributing nothing to the final prediction, hence, can be pruned without any significant loss in performance. Tuatara GS1 The Tuatara GS1 algorithm relies on the more advanced Tuatara GS2 algorithm which generates relationships between objects based on principles in congnition related to Computational Theory of the Mind (CTM) (Pinker, S. 1997) and auto-association (Xijin Ge , Shuichi Iwata, 2002) and reinforced learning (Wenhuan, X., Nandi, A. K., Zhang, J., Evans, K. G. 2005) with exponential decays that follow the Golden Ratio F (Dunlap, Richard A. 1997). Tube Convolutional Neural Network(T-CNN) Deep learning has been demonstrated to achieve excellent results for image classification and object detection. However, the impact of deep learning on video analysis (e.g. action detection and recognition) has been limited due to complexity of video data and lack of annotations. Previous convolutional neural networks (CNN) based video action detection approaches usually consist of two major steps: frame-level action proposal detection and association of proposals across frames. Also, these methods employ two-stream CNN framework to handle spatial and temporal feature separately. In this paper, we propose an end-to-end deep network called Tube Convolutional Neural Network (T-CNN) for action detection in videos. The proposed architecture is a unified network that is able to recognize and localize action based on 3D convolution features. A video is first divided into equal length clips and for each clip a set of tube proposals are generated next based on 3D Convolutional Network (ConvNet) features. Finally, the tube proposals of different clips are linked together employing network flow and spatio-temporal action detection is performed using these linked video proposals. Extensive experiments on several video datasets demonstrate the superior performance of T-CNN for classifying and localizing actions in both trimmed and untrimmed videos compared to state-of-the-arts. TuckER Knowledge graphs are structured representations of real world facts. However, they typically contain only a small subset of all possible facts. Link prediction is a task of inferring missing facts based on existing ones. We propose TuckER, a relatively simple but powerful linear model based on Tucker decomposition of the binary tensor representation of knowledge graph triples. TuckER outperforms all previous state-of-the-art models across standard link prediction datasets. We prove that TuckER is a fully expressive model, deriving the bound on its entity and relation embedding dimensionality for full expressiveness which is several orders of magnitude smaller than the bound of previous state-of-the-art models ComplEx and SimplE. We further show that several previously introduced linear models can be viewed as special cases of TuckER. Tucker Tensor Layer(TTL) We introduce the Tucker Tensor Layer (TTL), an alternative to the dense weight-matrices of the fully connected layers of feed-forward neural networks (NNs), to answer the long standing quest to compress NNs and improve their interpretability. This is achieved by treating these weight-matrices as the unfolding of a higher order weight-tensor. This enables us to introduce a framework for exploiting the multi-way nature of the weight-tensor in order to efficiently reduce the number of parameters, by virtue of the compression properties of tensor decompositions. The Tucker Decomposition (TKD) is employed to decompose the weight-tensor into a core tensor and factor matrices. We re-derive back-propagation within this framework, by extending the notion of matrix derivatives to tensors. In this way, the physical interpretability of the TKD is exploited to gain insights into training, through the process of computing gradients with respect to each factor matrix. The proposed framework is validated on synthetic data and on the Fashion-MNIST dataset, emphasizing the relative importance of various data features in training, hence mitigating the ‘black-box’ issue inherent to NNs. Experiments on both MNIST and Fashion-MNIST illustrate the compression properties of the TTL, achieving a 66.63 fold compression whilst maintaining comparable performance to the uncompressed NN. TUCRL While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e.g., in mountain car, the product space of speed and position contains configurations that are not physically reachable). This leads to defining weakly-communicating or multi-chain MDPs. In this paper, we introduce \tucrl, the first algorithm able to perform efficient exploration-exploitation in any finite Markov Decision Process (MDP) without requiring any form of prior knowledge. In particular, for any MDP with $S^{\texttt{C}}$ communicating states, $A$ actions and $\Gamma^{\texttt{C}} \leq S^{\texttt{C}}$ possible communicating next states, we derive a $\widetilde{O}(D^{\texttt{C}} \sqrt{\Gamma^{\texttt{C}} S^{\texttt{C}} AT})$ regret bound, where $D^{\texttt{C}}$ is the diameter (i.e., the longest shortest path) of the communicating part of the MDP. This is in contrast with optimistic algorithms (e.g., UCRL, Optimistic PSRL) that suffer linear regret in weakly-communicating MDPs, as well as posterior sampling or regularised algorithms (e.g., REGAL), which require prior knowledge on the bias span of the optimal policy to bias the exploration to achieve sub-linear regret. We also prove that in weakly-communicating MDPs, no algorithm can ever achieve a logarithmic growth of the regret without first suffering a linear regret for a number of steps that is exponential in the parameters of the MDP. Finally, we report numerical simulations supporting our theoretical findings and showing how TUCRL overcomes the limitations of the state-of-the-art. Tukey Mean-Difference Plot The Tukey mean-difference plot is a scatter graph produced not for (x,y) values themselves, but for modified coordinates (X,Y) : X = (x+y)/2, Y = y-x. Such a plot is useful, for example, to analyze data with strong correlation between x and y – when the (x,y) dots on the plot are close to the diagonal x=y. In this case, the value of the transformed variable X is about the same as x and y; and the variable Y shows the difference between x and y. The Tukey mean-difference plot is meaningful for two similar variables – that is, when both x and y are of the same physical dimension and expressed in the same units – e.g mass in pounds (or kilograms, …), length in foots (or meters, …). Otherwise, it makes no sense to sum up or subtract values of the variables x and y. TULIP Linear discriminant analysis (LDA) is a powerful tool in building classifiers with easy computation and interpretation. Recent advancements in science technology have led to the popularity of datasets with high dimensions, high orders and complicated structure. Such datasetes motivate the generalization of LDA in various research directions. The R package TULIP integrates several popular high-dimensional LDA-based methods and provides a comprehensive and user-friendly toolbox for linear, semi-parametric and tensor-variate classification. Functions are included for model fitting, cross validation and prediction. In addition, motivated by datasets with diverse sources of predictors, we further include functions for covariate adjustment. Our package is carefully tailored for low storage and high computation efficiency. Moreover, our package is the first R package for many of these methods, providing great convenience to researchers in this area. Tunable GMM Kernel While tree methods have been popular in practice, researchers and practitioners are also looking for simple algorithms which can reach similar accuracy of trees. In 2010, (Ping Li UAI’10) developed the method of ‘abc-robust-logitboost’ and compared it with other supervised learning methods on datasets used by the deep learning literature. In this study, we propose a series of ‘tunable GMM kernels’ which are simple and perform largely comparably to tree methods on the same datasets. Note that ‘abc-robust-logitboost’ substantially improved the original ‘GDBT’ in that (a) it developed a tree-split formula based on second-order information of the derivatives of the loss function; (b) it developed a new set of derivatives for multi-class classification formulation. In the prior study in 2017, the ‘generalized min-max’ (GMM) kernel was shown to have good performance compared to the ‘radial-basis function’ (RBF) kernel. However, as demonstrated in this paper, the original GMM kernel is often not as competitive as tree methods on the datasets used in the deep learning literature. Since the original GMM kernel has no parameters, we propose tunable GMM kernels by adding tuning parameters in various ways. Three basic (i.e., with only one parameter) GMM kernels are the ‘$e$GMM kernel’, ‘$p$GMM kernel’, and ‘$\gamma$GMM kernel’, respectively. Extensive experiments show that they are able to produce good results for a large number of classification tasks. Furthermore, the basic kernels can be combined to boost the performance. Tune Modern machine learning algorithms are increasingly computationally demanding, requiring specialized hardware and distributed computation to achieve high performance in a reasonable time frame. Many hyperparameter search algorithms have been proposed for improving the efficiency of model selection, however their adaptation to the distributed compute environment is often ad-hoc. We propose Tune, a unified framework for model selection and training that provides a narrow-waist interface between training scripts and search algorithms. We show that this interface meets the requirements for a broad range of hyperparameter search algorithms, allows straightforward scaling of search to large clusters, and simplifies algorithm implementation. We demonstrate the implementation of several state-of-the-art hyperparameter search algorithms in Tune. Tune is available at http://…/tune.html. Tunnel Network Traditionally, deep learning algorithms update the network weights whereas the network architecture is chosen manually, using a process of trial and error. In this work, we propose two novel approaches that automatically update the network structure while also learning its weights. The novelty of our approach lies in our parameterization where the depth, or additional complexity, is encapsulated continuously in the parameter space through control parameters that add additional complexity. We propose two methods: In tunnel networks, this selection is done at the level of a hidden unit, and in budding perceptrons, this is done at the level of a network layer; updating this control parameter introduces either another hidden unit or another hidden layer. We show the effectiveness of our methods on the synthetic two-spirals data and on two real data sets of MNIST and MIRFLICKR, where we see that our proposed methods, with the same set of hyperparameters, can correctly adjust the network complexity to the task complexity. Tuple Plot Complex systems are described with high-dimensional data that is hard to visualise. Inselberg’s parallel coordinates are one representation technique for visualising high-dimensional data. Here we generalise Inselberg’s approach, and use it for visualising trajectories through high dimensional state spaces. We introduce two geometric projections of parallel coordinate representations — ‘plan tuple plots’ and ‘side tuple plots’ — and demonstrate a link between state space and ordinary space representations. We provide examples from many domains to illustrate use of the approach, including Cellular Automata, Random Boolean Networks, coupled logistic maps, reservoir computing, search algorithms, Turing Machines, and flocking. Turbo Filtering In this manuscript a method for developing novel filtering algorithms through the parallel concatenation of two Bayesian filters is illustrated. Our description of this method, called turbo filtering, is based on a new graphical model; this allows us to efficiently describe both the processing accomplished inside each of the constituent filter and the interactions between them. This model is exploited to develop two new filtering algorithms for conditionally linear Gaussian systems. Numerical results for a specific dynamic system evidence that such filters can achieve a better complexity-accuracy tradeoff than marginalized particle filtering. Turbo Smoothing Recently, a novel method for developing filtering algorithms, based on the parallel concatenation of Bayesian filters and called turbo filtering, has been proposed. In this manuscript we show how the same conceptual approach can be exploited to devise a new smoothing method, called turbo smoothing. A turbo smoother combines a turbo filter, employed in its forward pass, with the parallel concatenation of two backward information filters used in its backward pass. As a specific application of our general theory, a detailed derivation of two turbo smoothing algorithms for conditionally linear Gaussian systems is illustrated. Numerical results for a specific dynamic system evidence that these algorithms can achieve a better complexity-accuracy tradeoff than other smoothing techniques recently appeared in the literature. Turek-Fletcher Model Model-averaging is commonly used as a means of allowing for model uncertainty in parameter estimation. In the frequentist framework, a model-averaged estimate of a parameter is the weighted mean of the estimates from each of the candidate models, the weights typically being chosen using an information criterion. Current methods for calculating a model-averaged confidence interval assume approximate normality of the model-averaged estimate, i.e., they are Wald intervals. As in the single-model setting, we might improve the coverage performance of this interval by a one-to-one transformation of the parameter, obtaining a Wald interval, and then back-transforming the endpoints. However, a transformation that works in the single-model setting may not when model-averaging, due to the weighting and the need to estimate the weights. In the single-model setting, a natural alternative is to use a profile likelihood interval, which generally provides better coverage than a Wald interval. We propose a method for model-averaging a set of single-model profile likelihood intervals, making use of the link between profile likelihood intervals and Bayesian credible intervals. We illustrate its use in an example involving negative binomial regression, and perform two simulation studies to compare its coverage properties with the existing Wald intervals. TuRF FPGA becomes a popular technology for implementing Convolutional Neural Network (CNN) in recent years. Most CNN applications on FPGA are domain-specific, e.g., detecting objects from specific categories, in which commonly-used CNN models pre-trained on general datasets may not be efficient enough. This paper presents TuRF, an end-to-end CNN acceleration framework to efficiently deploy domain-specific applications on FPGA by transfer learning that adapts pre-trained models to specific domains, replacing standard convolution layers with efficient convolution blocks, and applying layer fusion to enhance hardware design performance. We evaluate TuRF by deploying a pre-trained VGG-16 model for a domain-specific image recognition task onto a Stratix V FPGA. Results show that designs generated by TuRF achieve better performance than prior methods for the original VGG-16 and ResNet-50 models, while for the optimised VGG-16 model TuRF designs are more accurate and easier to process. Turfjs Turf.js is a JavaScript library for spatial analysis. It helps you analyze, aggregate, and transform data in order to visualize it in new ways and answer advanced questions about it. lawn TuringBox AI researchers employ not only the scientific method, but also methodology from mathematics and engineering. However, the use of the scientific method – specifically hypothesis testing – in AI is typically conducted in service of engineering objectives. Growing interest in topics such as fairness and algorithmic bias show that engineering-focused questions only comprise a subset of the important questions about AI systems. This results in the AI Knowledge Gap: the number of unique AI systems grows faster than the number of studies that characterize these systems’ behavior. To close this gap, we argue that the study of AI could benefit from the greater inclusion of researchers who are well positioned to formulate and test hypotheses about the behavior of AI systems. We examine the barriers preventing social and behavioral scientists from conducting such studies. Our diagnosis suggests that accelerating the scientific study of AI systems requires new incentives for academia and industry, mediated by new tools and institutions. To address these needs, we propose a two-sided marketplace called TuringBox. On one side, AI contributors upload existing and novel algorithms to be studied scientifically by others. On the other side, AI examiners develop and post machine intelligence tasks designed to evaluate and characterize algorithmic behavior. We discuss this market’s potential to democratize the scientific study of AI behavior, and thus narrow the AI Knowledge Gap. Turing-Universal Computation TutorialBank The field of Natural Language Processing (NLP) is growing rapidly, with new research published daily along with an abundance of tutorials, codebases and other online resources. In order to learn this dynamic field or stay up-to-date on the latest research, students as well as educators and researchers must constantly sift through multiple sources to find valuable, relevant information. To address this situation, we introduce TutorialBank, a new, publicly available dataset which aims to facilitate NLP education and research. We have manually collected and categorized over 6,300 resources on NLP as well as the related fields of Artificial Intelligence (AI), Machine Learning (ML) and Information Retrieval (IR). Our dataset is notably the largest manually-picked corpus of resources intended for NLP education which does not include only academic papers. Additionally, we have created both a search engine and a command-line tool for the resources and have annotated the corpus to include lists of research topics, relevant resources for each topic, prerequisite relations among topics, relevant sub-parts of individual resources, among other annotations. We are releasing the dataset and present several avenues for further research. TVClust In this paper, we propose a model-based clustering method (TVClust) that robustly incorporates noisy side information as soft-constraints and aims to seek a consensus between side information and the observed data. Our method is based on a nonparametric Bayesian hierarchical model that combines the probabilistic model for the data instance and the one for the side-information. An efficient Gibbs sampling algorithm is proposed for posterior inference. Using the small-variance asymptotics of our probabilistic model, we then derive a new deterministic clustering algorithm (RDP-means). It can be viewed as an extension of K-means that allows for the inclusion of side information and has the additional property that the number of clusters does not need to be specified a priori. Empirical studies have been carried out to compare our work with many constrained clustering algorithms from the literature on both a variety of data sets and under a variety of conditions such as using noisy side information and erroneous k values. The results of our experiments show strong results for our probabilistic and deterministic approaches under these conditions when compared to other algorithms in the literature. Tweedie Distribution In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal and gamma distributions, the purely discrete scaled Poisson distribution, and the class of mixed compound Poisson-gamma distributions which have positive mass at zero, but are otherwise continuous. For any random variable Y that obeys a Tweedie distribution, the variance var(Y) relates to the mean E(Y) by the power law, where a and p are positive constants. The Tweedie distributions were named by Bent Joergensen after Maurice Tweedie, a statistician and medical physicist at the University of Liverpool, UK, who presented the first thorough study of these distributions in 1984. Tweedie Model TDboost Tweepy An easy-to-use Python library for accessing the Twitter API. Tweet2Vec We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages. TweetsKB Publicly available social media archives facilitate research in a variety of fields, such as data science, sociology or the digital humanities, where Twitter has emerged as one of the most prominent sources. However, obtaining, archiving and annotating large amounts of tweets is costly. In this paper, we describe TweetsKB, a publicly available corpus of currently more than 1.5 billion tweets, spanning almost 5 years (Jan’13-Nov’17). Metadata information about the tweets as well as extracted entities, hashtags, user mentions and sentiment information are exposed using established RDF/S vocabularies. Next to a description of the extraction and annotation process, we present use cases to illustrate scenarios for entity-centric information exploration, data integration and knowledge discovery facilitated by TweetsKB. TwiInsight Social media platforms contain a great wealth of information which provides opportunities for us to explore hidden patterns or unknown correlations, and understand people’s satisfaction with what they are discussing. As one showcase, in this paper, we present a system, TwiInsight which explores the insight of Twitter data. Different from other Twitter analysis systems, TwiInsight automatically extracts the popular topics under different categories (e.g., healthcare, food, technology, sports and transport) discussed in Twitter via topic modeling and also identifies the correlated topics across different categories. Additionally, it also discovers the people’s opinions on the tweets and topics via the sentiment analysis. The system also employs an intuitive and informative visualization to show the uncovered insight. Furthermore, we also develop and compare six most popular algorithms – three for sentiment analysis and three for topic modeling. Twin Sort Technique The objective behind the Twin Sort technique is to sort the list of unordered data elements efficiently and to allow efficient and simple arrangement of data elements within the data structure with optimization of comparisons and iterations in the sorting method. This sorting technique effectively terminates the iterations when there is no need of comparison if the elements are all sorted in between the iterations. Unlike Quick sort, Merge sorting technique, this new sorting technique is based on the iterative method of sorting elements within the data structure. So it will be advantageous for optimization of iterations when there is no need for sorting elements. Finally, the Twin Sort technique is more efficient and simple method of arranging elements within a data structure and it is easy to implement when comparing to the other sorting technique. By the introduction of optimization of comparison and iterations, it will never allow the arranging task on the ordered elements. Twin Support Vector Machine(TSVM,TWSVM) Twin Support Vector Machine (TWSVM) is an emerging machine learning method suitable for both classification and regression problems. It utilizes the concept of Generalized Eigen-values Proximal Support Vector Machine (GEPSVM) and finds two non-parallel planes for each class by solving a pair of Quadratic Programming Problems. It enhances the computational speed as compared to the traditional Support Vector Machine (SVM). TWSVM was initially constructed to solve binary classification problems; later researchers successfully extended it for multi-class problem domain. TWSVM always gives promising empirical results, due to which it has many attractive features which enhance its applicability. This paper presents the research development of TWSVM in recent years. This study is divided into two main broad categories – variant based and multi-class based TWSVM methods. The paper primarily discusses the basic concept of TWSVM and highlights its applications in recent years. A comparative analysis of various research contributions based on TWSVM is also presented. This is helpful for researchers to effectively utilize the TWSVM as an emergent research methodology and encourage them to work further in the performance enhancement of TWSVM. Two Alternatives Forced Choice Score(2AFC) ➚ “Generalized Discrimination Score” Two Dimensional Stochastic Configuration Network(2DSCN) Stochastic configuration networks (SCNs) as a class of randomized learner model have been successfully employed in data analytics due to its universal approximation capability and fast modelling property. The technical essence lies in stochastically configuring hidden nodes (or basis functions) based on a supervisory mechanism rather than data-independent randomization as usually adopted for building randomized neural networks. Given image data modelling tasks, the use of one-dimensional SCNs potentially demolishes the spatial information of images, and may result in undesirable performance. This paper extends the original SCNs to two-dimensional version, termed 2DSCNs, for fast building randomized learners with matrix-inputs. Some theoretical analyses on the goodness of 2DSCNs against SCNs, including the complexity of the random parameter space, and the superiority of generalization, are presented. Empirical results over one regression, four benchmark handwritten digits classification, and two human face recognition datasets demonstrate that the proposed 2DSCNs perform favourably and show good potential for image data analytics. Two one-Sided Tests(TOST) Two one-sided tests (TOST) procedure to test equivalence for t-tests, correlations, and meta-analyses, including power analysis for t-tests and correlations. Allows you to specify equivalence bounds in raw scale units or in terms of effect sizes. TOSTER Two Stage Least Squares(2SLS,MIIV-2SLS) Two-Stage least squares (2SLS) regression analysis is a statistical technique that is used in the analysis of structural equations. This technique is the extension of the OLS method. It is used when the dependent variable’s error terms are correlated with the independent variables. Additionally, it is useful when there are feedback loops in the model. In structural equations modeling, we use the maximum likelihood method to estimate the path coefficient. This technique is an alternative in SEM modeling to estimate the path coefficient. This technique can also be applied in quasi-experimental studies. MIIVsem Two-Dimensional Linear Discriminant Analysis(2DLDA) ➚ “Generalized Lp-Norm Two-Dimensional Linear Discriminant Analysis” Two-Stage Learning(TSL) ➚ “Learning Through Deterministic Assignment of Hidden Parameters” Two-Step Importance Weighting IL(2IWIL) Imitation learning (IL) aims to learn an optimal policy from demonstrations. However, such demonstrations are often imperfect since collecting optimal ones is costly. To effectively learn from imperfect demonstrations, we propose a novel approach that utilizes confidence scores, which describe the quality of demonstrations. More specifically, we propose two confidence-based IL methods, namely two-step importance weighting IL (2IWIL) and generative adversarial IL with imperfect demonstration and confidence (IC-GAIL). We show that confidence scores given only to a small portion of sub-optimal demonstrations significantly improve the performance of IL both theoretically and empirically. Two-Wing Optimization Strategy(TwoWingOS) Determining whether a given claim is supported by evidence is a fundamental NLP problem that is best modeled as Textual Entailment. However, given a large collection of text, finding evidence that could support or refute a given claim is a challenge in itself, amplified by the fact that different evidence might be needed to support or refute a claim. Nevertheless, most prior work decouples evidence identification from determining the truth value of the claim given the evidence. We propose to consider these two aspects jointly. We develop TwoWingOS (two-wing optimization strategy), a system that, while identifying appropriate evidence for a claim, also determines whether or not the claim is supported by the evidence. Given the claim, TwoWingOS attempts to identify a subset of the evidence candidates; given the predicted evidence, it then attempts to determine the truth value of the corresponding claim. We treat this challenge as coupled optimization problems, training a joint model for it. TwoWingOS offers two advantages: (i) Unlike pipeline systems, it facilitates flexible-size evidence set, and (ii) Joint training improves both the claim entailment and the evidence identification. Experiments on a benchmark dataset show state-of-the-art performance. Code: https://…/FEVER Typed Graph Network Recently, the deep learning community has given growing attention to neural architectures engineered to learn problems in relational domains. Convolutional Neural Networks employ parameter sharing over the image domain, tying the weights of neural connections on a grid topology and thus enforcing the learning of a number of convolutional kernels. By instantiating trainable neural modules and assembling them in varied configurations (apart from grids), one can enforce parameter sharing over graphs, yielding models which can effectively be fed with relational data. In this context, vertices in a graph can be projected into a hyperdimensional real space and iteratively refined over many message-passing iterations in an end-to-end differentiable architecture. Architectures of this family have been referred to with several definitions in the literature, such as Graph Neural Networks, Message-passing Neural Networks, Relational Networks and Graph Networks. In this paper, we revisit the original Graph Neural Network model and show that it generalises many of the recent models, which in turn benefit from the insight of thinking about vertex \textbf{types}. To illustrate the generality of the original model, we present a Graph Neural Network formalisation, which partitions the vertices of a graph into a number of types. Each type represents an entity in the ontology of the problem one wants to learn. This allows – for instance – one to assign embeddings to edges, hyperedges, and any number of global attributes of the graph. As a companion to this paper we provide a Python/Tensorflow library to facilitate the development of such architectures, with which we instantiate the formalisation to reproduce a number of models proposed in the current literature. TypeSQL Interacting with relational databases through natural language helps users of any background easily query and analyze a vast amount of data. This requires a system that understands users’ questions and converts them to SQL queries automatically. In this paper we present a novel approach, TypeSQL, which views this problem as a slot filling task. Additionally, TypeSQL utilizes type information to better understand rare entities and numbers in natural language questions. We test this idea on the WikiSQL dataset and outperform the prior state-of-the-art by 5.5% in much less time. We also show that accessing the content of databases can significantly improve the performance when users’ queries are not well-formed. TypeSQL gets 82.6% accuracy, a 17.5% absolute improvement compared to the previous content-sensitive model. Typicality and Eccentricity Data Analysis(TEDA) The typicality and eccentricity data analysis (TEDA) framework was put forward by Angelov (2013) . It has been further developed into multiple different techniques since, and provides a non-parametric way of determining how similar an observation, from a process that is not purely random, is to other observations generated by the process. teda TzK We introduce TzK (pronounced ‘task’), a conditional flow-based encoder/decoder generative model, formulated in terms of maximum likelihood (ML). TzK offers efficient approximation of arbitrary data sample distributions (similar to GAN and flow-based ML), and stable training (similar to VAE and ML), while avoiding variational approximations (similar to ML). TzK exploits meta-data to facilitate a bottleneck, similar to autoencoders, thereby producing a low-dimensional representation. Unlike autoencoders, our bottleneck does not limit model expressiveness, similar to flow-based ML. Supervised, unsupervised, and semi-supervised learning are supported by replacing missing observations with samples from learned priors. We demonstrate TzK by jointly training on MNIST and Omniglot with minimal preprocessing, and weak supervision, with results which are comparable to state-of-the-art.