In data science, one is often confronted with a time series representing measurements of some quantity of interest. Usually, in a first step, features of the time series need to be extracted. These are numerical quantities that aim to succinctly describe the data and to dampen the influence of noise. In some applications, these features are also required to satisfy some invariance properties. In this paper, we concentrate on time-warping invariants. We show that these correspond to a certain family of iterated sums of the increments of the time series, known as quasisymmetric functions in the mathematics literature. We present these invariant features in an algebraic framework, and we develop some of their basic properties.
Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders scaling difficult, and it remains an open question whether structural biases are still necessary when sequential models have access to ever larger amounts of training data. To answer this question, we introduce an efficient knowledge distillation (KD) technique that transfers knowledge from a syntactic language model trained on a small corpus to an LSTM language model, hence enabling the LSTM to develop a more structurally sensitive representation of the larger training data it learns from. On targeted syntactic evaluations, we find that, while sequential LSTMs perform much better than previously reported, our proposed technique substantially improves on this baseline, yielding a new state of the art. Our findings and analysis affirm the importance of structural biases, even in models that learn from large amounts of data.
Monotonicity reasoning is one of the important reasoning skills for any intelligent natural language inference (NLI) model in that it requires the ability to capture the interaction between lexical and syntactic structures. Since no test set has been developed for monotonicity reasoning with wide coverage, it is still unclear whether neural models can perform monotonicity reasoning in a proper way. To investigate this issue, we introduce the Monotonicity Entailment Dataset (MED). Performance by state-of-the-art NLI models on the new test set is substantially worse, under 55%, especially on downward reasoning. In addition, analysis using a monotonicity-driven data augmentation method showed that these models might be limited in their generalization ability in upward and downward reasoning.
We study non-linear data-dimension reduction. We are motivated by the classical linear framework of Principal Component Analysis. In nonlinear case, we introduce instead a new kernel-Principal Component Analysis, manifold and feature space transforms. Our results extend earlier work for probabilistic Karhunen-Lo\`eve transforms on compression of wavelet images. Our object is algorithms for optimization, selection of efficient bases, or components, which serve to minimize entropy and error; and hence to improve digital representation of images, and hence of optimal storage, and transmission. We prove several new theorems for data-dimension reduction. Moreover, with the use of frames in Hilbert space, and a new Hilbert-Schmidt analysis, we identify when a choice of Gaussian kernel is optimal.
Regression trees and their ensemble methods are popular methods for non-parametric regression — combining strong predictive performance with interpretable estimators. In order to improve their utility for smooth response surfaces, we study regression trees and random forests with linear aggregation functions. We introduce a new algorithm which finds the best axis-aligned split to fit optimal linear aggregation functions on the corresponding nodes and implement this method in the provably fastest way. This algorithm enables us to create more interpretable trees and obtain better predictive performance on a wide range of data sets. We also provide a software package that implements our algorithm. Applying the algorithm to several real-world data sets, we showcase its favorable performance in an extensive simulation study in terms of EMSE and demonstrate the improved interpretability of resulting estimators on a large real-world data set.
This paper proposes a new method for estimating the joint probability mass function of a pair of discrete random variables. This estimator is used to construct joint entropy and Shannon mutual information estimates of a pair of discrete random variables. Almost sure consistency and central limit Theorems are established. Theorical results are validated by simulations.
Graph clustering is a fundamental task which discovers communities or groups in networks. Recent studies have mostly focused on developing deep learning approaches to learn a compact graph embedding, upon which classic clustering methods like k-means or spectral clustering algorithms are applied. These two-step frameworks are difficult to manipulate and usually lead to suboptimal performance, mainly because the graph embedding is not goal-directed, i.e., designed for the specific clustering task. In this paper, we propose a goal-directed deep learning approach, Deep Attentional Embedded Graph Clustering (DAEGC for short). Our method focuses on attributed graphs to sufficiently explore the two sides of information in graphs. By employing an attention network to capture the importance of the neighboring nodes to a target node, our DAEGC algorithm encodes the topological structure and node content in a graph to a compact representation, on which an inner product decoder is trained to reconstruct the graph structure. Furthermore, soft labels from the graph embedding itself are generated to supervise a self-training graph clustering process, which iteratively refines the clustering results. The self-training process is jointly learned and optimized with the graph embedding in a unified framework, to mutually benefit both components. Experimental results compared with state-of-the-art algorithms demonstrate the superiority of our method.
In the era of information explosion, facing complex information, it is difficult for users to choose the information of interest, and businesses also need detailed information on ways to let the ad stand out. By this time, it is recommended that a good way. We firstly by using random interviews, simulations, asking experts, summarizes methods outlined the main factors affecting the scores of books that users drew. In order to further illustrate the impact of these factors, we also by combining the AHP consistency test, then fuzzy evaluation method, empowered each factor, influencing factors and the degree of influence come. For the second question, predict user evaluation of the listed books from the predict annex. First, given the books Annex labels, user data extraction scorebooks and mathematical analysis of data obtained from SPSS user preferences and then use software to nearest neighbor analysis to result in predicted value.
We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other. The proposed method disentangles the common and separate parts of these domains and, through the generation of a mask, focuses the attention of the underlying network to the desired augmentation alone, without wastefully reconstructing the entire target. This enables state-of-the-art quality and variety of content translation, as shown through extensive quantitative and qualitative evaluation. Furthermore, the novel mask-based formulation and regularization is accurate enough to achieve state-of-the-art performance in the realm of weakly supervised segmentation, where only class labels are given. To our knowledge, this is the first report that bridges the problems of domain disentanglement and weakly supervised segmentation. Our code is publicly available at https://…/mbu-content-tansfer.
Technological breakthroughs on smart homes, self-driving cars, health care and robotic assistants, in addition to reinforced law regulations, have critically influenced academic research on explainable machine learning. A sufficient number of researchers have implemented ways to explain indifferently any black box model for classification tasks. A drawback of building agnostic explanators is that the neighbourhood generation process is universal and consequently does not guarantee true adjacency between the generated neighbours and the instance. This paper explores a methodology on providing explanations for a neural network’s decisions, in a local scope, through a process that actively takes into consideration the neural network’s architecture on creating an instance’s neighbourhood, that assures the adjacency among the generated neighbours and the instance.
An end-to-end data integration system requires human feedback in several phases, including collecting training data for entity matching, debugging the resulting clusters, confirming transformations applied on these clusters for data standardization, and finally, reducing each cluster to a single, canonical representation (or ‘golden record’). The traditional wisdom is to sequentially apply the human feedback, obtained by asking specific questions, within some budget in each phase. However, these questions are highly correlated; the answer to one can influence the outcome of any of the phases of the pipeline. Hence, interleaving them has the potential to offer significant benefits. In this paper, we propose a human-in-the-loop framework that interleaves different types of questions to optimize human involvement. We propose benefit models to measure the quality improvement from asking a question, and cost models to measure the human time it takes to answer a question. We develop a question scheduling framework that judiciously selects questions to maximize the accuracy of the final golden records. Experimental results on three real-world datasets show that our holistic method significantly improves the quality of golden records from 70% to 90%, compared with the state-of-the-art approaches.
Human ability at solving complex tasks is helped by priors on object and event semantics of their environment. This paper investigates the use of similar prior knowledge for transfer learning in Reinforcement Learning agents. In particular, the paper proposes to use a first-order-logic language grounded in deep neural networks to represent facts about objects and their semantics in the real world. Facts are provided as background knowledge a priori to learning a policy for how to act in the world. The priors are injected with the conventional input in a single agent architecture. As proof-of-concept, the paper tests the system in simple experiments that show the importance of symbolic abstraction and flexible fact derivation. The paper shows that the proposed system can learn to take advantage of both the symbolic layer and the image layer in a single decision selection module.
We discuss Bayesian model uncertainty analysis and forecasting in sequential dynamic modeling of multivariate time series. The perspective is that of a decision-maker with a specific forecasting objective that guides thinking about relevant models. Based on formal Bayesian decision-theoretic reasoning, we develop a time-adaptive approach to exploring, weighting, combining and selecting models that differ in terms of predictive variables included. The adaptivity allows for changes in the sets of favored models over time, and is guided by the specific forecasting goals. A synthetic example illustrates how decision-guided variable selection differs from traditional Bayesian model uncertainty analysis and standard model averaging. An applied study in one motivating application of long-term macroeconomic forecasting highlights the utility of the new approach in terms of improving predictions as well as its ability to identify and interpret different sets of relevant models over time with respect to specific, defined forecasting goals.
This paper introduces the R package slm which stands for Stationary Linear Models. The package contains a set of statistical procedures for linear regression in the general context where the error process is strictly stationary with short memory. We work in the setting of Hannan (1973), who proved the asymptotic normality of the (normalized) least squares estimators (LSE) under very mild conditions on the error process. We propose different ways to estimate the asymptotic covariance matrix of the LSE, and then to correct the type I error rates of the usual tests on the parameters (as well as confidence intervals). The procedures are evaluated through different sets of simulations, and two examples of real datasets are studied.
Modern big data systems run on cloud environments where resources are shared amongst several users and applications. As a result, declarative user queries in these environments need to be optimized and executed over resources that constantly change and are provisioned on demand for each job. This requires us to rethink traditional query optimizers designed for systems that run on dedicated resources. In this paper, we show evidence that the choice of query plans depends heavily on the available resources, and the current practice of choosing query plans before picking the resources could lead to significant performance loss in two popular big data systems, namely Hive and SparkSQL. Therefore, we make a case for Resource and Query Optimization (or RAQO), i.e., choosing both the query plan and the resource configuration at the same time. We describe rule-based RAQO and present alternate decisions trees to make resource-aware query planning in Hive and Spark. We further present cost-based RAQO that integrates resource planning within a query planner, and show techniques to significantly reduce the resource planning overheads. We evaluate cost-based RAQO using state-of-the-art System R query planner as well as a recently proposed multi-objective query planner. Our evaluation on TPC-H and randomly generated schemas show that: (i) we can reduce the resource planning overhead by up to 16x, and (ii) RAQO can scale to schemas as large as 100 table joins as well as clusters as big as 100K containers with 100GB each.
Catastrophic forgetting of connectionist neural networks is caused by the global sharing of parameters among all training examples. In this study, we analyze parameter sharing under the conditional computation framework where the parameters of a neural network are conditioned on each input example. At one extreme, if each input example uses a disjoint set of parameters, there is no sharing of parameters thus no catastrophic forgetting. At the other extreme, if the parameters are the same for every example, it reduces to the conventional neural network. We then introduce a clipped version of maxout networks which lies in the middle, i.e. parameters are shared partially among examples. Based on the parameter sharing analysis, we can locate a limited set of examples that are interfered when learning a new example. We propose to perform rehearsal on this set to prevent forgetting, which is termed as conditional rehearsal. Finally, we demonstrate the effectiveness of the proposed method in an online non-stationary setup, where updates are made after each new example and the distribution of the received example shifts over time.