Decisions by Machine Learning (ML) models have become ubiquitous. Trusting these decisions requires understanding how algorithms take them. Hence interpretability methods for ML are an active focus of research. A central problem in this context is that both the quality of interpretability methods as well as trust in ML predictions are difficult to measure. Yet evaluations, comparisons and improvements of trust and interpretability require quantifiable measures. Here we propose a quantitative measure for the quality of interpretability methods. Based on that we derive a quantitative measure of trust in ML decisions. Building on previous work we propose to measure intuitive understanding of algorithmic decisions using the information transfer rate at which humans replicate ML model predictions. We provide empirical evidence from crowdsourcing experiments that the proposed metric robustly differentiates interpretability methods. The proposed metric also demonstrates the value of interpretability for ML assisted human decision making: in our experiments providing explanations more than doubled productivity in annotation tasks. However unbiased human judgement is critical for doctors, judges, policy makers and others. Here we derive a trust metric that identifies when human decisions are overly biased towards ML predictions. Our results complement existing qualitative work on trust and interpretability by quantifiable measures that can serve as objectives for further improving methods in this field of research.
In this paper, we present a new approach to interpreting deep learning models. More precisely, by coupling mutual information with network science, we explore how information flows through feed forward networks. We show that efficiently approximating mutual information via the dual representation of Kullback-Leibler divergence allows us to create an information measure that quantifies how much information flows between any two neurons of a deep learning model. To that end, we propose NIF, Neural Information Flow, a new metric for codifying information flow which exposes the internals of a deep learning model while providing feature attributions.
Given key performance indicators collected with fine granularity as time series, our aim is to predict and explain failures in storage environments. Although explainable predictive modeling based on spiky telemetry data is key in many domains, current approaches cannot tackle this problem. Deep learning methods suitable for sequence modeling and learning temporal dependencies, such as RNNs, are effective, but opaque from an explainability perspective. Our approach first extracts the anomalous spikes from time series as events and then builds an RNN classifier with attention mechanisms to embed the irregularity and frequency of these events. A preliminary evaluation on real world storage environments shows that our approach can predict failures within a 3-day prediction window with comparable accuracy as traditional RNN-based classifiers. At the same time it can explain the predictions by returning the key anomalous events which led to those failure predictions.
Many image processing tasks involve image-to-image mapping, which can be addressed well by fully convolutional networks (FCN) without any heavy preprocessing. Although empirically designing and training FCNs can achieve satisfactory results, reasons for the improvement in performance are slightly ambiguous. Our study is to make progress in understanding their generalization abilities through visualizing the optimization landscapes. The visualization of objective functions is obtained by choosing a solution and projecting its vicinity onto a 3D space. We compare three FCN-based networks (two existing models and a new proposed in this paper for comparison) on multiple datasets. It has been observed in practice that the connections from the pre-pooled feature maps to the post-upsampled can achieve better results. We investigate the cause and provide experiments to shows that the skip-layer connections in FCN can promote flat optimization landscape, which is well known to generalize better. Additionally, we explore the relationship between the models generalization ability and loss surface under different batch sizes. Results show that large-batch training makes the model converge to sharp minimizers with chaotic vicinities while small-batch method leads the model to flat minimizers with smooth and nearly convex regions. Our work may contribute to insights and analysis for designing and training FCNs.
Explaining neural network computation in terms of probabilistic/fuzzy logical operations has attracted much attention due to its simplicity and high interpretability. Different choices of logical operators such as AND, OR and XOR give rise to another dimension for network optimization, and in this paper, we study the open problem of learning a universal logical operator without prescribing to any logical operations manually. Insightful observations along this exploration furnish deep convolution networks with a novel logical interpretation.
Transfer learning which aims at utilizing knowledge learned from one problem (source domain) to solve another different but related problem (target domain) has attracted wide research attentions. However, the current transfer learning methods are mostly uninterpretable, especially to people without ML expertise. In this extended abstract, we brief introduce two knowledge graph (KG) based frameworks towards human understandable transfer learning explanation. The first one explains the transferability of features learned by Convolutional Neural Network (CNN) from one domain to another through pre-training and fine-tuning, while the second justifies the model of a target domain predicted by models from multiple source domains in zero-shot learning (ZSL). Both methods utilize KG and its reasoning capability to provide rich and human understandable explanations to the transfer procedure.
We introduce a new logic programming language T-PRISM based on tensor embeddings. Our embedding scheme is a modification of the distribution semantics in PRISM, one of the state-of-the-art probabilistic logic programming languages, by replacing distribution functions with multidimensional arrays, i.e., tensors. T-PRISM consists of two parts: logic programming part and numerical computation part. The former provides flexible and interpretable modeling at the level of first order logic, and the latter part provides scalable computation utilizing parallelization and hardware acceleration with GPUs. Combing these two parts provides a remarkably wide range of high-level declarative modeling from symbolic reasoning to deep learning. To embody this programming language, we also introduce a new semantics, termed tensorized semantics, which combines the traditional least model semantics in logic programming with the embeddings of tensors. In T-PRISM, we first derive a set of equations related to tensors from a given program using logical inference, i.e., Prolog execution in a symbolic space and then solve the derived equations in a continuous space by TensorFlow. Using our preliminary implementation of T-PRISM, we have successfully dealt with a wide range of modeling. We have succeeded in dealing with real large-scale data in the declarative modeling. This paper presents a DistMult model for knowledge graphs using the FB15k and WN18 datasets.
One of the challenges in large-scale information retrieval (IR) is to develop fine-grained and domain-specific methods to answer natural language questions. Despite the availability of numerous sources and datasets for answer retrieval, Question Answering (QA) remains a challenging problem due to the difficulty of the question understanding and answer extraction tasks. One of the promising tracks investigated in QA is to map new questions to formerly answered questions that are similar’. In this paper, we propose a novel QA approach based on Recognizing Question Entailment (RQE) and we describe the QA system and resources that we built and evaluated on real medical questions. First, we compare machine learning and deep learning methods for RQE using different kinds of datasets, including textual inference, question similarity and entailment in both the open and clinical domains. Second, we combine IR models with the best RQE method to select entailed questions and rank the retrieved answers. To study the end-to-end QA approach, we built the MedQuAD collection of 47,457 question-answer pairs from trusted medical sources, that we introduce and share in the scope of this paper. Following the evaluation process used in TREC 2017 LiveQA, we find that our approach exceeds the best results of the medical task with a 29.8% increase over the best official score. The evaluation results also support the relevance of question entailment for QA and highlight the effectiveness of combining IR and RQE for future QA efforts. Our findings also show that relying on a restricted set of reliable answer sources can bring a substantial improvement in medical QA.
Despite the recent popularity of deep generative state space models, few comparisons have been made between network architectures and the inference steps of the Bayesian filtering framework — with most models simultaneously approximating both state transition and update steps with a single recurrent neural network (RNN). In this paper, we introduce the Recurrent Neural Filter (RNF), a novel recurrent variational autoencoder architecture that learns distinct representations for each Bayesian filtering step, captured by a series of encoders and decoders. Testing this on three real-world time series datasets, we demonstrate that decoupling representations not only improves the accuracy of one-step-ahead forecasts while providing realistic uncertainty estimates, but also facilitates multistep prediction through the separation of encoder stages.
Fitting machine learning models in the low-data limit is challenging. The main challenge is to obtain suitable prior knowledge and encode it into the model, for instance in the form of a Gaussian process prior. Recent advances in meta-learning offer powerful methods for extracting such prior knowledge from data acquired in related tasks. When it comes to meta-learning in Gaussian process models, approaches in this setting have mostly focused on learning the kernel function of the prior, but not on learning its mean function. In this work, we propose to parameterize the mean function of a Gaussian process with a deep neural network and train it with a meta-learning procedure. We present analytical and empirical evidence that mean function learning can be superior to kernel learning alone, particularly if data is scarce.
We introduce a new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model. Fine-tuning is performed by using a multi-task objective which combines several unsupervised prediction tasks. The resulting fine-tuned model shows strong improvements over the current state-of-the-art end-to-end conversational models like memory augmented seq2seq and information-retrieval models. On the privately held PERSONA-CHAT dataset of the Conversational Intelligence Challenge 2, this approach obtains a new state-of-the-art, with respective perplexity, Hits@1 and F1 metrics of 16.28 (45 % absolute improvement), 80.7 (46 % absolute improvement) and 19.5 (20 % absolute improvement).
We propose the predictability, computability, and stability (PCS) framework to extract reproducible knowledge from data that can guide scientific hypothesis generation and experimental design. The PCS framework builds on key ideas in machine learning, using predictability as a reality check and evaluating computational considerations in data collection, data storage, and algorithm design. It augments PC with an overarching stability principle, which largely expands traditional statistical uncertainty considerations. In particular, stability assesses how results vary with respect to choices (or perturbations) made across the data science life cycle, including problem formulation, pre-processing, modeling (data and algorithm perturbations), and exploratory data analysis (EDA) before and after modeling. Furthermore, we develop PCS inference to investigate the stability of data results and identify when models are consistent with relatively simple phenomena. We compare PCS inference with existing methods, such as selective inference, in high-dimensional sparse linear model simulations to demonstrate that our methods consistently outperform others in terms of ROC curves over a wide range of simulation settings. Finally, we propose a PCS documentation based on Rmarkdown, iPython, or Jupyter Notebook, with publicly available, reproducible codes and narratives to back up human choices made throughout an analysis. The PCS workflow and documentation are demonstrated in a genomics case study available on Zenodo.
Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel situations in order to obtain rewards. The agent can select informative interventions, draw causal inferences from observational data, and make counterfactual predictions. Although established formal causal reasoning algorithms also exist, in this paper we show that such reasoning can arise from model-free reinforcement learning, and suggest that causal reasoning in complex settings may benefit from the more end-to-end learning-based approaches presented here. This work also offers new strategies for structured exploration in reinforcement learning, by providing agents with the ability to perform — and interpret — experiments.
Classifying semantic relations between entity pairs in sentences is an important task in Natural Language Processing (NLP). Most previous models for relation classification rely on the high-level lexical and syntactic features obtained by NLP tools such as WordNet, dependency parser, part-of-speech (POS) tagger, and named entity recognizers (NER). In addition, state-of-the-art neural models based on attention mechanisms do not fully utilize information of entity that may be the most crucial features for relation classification. To address these issues, we propose a novel end-to-end recurrent neural model which incorporates an entity-aware attention mechanism with a latent entity typing (LET) method. Our model not only utilizes entities and their latent types as features effectively but also is more interpretable by visualizing attention mechanisms applied to our model and results of LET. Experimental results on the SemEval-2010 Task 8, one of the most popular relation classification task, demonstrate that our model outperforms existing state-of-the-art models without any high-level features.
Understanding dependence structure among extreme values plays an important role in risk assessment in environmental studies. In this work we propose the $\chi$ network and the annual extremal network for exploring the extremal dependence structure of environmental processes. A $\chi$ network is constructed by connecting pairs whose estimated upper tail dependence coefficient, $\hat \chi$, exceeds a prescribed threshold. We develop an initial $\chi$ network estimator and we use a spatial block bootstrap to assess both the bias and variance of our estimator. We then develop a method to correct the bias of the initial estimator by incorporating the spatial structure in $\chi$. In addition to the $\chi$ network, which assesses spatial extremal dependence over an extended period of time, we further introduce an annual extremal network to explore the year-to-year temporal variation of extremal connections. We illustrate the $\chi$ and the annual extremal networks by analyzing the hurricane season maximum precipitation at the US Gulf Coast and surrounding area. Analysis suggests there exists long distance extremal dependence for precipitation extremes in the study region and the strength of the extremal dependence may depend on some regional scale meteorological conditions, for example, sea surface temperature.
A new stochastic primal-dual algorithm for solving a composite optimization problem is proposed. It is assumed that all the functions/matrix used to define the optimization problem are given as statistical expectations. These expectations are unknown but revealed across time through i.i.d realizations. This covers the case of convex optimization under stochastic linear constraints. The proposed algorithm is proven to converge to a saddle point of the Lagrangian function. In the framework of the monotone operator theory, the convergence proof relies on recent results on the stochastic Forward Backward algorithm involving random monotone operators.
We consider the sequential change-point detection for asynchronous multi-sensors, where each sensor observe a signal (due to change-point) at different times. We propose an asynchronous Subspace-CUSUM procedure based on jointly estimating the unknown signal waveform and the unknown relative delays between the sensors. Using the estimated delays, we can align signals and use the subspace to combine the multiple sensor observations. We derive the optimal drift parameter for the proposed procedure, and characterize the relationship between the expected detection delay, average run length (of false alarms), and the energy of the time-varying signal. We demonstrate the good performance of the proposed procedure using simulation and real data. We also demonstrate that the proposed procedure outperforms the well-known one-shot procedure’ in detecting weak and asynchronous signals.
To improve the performance of Intensive Care Units (ICUs), the field of bio-statistics has developed scores which try to predict the likelihood of negative outcomes. These help evaluate the effectiveness of treatments and clinical practice, and also help to identify patients with unexpected outcomes. However, they have been shown by several studies to offer sub-optimal performance. Alternatively, Deep Learning offers state of the art capabilities in certain prediction tasks and research suggests deep neural networks are able to outperform traditional techniques. Nevertheless, a main impediment for the adoption of Deep Learning in healthcare is its reduced interpretability, for in this field it is crucial to gain insight on the why of predictions, to assure that models are actually learning relevant features instead of spurious correlations. To address this, we propose a deep multi-scale convolutional architecture trained on the Medical Information Mart for Intensive Care III (MIMIC-III) for mortality prediction, and the use of concepts from coalitional game theory to construct visual explanations aimed to show how important these inputs are deemed by the network. Our results show our model attains state of the art performance while remaining interpretable. Supporting code can be found at https://…/ISeeU.
We present TigerGraph, a graph database system built from the ground up to support massively parallel computation of queries and analytics. TigerGraph’s high-level query language, GSQL, is designed for compatibility with SQL, while simultaneously allowing NoSQL programmers to continue thinking in Bulk-Synchronous Processing (BSP) terms and reap the benefits of high-level specification. GSQL is sufficiently high-level to allow declarative SQL-style programming, yet sufficiently expressive to concisely specify the sophisticated iterative algorithms required by modern graph analytics and traditionally coded in general-purpose programming languages like C++ and Java. We report very strong scale-up and scale-out performance over a benchmark we published on GitHub for full reproducibility.
Predicting properties of nodes in a graph is an important problem with applications in a variety of domains. Graph-based Semi-Supervised Learning (SSL) methods aim to address this problem by labeling a small subset of the nodes as seeds and then utilizing the graph structure to predict label scores for the rest of the nodes in the graph. Recently, Graph Convolutional Networks (GCNs) have achieved impressive performance on the graph-based SSL task. In addition to label scores, it is also desirable to have confidence scores associated with them. Unfortunately, confidence estimation in the context of GCN has not been previously explored. We fill this important gap in this paper and propose ConfGCN, which estimates labels scores along with their confidences jointly in GCN-based setting. ConfGCN uses these estimated confidences to determine the influence of one node on another during neighborhood aggregation, thereby acquiring anisotropic capabilities. Through extensive analysis and experiments on standard benchmarks, we find that ConfGCN is able to outperform state-of-the-art baselines. We have made ConfGCN’s source code available to encourage reproducible research.
Large-batch training approaches have enabled researchers to utilize large-scale distributed processing and greatly accelerate deep-neural net (DNN) training. For example, by scaling the batch size from 256 to 32K, researchers have been able to reduce the training time of ResNet50 on ImageNet from 29 hours to 2.2 minutes (Ying et al., 2018). In this paper, we propose a new approach called linear-epoch gradual-warmup (LEGW) for better large-batch training. With LEGW, we are able to conduct large-batch training for both CNNs and RNNs with the Sqrt Scaling scheme. LEGW enables Sqrt Scaling scheme to be useful in practice and as a result we achieve much better results than the Linear Scaling learning rate scheme. For LSTM applications, we are able to scale the batch size by a factor of 64 without losing accuracy and without tuning the hyper-parameters. For CNN applications, LEGW is able to achieve the same accuracy even as we scale the batch size to 32K. LEGW works better than previous large-batch auto-tuning techniques. LEGW achieves a 5.3X average speedup over the baselines for four LSTM-based applications on the same hardware. We also provide some theoretical explanations for LEGW.
Recent work on Natural Language Interfaces to Databases (NLIDB) has attracted considerable attention. NLIDB allow users to search databases using natural language instead of SQL-like query languages. While saving the users from having to learn query languages, multi-turn interaction with NLIDB usually involves multiple queries where contextual information is vital to understand the users’ query intents. In this paper, we address a typical contextual understanding problem, termed as follow-up query analysis. In spite of its ubiquity, follow-up query analysis has not been well studied due to two primary obstacles: the multifarious nature of follow-up query scenarios and the lack of high-quality datasets. Our work summarizes typical follow-up query scenarios and provides a new FollowUp dataset with $1000$ query triples on 120 tables. Moreover, we propose a novel approach FANDA, which takes into account the structures of queries and employs a ranking model with weakly supervised max-margin learning. The experimental results on FollowUp demonstrate the superiority of FANDA over multiple baselines across multiple metrics.
Bayesian optimization (BO) is an effective tool for black-box optimization in which objective function evaluation is usually quite expensive. In practice, lower fidelity approximations of the objective function are often available. Recently, multi-fidelity Bayesian optimization (MFBO) has attracted considerable attention because it can dramatically accelerate the optimization process by using those cheaper observations. We propose a novel information theoretic approach to MFBO. Information-based approaches are popular and empirically successful in BO, but existing studies for information-based MFBO are plagued by difficulty for accurately estimating the information gain. Our approach is based on a variant of information-based BO called max-value entropy search (MES), which greatly facilitates evaluation of the information gain in MFBO. In fact, computations of our acquisition function is written analytically except for one dimensional integral and sampling, which can be calculated efficiently and accurately. We demonstrate effectiveness of our approach by using synthetic and benchmark datasets, and further we show a real-world application to materials science data.
In reinforcement learning, building policies of high-quality is challenging when the feature space of states is small and the training data is limited. Directly transferring data or knowledge from an agent to another agent will not work due to the privacy requirement of data and models. In this paper, we propose a novel reinforcement learning approach to considering the privacy requirement and building Q-network for each agent with the help of other agents, namely federated reinforcement learning (FRL). To protect the privacy of data and models, we exploit Gausian differentials on the information shared with each other when updating their local models. In the experiment, we evaluate our FRL framework in two diverse domains, Grid-world and Text2Action domains, by comparing to various baselines.
Time series forecasting is a crucial component of many important applications, ranging from forecasting the stock markets to energy load prediction. The high-dimensionality, velocity and variety of the data collected in these applications pose significant and unique challenges that must be carefully addressed for each of them. In this work, a novel Temporal Logistic Neural Bag-of-Features approach, that can be used to tackle these challenges, is proposed. The proposed method can be effectively combined with deep neural networks, leading to powerful deep learning models for time series analysis. However, combining existing BoF formulations with deep feature extractors pose significant challenges: the distribution of the input features is not stationary, tuning the hyper-parameters of the model can be especially difficult and the normalizations involved in the BoF model can cause significant instabilities during the training process. The proposed method is capable of overcoming these limitations by a employing a novel adaptive scaling mechanism and replacing the classical Gaussian-based density estimation involved in the regular BoF model with a logistic kernel. The effectiveness of the proposed approach is demonstrated using extensive experiments on a large-scale financial time series dataset that consists of more than 4 million limit orders.
Recently, neural models for information retrieval are becoming increasingly popular. They provide effective approaches for product search due to their competitive advantages in semantic matching. However, it is challenging to use graph-based features, though proved very useful in IR literature, in these neural approaches. In this paper, we leverage the recent advances in graph embedding techniques to enable neural retrieval models to exploit graph-structured data for automatic feature extraction. The proposed approach can not only help to overcome the long-tail problem of click-through data, but also incorporate external heterogeneous information to improve search results. Extensive experiments on a real-world e-commerce dataset demonstrate significant improvement achieved by our proposed approach over multiple strong baselines both as an individual retrieval model and as a feature used in learning-to-rank frameworks.
A graph is a powerful concept for representation of relations between pairs of entities. Data with underlying graph structure can be found across many disciplines and there is a natural desire for understanding such data better. Deep learning (DL) has achieved significant breakthroughs in a variety of machine learning tasks in recent years, especially where data is structured on a grid, such as in text, speech, or image understanding. However, surprisingly little has been done to explore the applicability of DL on arbitrary graph-structured data directly. The goal of this thesis is to investigate architectures for DL on graphs and study how to transfer, adapt or generalize concepts that work well on sequential and image data to this domain. We concentrate on two important primitives: embedding graphs or their nodes into a continuous vector space representation (encoding) and, conversely, generating graphs from such vectors back (decoding). To that end, we make the following contributions. First, we introduce Edge-Conditioned Convolutions (ECC), a convolution-like operation on graphs performed in the spatial domain where filters are dynamically generated based on edge attributes. The method is used to encode graphs with arbitrary and varying structure. Second, we propose SuperPoint Graph, an intermediate point cloud representation with rich edge attributes encoding the contextual relationship between object parts. Based on this representation, ECC is employed to segment large-scale point clouds without major sacrifice in fine details. Third, we present GraphVAE, a graph generator allowing us to decode graphs with variable but upper-bounded number of nodes making use of approximate graph matching for aligning the predictions of an autoencoder with its inputs. The method is applied to the task of molecule generation.
With the wide application of time series databases (TSDB) in big data fields like cluster monitoring and industrial IoT, there have been developed a number of TSDBs for time series data management. Different TSDBs have test reports comparing themselves with other databases to show their advantages, but the comparisons are typically based on their own tools without using a common well-recognized test framework. To the best of our knowledge, there is no mature TSDB benchmark either. With the goal of establishing a standard of evaluating TSDB systems, we present the \emph{IoTDB-Benchmark} framework, specifically designed for TSDB and IoT application scenarios. We pay close attention to some special data ingestion scenarios and summarize 10 basic queries types. We use this benchmark to compare four TSDB systems: InfluxDB, OpenTSDB, KairosDB and TimescaleDB. Our benchmark framework/tool not only measures performance metrics but also takes system resource consumption into consideration.
Four packages in R are analyzed to carry out sentiment analysis. All packages allow to define custom dictionaries. Just one – Sentiment R – properly accounts for the presence of negators.
Recovering pairwise interactions, i.e. pairs of input features whose joint effect on an output is different from the sum of their marginal effects, is central in many scientific applications. We conceptualize a solution to this problem as a two-stage procedure: first, we model the relationship between the features and the output using a flexible hybrid neural network; second, we detect feature interactions from the trained model. For the second step we propose a simple and intuitive interaction measure (IM), which has no specific requirements on the machine learning model used in the first step, only that it defines a mapping from an input to an output. And in a special case it reduces to the averaged Hessian of the input-output mapping. Importantly, our method upper bounds the interaction recovery error with the error of the learning model, which ensures that we can improve the recovered interactions by training a more accurate model. We present analyses of simulated and real-world data which demonstrate the benefits of our method compared to available alternatives, and theoretically analyse its properties and relation to other methods.
Cross-project defect prediction (CPDP) aims to predict defects of projects lacking training data by using prediction models trained on historical defect data from other projects. However, since the distribution differences between datasets from different projects, it is still a challenge to build high-quality CPDP models. Unfortunately, class imbalanced nature of software defect datasets further increases the difficulty. In this paper, we propose a transferlearning oriented minority over-sampling technique (TOMO) based feature weighting transfer naive Bayes (FWTNB) approach (TOMOFWTNB) for CPDP by considering both classimbalance and feature importance problems. Differing from traditional over-sampling techniques, TOMO not only can balance the data but reduce the distribution difference. And then FWTNB is used to further increase the similarity of two distributions. Experiments are performed on 11 public defect datasets. The experimental results show that (1) TOMO improves the average G-Measure by 23.7\%$\sim$41.8\%, and the average MCC by 54.2\%$\sim$77.8\%. (2) feature weighting (FW) strategy improves the average G-Measure by 11\%, and the average MCC by 29.2\%. (3) TOMOFWTNB improves the average G-Measure value by at least 27.8\%, and the average MCC value by at least 71.5\%, compared with existing state-of-theart CPDP approaches. It can be concluded that (1) TOMO is very effective for addressing class-imbalance problem in CPDP scenario; (2) our FW strategy is helpful for CPDP; (3) TOMOFWTNB outperforms previous state-of-the-art CPDP approaches.
Pre-training of models in pruning algorithms plays an important role in pruning decision-making. We find that excessive pre-training is not necessary for pruning algorithms. According to this idea, we propose a pruning algorithm—Incremental pruning based on less training (IPLT). Compared with the traditional pruning algorithm based on a large number of pre-training, IPLT has competitive compression effect than the traditional pruning algorithm under the same simple pruning strategy. On the premise of ensuring accuracy, IPLT can achieve 8x-9x compression for VGG-19 on CIFAR-10 and only needs to pre-train few epochs. For VGG-19 on CIFAR-10, we can not only achieve 10 times test acceleration, but also about 10 times training acceleration. At present, the research mainly focuses on the compression and acceleration in the application stage of the model, while the compression and acceleration in the training stage are few. We newly proposed a pruning algorithm that can compress and accelerate in the training stage. It is novel to consider the amount of pre-training required by pruning algorithm. Our results have implications: Too much pre-training may be not necessary for pruning algorithms.
A character-level convolutional neural network (CNN) motivated by applications in ‘automated machine learning’ (AutoML) is proposed to semantically classify columns in tabular data. Simulated data containing a set of base classes is first used to learn an initial set of weights. Hand-labeled data from the CKAN repository is then used in a transfer-learning paradigm to adapt the initial weights to a more sophisticated representation of the problem (e.g., including more classes). In doing so, realistic data imperfections are learned and the set of classes handled can be expanded from the base set with reduced labeled data and computing power requirements. Results show the effectiveness and flexibility of this approach in three diverse domains: semantic classification of tabular data, age prediction from social media posts, and email spam classification. In addition to providing further evidence of the effectiveness of transfer learning in natural language processing (NLP), our experiments suggest that analyzing the semantic structure of language at the character level without additional metadata—i.e., network structure, headers, etc.—can produce competitive accuracy for type classification, spam classification, and social media age prediction. We present our open-source toolkit SIMON, an acronym for Semantic Inference for the Modeling of ONtologies, which implements this approach in a user-friendly and scalable/parallelizable fashion.
Exploration bonus derived from the novelty of the states in an environment has become a popular approach to motivate exploration for deep reinforcement learning agents in the past few years. Recent methods such as curiosity-driven exploration usually estimate the novelty of new observations by the prediction errors of their system dynamics models. Due to the capacity limitation of the models and difficulty of performing next-frame prediction, however, these methods typically fail to balance between exploration and exploitation in high-dimensional observation tasks, resulting in the agents forgetting the visited paths and exploring those states repeatedly. Such inefficient exploration behavior causes significant performance drops, especially in large environments with sparse reward signals. In this paper, we propose to introduce the concept of optical flow estimation from the field of computer vision to deal with the above issue. We propose to employ optical flow estimation errors to examine the novelty of new observations, such that agents are able to memorize and understand the visited states in a more comprehensive fashion. We compare our method against the previous approaches in a number of experimental experiments. Our results indicate that the proposed method appears to deliver superior and long-lasting performance than the previous methods. We further provide a set of comprehensive ablative analysis of the proposed method, and investigate the impact of optical flow estimation on the learning curves of the DRL agents.
We investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from human societies, in which successful coordination of many individuals is often facilitated by hierarchical organisation, we introduce Feudal Multi-agent Hierarchies (FMH). In this framework, a ‘manager’ agent, which is tasked with maximising the environmentally-determined reward function, learns to communicate subgoals to multiple, simultaneously-operating, ‘worker’ agents. Workers, which are rewarded for achieving managerial subgoals, take concurrent actions in the world. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We find that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use a shared reward function.
We consider strictly stationary stochastic processes of Hilbert space-valued random variables and focus on tests of the equality of the lag-zero autocovariance operators of several independent functional time series. A moving block bootstrap-based testing procedure is proposed which generates pseudo random elements that satisfy the null hypothesis of interest. It is based on directly bootstrapping the time series of tensor products which overcomes some common difficulties associated with applications of the bootstrap to related testing problems. The suggested methodology can be potentially applied to a broad range of test statistics of the hypotheses of interest. As an example, we establish validity for approximating the distribution under the null of a fully functional test statistic based on the Hilbert-Schmidt distance of the corresponding sample lag-zero autocovariance operators, and show consistency under the alternative. As a prerequisite, we prove a central limit theorem for the moving block bootstrap procedure applied to the sample autocovariance operator which is of interest on its own. The finite sample size and power performance of the suggested moving block bootstrap-based testing procedure is illustrated through simulations and an application to a real-life dataset is discussed.
Summary statistics of genome-wide association studies (GWAS) teach causal relationship between millions of genetic markers and tens and thousands of phenotypes. However, underlying biological mechanisms are yet to be elucidated. We can achieve necessary interpretation of GWAS in a causal mediation framework, looking to establish a sparse set of mediators between genetic and downstream variables, but there are several challenges. Unlike existing methods rely on strong and unrealistic assumptions, we tackle practical challenges within a principled summary-based causal inference framework. We analyzed the proposed methods in extensive simulations generated from real-world genetic data. We demonstrated only our approach can accurately redeem causal genes, even without knowing actual individual-level data, despite the presence of competing non-causal trails.
Machine learning has shown much promise in helping improve the quality of medical, legal, and economic decision-making. In these applications, machine learning models must satisfy two important criteria: (i) they must be causal, since the goal is typically to predict individual treatment effects, and (ii) they must be interpretable, so that human decision makers can validate and trust the model predictions. There has recently been much progress along each direction independently, yet the state-of-the-art approaches are fundamentally incompatible. We propose a framework for learning causal interpretable models—from observational data—that can be used to predict individual treatment effects. Our framework can be used with any algorithm for learning interpretable models. Furthermore, we prove an error bound on the treatment effects predicted by our model. Finally, in an experiment on real-world data, we show that the models trained using our framework significantly outperform a number of baselines.
Transformative AI technologies have the potential to reshape critical aspects of society in the near future. However, in order to properly prepare policy initiatives for the arrival of such technologies accurate forecasts and timelines are necessary. A survey was administered to attendees of three AI conferences during the summer of 2018 (ICML, IJCAI and the HLAI conference). The survey included questions for estimating AI capabilities over the next decade, questions for forecasting five scenarios of transformative AI and questions concerning the impact of computational resources in AI research. Respondents indicated a median of 21.5% of human tasks (i.e., all tasks that humans are currently paid to do) can be feasibly automated now, and that this figure would rise to 40% in 5 years and 60% in 10 years. Median forecasts indicated a 50% probability of AI systems being capable of automating 90% of current human tasks in 25 years and 99% of current human tasks in 50 years. The conference of attendance was found to have a statistically significant impact on all forecasts, with attendees of HLAI providing more optimistic timelines with less uncertainty. These findings suggest that AI experts expect major advances in AI technology to continue over the next decade to a degree that will likely have profound transformative impacts on society.