Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy, but lacks consideration of computational resource use. We propose the Resource-Efficient Neural Architect (RENA), an efficient resource-constrained NAS using reinforcement learning with network embedding. RENA uses a policy network to process the network embeddings to generate new configurations. We demonstrate RENA on image recognition and keyword spotting (KWS) problems. RENA can find novel architectures that achieve high performance even with tight resource constraints. For CIFAR10, it achieves 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size is less than 3M parameters. For Google Speech Commands Dataset, RENA achieves the state-of-the-art accuracy without resource constraints, and it outperforms the optimized architectures with tight resource constraints.
Deep Learning methods are currently the state-of-the-art in many problems which can be tackled via machine learning, in particular classification problems. However there is still lack of understanding on how those methods work, why they work and what are the limitations involved in using them. In this chapter we will describe in detail the transition from shallow to deep networks, include examples of code on how to implement them, as well as the main issues one faces when training a deep network. Afterwards, we introduce some theoretical background behind the use of deep models, and discuss their limitations.
In this paper we introduce the class of beta seasonal autoregressive moving average ($\beta$SARMA) models for modeling and forecasting time series data that assume values in the standard unit interval. It generalizes the class of beta autoregressive moving average models [Rocha and Cribari-Neto, Test, 2009] by incorporating seasonal dynamics to the model dynamic structure. Besides introducing the new class of models, we develop parameter estimation, hypothesis testing inference, and diagnostic analysis tools. We also discuss out-of-sample forecasting. In particular, we provide closed-form expressions for the conditional score vector and for the conditional Fisher information matrix. We also evaluate the finite sample performances of conditional maximum likelihood estimators and white noise tests using Monte Carlo simulations. An empirical application is presented and discussed.
Since Bartik (1991), it has become popular in empirical studies to estimate regressions in which the variable of interest is a shift-share, such as when a regional labor market outcome is regressed on a weighted average of observed sectoral shocks, using the regional sector shares as weights. In this paper, we discuss inference in these regressions. We show that standard economic models imply that the regression residuals are likely to be correlated across regions with similar sector shares, independently of their geographic location. These correlations are ignored by inference procedures commonly used in these regressions, which can lead to severe undercoverage. In regressions studying the effect of randomly generated placebo sectoral shocks on actual labor market outcomes in U.S. commuting zones, we find that a 5% level significance test based on standard errors clustered at the state level rejects the null of no effect in up to 45% of the placebo interventions. We derive novel confidence intervals that correctly account for the potential correlation in the regression residuals.
To-do lists are a popular medium for personal information management. As to-do tasks are increasingly tracked in electronic form with mobile and desktop organizers, so does the potential for software support for the corresponding tasks by means of intelligent agents. While there has been work in the area of personal assistants for to-do tasks, no work has focused on classifying user intention and information extraction as we do. We show that our methods perform well across two corpora that span sub-domains, one of which we released.
To generate trust with their users, Explainable Artificial Intelligence (XAI) systems need to include an explanation model that can communicate the internal decisions, behaviours and actions to the interacting humans. Successful explanation involves both cognitive and social processes. In this paper we focus on the challenge of meaningful interaction between an explainer and an explainee and investigate the structural aspects of an explanation in order to propose a human explanation dialog model. We follow a bottom-up approach to derive the model by analysing transcripts of 398 different explanation dialog types. We use grounded theory to code and identify key components of which an explanation dialog consists. We carry out further analysis to identify the relationships between components and sequences and cycles that occur in a dialog. We present a generalized state model obtained by the analysis and compare it with an existing conceptual dialog model of explanation.
A cognitive model of human learning provides information about skills a learner must acquire to perform accurately in a task domain. Cognitive models of learning are not only of scientific interest, but are also valuable in adaptive online tutoring systems. A more accurate model yields more effective tutoring through better instructional decisions. Prior methods of automated cognitive model discovery have typically focused on well-structured domains, relied on student performance data or involved substantial human knowledge engineering. In this paper, we propose Cognitive Representation Learner (CogRL), a novel framework to learn accurate cognitive models in ill-structured domains with no data and little to no human knowledge engineering. Our contribution is two-fold: firstly, we show that representations learnt using CogRL can be used for accurate automatic cognitive model discovery without using any student performance data in several ill-structured domains: Rumble Blocks, Chinese Character, and Article Selection. This is especially effective and useful in domains where an accurate human-authored cognitive model is unavailable or authoring a cognitive model is difficult. Secondly, for domains where a cognitive model is available, we show that representations learned through CogRL can be used to get accurate estimates of skill difficulty and learning rate parameters without using any student performance data. These estimates are shown to highly correlate with estimates using student performance data on an Article Selection dataset.
Random forest and deep neural network are two schools of effective classification methods in machine learning. While the random forest is robust irrespective of the data domain, the deep neural network has advantages in handling high dimensional data. In view that a differentiable neural decision forest can be added to the neural network to fully exploit the benefits of both models, in our work, we further combine convolutional autoencoder with neural decision forest, where autoencoder has its advantages in finding the hidden representations of the input data. We develop a gradient boost module and embed it into the proposed convolutional autoencoder with neural decision forest to improve the performance. The idea of gradient boost is to learn and use the residual in the prediction. In addition, we design a structure to learn the parameters of the neural decision forest and gradient boost module at contiguous steps. The extensive experiments on several public datasets demonstrate that our proposed model achieves good efficiency and prediction performance compared with a series of baseline methods.
This paper evaluates eight parallel graph processing systems: Hadoop, HaLoop, Vertica, Giraph, GraphLab (PowerGraph), Blogel, Flink Gelly, and GraphX (SPARK) over four very large datasets (Twitter, World Road Network, UK 200705, and ClueWeb) using four workloads (PageRank, WCC, SSSP and K-hop). The main objective is to perform an independent scale-out study by experimentally analyzing the performance, usability, and scalability (using up to 128 machines) of these systems. In addition to performance results, we discuss our experiences in using these systems and suggest some system tuning heuristics that lead to better performance.
The past decade has seen development of many shared-memory graph processing frameworks intended to reduce the effort for creating high performance parallel applications. However, their programming models, based on Vertex-centric or Edge-centric paradigms suffer from several issues, such as poor cache utilization, irregular memory accesses, heavy use of synchronization primitives or theoretical inefficiency, that deteriorate the performance and scalability of applications. Recently, a cache-efficient partition-centric paradigm was proposed for computing PageRank. In this paper, we generalize this approach to develop a novel Partition-centric Programming Model(PPM) that is cache-efficient, scalable and work-efficient. We implement PPM as part of Graph Processing over Partitions(GPOP) framework that can efficiently execute a variety of algorithms. GPOP dramatically improves the cache performance by exploiting locality of partitioning. It achieves high scalability by enabling completely lock and atomic free computation. Its built-in analytical performance models enable it to use a hybrid of source and destination centric communication modes in a way that ensures work-efficiency of each iteration and simultaneously boosts high bandwidth sequential memory accesses. GPOP framework completely abstracts away underlying parallelism and programming model details from the user. It provides an easy to program set of APIs with the ability to selectively continue the active vertex set across iterations, which is not intrinsically supported by the current frameworks. We extensively evaluate the performance of GPOP for a variety of graph algorithms, using several large datasets. We observe that GPOP incurs upto 8.6x and 5.2x less L2 cache misses compared to Ligra and GraphMat, respectively. In terms of execution time, GPOP is upto 19x and 6.1x faster than Ligra and GraphMat, respectively.
In many natural language generation tasks, incorporating additional knowledge like lexical constraints into the model’s output is significant, which take the form of phrases or words that must be present in the output sequence. Unfortunately, existing neural language model cannot be used directly to generate lexically constrained sentences. In this paper, we propose a new algorithmic framework called BFGAN to address this challenge. We employ a backward generator and a forward generator to generate lexically constrained sentence together, and use a discriminator to guide the joint training of two generators by assigning them reward signals. Experimental results on automatic and human evaluation demonstrate significant improvements over previous baselines.
This paper presents a study on detecting cyberattacks on industrial control systems (ICS) using unsupervised deep neural networks, specifically, convolutional neural networks. The study was performed on a SecureWater Treatment testbed (SWaT) dataset, which represents a scaled-down version of a real-world industrial water treatment plant. e suggest a method for anomaly detection based on measuring the statistical deviation of the predicted value from the observed value.We applied the proposed method by using a variety of deep neural networks architectures including different variants of convolutional and recurrent networks. The test dataset from SWaT included 36 different cyberattacks. The proposed method successfully detects the vast majority of the attacks with a low false positive rate thus improving on previous works based on this data set. The results of the study show that 1D convolutional networks can be successfully applied to anomaly detection in industrial control systems and outperform more complex recurrent networks while being much smaller and faster to train.
With the rapid development of deep learning, deep reinforcement learning (DRL) began to appear in the field of resource scheduling in recent years. Based on the previous research on DRL in the literature, we introduce online resource scheduling algorithm DeepRM2 and the offline resource scheduling algorithm DeepRM_Off. Compared with the state-of-the-art DRL algorithm DeepRM and heuristic algorithms, our proposed algorithms have faster convergence speed and better scheduling efficiency with regarding to average slowdown time, job completion time and rewards.
Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performances in applications such as image classification and language modeling. However, these techniques typically ignore device-related objectives such as inference time, memory usage, and power consumption. Optimizing neural architecture for device-related objectives is immensely crucial for deploying deep networks on portable devices with limited computing resources. We propose DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related (e.g., inference time and memory usage) and device-agnostic (e.g., accuracy and model size) objectives. DPP-Net employs a compact search space inspired by current state-of-the-art mobile CNNs, and further improves search efficiency by adopting progressive search (Liu et al. 2017). Experimental results on CIFAR-10 are poised to demonstrate the effectiveness of Pareto-optimal networks found by DPP-Net, for three different devices: (1) a workstation with Titan X GPU, (2) NVIDIA Jetson TX1 embedded system, and (3) mobile phone with ARM Cortex-A53. Compared to CondenseNet and NASNet-A (Mobile), DPP-Net achieves better performances: higher accuracy and shorter inference time on various devices. Additional experimental results show that models found by DPP-Net also achieve considerably-good performance on ImageNet as well.
In the scientific digital libraries, some papers from different research communities can be described by community-dependent keywords even if they share a semantically similar topic. Articles that are not tagged with enough keyword variations are poorly indexed in any information retrieval system which limits potentially fruitful exchanges between scientific disciplines. In this paper, we introduce a novel experimentally designed pipeline for multi-label semantic-based tagging developed for open-access metadata digital libraries. The approach starts by learning from a standard scientific categorization and a sample of topic tagged articles to find semantically relevant articles and enrich its metadata accordingly. Our proposed pipeline aims to enable researchers reaching articles from various disciplines that tend to use different terminologies. It allows retrieving semantically relevant articles given a limited known variation of search terms. In addition to achieving an accuracy that is higher than an expanded query based method using a topic synonym set extracted from a semantic network, our experiments also show a higher computational scalability versus other comparable techniques. We created a new benchmark extracted from the open-access metadata of a scientific digital library and published it along with the experiment code to allow further research in the topic.
We propose an efficient graph-based divisive cluster analysis approach called sampling clustering. It constructs a lite informative dendrogram by recursively dividing a graph into subgraphs. In each recursive call, a graph is sampled first with a set of vertices being removed to disconnect latent clusters, then condensed by adding edges to the remaining vertices to avoid graph fragmentation caused by vertex removals. We also present some sampling and condensing methods and discuss the effectiveness in this paper. Our implementations run in linear time and achieve outstanding performance on various types of datasets. Experimental results show that they outperform state-of-the-art clustering algorithms with significantly less computing resources requirements.
To test the effectiveness of process discovery algorithms, a Process Discovery Contest (PDC) has been set up. This PDC uses a classification approach to measure this effectiveness: The better the discovered model can classify whether or not a new trace conforms to the event log, the better the discovery algorithm is supposed to be. Unfortunately, even the state-of-the-art fully-automated discovery algorithms score poorly on this classification. Even the best of these algorithms, the Inductive Miner, scored only 147 correct classified traces out of 200 traces on the PDC of 2017. This paper introduces the rule-based log skeleton model, which is closely related to the Declare constraint model, together with a way to classify traces using this model. This classification using log skeletons is shown to score better on the PDC of 2017 than state-of-the-art discovery algorithms: 194 out of 200. As a result, one can argue that the fully-automated algorithm to construct (or: discover) a log skeleton from an event log outperforms existing state-of-the-art fully-automated discovery algorithms.
Personalized probabilistic forecasts of time to event (such as mortality) can be crucial in decision making, especially in the clinical setting. Inspired by ideas from the meteorology literature, we approach this problem through the paradigm of maximizing sharpness of prediction distributions, subject to calibration. In regression problems, it has been shown that optimizing the continuous ranked probability score (CRPS) instead of maximum likelihood leads to sharper prediction distributions while maintaining calibration. We introduce the Survival-CRPS, a generalization of the CRPS to the time to event setting, and present right-censored and interval-censored variants. To holistically evaluate the quality of predicted distributions over time to event, we present the Survival-AUPRC evaluation metric, an analog to area under the precision-recall curve. We apply these ideas by building a recurrent neural network for mortality prediction, using an Electronic Health Record dataset covering millions of patients. We demonstrate significant benefits in models trained by the Survival-CRPS objective instead of maximum likelihood.