The continually increasing number of complex datasets each year necessitates ever improving machine learning methods for robust and accurate categorization of these data. This paper introduces Random Multimodel Deep Learning (RMDL): a new ensemble, deep learning approach for classification. Deep learning models have achieved state-of-the-art results across many domains. RMDL solves the problem of finding the best deep learning structure and architecture while simultaneously improving robustness and accuracy through ensembles of deep learning architectures. RDML can accept as input a variety data to include text, video, images, and symbolic. This paper describes RMDL and shows test results for image and text data including MNIST, CIFAR-10, WOS, Reuters, IMDB, and 20newsgroup. These test results show that RDML produces consistently better performance than standard methods over a broad range of data types and classification problems.
We present MAESTRO, a framework to describe and analyze CNN dataflows, and predict performance and energy-efficiency when running neural network layers across various hardware configurations. This includes two components: (i) a concise language to describe arbitrary dataflows and (ii) and analysis framework that accepts the dataflow description, hardware resource description, and DNN layer description as inputs and generates buffer requirements, buffer access counts, network-on-chip (NoC) bandwidth requirements, and roofline performance information. We demonstrate both components across several dataflows as case studies.
We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning. We show that our proposed framework conceptually unifies multiple previous methods in exploration. We also derive a practical algorithm that achieves efficient exploration on challenging control tasks.
Word Embeddings have recently imposed themselves as a standard for representing word meaning in NLP. Semantic similarity between word pairs has become the most common evaluation benchmark for these representations, with vector cosine being typically used as the only similarity metric. In this paper, we report experiments with a rank-based metric for WE, which performs comparably to vector cosine in similarity estimation and outperforms it in the recently-introduced and challenging task of outlier detection, thus suggesting that rank-based measures can improve clustering quality.
Artificial neural networks (ANNs) may not be worth their computational/memory costs when used in mobile phones or embedded devices. Parameter-pruning algorithms combat these costs, with some algorithms capable of removing over 90% of an ANN’s weights without harming the ANN’s performance. Removing weights from an ANN is a form of regularization, but existing pruning algorithms do not significantly improve generalization error. We show that pruning ANNs can improve generalization if pruning targets large weights instead of small weights. Applying our pruning algorithm to an ANN leads to a higher image classification accuracy on CIFAR-10 data than applying the popular regularizer dropout. The pruning couples this higher accuracy with an 85% reduction of the ANN’s parameter count.
This paper proposes a causal inference relation and causal programming as general frameworks for causal inference with structural causal models. A tuple, $\langle M, I, Q, F \rangle$, is an instance of the relation if a formula, $F$, computes a causal query, $Q$, as a function of known population probabilities, $I$, in every model entailed by a set of model assumptions, $M$. Many problems in causal inference can be viewed as the problem of enumerating instances of the relation that satisfy given criteria. This unifies a number of previously studied problems, including causal effect identification, causal discovery and recovery from selection bias. In addition, the relation supports formalizing new problems in causal inference with structural causal models, such as the problem of research design. Causal programming is proposed as a further generalization of causal inference as the problem of finding optimal instances of the relation, with respect to a cost function.
The problem of aspect-based sentiment analysis deals with classifying sentiments (negative, neutral, positive) for a given aspect in a sentence. A traditional sentiment classification task involves treating the entire sentence as a text document and classifying sentiments based on all the words. Let us assume, we have a sentence such as ‘the acceleration of this car is fast, but the reliability is horrible’. This can be a difficult sentence because it has two aspects with conflicting sentiments about the same entity. Considering machine learning techniques (or deep learning), how do we encode the information that we are interested in one aspect and its sentiment but not the other? Let us explore various pre-processing steps, features, and methods used to facilitate in solving this task.
This work presents the first algorithm for the problem of weighted online perfect bipartite matching with i.i.d. arrivals. Previous work only considered adversarial arrival sequences. In this problem, we are given a known set of workers, a distribution over job types, and non-negative utility weights for each worker, job type pair. At each time step, a job is drawn i.i.d. from the distribution over job types. Upon arrival, the job must be irrevocably assigned to a worker. The goal is to maximize the expected sum of utilities after all jobs are assigned. Our work is motivated by the application of ride-hailing, where jobs represent passengers and workers represent drivers. We introduce \algname{}, a 0.5-competitive, randomized algorithm and prove that 0.5-competitive is the best possible. \algname{} first selects a ‘preferred worker’ and assign the job to this worker if it is available. The preferred worker is determined based on an optimal solution to a fractional transportation problem. If the preferred worker is not available, \algname{} randomly selects a worker from the available workers. We show that \algname{} maintains a uniform distribution over the workers even when the distribution over the job types is non-uniform.
Causal processes in nature may contain cycles, and real datasets may violate causal sufficiency as well as contain selection bias. No constraint-based causal discovery algorithm can currently handle cycles, latent variables and selection bias (CLS) simultaneously. I therefore introduce an algorithm called Cyclic Causal Inference (CCI) that makes sound inferences with a conditional independence oracle under CLS, provided that we can represent the cyclic causal process as a non-recursive linear structural equation model with independent errors. Empirical results show that CCI outperforms CCD in the cyclic case as well as rivals FCI and RFCI in the acyclic case.
We introduce an algorithmic method for population anomaly detection based on gaussianization through an adversarial autoencoder. This method is applicable to detection of `soft’ anomalies in arbitrarily distributed highly-dimensional data. A soft, or population, anomaly is characterized by a shift in the distribution of the data set, where certain elements appear with higher probability than anticipated. Such anomalies must be detected by considering a sufficiently large sample set rather than a single sample. Applications include, but not limited to, payment fraud trends, data exfiltration, disease clusters and epidemics, and social unrests. We evaluate the method on several domains and obtain both quantitative results and qualitative insights.
Existing Semantic Desktops are still reproached for being too complicated to use or not scaling well. Besides, a real ‘killer app’ is still missing. In this paper, we present a new prototype inspired by NEPOMUK and its successors having a semantic graph and ontologies as its basis. In addition, we introduce the idea of context spaces that users can directly interact with and work on. To make them available in all applications without further ado, the system is transparently integrated using mostly standard protocols complemented by a sidebar for advanced features. By exploiting collected context information and applying Managed Forgetting features (like hiding, condensation or deletion), the system is able to dynamically reorganize itself, which also includes a kind of tidy-up-itself functionality. We therefore expect it to be more scalable while providing new levels of user support. An early prototype has been implemented and is presented in this demo.
For extracting meaningful topics from texts, their structures should be considered properly. In this paper, we aim to analyze structured time-series documents such as a collection of news articles and a series of scientific papers, wherein topics evolve along time depending on multiple topics in the past and are also related to each other at each time. To this end, we propose a dynamic and static topic model, which simultaneously considers the dynamic structures of the temporal topic evolution and the static structures of the topic hierarchy at each time. We show the results of experiments on collections of scientific papers, in which the proposed method outperformed conventional models. Moreover, we show an example of extracted topic structures, which we found helpful for analyzing research activities.