Documents

( | 1 | 4 | 5 | 7 | 8 | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | Y | Z
|Documents| = 1778

(

(Machine) Learning to Do More with Less Determining the best method for training a machine learning algorithm is critical to maximizing its ability to classify data. In this paper, we compare the standard ‘fully supervised’ approach (that relies on knowledge of event-by-event truth-level labels) with a recent proposal that instead utilizes class ratios as the only discriminating information provided during training. This so-called ‘weakly supervised’ technique has access to less information than the fully supervised method and yet is still able to yield impressive discriminating power. In addition, weak supervision seems particularly well suited to particle physics since quantum mechanics is incompatible with the notion of mapping an individual event onto any single Feynman diagram. We examine the technique in detail — both analytically and numerically — with a focus on the robustness to issues of mischaracterizing the training samples. Weakly supervised networks turn out to be remarkably insensitive to systematic mismodeling. Furthermore, we demonstrate that the event level outputs for weakly versus fully supervised networks are probing different kinematics, even though the numerical quality metrics are essentially identical. This implies that it should be possible to improve the overall classification ability by combining the output from the two types of networks. For concreteness, we apply this technology to a signature of beyond the Standard Model physics to demonstrate that all these impressive features continue to hold in a scenario of relevance to the LHC. Example code is provided at https://…/master.

1

10 Tips to Create Useful and Beautiful Visualizations (Slide Deck)

4

4 Steps to Successfully Evaluating Business Analytics Software The goal of Business Analytics and Intelligence software is to help businesses access, analyze and visualize data, and then communicate those insights in meaningful dashboards and metrics. Unfortunately, the reality is that the majority of software options on the market today provide only a subset of that functionality. And those that provide a more comprehensive solution, tend to then lack the features that make it user-friendly. With a crowded marketplace, businesses need to go through a complex evaluation process and make some fundamental technology decisions before selecting a vendor. Finding a business intelligence (BI) software that will scale with your organization´s needs may seem like an impossible task. Here are the four questions you can ask when beginning the BI evaluation process that will save you a lot of time and help set you in the right direction.

5

5 Best Practices for Creating Effective Dashboards You´ve been there: no matter how many reports, formal meetings, casual conversations or emailed memos, someone important inevitably claims they didn´t know about some important fact or insight and says ‘we should have a dashboard to monitor the performance of X.’ Or maybe you´ve been here: you´ve said ‘yes, let´s have a dashboard. It will help us improve return on investment (ROI) if everyone can see how X is performing and be able to quickly respond. I´ll update it weekly.’ Unfortunately, by week 3, you realize you´re killing several hours a week integrating data from multiple sources to update a dashboard you´re not sure anyone is actually using. Yet, dashboards have been all the rage and with good reason. They can help you and your coworkers achieve a better grasp on the data – one of your most important, and often overlooked assets. You´ve read how they help organizations get on the same page, speed decision-making and improve ROI. They help create organizational alignment because everyone is looking at the same thing. So dashboards can be effective. They can work. The question becomes: How can you get one to work for you Focus on these 5 best practices. Equally important, keep an eye on the 7 critical mistakes you don´t want to make.
50 years of Data Science More than 50 years ago, John Tukey called for a reformation of academic statistics. In `The Future of Data Analysis’, he pointed to the existence of an as-yet unrecognized science, whose subject of interest was learning from data, or `data analysis’. Ten to twenty years ago, John Chambers, Bill Cleveland and Leo Breiman independently once again urged academic statistics to expand its boundaries beyond the classical domain of theoretical statistics; Chambers called for more emphasis on data preparation and presentation rather than statistical modeling; and Breiman called for emphasis on prediction rather than inference. Cleveland even suggested the catchy name \Data Science’ for his envisioned eld. A recent and growing phenomenon is the emergence of \Data Science’ programs at major universities, including UC Berkeley, NYU, MIT, and most recently the Univ. of Michigan, which on September 8, 2015 announced a $100M \Data Science Initiative’ that will hire 35 new faculty. Teaching in these new programs has signi cant overlap in curricular subject matter with tradi- tional statistics courses; in general, though, the new initiatives steer away from close involvement with academic statistics departments. This paper reviews some ingredients of the current \Data Science moment’, including recent commentary about data science in the popular media, and about how/whether Data Science is really di erent from Statistics. The now-contemplated eld of Data Science amounts to a superset of the elds of statistics and machine learning which adds some technology for `scaling up’ to `big data’. This chosen superset is motivated by commercial rather than intellectual developments. Choosing in this way is likely to miss out on the really important intellectual event of the next fty years. Because all of science itself will soon become data that can be mined, the imminent revolution in Data Science is not about mere `scaling up’, but instead the emergence of scienti c studies of data analysis science-wide. In the future, we will be able to predict how a proposal to change data analysis work ows would impact the validity of data analysis across all of science, even predicting the impacts eld-by- eld. Drawing on work by Tukey, Cleveland, Chambers and Breiman, I present a vision of data science based on the activities of people who are `learning from data’, and I describe an academic eld dedicated to improving that activity in an evidence-based manner. This new eld is a better academic enlargement of statistics and machine learning than today’s Data Science Initiatives, while being able to accommodate the same short-term goals.

7

7 Signs You Need Advanced Analytics for Salesforce.com (or any CRM) and Why They Matter Sure, customer relationship management (CRM) applications provide reports and dashboards. But if you rely on the built-in analytic capabilities of CRM, you´re leaving money on the table. Because that´s what the information in your CRM system is; it´s money. But you can´t extract the true value of that information without an analytics application that does the heavy lifting without putting your sales team through hell. You also want your sales team to stay in your CRM application. That was the point. Remember, all CRM, all the time. Directing the team to another application for analytic insight just defeats the purpose. What you need are robust, easy-to-access analytics embedded right in your CRM solution. Following are seven signs that you are not operating efficiently and making reporting and analytics more difficult for your sales team and your business less productive. Don´t ignore these seven warning signs. They all carry one message: Yes, you need advanced analytics!
7 Tips to Succeed with Big Data in 2014 Just when you thought big data couldn´t get any bigger, it got bigger still. Regardless of its actual size, big data is showing its value. Organizations everywhere have big data of all shapes and sizes. They recognize the importance, the opportunity, and even the imperative to pay attention. It has become clear that big data will outlive those who ignore it. Organizations that have already tamed big data – the multi-structured mass they stored before they knew its worth – are improving their operational efficiency, growing their revenues, and empowering new business models. How do they do it Their techniques for success can be summarized in seven tips.

8

8 Critical Metrics for Measuring App User Engagement In this guide, we outline for you the eight engagement metrics critical to app success, including suggestions for running marketing campaigns and boosting ROI.

A

A Benchmark of Selected Algorithmic Differentiation Tools on Some Problems in Computer Vision and Machine Learning Algorithmic differentiation (AD) allows exact computation of derivatives given only an implementation of an objective function. Although many AD tools are available, a proper and efficient implementation of AD methods is not straightforward. The existing tools are often too different to allow for a general test suite. In this paper, we compare fifteen ways of computing derivatives including eleven automatic differentiation tools implementing various methods and written in various languages (C++, F#, MATLAB, Julia and Python), two symbolic differentiation tools, finite differences, and hand-derived computation. We look at three objective functions from computer vision and machine learning. These objectives are for the most part simple, in the sense that no iterative loops are involved, and conditional statements are encapsulated in functions such as {\tt abs} or {\tt logsumexp}. However, it is important for the success of algorithmic differentiation that such `simple’ objective functions are handled efficiently, as so many problems in computer vision and machine learning are of this form. Of course, our results depend on programmer skill, and familiarity with the tools. However, we contend that this paper presents an important datapoint: a skilled programmer devoting roughly a week to each tool produced the timings we present. We have made our implementations available as open source to allow the community to replicate and update these benchmarks.
A Brief Introduction to Machine Learning for Engineers This monograph aims at providing an introduction to key concepts, algorithms, and theoretical frameworks in machine learning, including supervised and unsupervised learning, statistical learning theory, probabilistic graphical models and approximate inference. The intended readership consists of electrical engineers with a background in probability and linear algebra. The treatment builds on first principles, and organizes the main ideas according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, directed and undirected models, and convex and non-convex optimization. The mathematical framework uses information-theoretic measures as a unifying tool. The text offers simple and reproducible numerical examples providing insights into key motivations and conclusions. Rather than providing exhaustive details on the existing myriad solutions in each specific category, for which the reader is referred to textbooks and papers, this monograph is meant as an entry point for an engineer into the literature on machine learning.
A Brief Survey of Deep Reinforcement Learning Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and algorithms are required to discover useful patterns. Text mining is the task of extracting meaningful information from text, which has gained significant attentions in recent years. In this paper, we describe several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering. Additionally, we briefly explain text mining in biomedical and health care domains.
A Closer Look at Memorization in Deep Networks We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
A Comparative Study of Association Rule Mining Algorithms on Grid and Cloud Platform Association rule mining is a time consuming process due to involving both data intensive and computation intensive nature. In order to mine large volume of data and to enhance the scalability and performance of existing sequential association rule mining algorithms, parallel and distributed algorithms are developed. These traditional parallel and distributed algorithms are based on homogeneous platform and are not lucrative for heterogeneous platform such as grid and cloud. This requires design of new algorithms which address the issues of good data set partition and distribution, load balancing strategy, optimization of communication and synchronization technique among processors in such heterogeneous system. Grid and cloud are the emerging platform for distributed data processing and various association rule mining algorithms have been proposed on such platforms. This survey article integrates the brief architectural aspect of distributed system, various recent approaches of grid based and cloud based association rule mining algorithms with comparative perception. We differentiate between approaches of association rule mining algorithms developed on these architectures on the basis of data locality, programming paradigm, fault tolerance, communication cost, partition and distribution of data sets. Although it is not complete in order to cover all algorithms, yet it can be very useful for the new researchers working in the direction of distributed association rule mining algorithms.
A comparative study of fuzzy c-means algorithm and entropy-based fuzzy clustering algorithms Fuzzy clustering is useful to mine complex and multi-dimensional data sets, where the members have partial or fuzzy relations. Among the various developed techniques, fuzzy-C-means (FCM) algorithm is the most popular one, where a piece of data has partial membership with each of the pre-defined cluster centers. Moreover, in FCM, the cluster centers are virtual, that is, they are chosen at random and thus might be out of the data set. The cluster centers and membership values of the data points with them are updated through some iterations. On the other hand, entropy-based fuzzy clustering (EFC) algorithm works based on a similarity-threshold value. Contrary to FCM, in EFC, the cluster centers are real, that is, they are chosen from the data points. In the present paper, the performances of these algorithms have been compared on four data sets, such as IRIS, WINES, OLITOS and psychosis (collected with the help of forty doctors), in terms of the quality of the clusters (that is, discrepancy factor, compactness, distinctness) obtained and their computational time. Moreover, the best set of clusters has been mapped into 2-D for visualization using a self-organizing map (SOM).
A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems Between matrix factorization or Random Walk with Restart (RWR), which method works better for recommender systems Which method handles explicit or implicit feedback data better Does additional side information help recommen- dation Recommender systems play an important role in many e-commerce services such as Amazon and Netflix to recommend new items to a user. Among various recommendation strategies, collaborative filtering has shown good performance by using rating patterns of users. Matrix factorization and random walk with restart are the most representative collaborative filtering methods. However, it is still unclear which method provides better recommendation performance despite their extensive utility. In this paper, we provide a comparative study of matrix factorization and RWR in recommender systems. We exactly formulate each correspondence of the two methods according to various tasks in recommendation. Especially, we newly devise an RWR method using global bias term which corresponds to a matrix factorization method using biases. We describe details of the two methods in various aspects of recommendation quality such as how those methods handle cold-start problem which typ- ically happens in collaborative filtering. We extensively perform experiments over real-world datasets to evaluate the performance of each method in terms of various measures. We observe that matrix factorization performs better with explicit feedback ratings while RWR is better with implicit ones. We also observe that exploiting global popularities of items is advantageous in the performance and that side information produces positive synergy with explicit feedback but gives negative effects with implicit one.
A Comparative Study of Recommendation Algorithms in Ecommerce Applications We evaluate a wide range of recommendation algorithms on e-commerce-related datasets. These algorithms include the popular user-based and item-based correlation/similarity algorithms as well as methods designed to work with sparse transactional data. Data sparsity poses a significant challenge to recommendation approaches when applied in ecommerce applications. We experimented with approaches such as dimensionality reduction, generative models, and spreading activation, which are designed to meet this challenge. In addition, we report a new recommendation algorithm based on link analysis. Initial experimental results indicate that the link analysis-based algorithm achieves the best overall performance across several e-commerce datasets.
A Comparative Study on using Principle Component Analysis with Different Text Classifiers Text categorization (TC) is the task of automatically organizing a set of documents into a set of pre-defined categories. Over the last few years, increased attention has been paid to the use of documents in digital form and this makes text categorization becomes a challenging issue. The most significant problem of text categorization is its huge number of features. Most of these features are redundant, noisy and irrelevant that cause over fitting with most of the classifiers. Hence, feature extraction is an important step to improve the overall accuracy and the performance of the text classifiers. In this paper, we will provide an overview of using principle component analysis (PCA) as a feature extraction with various classifiers. It was observed that the performance rate of the classifiers after using PCA to reduce the dimension of data improved. Experiments are conducted on three UCI data sets, Classic03, CNAE-9 and DBWorld e-mails. We compare the classification performance results of using PCA with popular and well-known text classifiers. Results show that using PCA encouragingly enhances classification performance on most of the classifiers.
A Comparative Survey of Recent Natural Language Interfaces for Databases Over the last few years natural language interfaces (NLI) for databases have gained significant traction both in academia and industry. These systems use very different approaches as described in recent survey papers. However, these systems have not been systematically compared against a set of benchmark questions in order to rigorously evaluate their functionalities and expressive power. In this paper, we give an overview over 24 recently developed NLIs for databases. Each of the systems is evaluated using a curated list of ten sample questions to show their strengths and weaknesses. We categorize the NLIs into four groups based on the methodology they are using: keyword-, pattern-, parsing-, and grammar-based NLI. Overall, we learned that keyword-based systems are enough to answer simple questions. To solve more complex questions involving subqueries, the system needs to apply some sort of parsing to identify structural dependencies. Grammar-based systems are overall the most powerful ones, but are highly dependent on their manually designed rules. In addition to providing a systematic analysis of the major systems, we derive lessons learned that are vital for designing NLIs that can answer a wide range of user questions.
A comparison of algorithms for the multivariate L1-median The L1-median is a robust estimator of multivariate location with good statistical properties. Several algorithms for computing the L1-median are available. Problem specific algorithms can be used, but also general optimization routines. The aim is to compare different algorithms with respect to their precision and runtime. This is possible because all considered algorithms have been implemented in a standardized manner in the open source environment R. In most situations, the algorithm based on the optimization routine NLM (non-linear minimization) clearly outperforms other approaches. Its low computation time makes applications for large and high-dimensional data feasible.
A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and requires only a parallel corpus of speech and text for training. However, unlike in conventional approaches that combine separate acoustic and language models, it is not clear how to use additional (unpaired) text. While there has been previous work on methods addressing this problem, a thorough comparison among methods is still lacking. In this paper, we compare a suite of past methods and some of our own proposed methods for using unpaired text data to improve encoder-decoder models. For evaluation, we use the medium-sized Switchboard data set and the large-scale Google voice search and dictation data sets. Our results confirm the benefits of using unpaired text across a range of methods and data sets. Surprisingly, for first-pass decoding, the rather simple approach of shallow fusion performs best across data sets. However, for Google data sets we find that cold fusion has a lower oracle error rate and outperforms other approaches after second-pass rescoring on the Google voice search data set.
A Composite Model for Computing Similarity Between Texts Computing text similarity is a foundational technique for a wide range of tasks in natural language processing such as duplicate detection, question answering, or automatic essay grading. Just recently, text similarity received wide-spread attention in the research community by the establishment of the Semantic Textual Similarity (STS) Task at the Semantic Evaluation (SemEval) workshop in 2012 – a fact that stresses the importance of text similarity research. The goal of the STS Task is to create automated measures which are able to compute the degree of similarity between two given texts in the same way that humans do. Measures are thereby expected to output continuous text similarity scores, which are then either compared with human judgments or used as a means for solving a particular problem. We start this thesis with the observation that while the concept of similarity is well grounded in psychology, text similarity is much less well-defined in the natural language processing community. No attempt has been made yet to formalize in what way text similarity between two texts can be computed. Still, text similarity is regarded as a fixed, axiomatic notion in the community. To alleviate this shortcoming, we describe existing formal models of similarity and discuss how we can adapt them to texts. We propose to judge text similarity along multiple text dimensions, i.e. characteristics inherent to texts, and provide empirical evidence based on a set of annotation studies that the proposed dimensions are perceived by humans. We continue with a comprehensive survey of state-of-the-art text similarity measures previously proposed in the literature. To the best of our knowledge, no such survey has been done yet. We propose a classification into compositional and noncompositional text similarity measures according to their inherent properties. Compositional measures compute text similarity based on pairwise word similarity scores between all words which are then aggregated to an overall similarity score, while noncompositional measures project the complete texts onto particular models and then compare the texts based on these models. Based on our theoretical insights, we then present the implementation of a text similarity system which composes a multitude of text similarity measures along multiple text dimensions using a machine learning classifier. Depending on the concrete task at hand, we argue that such a system may need to address more than a single text dimension in order to best resemble human judgments. Our efforts culminate in the open source framework DKPro Similarity, which streamlines the development of text similarity measures and experimental setups. We apply our system in two evaluations, for which it consistently outperforms prior work and competing systems: an intrinsic and an extrinsic evaluation. In the intrinsic evaluation, the performance of text similarity measures is evaluated in an isolated setting by comparing the algorithmically produced scores with human judgments. We conducted the intrinsic evaluation in the context of the STS Task as part of the SemEval workshop. In the extrinsic evaluation, the performance of text similarity measures is evaluated with respect to a particular task at hand, where text similarity is a means for solving a particular problem. We conducted the extrinsic evaluation in the text classification task of text reuse detection. The results of both evaluations support our hypothesis that a composition of text similarity measures highly benefits the similarity computation process. Finally, we stress the importance of text similarity measures for real-world applications. We therefore introduce the application scenario Self-Organizing Wikis, where users of wikis, i.e. web-based collaborative content authoring systems, are supported in their everyday tasks by means of natural language processing techniques in general, and text similarity in particular. We elaborate on two use cases where text similarity computation is particularly beneficial: the detection of duplicates, and the semi-automatic insertion of hyperlinks. Moreover, we discuss two further applications where text similarity is a valuable tool: In both question answering and textual entailment recognition, text similarity has been used successfully in experiments and appears to be a promising means for further research in these fields. We conclude this thesis with an analysis of shortcomings of current text similarity research and formulate challenges which should be tackled by future work. In particular, we believe that computing text similarity along multiple text dimensions – which depend on the specific task at hand – will benefit any other task where text similarity is fundamental, as a composition of text similarity measures has shown superior performance in both the intrinsic as well as the extrinsic evaluation.
A Comprehensive Analysis of Deep Regression Deep learning revolutionized data science, and recently, its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks such as human pose estimation did not escape this methodological change. The large number of deep architectures lead to a plethora of methods that are evaluated under different experimental protocols. Moreover, small changes in the architecture of the network, or in the data pre-processing procedure, together with the stochastic nature of the optimization methods, lead to notably different results, making extremely difficult to sift methods that significantly outperform others. Therefore, when proposing regression algorithms, practitioners proceed by trial-and-error. This situation motivated the current study, in which we perform a systematic evaluation and a statistical analysis of the performance of vanilla deep regression — short for convolutional neural networks with a linear regression top layer –. Up to our knowledge this is the first comprehensive analysis of deep regression techniques. We perform experiments on three vision problems and report confidence intervals for the median performance as well as the statistical significance of the results, if any. Surprisingly, the variability due to different data pre-processing procedures generally eclipses the variability due to modifications in the network architecture.
A Comprehensive Analysis on Adversarial Robustness of Spiking Neural Networks In this era of machine learning models, their functionality is being threatened by adversarial attacks. In the face of this struggle for making artificial neural networks robust, finding a model, resilient to these attacks, is very important. In this work, we present, for the first time, a comprehensive analysis of the behavior of more bio-plausible networks, namely Spiking Neural Network (SNN) under state-of-the-art adversarial tests. We perform a comparative study of the accuracy degradation between conventional VGG-9 Artificial Neural Network (ANN) and equivalent spiking network with CIFAR-10 dataset in both whitebox and blackbox setting for different types of single-step and multi-step FGSM (Fast Gradient Sign Method) attacks. We demonstrate that SNNs tend to show more resiliency compared to ANN under black-box attack scenario. Additionally, we find that SNN robustness is largely dependent on the corresponding training mechanism. We observe that SNNs trained by spike-based backpropagation are more adversarially robust than the ones obtained by ANN-to-SNN conversion rules in several whitebox and blackbox scenarios. Finally, we also propose a simple, yet, effective framework for crafting adversarial attacks from SNNs. Our results suggest that attacks crafted from SNNs following our proposed method are much stronger than those crafted from ANNs.
A Comprehensive Comparison of Unsupervised Network Representation Learning Methods There has been appreciable progress in unsupervised network representation learning (UNRL) approaches over graphs recently with flexible random-walk approaches, new optimization objectives and deep architectures. However, there is no common ground for systematic comparison of embeddings to understand their behavior for different graphs and tasks. In this paper we theoretically group different approaches under a unifying framework and empirically investigate the effectiveness of different network representation methods. In particular, we argue that most of the UNRL approaches either explicitly or implicit model and exploit context information of a node. Consequently, we propose a framework that casts a variety of approaches — random walk based, matrix factorization and deep learning based — into a unified context-based optimization function. We systematically group the methods based on their similarities and differences. We study the differences among these methods in detail which we later use to explain their performance differences (on downstream tasks). We conduct a large-scale empirical study considering 9 popular and recent UNRL techniques and 11 real-world datasets with varying structural properties and two common tasks — node classification and link prediction. We find that there is no single method that is a clear winner and that the choice of a suitable method is dictated by certain properties of the embedding methods, task and structural properties of the underlying graph. In addition we also report the common pitfalls in evaluation of UNRL methods and come up with suggestions for experimental design and interpretation of results.
A Comprehensive Study of Deep Learning for Image Captioning Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically correct sentences. Deep learning-based techniques are capable of handling the complexities and challenges of image captioning. In this survey paper, we aim to present a comprehensive review of existing deep learning-based image captioning techniques. We discuss the foundation of the techniques to analyze their performances, strengths and limitations. We also discuss the datasets and the evaluation metrics popularly used in deep learning based automatic image captioning.
A Comprehensive Survey for Low Rank Regularization Low rank regularization, in essence, involves introducing a low rank or approximately low rank assumption for matrix we aim to learn, which has achieved great success in many fields including machine learning, data mining and computer version. Over the last decade, much progress has been made in theories and practical applications. Nevertheless, the intersection between them is very slight. In order to construct a bridge between practical applications and theoretical research, in this paper we provide a comprehensive survey for low rank regularization. We first review several traditional machine learning models using low rank regularization, and then show their (or their variants) applications in solving practical issues, such as non-rigid structure from motion and image denoising. Subsequently, we summarize the regularizers and optimization methods that achieve great success in traditional machine learning tasks but are rarely seen in solving practical issues. Finally, we provide a discussion and comparison for some representative regularizers including convex and non-convex relaxations. Extensive experimental results demonstrate that non-convex regularizers can provide a large advantage over the nuclear norm, the regularizer widely used in solving practical issues.
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.
A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications Graph is an important data representation which appears in a wide diversity of real-world scenarios. Effective graph analytics provides users a deeper understanding of what is behind the data, and thus can benefit a lot of useful applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer the high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem. It converts the graph data into a low dimensional space in which the graph structural information and graph properties are maximally preserved. In this survey, we conduct a comprehensive review of the literature in graph embedding. We first introduce the formal definition of graph embedding as well as the related concepts. After that, we propose two taxonomies of graph embedding which correspond to what challenges exist in different graph embedding problem settings and how the existing work address these challenges in their solutions. Finally, we summarize the applications that graph embedding enables and suggest four promising future research directions in terms of computation efficiency, problem settings, techniques and application scenarios.
A Comprehensive Survey of Ontology Summarization: Measures and Methods The Semantic Web is becoming a large scale framework that enables data to be published, shared, and reused in the form of ontologies. The ontology which is considered as basic building block of semantic web consists of two layers including data and schema layer. With the current exponential development of ontologies in both data size and complexity of schemas, ontology understanding which is playing an important role in different tasks such as ontology engineering, ontology learning, etc., is becoming more difficult. Ontology summarization as a way to distill knowledge from an ontology and generate an abridge version to facilitate a better understanding is getting more attention recently. There are various approaches available for ontology summarization which are focusing on different measures in order to produce a proper summary for a given ontology. In this paper, we mainly focus on the common metrics which are using for ontology summarization and meet the state-of-the-art in ontology summarization.
A Comprehensive Survey on Fog Computing: State-of-the-art and Research Challenges Cloud computing with its three key facets (i.e., IaaS, PaaS, and SaaS) and its inherent advantages (e.g., elasticity and scalability) still faces several challenges. The distance between the cloud and the end devices might be an issue for latency-sensitive applications such as disaster management and content delivery applications. Service Level Agreements (SLAs) may also impose processing at locations where the cloud provider does not have data centers. Fog computing is a novel paradigm to address such issues. It enables provisioning resources and services outside the cloud, at the edge of the network, closer to end devices or eventually, at locations stipulated by SLAs. Fog computing is not a substitute for cloud computing but a powerful complement. It enables processing at the edge while still offering the possibility to interact with the cloud. This article presents a comprehensive survey on fog computing. It critically reviews the state of the art in the light of a concise set of evaluation criteria. We cover both the architectures and the algorithms that make fog systems. Challenges and research directions are also introduced. In addition, the lessons learned are reviewed and the prospects are discussed in terms of the key role fog is likely to play in emerging technologies such as Tactile Internet.
A Comprehensive Survey on Graph Neural Networks Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into different categories. With a focus on graph convolutional networks, we review alternative architectures that have recently been developed; these learning paradigms include graph attention networks, graph autoencoders, graph generative networks, and graph spatial-temporal networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes and benchmarks of the existing algorithms on different learning tasks. Finally, we propose potential research directions in this fast-growing field.
A Comprehensive Survey on Safe Reinforcement Learning Safe Reinforcement Learning can be de ned as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes. We categorize and analyze two approaches of Safe Reinforcement Learning. The rst is based on the modi cation of the optimality criterion, the classic discounted – nite/in nite horizon, with a safety factor. The second is based on the modi cation of the exploration process through the incorporation of external knowledge or the guidance of a risk metric. We use the proposed classi cation to survey the existing literature, as well as suggesting future directions for Safe Reinforcement Learning.
A Conceptual Introduction to Markov Chain Monte Carlo Methods Markov Chain Monte Carlo (MCMC) methods have become a cornerstone of many modern scientific analyses by providing a straightforward approach to numerically estimate uncertainties in the parameters of a model using a sequence of random samples. This article provides a basic introduction to MCMC methods by establishing a strong conceptual understanding of what problems MCMC methods are trying to solve, why we want to use them, and how they work in theory and in practice. To develop these concepts, I outline the foundations of Bayesian inference, discuss how posterior distributions are used in practice, explore basic approaches to estimate posterior-based quantities, and derive their link to Monte Carlo sampling and MCMC. Using a simple toy problem, I then demonstrate how these concepts can be used to understand the benefits and drawbacks of various MCMC approaches. Exercises designed to highlight various concepts are also included throughout the article.
A Concise Guide to Compositional Data Analysis Why a course in compositional data analysis Compositional data consist of vectors whose components are the proportion or percentages of some whole. Their peculiarity is that their sum is constrained to the be some constant, equal to 1 for proportions, 100 for percentages or possibly some other constant c for other situations such as parts per million (ppm) in trace element compositions. Unfortunately a cursory look at such vectors gives the appearance of vectors of real numbers with the consequence that over the last century all sorts of sophisticated statistical methods designed for unconstrained data have been applied to compositional data with inappropriate inferences. All this despite the fact that many workers have been, or should have been, aware that the sample space for compositional vectors is radically different from the real Euclidean space associated with unconstrained data. Several substantial warnings had been given, even as early as 1897 by Karl Pearson in his seminal paper on spurious correlations and then repeatedly in the 1960’s by geologist Felix Chayes. Unfortunately little heed was paid to such warnings and within the small circle who did pay attention the approach was essentially pathological, attempting to answer the question: what goes wrong when we apply multivariate statistical methodology designed for unconstrained data to our constrained data and how can the unconstrained methodology be adjusted to give meaningful inferences.
A Contemporary Overview of Probabilistic Latent Variable Models In this paper we provide a conceptual overview of latent variable models within a probabilistic modeling framework, an overview that emphasizes the compositional nature and the interconnectedness of the seemingly disparate models commonly encountered in statistical practice.
A Correspondence Between Random Neural Networks and Statistical Field Theory A number of recent papers have provided evidence that practical design questions about neural networks may be tackled theoretically by studying the behavior of random networks. However, until now the tools available for analyzing random neural networks have been relatively ad-hoc. In this work, we show that the distribution of pre-activations in random neural networks can be exactly mapped onto lattice models in statistical physics. We argue that several previous investigations of stochastic networks actually studied a particular factorial approximation to the full lattice model. For random linear networks and random rectified linear networks we show that the corresponding lattice models in the wide network limit may be systematically approximated by a Gaussian distribution with covariance between the layers of the network. In each case, the approximate distribution can be diagonalized by Fourier transformation. We show that this approximation accurately describes the results of numerical simulations of wide random neural networks. Finally, we demonstrate that in each case the large scale behavior of the random networks can be approximated by an effective field theory.
A correspondence between thermodynamics and inference A rough analogy between Bayesian statistics and statistical mechanics has long been discussed. We explore this analogy systematically and discover that it is more substantive than previously reported. We show that most canonical thermodynamic quantities have a natural correspondence with well-established statistical quantities. A novel correspondence is discovered between the heat capacity and the model complexity in information-based inference. This leads to a critical insight: We argue that the well-known mechanisms of failure of equipartition in statistical mechanics explain the nature of sloppy models in statistics. Finally, we exploit the correspondence to propose a solution to a long-standing ambiguity in Bayesian statistics: the definition of an objective or uninformative prior. In particular, we propose that the Gibbs entropy provides a natural generalization of the principle of indifference.
A Data Management System for Computational Experiments (3X) 3X, which stands for eXecuting eXploratory eXperiments, is a software tool to ease the burden of conducting computational experiments. 3X provides a standard yet con gurable structure to execute a wide variety of experiments in a systematic way. 3X organizes the code, inputs, and outputs for an experiment, records results, and lets users visualize result data in a variety of ways. Its interface allows further runs of the experiment to be driven interactively. Our demonstration will illustrate how 3X eases the process of conducting computational experiments, using two complementary examples designed to quickly show the many features of 3X.
A data scientist´s guide to start-ups In August 2013, we held a panel discussion at the KDD 2013 conference in Chicago on the subject of data science, data scientists, and start-ups. KDD is the premier conference on data science research and practice. The panel discussed the pros and cons for top-notch data scientists of the hot data science start-up scene. In this article, we first present background on our panelists. Our four panelists have unquestionable pedigrees in data science and substantial experience with start-ups from multiple perspectives (founders, employees, chief scientists, venture capitalists). For the casual reader, we next present a brief summary of the experts´ opinions on eight of the issues the panel discussed. The rest of the article presents a lightly edited transcription of the entire panel discussion.
A Detailed Analysis of Quicksort Algorithms with Experimental Mathematics We study several variants of single-pivot and multi-pivot Quicksort algorithms and consider them as discrete probability problems. With experimental mathematics, explicit expressions for expectations, variances and even higher moments of their numbers of comparisons and swaps can be obtained. For some variants, Monte Carlo experiments are performed, the numerical results are demonstrated and the scaled limiting distribution is also discussed.
A detailed comparative study of open source deep learning frameworks Deep Learning (DL) is one of the hottest trends in machine learning as DL approaches produced results superior to the state-of-the-art in problematic areas such as image processing and natural language processing (NLP). To foster the growth of DL, several open source frameworks appeared providing implementations of the most common DL algorithms. These frameworks vary in the algorithms they support and in the quality of their implementations. The purpose of this work is to provide a qualitative and quantitative comparison among three of the most popular and most comprehensive DL frameworks (namely Google’s TensorFlow, University of Montreal’s Theano and Microsoft’s CNTK). The ultimate goal of this work is to help end users make an informed decision about the best DL framework that suits their needs and resources. To ensure that our study is as comprehensive as possible, we conduct several experiments using multiple benchmark datasets from different fields (image processing, NLP, etc.) and measure the performance of the frameworks’ implementations of different DL algorithms. For most of our experiments, we find out that CNTK’s implementations are superior to the other ones under consideration.
A fast learning algorithm for deep belief nets We show how to use ‘complementary priors’ to eliminate the explaining away effects that make inference difficult in densely-connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modelled by long ravines in the free-energy landscape of the top-level associative memory and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.
A Few Useful Things to Know about Machine Learning Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. This is often feasible and cost-effective where manual programming is not. As more data becomes available, more ambitious problems can be tackled. As a result, machine learning is widely used in computer science and other fields. However, developing successful machine learning applications requires a substantial amount of ‘black art’ that is hard to find in textbooks. This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions.
A Framework for Considering Comprehensibility Comprehensibility in modeling is the ability of stakeholders to understand relevant aspects of the modeling process. In this article, we provide a framework to help guide exploration of the space of comprehensibility challenges. We consider facets organized around key questions: Who is comprehending Why are they trying to comprehend Where in the process are they trying to comprehend How can we help them comprehend How do we measure their comprehension With each facet we consider the broad range of options.We discuss why taking a broad view of comprehensibility in modeling is useful in identifying challenges and opportunities for solutions.
A Framework for Time-Consistent, Risk-Averse Model Predictive Control: Theory and Algorithms In this paper we present a framework for risk-averse model predictive control (MPC) of linear systems affected by multiplicative uncertainty. Our key innovation is to consider time-consistent, dynamic risk metrics as objective functions to be minimized. This framework is axiomatically justified in terms of time-consistency of risk assessments, is amenable to dynamic optimization, and is unifying in the sense that it captures a full range of risk preferences from risk-neutral to worst case. Within this framework, we propose and analyze an online risk-averse MPC algorithm that is provably stabilizing. Furthermore, by exploiting the dual representation of time-consistent, dynamic risk metrics, we cast the computation of the MPC control law as a convex optimization problem amenable to real-time implementation. Simulation results are presented and discussed.
A General Theory for Training Learning Machine Though the deep learning is pushing the machine learning to a new stage, basic theories of machine learning are still limited. The principle of learning, the role of the a prior knowledge, the role of neuron bias, and the basis for choosing neural transfer function and cost function, etc., are still far from clear. In this paper, we present a general theoretical framework for machine learning. We classify the prior knowledge into common and problem-dependent parts, and consider that the aim of learning is to maximally incorporate them. The principle we suggested for maximizing the former is the design risk minimization principle, while the neural transfer function, the cost function, as well as pretreatment of samples, are endowed with the role for maximizing the latter. The role of the neuron bias is explained from a different angle. We develop a Monte Carlo algorithm to establish the input-output responses, and we control the input-output sensitivity of a learning machine by controlling that of individual neurons. Applications of function approaching and smoothing, pattern recognition and classification, are provided to illustrate how to train general learning machines based on our theory and algorithm. Our method may in addition induce new applications, such as the transductive inference.
A Generalization of Convolutional Neural Networks to Graph-Structured Data This paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within the input, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure. Furthermore, this generalization can be applied to many standard regression or classification problems, by learning the the underlying graph. We empirically demonstrate the performance of the proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular activity data set.
A generalized concept-cognitive learning: A machine learning viewpoint Concept-cognitive learning (CCL) is a hot topic in recent years, and it has attracted much attention from the communities of formal concept analysis, granular computing and cognitive computing. However, the relationship among cognitive computing (CC), conceptcognitive computing (CCC), and CCL is not clearly described. To this end, we explain the relationship of CC, CCC, and CCL. Then, we propose a generalized CCL from the point of view of machine learning. Finally, experiments on seven data sets are conducted to evaluate concept formation and concept-cognitive processes of the proposed generalized CCL.
A Gentle Introduction to Deep Learning in Medical Image Processing This paper tries to give a gentle introduction to deep learning in medical image processing, proceeding from theoretical foundations to applications. We first discuss general reasons for the popularity of deep learning, including several major breakthroughs in computer science. Next, we start reviewing the fundamental basics of the perceptron and neural networks, along with some fundamental theory that is often omitted. Doing so allows us to understand the reasons for the rise of deep learning in many application domains. Obviously medical image processing is one of these areas which has been largely affected by this rapid progress, in particular in image detection and recognition, image segmentation, image registration, and computer-aided diagnosis. There are also recent trends in physical simulation, modelling, and reconstruction that have led to astonishing results. Yet, some of these approaches neglect prior knowledge and hence bear the risk of producing implausible results. These apparent weaknesses highlight current limitations of deep learning. However, we also briefly discuss promising approaches that might be able to resolve these problems in the future.
A Gentle Introduction to Memetic Algorithms The generic denomination of `Memetic Algorithms’ (MAs) is used to encompass a broad class of metaheuristics (i.e. general purpose methods aimed to guide an underlying heuristic). The method is based on a population of agents and proved to be of practical success in a variety of problem domains and in particular for the approximate solution of NP Optimization problems. Unlike traditional Evolutionary Computation (EC) methods, MAs are intrinsically concerned with exploiting all available knowledge about the problem under study. The incorporation of prob- lem domain knowledge is not an optional mechanism, but a fundamental feature that characterizes MAs. This functioning philosophy is perfectly illustrated by the term \memetic’. Coined by R. Dawkins , the word `meme’ denotes an analogous to the gene in the context of cultural evolution .
A Gentle Introduction to Supervised Machine Learning This tutorial is based on the lecture notes for the courses ‘Machine Learning: Basic Principles’ and ‘Artificial Intelligence’, which I have taught during fall 2017 and spring 2018 at Aalto university. The aim is to provide an accessible introduction to some of the main concepts and methods within supervised machine learning. Most of the current systems which are con- sidered as (artificially) intelligent are based on some form of supervised machine learning. After discussing the main building blocks of a formal machine learning problem, some of the most popular algorithmic design patterns for machine learning methods are presented.
A Graph Summarization: A Survey While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Thus, efficient computational methods for condensing and simplifying data are becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind and the challenges of graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.
A History of Bayesian Neural Networks (Slide Deck)
A Joint Model for Question Answering and Question Generation We propose a generative machine comprehension model that learns jointly to ask and answer questions based on documents. The proposed model uses a sequence-to-sequence framework that encodes the document and generates a question (answer) given an answer (question). Significant improvement in model performance is observed empirically on the SQuAD corpus, confirming our hypothesis that the model benefits from jointly learning to perform both tasks. We believe the joint model’s novelty offers a new perspective on machine comprehension beyond architectural engineering, and serves as a first step towards autonomous information seeking.
A joint renewal process used to model event based data In many industrial situations, where systems must be monitored using data recorded throughout a historical period of observation, one cannot fully rely on sensor data, but often only has event data to work with. This, in particular, holds for legacy data, whose evaluation is of interest to systems analysts, reliability planners, maintenance engineers etc. Event data, herein defined as a collection of triples containing a time stamp, a failure code and eventually a descriptive text, can best be evaluated by using the paradigm of joint renewal processes. The present paper formulates a model of such a process, which proceeds by means of state dependent event rates. The system state is defined, at each point in time, as the vector of backward times, whereby the backward time of an event is the time passed since the last occurrence of this event. The present paper suggests a mathematical model relating event rates linearly to the backward times. The parameters can then be estimated by means of the method of moments. In a subsequent step, these event rates can be used in a Monte-Carlo simulation to forecast the numbers of occurrences of each failure in a future time interval, based on the current system state. The model is illustrated by means of an example. As forecasting system malfunctions receives increasingly more attention in light of modern condition-based maintenance policies, this approach enables decision makers to use existing event data to implement state dependent maintenance measures.
A Large-Scale Comparison of Historical Text Normalization Systems There is no consensus on the state-of-the-art approach to historical text normalization. Many techniques have been proposed, including rule-based methods, distance metrics, character-based statistical machine translation, and neural encoder–decoder models, but studies have used different datasets, different evaluation methods, and have come to different conclusions. This paper presents the largest study of historical text normalization done so far. We critically survey the existing literature and report experiments on eight languages, comparing systems spanning all categories of proposed normalization techniques, analysing the effect of training data quantity, and using different evaluation methods. The datasets and scripts are made publicly available.
A Learning Approach to Secure Learning Deep Neural Networks (DNNs) have been shown to be vulnerable against adversarial examples, which are data points cleverly constructed to fool the classifier. Such attacks can be devastating in practice, especially as DNNs are being applied to ever increasing critical tasks like image recognition in autonomous driving. In this paper, we introduce a new perspective on the problem. We do so by first defining robustness of a classifier to adversarial exploitation. Next, we show that the problem of adversarial example generation and defense both can be posed as learning problems, which are duals of each other. We also show formally that our defense aims to increase robustness of the classifier. We demonstrate the efficacy of our techniques by experimenting with the MNIST and CIFAR-10 datasets.
A literature review on current approaches and applications of fuzzy expert systems The main purposes of this study are to distinguish the trends of research in publication exits for the utilisations of the fuzzy expert and knowledge-based systems that is done based on the classification of studies in the last decade. The present investigation covers 60 articles from related scholastic journals, International conference proceedings and some major literature review papers. Our outcomes reveal an upward trend in the up-to-date publications number, that is evidence of growing notoriety on the various applications of fuzzy expert systems. This raise in the reports is mainly in the medical neuro-fuzzy and fuzzy expert systems. Moreover, another most critical observation is that many modern industrial applications are extended, employing knowledge-based systems by extracting the experts’ knowledge.
A Literature Survey on Ontology of Different Computing Platforms in Smart Environments Smart environments integrates various types of technologies, including cloud computing, fog computing, and the IoT paradigm. In such environments, it is essential to organize and manage efficiently the broad and complex set of heterogeneous resources. For this reason, resources classification and categorization becomes a vital issue in the control system. In this paper we make an exhaustive literature survey about the various computing systems and architectures which defines any type of ontology in the context of smart environments, considering both, authors that explicitly propose resources categorization and authors that implicitly propose some resources classification as part of their system architecture. As part of this research survey, we have built a table that summarizes all research works considered, and which provides a compact and graphical snapshot of the current classification trends. The goal and primary motivation of this literature survey has been to understand the current state of the art and identify the gaps between the different computing paradigms involved in smart environment scenarios. As a result, we have found that it is essential to consider together several computing paradigms and technologies, and that there is not, yet, any research work that integrates a merged resources classification, taxonomy or ontology required in such heterogeneous scenarios.
A Mathematical Theory for Clustering in Metric Spaces Clustering is one of the most fundamental problems in data analysis and it has been studied extensively in the literature. Though many clustering algorithms have been proposed, clustering theories that justify the use of these clustering algorithms are still unsatisfactory. In particular, one of the fundamental challenges is to address the following question: What is a cluster in a set of data points In this paper, we make an attempt to address such a question by considering a set of data points associated with a distance measure (metric). We first propose a new cohesion measure in terms of the distance measure. Using the cohesion measure, we define a cluster as a set of points that are cohesive to themselves. For such a definition, we show there are various equivalent statements that have intuitive explanations. We then consider the second question: How do we find clusters and good partitions of clusters under such a definition For such a question, we propose a hierarchical agglomerative algorithm and a partitional algorithm. Unlike standard hierarchical agglomerative algorithms, our hierarchical agglomerative algorithm has a specific stopping criterion and it stops with a partition of clusters. Our partitional algorithm, called the K-sets algorithm in the paper, appears to be a new iterative algorithm. Unlike the Lloyd iteration that needs two-step minimization, our K-sets algorithm only takes one-step minimization. One of the most interesting findings of our paper is the duality result between a distance measure and a cohesion measure. Such a duality result leads to a dual K-sets algorithm for clustering a set of data points with a cohesion measure. The dual K-sets algorithm converges in the same way as a sequential version of the classical kernel K-means algorithm. The key difference is that a cohesion measure does not need to be positive semi-definite.
A Mathematical Theory of Interpersonal Interactions and Group Behavior Emergent collective group processes and capabilities have been studied through analysis of transactive memory, measures of group task performance, and group intelligence, among others. In their approach to collective behaviors, these approaches transcend traditional studies of group decision making that focus on how individual preferences combine through power relationships, social choice by voting, negotiation and game theory. Understanding more generally how individuals contribute to group effectiveness is important to a broad set of social challenges. Here we formalize a dynamic theory of interpersonal communications that classifies individual acts, sequences of actions, group behavioral patterns, and individuals engaged in group decision making. Group decision making occurs through a sequence of communications that convey personal attitudes and preferences among members of the group. The resulting formalism is relevant to psychosocial behavior analysis, rules of order, organizational structures and personality types, as well as formalized systems such as social choice theory. More centrally, it provides a framework for quantifying and even anticipating the structure of informal dialog, allowing specific conversations to be coded and analyzed in relation to a quantitative model of the participating individuals and the parameters that govern their interactions.
A mathematical theory of semantic development in deep neural networks An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species. Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep learning dynamics to give rise to these regularities.
A Measure of Similarity Between Graph Vertices: Applications to Synonym Extraction and Web Searching We introduce a concept of similarity between vertices of directed graphs. Let GA and GB be two directed graphs with respectively nA and nB vertices. We define a nB × nA similarity matrix S whose real entry sij expresses how similar vertex j (in GA) is to vertex i (in GB) : we say that sij is their similarity score. The similarity matrix can be obtained as the limit of the normalized even iterates of S(k+1) = BS(k)AT +BT S(k)A where A and B are adjacency matrices of the graphs and S(0) is a matrix whose entries are all equal to one. In the special case where GA = GB = G, the matrix S is square and the score sij is the similarity score between the vertices i and j of G. We point out that Kleinberg´s ‘hub and authority’ method to identify web-pages relevant to a given query can be viewed as a special case of our definition in the case where one of the graphs has two vertices and a unique directed edge between them. In analogy to Kleinberg, we show that our similarity scores are given by the components of a dominant eigenvector of a non-negative matrix. Potential applications of our similarity concept are numerous. We illustrate an application for the automatic extraction of synonyms in a monolingual dictionary.
A Model Explanation System We propose a general model explanation system (MES) for ‘explaining’ the output of black box classifiers. In this introduction we use the motivating example of a classifier trained to detect fraud in a credit card transaction history. The key aspect is that we provide explanations applicable to a single prediction, rather than provide an interpretable set of parameters. The labels in the provided examples are usually negative. Hence, we focus on explaining positive predictions (alerts). In many classification applications, but especially in fraud detection, there is an expectation of false positives. Alerts are given to a human analyst before any further action is taken. Analysts often insist on understanding ‘why’ there was an alert, since an opaque alert makes it difficult for them to proceed. Analogous scenarios occur in computer vision , credit risk , spam detection , etc. Furthermore, the MES framework is useful for model criticism. In the world of generative models, practitioners often generate synthetic data from a trained model to get an idea of ‘what the model is doing’. Our MES framework augments such tools. As an added benefit, MES is applicable to completely non-probabilistic black boxes that only provide hard labels. In Section 3 we use MES to visualize the decisions of a face recognition system.
A Model for General Intelligence The overarching problem in artificial intelligence (AI) is that we do not understand the intelligence process well enough to enable the development of adequate computational models. Much work has been done in AI over the years at lower levels, but a big part of what has been missing involves the high level, abstract, general nature of intelligence. We address this gap by developing a model for general intelligence. To accomplish this, we focus on three basic aspects of intelligence. First, we must realize the general order and nature of intelligence at a high level. Second, we must come to know what these realizations mean with respect to the overall intelligence process. Third, we must describe these realizations as clearly as possible. We propose a hierarchical model to help capture and exploit the order within intelligence. The underlying order involves patterns of signals that become organized, stored and activated in space and time. These patterns can be described using a simple, general hierarchy, with physical signals at the lowest level, information in the middle, and abstract signal representations at the top. This high level perspective provides a big picture that literally helps us see the intelligence process, thereby enabling fundamental realizations, a better understanding and clear descriptions of the intelligence process. The resulting model can be used to support all kinds of information processing across multiple levels of abstraction. As computer technology improves, and as cooperation increases between humans and computers, people will become more efficient and more productive in performing their information processing tasks.
A Model of Modeling We propose a formal model of scientific modeling, geared to applications of decision theory and game theory. The model highlights the freedom that modelers have in conceptualizing social phenomena using general paradigms in these elds. It may shed some light on the distinctions between (i) refutation of a theory and a paradigm, (ii) notions of rationality, (iii) modes of application of decision models, and (iv) roles of economics as an academic discipline. Moreover, the model suggests that all four distinctions have some common features that are captured by the model.
A model of text for experimentation in the social sciences Statistical models of text have become increasingly popular in statistics and com- puter science as a method of exploring large document collections. Social scientists often want to move beyond exploration, to measurement and experimentation, and make inference about social and political processes that drive discourse and content. In this paper, we develop a model of text data that supports this type of substantive re- search. Our approach is to posit a hierarchical mixed membership model for analyzing topical content of documents, in which mixing weights are parameterized by observed covariates. In this model, topical prevalence and topical content are speci ed as a sim- ple generalized linear model on an arbitrary number of document-level covariates, such as news source and time of release, enabling researchers to introduce elements of the experimental design that informed document collection into the model, within a gen- erally applicable framework. We demonstrate the proposed methodology by analyzing a collection of news reports about China, where we allow the prevalence of topics to evolve over time and vary across newswire services. Our methods help quantify the e ect of news wire source on both the frequency and nature of topic coverage. All the methods we describe are available as part of the open source R package stm.
A Natural Language Query Interface to Structured Information Accessing structured data such as that encoded in ontologies and knowledge bases can be done using either syntactically complex formal query languages like SPARQL or complicated form interfaces that require expensive customisation to each particular application domain. This paper presents the QuestIO system – a natural language interface for accessing structured information, that is domain independent and easy to use without training. It aims to bring the simplicity of Google´s search interface to conceptual retrieval by automatically converting short conceptual queries into formal ones, which can then be executed against any semantic repository. QuestIO was developed specifically to be robust with regard to language ambiguities, incomplete or syntactically ill-formed queries, by harnessing the structure of ontologies, fuzzy string matching, and ontologymotivated similarity metrics.
A Neural Bayesian Estimator for Conditional Probability Densities This article describes a robust algorithm to estimate a conditional probability density f(t|x) as a non-parametric smooth regression function. It is based on a neural network and the Bayesian interpretation of the network output as a posteriori probabability. The network is trained using example events from history or simulation, which define the underlying probability density f(t, x). Once trained, the network is applied on new, unknown examples x, for which it can predict the probability distribution of the target variable t. Event-by-event knowledge of the smooth function f(t|x) can be very useful, e.g. in maximum likelihood fits or for forecasting tasks. No assumptions are necessary about the distribution, and non-Gaussian tails are accounted for automatically. Important quantities like median, mean value, left and right standard deviations, moments and expectation values of any function of t are readily derived from it. The algorithm can be considered as an event-by-event unfolding and leads to statistically optimal reconstruction. The largest benefit of the method lies in complicated problems, when the measurements x are only relatively weakly correlated to the output t. As to assure optimal generalisation features and to avoid overfitting, the networks are regularised by extended versions of weight decay. The regularisation parameters are determined during the online-learning of the network by relations obtained from Bayesian statistics. Some toy Monte Carlo tests and first real application examples from high-energy physics and econometry are discussed.
A new look at clustering through the lens of deep convolutional neural networks Classification and clustering have been studied separately in machine learning and computer vision. Inspired by the recent success of deep learning models in solving various vision problems (e.g., object recognition, semantic segmentation) and the fact that humans serve as the gold standard in assessing clustering algorithms, here, we advocate for a unified treatment of the two problems and suggest that hierarchical frameworks that progressively build complex patterns on top of the simpler ones (e.g., convolutional neural networks) offer a promising solution. We do not dwell much on the learning mechanisms in these frameworks as they are still a matter of debate, with respect to biological constraints. Instead, we emphasize on the compositionality of the real world structures and objects. In particular, we show that CNNs, trained end to end using back propagation with noisy labels, are able to cluster data points belonging to several overlapping shapes, and do so much better than the state of the art algorithms. The main takeaway lesson from our study is that mechanisms of human vision, particularly the hierarchal organization of the visual ventral stream should be taken into account in clustering algorithms (e.g., for learning representations in an unsupervised manner or with minimum supervision) to reach human level clustering performance. This, by no means, suggests that other methods do not hold merits. For example, methods relying on pairwise affinities (e.g., spectral clustering) have been very successful in many cases but still fail in some cases (e.g., overlapping clusters).
A New View of Predictive State Methods for Dynamical System Learning Recently there has been substantial interest in predictive state methods for learning dynamical systems: these algorithms are popular since they often offer a good tradeoff between computational speed and statistical efficiency. Despite their desirable properties, though, predictive state methods can sometimes be difficult to use in practice. E.g., in contrast to the rich literature on supervised learning methods, which allows us to choose from an extensive menu of models and algorithms to suit the prior beliefs we have about properties of the function to be learned, predictive state dynamical system learning methods are comparatively inflexible: it is as if we were restricted to use only linear regression instead of being allowed to choose decision trees, nonparametric regression, or the lasso. To address this problem, we propose a new view of predictive state methods in terms of instrumentalvariable regression. This view allows us to construct a wide variety of dynamical system learners simply by swapping in different supervised learning methods. We demonstrate the effectiveness of our proposed methods by experimenting with non-linear regression to learn a hidden Markov model, showing that the resulting algorithm outperforms its linear counterpart; the correctness of this algorithm follows directly from our general analysis.
A Non-Geek´s Big Data Playbook This Big Data Playbook demonstrates in six common ‘plays’ how Apache Hadoop supports and extends the EDW ecosystem.
A novel algorithm for fast and scalable subspace clustering of high-dimensional data Rapid growth of high dimensional datasets in recent years has created an emergent need to extract the knowledge underlying them. Clustering is the process of automatically finding groups of similar data points in the space of the dimensions or attributes of a dataset. Finding clusters in the high dimensional datasets is an important and challenging data mining problem. Data group together differently under different subsets of dimensions, called subspaces. Quite often a dataset can be better understood by clustering it in its subspaces, a process called subspace clustering. But the exponential growth in the number of these subspaces with the dimensionality of data makes the whole process of subspace clustering computationally very expensive. There is a growing demand for efficient and scalable subspace clustering solutions in many Big data application domains like biology, computer vision, astronomy and social networking. Apriori based hierarchical clustering is a promising approach to find all possible higher dimensional subspace clusters from the lower dimensional clusters using a bottom-up process. However, the performance of the existing algorithms based on this approach deteriorates drastically with the increase in the number of dimensions. Most of these algorithms require multiple database scans and generate a large number of redundant subspace clusters, either implicitly or explicitly, during the clustering process. In this paper, we present SUBSCALE, a novel clustering algorithm to find non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality of the dataset and is highly parallelizable. We present the details of the SUBSCALE algorithm and its evaluation in this paper.
A novel algorithmic approach to Bayesian Logic Regression Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has remained less well known than other approaches to epistatic association mapping. Here we will adopt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects.
A novel framework to analyze road accident time series data Road accident data analysis plays an important role in identifying key factors associated with road accidents. These associated factors help in taking preventive measures to overcome the road accidents. Various studies have been done on road accident data analysis using traditional statistical techniques and data mining techniques. All these studies focused on identifying key factors associated with road accidents in different countries. Road accident is uncertain and unpredictable events which can occur in any circumstances. Also, road accidents do not have similar impacts in every region of the districts. There are chances that road accident rate is increasing in a certain district but it has some lower impact in other districts. Hence, the more focus on road safety should be on those regions or districts where road accident trend is increasing. Time series analysis is an important area of study which can be helpful in identifying the increasing or decreasing trends in different districts. In this paper, we have proposed a framework to analyze road accident time series data that takes 39 time series data of 39 districts of Gujrat and Uttarakhand state of India. This framework segments the time series data into different clusters. A time series merging algorithm is proposed to find the representative time series (RTS) for each cluster. This RTS is further used for trend analysis of different clusters. The results reveals that road accident trend is going to increase in certain clusters and those districts should be the prime concern to take preventive measure to overcome the road accidents.
A Practical Guide to Support Vector Classification The support vector machine (SVM) is a popular classi cation technique. However, beginners who are not familiar with SVM often get unsatisfactory results since they miss some easy but signi cant steps. In this guide, we propose a simple procedure which usually gives reasonable results.
A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines Many of the existing machine learning algorithms, both supervised and unsupervised, depend on the quality of the input characteristics to generate a good model. The amount of these variables is also important, since performance tends to decline as the input dimensionality increases, hence the interest in using feature fusion techniques, able to produce feature sets that are more compact and higher level. A plethora of procedures to fuse original variables for producing new ones has been developed in the past decades. The most basic ones use linear combinations of the original variables, such as PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis), while others find manifold embeddings of lower dimensionality based on non-linear combinations, such as Isomap or LLE (Linear Locally Embedding) techniques. More recently, autoencoders (AEs) have emerged as an alternative to manifold learning for conducting nonlinear feature fusion. Dozens of AE models have been proposed lately, each with its own specific traits. Although many of them can be used to generate reduced feature sets through the fusion of the original ones, there also AEs designed with other applications in mind. The goal of this paper is to provide the reader with a broad view of what an AE is, how they are used for feature fusion, a taxonomy gathering a broad range of models, and how they relate to other classical techniques. In addition, a set of didactic guidelines on how to choose the proper AE for a given task is supplied, together with a discussion of the software tools available. Finally, two case studies illustrate the usage of AEs with datasets of handwritten digits and breast cancer.
A Primer on Neural Network Models for Natural Language Processing Over the past few years, neural networks have re-emerged as powerful machine-learning models, yielding state-of-the-art results in elds such as image recognition and speech processing. More recently, neural network models started to be applied also to textual natural language signals, again with very promising results. This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques. The tutorial covers input encoding for natural language tasks, feed-forward networks, convolutional networks, recurrent networks and recursive networks, as well as the computation graph abstraction for automatic gradient computation.
A Probabilistic Theory of Deep Learning A grand challenge in machine learning is the development of computational algorithms that match or outperform humans in perceptual inference tasks such as visual object and speech recognition. The key factor complicating such tasks is the presence of numerous nuisance variables, for instance, the unknown object position, orientation, and scale in object recognition or the unknown voice pronunciation, pitch, and speed in speech recognition. Recently, a new breed of deep learning algorithms have emerged for high-nuisance inference tasks; they are constructed from many layers of alternating linear and nonlinear processing units and are trained using large-scale algorithms and massive amounts of training data. The recent success of deep learning systems is impressive – they now routinely yield pattern recognition systems with nearor super-human capabilities – but a fundamental question remains: Why do they work Intuitions abound, but a coherent framework for understanding, analyzing, and synthesizing deep learning architectures has remained elusive. We answer this question by developing a new probabilistic framework for deep learning based on a Bayesian generative probabilistic model that explicitly captures variation due to nuisance variables. The graphical structure of the model enables it to be learned from data using classical expectation-maximization techniques. Furthermore, by relaxing the generative model to a discriminative one, we can recover two of the current leading deep learning systems, deep convolutional neural networks (DCNs) and random decision forests (RDFs), providing insights into their successes and shortcomings as well as a principled route to their improvement.
A rational analysis of curiosity We present a rational analysis of curiosity, proposing that people’s curiosity is driven by seeking stimuli that maximize their ability to make appropriate responses in the future. This perspective offers a way to unify previous theories of curiosity into a single framework. Experimental results confirm our model’s predictions, showing how the relationship between curiosity and confidence can change significantly depending on the nature of the environment.
A Recent Survey on the Applications of Genetic Programming in Image Processing During the last two decades, Genetic Programming (GP) has been largely used to tackle optimization, classification, and automatic features selection related tasks. The widespread use of GP is mainly due to its flexible and comprehensible tree-type structure. Similarly, research is also gaining momentum in the field of Image Processing (IP) because of its promising results over wide areas of applications ranging from medical IP to multispectral imaging. IP is mainly involved in applications such as computer vision, pattern recognition, image compression, storage and transmission, and medical diagnostics. This prevailing nature of images and their associated algorithm i.e complexities gave an impetus to the exploration of GP. GP has thus been used in different ways for IP since its inception. Many interesting GP techniques have been developed and employed in the field of IP. To give the research community an extensive view of these techniques, this paper presents the diverse applications of GP in IP and provides useful resources for further research. Also, comparison of different parameters used in ten different applications of IP are summarized in tabular form. Moreover, analysis of different parameters used in IP related tasks is carried-out to save the time needed in future for evaluating the parameters of GP. As more advancement is made in GP methodologies, its success in solving complex tasks not only related to IP but also in other fields will increase. Additionally, guidelines are provided for applying GP in IP related tasks, pros and cons of GP techniques are discussed, and some future directions are also set.
A Reliability Theory of Truth Our approach is basically a coherence approach, but we avoid the well-known pitfalls of coherence theories of truth. Consistency is replaced by reliability, which expresses support and attack, and, in principle, every theory (or agent, message) counts. At the same time, we do not require a priviledged access to ‘reality’. A centerpiece of our approach is that we attribute reliability also to agents, messages, etc., so an unreliable source of information will be less important in future. Our ideas can also be extended to value systems, and even actions, e.g., of animals.
A Revealing Introduction to Hidden Markov Models Suppose we want to determine the average annual temperature at a particular location on earth over a series of years. To make it interesting, suppose the years we are concerned with lie in the distant past, before thermometers were invented. Since we can’t go back in time, we instead look for indirect evidence of the temperature…
A review and comparative study on functional time series techniques This paper reviews the main estimation and prediction results derived in the context of functional time series, when Hilbert and Banach spaces are considered, specially, in the context of autoregressive processes of order one (ARH(1) and ARB(1) processes, for H and B being a Hilbert and Banach space, respectively). Particularly, we pay attention to the estimation and prediction results, and statistical tests, derived in both parametric and non-parametric frameworks. A comparative study between different ARH(1) prediction approaches is developed in the simulation study undertaken.
A Review for Weighted MinHash Algorithms Data similarity (or distance) computation is a fundamental research topic which underpins many high-level applications based on similarity measures in machine learning and data mining. However, in large-scale real-world scenarios, the exact similarity computation has become daunting due to ‘3V’ nature (volume, velocity and variety) of big data. In such cases, the hashing techniques have been verified to efficiently conduct similarity estimation in terms of both theory and practice. Currently, MinHash is a popular technique for efficiently estimating the Jaccard similarity of binary sets and furthermore, weighted MinHash is generalized to estimate the generalized Jaccard similarity of weighted sets. This review focuses on categorizing and discussing the existing works of weighted MinHash algorithms. In this review, we mainly categorize the Weighted MinHash algorithms into quantization-based approaches, ‘active index’-based ones and others, and show the evolution and inherent connection of the weighted MinHash algorithms, from the integer weighted MinHash algorithms to real-valued weighted MinHash ones (particularly the Consistent Weighted Sampling scheme). Also, we have developed a python toolbox for the algorithms, and released it in our github. Based on the toolbox, we experimentally conduct a comprehensive comparative study of the standard MinHash algorithm and the weighted MinHash ones.
A Review of 40 Years of Cognitive Architecture Research: Focus on Perception, Attention, Learning and Applications In this paper we present a broad overview of the last 40 years of research on cognitive architectures. Although the number of existing architectures is nearing several hundred, most of the existing surveys do not reflect this growth and focus on a handful of well-established architectures. While their contributions are undeniable, they represent only a part of the research in the field. Thus, in this survey we wanted to shift the focus towards a more inclusive and high-level overview of the research in cognitive architectures. Our final set of 86 architectures includes 55 that are still actively developed, and borrow from a diverse set of disciplines, spanning areas from psychoanalysis to neuroscience. To keep the length of this paper within reasonable limits we discuss only the core cognitive abilities, such as perception, attention mechanisms, learning and memory structure. To assess the breadth of practical applications of cognitive architectures we gathered information on over 700 practical projects implemented using the cognitive architectures in our list. We use various visualization techniques to highlight overall trends in the development of the field. Our analysis of practical applications shows that most architectures are very narrowly focused on a particular application domain. Furthermore, there is an apparent gap between general research in robotics and computer vision and research in these areas within the cognitive architectures field. It is very clear that biologically inspired models do not have the same range and efficiency compared to the systems based on engineering principles and heuristics. Another observation is related to a general lack of collaboration. Several factors hinder communication, such as the closed nature of the individual projects (only one-third of the reviewed here architectures are open-source) and terminological differences.
A review of change point detection methods In this work, methods to detect one or several change points in multivariate time series are reviewed. They include retrospective (off-line) procedure such as maximum likelihood estimation, regression, kernel methods, etc. In this large area of research, applications are numerous and diverse; many different models and operational constraints (on precision, complexity,…) exist. A formal framework for change point detection is introduced to give sens to this significant body of work. Precisely, all methods are described as a collection of three elements: a cost function, a search method and a constraint on the number of changes to detect. For a given method, we detail the assumed signal model, the associated algorithm, theoretical guarantees (if any) and the application domain. This approach is intended to facilitate prototyping of change point detection methods: for a given segmentation task, one can appropriately choose among the described elements to design an algorithm.
A Review of Changepoint Detection Models The objective of the change-point detection is to discover the abrupt property changes lying behind the time-series data. In this paper, we firstly summarize the definition and in-depth implication of the changepoint detection. The next stage is to elaborate traditional and some alternative model-based changepoint detection algorithms. Finally, we try to go a bit further in the theory and look into future research directions.
A Review of Cooperative Multi-Agent Deep Reinforcement Learning Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. In this review article, we have mostly focused on recent papers on Multi-Agent Reinforcement Learning (MARL) than the older papers, unless it was necessary. Several ideas and papers are proposed with different notations, and we tried our best to unify them with a single notation and categorize them by their relevance. In particular, we have focused on five common approaches on modeling and solving multi-agent reinforcement learning problems: (I) independent-learners, (II) fully observable critic, (III) value function decomposition, (IV) consensus, (IV) learn to communicate. Moreover, we discuss some new emerging research areas in MARL along with the relevant recent papers. In addition, some of the recent applications of MARL in real world are discussed. Finally, a list of available environments for MARL research are provided and the paper is concluded with proposals on the possible research directions.
A Review of Data Fusion Techniques In general, all tasks that demand any type of parameter estimation from multiple sources can benefit from the use of data/information fusion methods. The terms information fusion and data fusion are typically employed as synonyms; but in some scenarios, the term data fusion is used for raw data (obtained directly from the sensors) and the term information fusion is employed to define already processed data. In this sense, the term information fusion implies a higher semantic level than data fusion. Other terms associated with data fusion that typically appear in the literature include decision fusion, data combination, data aggregation, multisensor data fusion, and sensor fusion. Researchers in this field agree that the most accepted definition of data fusion was provided by the Joint Directors of Laboratories (JDL) workshop : ‘A multi-level process dealing with the association, correlation, combination of data and information from single and multiple sources to achieve refined position, identify estimates and complete and timely assessments of situations, threats and their significance.’ Hall and Llinas provided the following well-known definition of data fusion: ‘data fusion techniques combine data from multiple sensors and related information from associated databases to achieve improved accuracy and more specific inferences than could be achieved by the use of a single sensor alone.’ Briefly, we can define data fusion as a combination of multiple sources to obtain improved information; in this context, improved information means less expensive, higher quality, or more relevant information. Data fusion techniques have been extensively employed on multisensor environments with the aim of fusing and aggregating data from different sensors; however, these techniques can also be applied to other domains, such as text processing.The goal of using data fusion inmultisensor environments is to obtain a lower detection error probability and a higher reliability by using data from multiple distributed sources. The available data fusion techniques can be classified into three nonexclusive categories: (i) data association, (ii) state estimation, and (iii) decision fusion. Because of the large number of published papers on data fusion, this paper does not aim to provide an exhaustive review of all of the studies; instead, the objective is to highlight the main steps that are involved in the data fusion framework and to review the most common techniques for each step. The remainder of this paper continues as follows. The next section provides various classification categories for data fusion techniques. Then, Section 3 describes the most common methods for data association tasks. Section 4 provides a review of techniques under the state estimation category. Next, the most common techniques for decision fusion are enumerated in Section 5. Finally, the conclusions obtained from reviewing the different methods are highlighted in Section 6.
A review of data mining using big data in health informatics The amount of data produced within Health Informatics has grown to be quite vast, and analysis of this Big Data grants potentially limitless possibilities for knowledge to be gained. In addition, this information can improve the quality of healthcare offered to patients. However, there are a number of issues that arise when dealing with these vast quantities of data, especially how to analyze this data in a reliable manner. The basic goal of Health Informatics is to take in real world medical data from all levels of human existence to help advance our understanding of medicine and medical practice. This paper will present recent research using Big Data tools and approaches for the analysis of Health Informatics data gathered at multiple levels, including the molecular, tissue, patient, and population levels. In addition to gathering data at multiple levels, multiple levels of questions are addressed: human-scale biology, clinical-scale, and epidemic-scale. We will also analyze and examine possible future work for each of these areas, as well as how combining data from each level may provide the most promising approach to gain the most knowledge in Health Informatics.
A Review of Deep Learning with Special Emphasis on Architectures, Applications and Recent Trends Deep learning (DL) has solved a problem that as little as five years ago was thought by many to be intractable – the automatic recognition of patterns in data; and it can do so with accuracy that often surpasses human beings. It has solved problems beyond the realm of traditional, hand-crafted machine learning algorithms and captured the imagination of practitioners trying to make sense out of the flood of data that now inundates our society. As public awareness of the efficacy of DL increases so does the desire to make use of it. But even for highly trained professionals it can be daunting to approach the rapidly increasing body of knowledge produced by experts in the field. Where does one start? How does one determine if a particular model is applicable to their problem? How does one train and deploy such a network? A primer on the subject can be a good place to start. With that in mind, we present an overview of some of the key multilayer ANNs that comprise DL. We also discuss some new automatic architecture optimization protocols that use multi-agent approaches. Further, since guaranteeing system uptime is becoming critical to many computer applications, we include a section on using neural networks for fault detection and subsequent mitigation. This is followed by an exploratory survey of several application areas where DL has emerged as a game-changing technology: anomalous behavior detection in financial applications or in financial time-series forecasting, predictive and prescriptive analytics, medical image processing and analysis and power systems research. The thrust of this review is to outline emerging areas of application-oriented research within the DL community as well as to provide a reference to researchers seeking to use it in their work for what it does best: statistical pattern recognition with unparalleled learning capacity with the ability to scale with information.
A Review of Different Word Embeddings for Sentiment Classification using Deep Learning The web is loaded with textual content, and Natural Language Processing is a standout amongst the most vital fields in Machine Learning. But when data is huge simple Machine Learning algorithms are not able to handle it and that is when Deep Learning comes into play which based on Neural Networks. However since neural networks cannot process raw text, we have to change over them through some diverse strategies of word embedding. This paper demonstrates those distinctive word embedding strategies implemented on an Amazon Review Dataset, which has two sentiments to be classified: Happy and Unhappy based on numerous customer reviews. Moreover we demonstrate the distinction in accuracy with a discourse about which word embedding to apply when.
A Review of Evaluation Techniques for Social Dialogue Systems In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review current evaluation methods, focusing on automatic metrics. We conclude that turn-based metrics often ignore the context and do not account for the fact that several replies are valid, while end-of-dialogue rewards are mainly hand-crafted. Both lack grounding in human perceptions.
A Review of Features for the Discrimination of Twitter Users: Application to the Prediction of Offline Influence Many works related to Twitter aim at characterizing its users in some way: role on the service (spammers, bots, organizations, etc.), nature of the user (socio-professional category, age, etc.), topics of interest, and others. However, for a given user classification problem, it is very difficult to select a set of appropriate features, because the many features described in the literature are very heterogeneous, with name overlaps and collisions, and numerous very close variants. In this article, we review a wide range of such features. In order to present a clear state-of-the-art description, we unify their names, definitions and relationships, and we propose a new, neutral, typology. We then illustrate the interest of our review by applying a selection of these features to the offline influence detection problem. This task consists in identifying users which are influential in real-life, based on their Twitter account and related data. We show that most features deemed efficient to predict online influence, such as the numbers of retweets and followers, are not relevant to this problem. However, We propose several content-based approaches to label Twitter users as Influencers or not. We also rank them according to a predicted influence level. Our proposals are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art methods.
A review of instance selection methods In supervised learning, a training set providing previously known information is used to classify new instances. Commonly, several instances are stored in the training set but some of them are not useful for classifying therefore it is possible to get acceptable classification rates ignoring non useful cases; this process is known as instance selection. Through instance selection the training set is reduced which allows reducing runtimes in the classification and/or training stages of classifiers. This work is focused on presenting a survey of the main instance selection methods reported in the literature.
A Review of Literature on Parallel Constraint Solving As multicore computing is now standard, it seems irresponsible for constraints researchers to ignore the implications of it. Researchers need to address a number of issues to exploit parallelism, such as: investigating which constraint algorithms are amenable to parallelisation; whether to use shared memory or distributed computation; whether to use static or dynamic decomposition; and how to best exploit portfolios and cooperating search. We review the literature, and see that we can sometimes do quite well, some of the time, on some instances, but we are far from a general solution. Yet there seems to be little overall guidance that can be given on how best to exploit multicore computers to speed up constraint solving. We hope at least that this survey will provide useful pointers to future researchers wishing to correct this situation. Under consideration in Theory and Practice of Logic Programming (TPLP).
A Review of Modularization Techniques in Artificial Neural Networks Artificial neural networks (ANNs) have achieved significant success in tackling classical and modern machine learning problems. As learning problems grow in scale and complexity, and expand into multi-disciplinary territory, a more modular approach for scaling ANNs will be needed. Modular neural networks (MNNs) are neural networks that embody the concepts and principles of modularity. MNNs adopt a large number of different techniques for achieving modularization. Previous surveys of modularization techniques are relatively scarce in their systematic analysis of MNNs, focusing mostly on empirical comparisons and lacking an extensive taxonomical framework. In this review, we aim to establish a solid taxonomy that captures the essential properties and relationships of the different variants of MNNs. Based on an investigation of the different levels at which modularization techniques act, we attempt to provide a universal and systematic framework for theorists studying MNNs, also trying along the way to emphasise the strengths and weaknesses of different modularization approaches in order to highlight good practices for neural network practitioners.
A review of neuro-fuzzy systems based on intelligent control The system’s ability to adapt and self-organize are two key factors when it comes to how well the system can survive the changes to the environment and the plant they work within. Intelligent control improves these two factors in controllers. Considering the increasing complexity of dynamic systems along with their need for feedback controls, using more complicated controls has become necessary and intelligent control can be a suitable response to this necessity. This paper briefly describes the structure of intelligent control and provides a review on fuzzy logic and neural networks which are some of the base methods for intelligent control. The different aspects of these two methods are then compared together and an example of a combined method is presented.
A Review of Point Cloud Semantic Segmentation 3D Point Cloud Semantic Segmentation (PCSS) is attracting increasing interest, due to its applicability in remote sensing, computer vision and robotics, and due to the new possibilities offered by deep learning techniques. In order to provide a needed up-to-date review of recent developments in PCSS, this article summarizes existing studies on this topic. Firstly, we outline the acquisition and evolution of the 3D point cloud from the perspective of remote sensing and computer vision, as well as the published benchmarks for PCSS studies. Then, traditional and advanced techniques used for Point Cloud Segmentation (PCS) and PCSS are reviewed and compared. Finally, important issues and open questions in PCSS studies are discussed.
A Review of Relational Machine Learning for Knowledge Graphs Relational machine learning studies methods for the statistical analysis of relational, or graph-structured, data. In this paper, we provide a review of how such statistical models can be ‘trained’ on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph). In particular, we discuss two different kinds of statistical relational models, both of which can scale to massive datasets. The first is based on tensor factorization methods and related latent variable models. The second is based on mining observable patterns in the graph. We also show how to combine these latent and observable models to get improved modeling power at decreased computational cost. Finally, we discuss how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. In particular, we discuss Google’s Knowledge Vault project.
A Review of Self-Exciting Spatio-Temporal Point Processes and Their Applications Self-exciting spatio-temporal point process models predict the rate of events as a function of space, time, and the previous history of events. These models naturally capture triggering and clustering behavior, and have been widely used in fields where spatio-temporal clustering of events is observed, such as earthquake modeling, infectious disease, and crime. In the past several decades, advances have been made in estimation, inference, simulation, and diagnostic tools for self-exciting point process models. In this review, I describe the basic theory, survey related estimation and inference techniques from each field, highlight several key applications, and suggest directions for future research.
A review of single-source unsupervised domain adaptation Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the questions: when and how a classifier can learn from a source domain and generalize to a target domain. As for when, we review conditions that allow for cross-domain generalization error bounds. As for how, we present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods focus on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods focus on alternative estimators, such as robust, minimax or Bayesian. Our categorization highlights recurring ideas and raises a number of questions important to further research.
A review of swarmalators and their potential in bio-inspired computing From fireflies to heart cells, many systems in Nature show the remarkable ability to spontaneously fall into synchrony. By imitating Nature’s success at self-synchronizing, scientists have designed cost-effective methods to achieve synchrony in the lab, with applications ranging from wireless sensor networks to radio transmission. A similar story has occurred in the study of swarms, where inspiration from the behavior flocks of birds and schools of fish has led to ‘low-footprint’ algorithms for multi-robot systems. Here, we continue this ‘bio-inspired’ tradition, by speculating on the technological benefit of fusing swarming with synchronization. The subject of recent theoretical work, minimal models of so-called ‘swarmalator’ systems exhibit rich spatiotemporal patterns, hinting at utility in ‘bottom-up’ robotic swarms. We review the theoretical work on swarmalators, identify possible realizations in Nature, and discuss their potential applications in technology.
A Review of Theoretical and Practical Challenges of Trusted Autonomy in Big Data Despite the advances made in artificial intelligence, software agents, and robotics, there is little we see today that we can truly call a fully autonomous system. We conjecture that the main inhibitor for advancing autonomy is lack of trust. Trusted autonomy is the scientific and engineering field to establish the foundations and ground work for developing trusted autonomous systems (robotics and software agents) that can be used in our daily life, and can be integrated with humans seamlessly, naturally and efficiently. In this paper, we review this literature to reveal opportunities for researchers and practitioners to work on topics that can create a leap forward in advancing the field of trusted autonomy. We focus the paper on the `trust’ component as the uniting technology between humans and machines. Our inquiry into this topic revolves around three sub-topics: (1) reviewing and positioning the trust modelling literature for the purpose of trusted autonomy; (2) reviewing a critical subset of sensor technologies that allow a machine to sense human states; and (3) distilling some critical questions for advancing the field of trusted autonomy. The inquiry is augmented with conceptual models that we propose along the way by recompiling and reshaping the literature into forms that enables trusted autonomous systems to become a reality. The paper offers a vision for a Trusted Cyborg Swarm, an extension of our previous Cognitive Cyber Symbiosis concept, whereby humans and machines meld together in a harmonious, seamless, and coordinated manner.
A Review on Algorithms for Constraint-based Causal Discovery Causal discovery studies the problem of mining causal relationships between variables from data, which is of primary interest in science. During the past decades, significant amount of progresses have been made toward this fundamental data mining paradigm. Recent years, as the availability of abundant large-sized and complex observational data, the constrain-based approaches have gradually attracted a lot of interest and have been widely applied to many diverse real-world problems due to the fast running speed and easy generalizing to the problem of causal insufficiency. In this paper, we aim to review the constraint-based causal discovery algorithms. Firstly, we discuss the learning paradigm of the constraint-based approaches. Secondly and primarily, the state-of-the-art constraint-based casual inference algorithms are surveyed with the detailed analysis. Thirdly, several related open-source software packages and benchmark data repositories are briefly summarized. As a conclusion, some open problems in constraint-based causal discovery are outlined for future research.
A Review on Deep Learning Techniques Applied to Semantic Segmentation Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.
A Review on Recommendation Systems: Context-aware to Social-based The number of Internet users had grown rapidly enticing companies and cooperations to make full use of recommendation infrastructures. Consequently, online advertisement companies emerged to aid us in the presence of numerous items and users. Even as a user, you may find yourself drowned in a set of items that you think you might need, but you are not sure if you should try them. Those items could be online services, products, places or even a person for a friendship. Therefore, we need recommender systems that pave the way and help us making good decisions. This paper provides a review on traditional recommendation systems, recommendation system evaluations and metrics, context-aware recommendation systems, and social-based recommendation systems. While it is hard to include all the information in a brief review paper, we try to have an introductory review over the essentials of recommendation systems. More detailed information on each chapter will be found in the corresponding references. For the purpose of explaining the concept in a different way, we provided slides available on https://…/recommender-systems-97094937.
A review on statistical inference methods for discrete Markov random fields Developing satisfactory methodology for the analysis of Markov random field is a very challenging task. Indeed, due to the Markovian dependence structure, the normalizing constant of the fields cannot be computed using standard analytical or numerical methods. This forms a central issue for any statistical approach as the likelihood is an integral part of the procedure. Furthermore, such unobserved fields cannot be integrated out and the likelihood evaluation becomes a doubly intractable problem. This report gives an overview of some of the methods used in the literature to analyse such observed or unobserved random fields.
A second-quantised Shannon theory Shannon’s theory of information was built on the assumption that the information carriers were classical systems. Its quantum counterpart, quantum Shannon theory, explores the new possibilities that arise when the information carriers are quantum particles. Traditionally,quantum Shannon theory has focussed on scenarios where the internal state of the particles is quantum, while their trajectory in spacetime is classical. Here we propose a second level of quantisation where both the information and its propagation in spacetime is treated quantum mechanically. The framework is illustrated with a number of examples, showcasing some of the couterintuitive phenomena taking place when information travels in a superposition of paths.
A Security Framework for Wireless Sensor Networks: Theory and Practice Wireless sensor networks are often deployed in public or otherwise untrusted and even hostile environments, which prompts a number of security issues. Although security is a necessity in other types of networks, it is much more so in sensor networks due to the resource-constraint, susceptibility to physical capture, and wireless nature. In this work we emphasize two security issues: (1) secure communication infrastructure and (2) secure nodes scheduling algorithm. Due to resource constraints, specific strategies are often necessary to preserve the network’s lifetime and its quality of service. For instance, to reduce communication costs nodes can go to sleep mode periodically (nodes scheduling). These strategies must be proven as secure, but protocols used to guarantee this security must be compatible with the resource preservation requirement. To achieve this goal, secure communications in such networks will be defined, together with the notions of secure scheduling. Finally, some of these security properties will be evaluated in concrete case studies.
A Selective Overview of Deep Learning Deep learning has arguably achieved tremendous success in recent years. In simple words, deep learning uses the composition of many nonlinear functions to model the complex dependency between input features and labels. While neural networks have a long history, recent advances have greatly improved their performance in computer vision, natural language processing, etc. From the statistical and scientific perspective, it is natural to ask: What is deep learning? What are the new characteristics of deep learning, compared with classical methods? What are the theoretical foundations of deep learning? To answer these questions, we introduce common neural network models (e.g., convolutional neural nets, recurrent neural nets, generative adversarial nets) and training techniques (e.g., stochastic gradient descent, dropout, batch normalization) from a statistical point of view. Along the way, we highlight new characteristics of deep learning (including depth and over-parametrization) and explain their practical and theoretical benefits. We also sample recent results on theories of deep learning, many of which are only suggestive. While a complete understanding of deep learning remains elusive, we hope that our perspectives and discussions serve as a stimulus for new statistical research.
A Short Course on Network Analysis These are lecture notes prepared for a short (6 hours) course given at the conference Method- ological Advances in Statistics Related to Big Data, held in Castro Urdiales, Spain, June 8-12, 2015. The course focuses on the analysis of networks without labels at the nodes. It covers de- scriptives statistics for graphs, random graphs models, and graph partitioning, including recent advances in spectral and semide nite methods.
A Short Introduction to Boosting Boosting is a general method for improving the accuracy of any given learning algorithm. This short overview paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting´s relationship to support-vector machines. Some examples of recent applications of boosting are also described.
A Short Introduction to Local Graph Clustering Methods and Software Graph clustering has many important applications in computing, but due to the increasing sizes of graphs, even traditionally fast clustering methods can be computationally expensive for real-world graphs of interest. Scalability problems led to the development of local graph clustering algorithms that come with a variety of theoretical guarantees. Rather than return a global clustering of the entire graph, local clustering algorithms return a single cluster around a given seed node or set of seed nodes. These algorithms improve scalability because they use time and memory resources that depend only on the size of the cluster returned, instead of the size of the input graph. Indeed, for many of them, their running time grows linearly with the size of the output. In addition to scalability arguments, local graph clustering algorithms have proven to be very useful for identifying and interpreting small-scale and meso-scale structure in large-scale graphs. As opposed to heuristic operational procedures, this class of algorithms comes with strong algorithmic and statistical theory. These include statistical guarantees that prove they have implicit regularization properties. One of the challenges with the existing literature on these approaches is that they are published in a wide variety of areas, including theoretical computer science, statistics, data science, and mathematics. This has made it difficult to relate the various algorithms and ideas together into a cohesive whole. We have recently been working on unifying these diverse perspectives through the lens of optimization as well as providing software to perform these computations in a cohesive fashion. In this note, we provide a brief introduction to local graph clustering, we provide some representative examples of our perspective, and we introduce our software named Local Graph Clustering (LGC).
A Short Survey of Topological Data Analysis in Time Series and Systems Analysis Topological Data Analysis (TDA) is the collection of mathematical tools that capture the structure of shapes in data. Despite computational topology and computational geometry, the utilization of TDA in time series and signal processing is relatively new. In some recent contributions, TDA has been utilized as an alternative to the conventional signal processing methods. Specifically, TDA is been considered to deal with noisy signals and time series. In these applications, TDA is used to find the shapes in data as the main properties, while the other properties are assumed much less informative. In this paper, we will review recent developments and contributions where topological data analysis especially persistent homology has been applied to time series analysis, dynamical systems and signal processing. We will cover problem statements such as stability determination, risk analysis, systems behaviour, and predicting critical transitions in financial markets.
A Short Survey On Memory Based Reinforcement Learning Reinforcement learning (RL) is a branch of machine learning which is employed to solve various sequential decision making problems without proper supervision. Due to the recent advancement of deep learning, the newly proposed Deep-RL algorithms have been able to perform extremely well in sophisticated high-dimensional environments. However, even after successes in many domains, one of the major challenge in these approaches is the high magnitude of interactions with the environment required for efficient decision making. Seeking inspiration from the brain, this problem can be solved by incorporating instance based learning by biasing the decision making on the memories of high rewarding experiences. This paper reviews various recent reinforcement learning methods which incorporate external memory to solve decision making and a survey of them is presented. We provide an overview of the different methods – along with their advantages and disadvantages, applications and the standard experimentation settings used for memory based models. This review hopes to be a helpful resource to provide key insight of the recent advances in the field and provide help in further future development of it.
A Short Survey on Probabilistic Reinforcement Learning A reinforcement learning agent tries to maximize its cumulative payoff by interacting in an unknown environment. It is important for the agent to explore suboptimal actions as well as to pick actions with highest known rewards. Yet, in sensitive domains, collecting more data with exploration is not always possible, but it is important to find a policy with a certain performance guaranty. In this paper, we present a brief survey of methods available in the literature for balancing exploration-exploitation trade off and computing robust solutions from fixed samples in reinforcement learning.
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University
A simple neural network module for relational reasoning Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human performance; text-based question answering using the bAbI suite of tasks; and complex reasoning about dynamic physical systems. Then, using a curated dataset called Sort-of-CLEVR we show that powerful convolutional networks do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs. Our work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations.
A snapshot on nonstandard supervised learning problems: taxonomy, relationships and methods Machine learning is a field which studies how machines can alter and adapt their behavior, improving their actions according to the information they are given. This field is subdivided into multiple areas, among which the best known are supervised learning (e.g. classification and regression) and unsupervised learning (e.g. clustering and association rules). Within supervised learning, most studies and research are focused on well known standard tasks, such as binary classification, multiclass classification and regression with one dependent variable. However, there are many other less known problems. These are what we generically call nonstandard supervised learning problems. The literature about them is much more sparse, and each study is directed to a specific task. Therefore, the definitions, relations and applications of this kind of learners are hard to find. The goal of this paper is to provide the reader with a broad view on the distinct variations of nonstandard supervised problems. A comprehensive taxonomy summarizing their traits is proposed. A review of the common approaches followed to accomplish them and their main applications is provided as well.
A Statistical Learning Model of Text Classification for Support Vector Machines
A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions Deep neural networks (DNNs) achieve excellent performance on standard classification tasks. However, under image quality distortions such as blur and noise, classification accuracy becomes poor. In this work, we compare the performance of DNNs with human subjects on distorted images. We show that, although DNNs perform better than or on par with humans on good quality images, DNN performance is still much lower than human performance on distorted images. We additionally find that there is little correlation in errors between DNNs and human subjects. This could be an indication that the internal representation of images are different between DNNs and the human visual system. These comparisons with human performance could be used to guide future development of more robust DNNs.
A Study of Recent Contributions on Information Extraction This paper reports on modern approaches in Information Extraction (IE) and its two main sub-tasks of Named Entity Recognition (NER) and Relation Extraction (RE). Basic concepts and the most recent approaches in this area are reviewed, which mainly include Machine Learning (ML) based approaches and the more recent trend to Deep Learning (DL) based methods.
A Study of Reinforcement Learning for Neural Machine Translation Recent studies have shown that reinforcement learning (RL) is an effective approach for improving the performance of neural machine translation (NMT) system. However, due to its instability, successfully RL training is challenging, especially in real-world systems where deep models and large datasets are leveraged. In this paper, taking several large-scale translation tasks as testbeds, we conduct a systematic study on how to train better NMT models using reinforcement learning. We provide a comprehensive comparison of several important factors (e.g., baseline reward, reward shaping) in RL training. Furthermore, to fill in the gap that it remains unclear whether RL is still beneficial when monolingual data is used, we propose a new method to leverage RL to further boost the performance of NMT systems trained with source/target monolingual data. By integrating all our findings, we obtain competitive results on WMT14 English- German, WMT17 English-Chinese, and WMT17 Chinese-English translation tasks, especially setting a state-of-the-art performance on WMT17 Chinese-English translation task.
A Study on Neural Network Language Modeling An exhaustive study on neural network language modeling (NNLM) is performed in this paper. Different architectures of basic neural network language models are described and examined. A number of different improvements over basic neural network language models, including importance sampling, word classes, caching and bidirectional recurrent neural network (BiRNN), are studied separately, and the advantages and disadvantages of every technique are evaluated. Then, the limits of neural network language modeling are explored from the aspects of model architecture and knowledge representation. Part of the statistical information from a word sequence will loss when it is processed word by word in a certain order, and the mechanism of training neural network by updating weight matrixes and vectors imposes severe restrictions on any significant enhancement of NNLM. For knowledge representation, the knowledge represented by neural network language models is the approximate probabilistic distribution of word sequences from a certain training data set rather than the knowledge of a language itself or the information conveyed by word sequences in a natural language. Finally, some directions for improving neural network language modeling further is discussed.
A Study on Overfitting in Deep Reinforcement Learning Recent years have witnessed significant progresses in deep Reinforcement Learning (RL). Empowered with large scale neural networks, carefully designed architectures, novel training algorithms and massively parallel computing devices, researchers are able to attack many challenging RL problems. However, in machine learning, more training power comes with a potential risk of more overfitting. As deep RL techniques are being applied to critical problems such as healthcare and finance, it is important to understand the generalization behaviors of the trained agents. In this paper, we conduct a systematic study of standard RL agents and find that they could overfit in various ways. Moreover, overfitting could happen “robustly”: commonly used techniques in RL that add stochasticity do not necessarily prevent or detect overfitting. In particular, the same agents and learning algorithms could have drastically different test performance, even when all of them achieve optimal rewards during training. The observations call for more principled and careful evaluation protocols in RL. We conclude with a general discussion on overfitting in RL and a study of the generalization behaviors from the perspective of inductive bias.
A summary on Maximum likelihood Estimator A general method of building a predictive model requires least square estimation at first. Then we need work on the residuals, find the confidence interval of parameters and test how well the model fits the data which are based on the normally distributed assumption of the residuals (or noises). But unfortunately the assumption is not guaranteed. Most of the time, you will have a graph of residuals that looks like another distribution rather than the normal. At this moment you could add one more factor term to your model so as to filter out the non-normal distributed noise, and then calculate the LSE again. But you may still have the same problem again. Or if you can recognize the distribution of the graph (or somehow you know the pdf of the noise), you can just calculate the MLE of the parameters of your model. This time, your work is really finished.
A Survey and Evaluation of Data Center Network Topologies Data centers are becoming increasingly popular for their flexibility and processing capabilities in the modern computing environment. They are managed by a single entity (administrator) and allow dynamic resource provisioning, performance optimization as well as efficient utilization of available resources. Each data center consists of massive compute, network and storage resources connected with physical wires. The large scale nature of data centers requires careful planning of compute, storage, network nodes, interconnection as well as inter-communication for their effective and efficient operations. In this paper, we present a comprehensive survey and taxonomy of network topologies either used in commercial data centers, or proposed by researchers working in this space. We also compare and evaluate some of those topologies using mininet as well as gem5 simulator for different traffic patterns, based on various metrics including throughput, latency and bisection bandwidth.
A Survey of Algorithms for Keyword Search on Graph Data In this chapter, we survey methods that perform keyword search on graph data. Keyword search provides a simple but user-friendly interface to retrieve information from complicated data structures. Since many real life datasets are represented by trees and graphs, keyword search has become an attractive mechanism for data of a variety of types. In this survey, we discuss methods of keyword search on schema graphs, which are abstract representation for XML data and relational data, and methods of keyword search on schema-free graphs. In our discussion, we focus on three major challenges of keyword search on graphs. First, what is the semantics of keyword search on graphs, or, what qualifies as an answer to a keyword search; second, what constitutes a good answer, or, how to rank the answers; third, how to perform keyword search efficiently. We also discuss some unresolved challenges and propose some new research directions on this topic.
A Survey of Autonomous Driving: Common Practices and Emerging Technologies Automated driving systems (ADSs) promise a safe, comfortable and efficient driving experience. However, fatalities involving vehicles equipped with ADSs are on the rise. The full potential of ADSs cannot be realized unless the robustness of state-of-the-art improved further. This paper discusses unsolved problems and surveys the technical aspect of automated driving. Studies regarding present challenges, high-level system architectures, emerging methodologies and core functions: localization, mapping, perception, planning, and human machine interface, were thoroughly reviewed. Furthermore, the state-of-the-art was implemented on our own platform and various algorithms were compared in a real-world driving setting. The paper concludes with an overview of available datasets and tools for ADS development.
A survey of Bayesian predictive methods for model assessment, selection and comparison To date, several methods exist in the statistical literature for model assessment, which purport themselves specifically as Bayesian predic- tive methods. The decision theoretic assumptions on which these methods are based are not always clearly stated in the original articles, however. The aim of this survey is to provide a unified review of Bayesian predictive model assessment and selection methods, and of methods closely related to them. We review the various assumptions that are made in this context and discuss the connections between different approaches, with an emphasis on how each method approximates the expected utility of using a Bayesian model for the purpose of predicting future data.
A Survey of Binary Similarity and Distance Measures The binary feature vector is one of the most common representations of patterns and measuring similarity and distance measures play a critical role in many problems such as clustering, classification, etc. Ever since Jaccard proposed a similarity measure to classify ecological species in 1901, numerous binary similarity and distance measures have been proposed in various fields. Applying appropriate measures results in more accurate data analysis. Notwithstanding, few comprehensive surveys on binary measures have been conducted. Hence we collected 76 binary similarity and distance measures used over the last century and reveal their correlations through the hierarchical clustering technique.
A survey of blockchain frameworks and applications The applications of the blockchain technology are still being discov-ered. When a new potential disruptive technology emerges, there is a tendency to try to solve every problem with that technology. However, it is still necessary to determine what approach is the best for each type of application. To find how distributed ledgers solve existing problems, this study looks for blockchain frameworks in the academic world. Identifying the existing frameworks can demonstrate where the interest in the technology exists and where it can be miss-ing. This study encountered several blockchain frameworks in development. However, there are few references to operational needs, testing, and deploy of the technology. With the widespread use of the technology, either integrating with pre-existing solutions, replacing legacy systems, or new implementations, the need for testing, deploying, exploration, and maintenance is expected to in-tensify.
A survey of Community Question Answering With the advent of numerous community forums, tasks associated with the same have gained importance in the recent past. With the influx of new questions every day on these forums, the issues of identifying methods to find answers to said questions, or even trying to detect duplicate questions, are of practical importance and are challenging in their own right. This paper aims at surveying some of the aforementioned issues, and methods proposed for tackling the same.
A Survey of Community Search Over Big Graphs With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many real applications, such as event organization, friend recommendation, and so on. Consequently, how to efficiently find high-quality communities from big graphs is an important research topic in the era of big data. Recently a large group of research works, called community search, have been proposed. They aim to provide efficient solutions for searching high-quality communities from large networks in real-time. Nevertheless, these works focus on different types of graphs and formulate communities in different manners, and thus it is desirable to have a comprehensive review of these works. In this survey, we conduct a thorough review of existing community search works. Moreover, we analyze and compare the quality of communities under their models, and the performance of different solutions. Furthermore, we point out new research directions. This survey does not only help researchers to have a better understanding of existing community search solutions, but also provides practitioners a better judgment on choosing the proper solutions.
A Survey of Cross-Lingual Embedding Models Cross-lingual embedding models allow us to project words from different languages into a shared embedding space. This allows us to apply models trained on languages with a lot of data, e.g. English to low-resource languages. In the following, we will survey models that seek to learn cross-lingual embeddings. We will discuss them based on the type of approach and the nature of parallel data that they employ. Finally, we will present challenges and summarize how to evaluate cross-lingual embedding models.
A Survey of Deep Learning Methods for Relation Extraction Relation Extraction is an important sub-task of Information Extraction which has the potential of employing deep learning (DL) models with the creation of large datasets using distant supervision. In this review, we compare the contributions and pitfalls of the various DL models that have been used for the task, to help guide the path ahead.
A Survey of Deep Learning Techniques for Autonomous Driving The last decade witnessed increasingly rapid progress in self-driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence. The objective of this paper is to survey the current state-of-the-art on deep learning technologies used in autonomous driving. We start by presenting AI-based self-driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration and motion control algorithms. We investigate both the modular perception-planning-action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources and computational hardware. The comparison presented in this survey helps to gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices
A Survey of Deep Learning Techniques for Mobile Robot Applications Advancements in deep learning over the years have attracted research into how deep artificial neural networks can be used in robotic systems. This research survey will present a summarization of the current research with a specific focus on the gains and obstacles for deep learning to be applied to mobile robotics.
A Survey of Deep Learning-based Object Detection Object detection is one of the most important and challenging branches of computer vision, which has been widely applied in peoples life, such as monitoring security, autonomous driving and so on, with the purpose of locating instances of semantic objects of a certain class. With the rapid development of deep learning networks for detection tasks, the performance of object detectors has been greatly improved. In order to understand the main development status of object detection pipeline, thoroughly and deeply, in this survey, we first analyze the methods of existing typical detection models and describe the benchmark datasets. Afterwards and primarily, we provide a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors. Moreover, we list the traditional and new applications. Some representative branches of object detection are analyzed as well. Finally, we discuss the architecture of exploiting these object detection methods to build an effective and efficient system and point out a set of development trends to better follow the state-of-the-art algorithms and further research.
A survey of dimensionality reduction techniques based on random projection Dimensionality reduction techniques play important roles in the analysis of big data. Traditional dimensionality reduction approaches, such as Principle Component Analysis (PCA) and Linear Discriminant Analysis (LDA), have been studied extensively in the past few decades. However, as the dimension of huge data increases, the computational cost of traditional dimensionality reduction approaches grows dramatically and becomes prohibitive. It has also triggered the development of Random Projection (RP) technique which maps high-dimensional data onto low-dimensional subspace within short time. However, RP generates transformation matrix without considering intrinsic structure of original data and usually leads to relatively high distortion. Therefore, in the past few years, some approaches based on RP have been proposed to address this problem. In this paper, we summarized these approaches in different applications to help practitioners to employ proper approaches in their specific applications. Also, we enumerated their benefits and limitations to provide further references for researchers to develop novel RP-based approaches.
A Survey of Domain Adaptation for Neural Machine Translation Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.
A survey of hidden convex optimization Motivated by the fact that not all nonconvex optimization problems are difficult to solve, we survey in this paper three widely-used ways to reveal the hidden convex structure for different classes of nonconvex optimization problems. Finally, ten open problems are raised.
A Survey of Hierarchy Identification in Social Networks Humans are social by nature. Throughout history, people have formed communities and built relationships. Most relationships with coworkers, friends, and family are developed during face-to-face interactions. These relationships are established through explicit means of communications such as words and implicit such as intonation, body language, etc. By analyzing human interactions we can derive information about the relationships and influence among conversation participants. However, with the development of the Internet, people started to communicate through text in online social networks. Interestingly, they brought their communicational habits to the Internet. Many social network users form relationships with each other and establish communities with leaders and followers. Recognizing these hierarchical relationships is an important task because it will help to understand social networks and predict future trends, improve recommendations, better target advertisement, and improve national security by identifying leaders of anonymous terror groups. In this work, I provide an overview of current research in this area and present the state-of-the-art approaches to deal with the problem of identifying hierarchical relationships in social networks.
A Survey of Inductive Biases for Factorial Representation-Learning With the resurgence of interest in neural networks, representation learning has re-emerged as a central focus in artificial intelligence. Representation learning refers to the discovery of useful encodings of data that make domain-relevant information explicit. Factorial representations identify underlying independent causal factors of variation in data. A factorial representation is compact and faithful, makes the causal factors explicit, and facilitates human interpretation of data. Factorial representations support a variety of applications, including the generation of novel examples, indexing and search, novelty detection, and transfer learning. This article surveys various constraints that encourage a learning algorithm to discover factorial representations. I dichotomize the constraints in terms of unsupervised and supervised inductive bias. Unsupervised inductive biases exploit assumptions about the environment, such as the statistical distribution of factor coefficients, assumptions about the perturbations a factor should be invariant to (e.g. a representation of an object can be invariant to rotation, translation or scaling), and assumptions about how factors are combined to synthesize an observation. Supervised inductive biases are constraints on the representations based on additional information connected to observations. Supervisory labels come in variety of types, which vary in how strongly they constrain the representation, how many factors are labeled, how many observations are labeled, and whether or not we know the associations between the constraints and the factors they are related to. This survey brings together a wide variety of models that all touch on the problem of learning factorial representations and lays out a framework for comparing these models based on the strengths of the underlying supervised and unsupervised inductive biases.
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress Inverse reinforcement learning is the problem of inferring the reward function of an observed agent, given its policy or behavior. Researchers perceive IRL both as a problem and as a class of methods. By categorically surveying the current literature in IRL, this article serves as a reference for researchers and practitioners in machine learning to understand the challenges of IRL and select the approaches best suited for the problem on hand. The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size. The article elaborates how the current methods mitigate these challenges. We further discuss the extensions of traditional IRL methods: (i) inaccurate and incomplete perception, (ii) incomplete model, (iii) multiple rewards, and (iv) non-linear reward functions. This discussion concludes with some broad advances in the research area and currently open research questions.
A Survey of Knowledge Representation and Retrieval for Learning in Service Robotics Within the realm of service robotics, researchers have placed a great amount of effort into learning motions and manipulations for task execution by robots. The task of robot learning is very broad, as it involves many tasks such as object detection, action recognition, motion planning, localization, knowledge representation and retrieval, and the intertwining of computer vision and machine learning techniques. In this paper, we focus on how knowledge can be gathered, represented, and reproduced to solve problems as done by researchers in the past decades. We discuss the problems which have existed in robot learning and the solutions, technologies or developments (if any) which have contributed to solving them. Specifically, we look at three broad categories involved in task representation and retrieval for robotics: 1) activity recognition from demonstrations, 2) scene understanding and interpretation, and 3) task representation in robotics – datasets and networks. Within each section, we discuss major breakthroughs and how their methods address present issues in robot learning and manipulation.
A Survey of Location Prediction on Twitter Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people’s daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.
A Survey of Machine and Deep Learning Methods for Internet of Things (IoT) Security The Internet of Things (IoT) integrates billions of smart devices that can communicate with one another with minimal human intervention. It is one of the fastest developing fields in the history of computing, with an estimated 50 billion devices by the end of 2020. On the one hand, IoT play a crucial role in enhancing several real-life smart applications that can improve life quality. On the other hand, the crosscutting nature of IoT systems and the multidisciplinary components involved in the deployment of such systems introduced new security challenges. Implementing security measures, such as encryption, authentication, access control, network security and application security, for IoT devices and their inherent vulnerabilities is ineffective. Therefore, existing security methods should be enhanced to secure the IoT system effectively. Machine learning and deep learning (ML/DL) have advanced considerably over the last few years, and machine intelligence has transitioned from laboratory curiosity to practical machinery in several important applications. Consequently, ML/DL methods are important in transforming the security of IoT systems from merely facilitating secure communication between devices to security-based intelligence systems. The goal of this work is to provide a comprehensive survey of ML /DL methods that can be used to develop enhanced security methods for IoT systems. IoT security threats that are related to inherent or newly introduced threats are presented, and various potential IoT system attack surfaces and the possible threats related to each surface are discussed. We then thoroughly review ML/DL methods for IoT security and present the opportunities, advantages and shortcomings of each method. We discuss the opportunities and challenges involved in applying ML/DL to IoT security. These opportunities and challenges can serve as potential future research directions.
A Survey of Machine Learning for Big Code and Naturalness Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit code’s abundance of patterns. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.
A Survey of Methods for Collective Communication Optimization and Tuning New developments in HPC technology in terms of increasing computing power on multi/many core processors, high-bandwidth memory/IO subsystems and communication interconnects, pose a direct impact on software and runtime system development. These advancements have become useful in producing high-performance collective communication interfaces that integrate efficiently on a wide variety of platforms and environments. However, number of optimization options that shows up with each new technology or software framework has resulted in a \emph{combinatorial explosion} in feature space for tuning collective parameters such that finding the optimal set has become a nearly impossible task. Applicability of algorithmic choices available for optimizing collective communication depends largely on the scalability requirement for a particular usecase. This problem can be further exasperated by any requirement to run collective problems at very large scales such as in the case of exascale computing, at which impractical tuning by brute force may require many months of resources. Therefore application of statistical, data mining and artificial Intelligence or more general hybrid learning models seems essential in many collectives parameter optimization problems. We hope to explore current and the cutting edge of collective communication optimization and tuning methods and culminate with possible future directions towards this problem.
A Survey of Mixed Data Clustering Algorithms Most of the datasets normally contain either numeric or categorical features. Mixed data comprises of both numeric and categorical features, and they frequently occur in various domains, such as health, finance, marketing, etc. Clustering is often sought on mixed data to find structures and to group similar objects. However, clustering mixed data is challenging because it is difficult to directly apply mathematical operations, such as summation, average etc. on the feature values of these datasets. In this paper, we review various types of mixed data clustering techniques in detail. We present a taxonomy to identify ten types of different mixed data clustering techniques. We also compare the performance of several mixed data clustering methods on publicly available datasets. The paper further identifies challenges in developing different mixed data clustering algorithms and provides guidelines for future directions in this area.
A Survey of Model Compression and Acceleration for Deep Neural Networks Deep convolutional neural networks (CNNs) have recently achieved dramatic accuracy improvements in many visual recognition tasks. However, existing deep convolutional neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep CNNs without significantly decreasing the classification accuracy. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transfered/compact convolutional filters and knowledge distillation. Methods of parameter pruning and sharing will be described in detail at the beginning, and all the others will introduced. For methods of each scheme, we provide insightful analysis regarding the performance, related applications, advantages and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic networks and stochastic depths networks. After that, we survey the evaluation matrix, main datasets used for the evaluating the model performance and recent benchmarking efforts. Finally we conclude this paper, discuss remaining challenges and possible directions in this topic.
A Survey of Modern Object Detection Literature using Deep Learning Object detection is the identification of an object in the image along with its localisation and classification. It has wide spread applications and is a critical component for vision based software systems. This paper seeks to perform a rigorous survey of modern object detection algorithms that use deep learning. As part of the survey, the topics explored include various algorithms, quality metrics, speed/size trade offs and training methodologies. This paper focuses on the two types of object detection algorithms- the SSD class of single step detectors and the Faster R-CNN class of two step detectors. Techniques to construct detectors that are portable and fast on low powered devices are also addressed by exploring new lightweight convolutional base architectures. Ultimately, a rigorous review of the strengths and weaknesses of each detector leads us to the present state of the art.
A Survey of Monte Carlo Tree Search Methods Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm´s derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
A Survey of Natural Language Generation Techniques with a Focus on Dialogue Systems – Past, Present and Future Directions One of the hardest problems in the area of Natural Language Processing and Artificial Intelligence is automatically generating language that is coherent and understandable to humans. Teaching machines how to converse as humans do falls under the broad umbrella of Natural Language Generation. Recent years have seen unprecedented growth in the number of research articles published on this subject in conferences and journals both by academic and industry researchers. There have also been several workshops organized alongside top-tier NLP conferences dedicated specifically to this problem. All this activity makes it hard to clearly define the state of the field and reason about its future directions. In this work, we provide an overview of this important and thriving area, covering traditional approaches, statistical approaches and also approaches that use deep neural networks. We provide a comprehensive review towards building open domain dialogue systems, an important application of natural language generation. We find that, predominantly, the approaches for building dialogue systems use seq2seq or language models architecture. Notably, we identify three important areas of further research towards building more effective dialogue systems: 1) incorporating larger context, including conversation context and world knowledge; 2) adding personae or personality in the NLG system; and 3) overcoming dull and generic responses that affect the quality of system-produced responses. We provide pointers on how to tackle these open problems through the use of cognitive architectures that mimic human language understanding and generation capabilities.
A Survey of Neural Network Techniques for Feature Extraction from Text This paper aims to catalyze the discussions about text feature extraction techniques using neural network architectures. The research questions discussed in the paper focus on the state-of-the-art neural network techniques that have proven to be useful tools for language processing, language generation, text classification and other computational linguistics tasks.
A Survey of Neuromorphic Computing and Neural Networks in Hardware Neuromorphic computing has come to refer to a variety of brain-inspired computers, devices, and models that contrast the pervasive von Neumann computer architecture. This biologically inspired approach has created highly connected synthetic neurons and synapses that can be used to model neuroscience theories as well as solve challenging machine learning problems. The promise of the technology is to create a brain-like ability to learn and adapt, but the technical challenges are significant, starting with an accurate neuroscience model of how the brain works, to finding materials and engineering breakthroughs to build devices to support these models, to creating a programming framework so the systems can learn, to creating applications with brain-like capabilities. In this work, we provide a comprehensive survey of the research and motivations for neuromorphic computing over its history. We begin with a 35-year review of the motivations and drivers of neuromorphic computing, then look at the major research areas of the field, which we define as neuro-inspired models, algorithms and learning approaches, hardware and devices, supporting systems, and finally applications. We conclude with a broad discussion on the major research topics that need to be addressed in the coming years to see the promise of neuromorphic computing fulfilled. The goals of this work are to provide an exhaustive review of the research conducted in neuromorphic computing since the inception of the term, and to motivate further work by illuminating gaps in the field where new research is needed.
A Survey of Online Failure Prediction Methods With ever-growing complexity and dynamicity of computer systems, proactive fault management is an effective approach to enhancing availability. Online failure prediction is the key to such techniques. In contrast to classical reliability methods, online failure prediction is based on runtime monitoring and a variety of models and methods that use the current state of a system and, frequently, the past experience as well. This survey describes these methods. To capture the wide spectrum of approaches concerning this area, a taxonomy has been developed, whose different approaches are explained and major concepts are described in detail.
A Survey of Optimization Methods from a Machine Learning Perspective Machine learning develops rapidly, which has made many theoretical breakthroughs and is widely applied in various fields. Optimization, as an important part of machine learning, has attracted much attention of researchers. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research. In this paper, we first describe the optimization problems in machine learning. Then, we introduce the principles and progresses of commonly used optimization methods. Next, we summarize the applications and developments of optimization methods in some popular machine learning fields. Finally, we explore and give some challenges and open problems for the optimization in machine learning.
A Survey of Parallel Sequential Pattern Mining With the growing popularity of resource sharing and shared resources, large volumes of complex data of different types are collected automatically. Traditional data mining algorithms generally have problems and challenges including huge memory cost, low processing speed, and inadequate hard disk space. For sequential pattern mining (SPM), it is used in a wide variety of real-life applications. However, it is more complex and challenging than frequent itemset mining, and also suffers from the above challenges when handling the large-scale data. To solve these problems, mining sequential patterns in a parallel computing environment has emerged as an important issue with many applications. In this paper, an in-depth survey of the current status of parallel sequential pattern mining (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state of the art parallel SPM. We review the related work of PSPM in detail, including partition-based algorithms for PSPM, Apriori-based PSPM, pattern growth based PSPM, and hybrid algorithms for PSPM, and provide deep description (i.e., characteristics, advantages, and disadvantages) of each parallel approach of PSPM. Some advanced topics for PSPM and the related open-source software are further reviewed in details. Finally, we summarize some challenges and opportunities of PSPM in the big data era.
A Survey of Point-of-interest Recommendation in Location-based Social Networks Point-of-interest (POI) recommendation that suggests new places for users to visit arises with the popularity of location-based social networks (LBSNs). Due to the importance of POI recommendation in LBSNs, it has attracted much academic and industrial interest. In this paper, we offer a systematic review of this field, summarizing the contributions of individual efforts and exploring their relations. We discuss the new properties and challenges in POI recommendation, compared with traditional recommendation problems, e.g., movie recommendation. Then, we present a comprehensive review in three aspects: influential factors for POI recommendation, methodologies employed for POI recommendation, and different tasks in POI recommendation. Specifically, we propose three taxonomies to classify POI recommendation systems. First, we categorize the systems by the influential factors check-in characteristics, including the geographical information, social relationship, temporal influence, and content indications. Second, we categorize the systems by the methodology, including systems modeled by fused methods and joint methods. Third, we categorize the systems as general POI recommendation and successive POI recommendation by subtle differences in the recommendation task whether to be bias to the recent check-in. For each category, we summarize the contributions and system features, and highlight the representative work. Moreover, we discuss the available data sets and the popular metrics. Finally, we point out the possible future directions in this area and conclude this survey.
A Survey of Shortest-Path Algorithms A shortest-path algorithm finds a path containing the minimal cost between two vertices in a graph. A plethora of shortest-path algorithms is studied in the literature that span across multiple disciplines. This paper presents a survey of shortest-path algorithms based on a taxonomy that is introduced in the paper. One dimension of this taxonomy is the various flavors of the shortest-path problem. There is no one general algorithm that is capable of solving all variants of the shortest-path problem due to the space and time complexities associated with each algorithm. Other important dimensions of the taxonomy include whether the shortest-path algorithm operates over a static or a dynamic graph, whether the shortest-path algorithm produces exact or approximate answers, and whether the objective of the shortest-path algorithm is to achieve time-dependence or is to only be goal directed. This survey studies and classifies shortest-path algorithms according to the proposed taxonomy. The survey also presents the challenges and proposed solutions associated with each category in the taxonomy.
A Survey of Tensor Methods Matrix decompositions have always been at the heart of signal, circuit and system theory. In particular, the Singular Value Decomposition (SVD) has been an important tool. There is currently a shift of paradigm in the algebraic foundations of these fields. Quite recently, Nonnegative Matrix Factorization (NMF) has been shown to outperform SVD at a number of tasks. Increasing research efforts are spent on the study and application of decompositions of higher-order tensors or multi-way arrays. This paper is a partial survey on tensor generalizations of the SVD and their applications. We also touch on Nonnegative Tensor Factorizations.
A Survey of the Recent Architectures of Deep Convolutional Neural Networks Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.
A Survey of the Usages of Deep Learning in Natural Language Processing Over the last several years, the field of natural language processing has been propelled forward by an explosion in the use of deep learning models. This survey provides a brief introduction to the field and a quick overview of deep learning architectures and methods. It then sifts through the plethora of recent studies and summarizes a large assortment of relevant contributions. Analyzed research areas include several core linguistic processing issues in addition to a number of applications of computational linguistics. A discussion of the current state of the art is then provided along with recommendations for future research in the field.
A survey of transfer learning Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments.
A Survey of Tuning Parameter Selection for High-dimensional Regression Penalized (or regularized) regression, as represented by Lasso and its variants, has become a standard technique for analyzing high-dimensional data when the number of variables substantially exceeds the sample size. The performance of penalized regression relies crucially on the choice of the tuning parameter, which determines the amount of regularization and hence the sparsity level of the fitted model. The optimal choice of tuning parameter depends on both the structure of the design matrix and the unknown random error distribution (variance, tail behavior, etc). This article reviews the current literature of tuning parameter selection for high-dimensional regression from both theoretical and practical perspectives. We discuss various strategies that choose the tuning parameter to achieve prediction accuracy or support recovery. We also review several recently proposed methods for tuning-free high-dimensional regression.
A Survey of Utility-Oriented Pattern Mining The main purpose of data mining and analytics is to find novel, potentially useful patterns that can be utilized in real-world applications to derive beneficial knowledge. For identifying and evaluating the usefulness of different kinds of patterns, many techniques/constraints have been proposed, such as support, confidence, sequence order, and utility parameters (e.g., weight, price, profit, quantity, etc.). In recent years, there has been an increasing demand for utility-oriented pattern mining (UPM). UPM is a vital task, with numerous high-impact applications, including cross-marketing, e-commerce, finance, medical, and biomedical applications. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods of UPM. First, we introduce an in-depth understanding of UPM, including concepts, examples, and comparisons with related concepts. A taxonomy of the most common and state-of-the-art approaches for mining different kinds of high-utility patterns is presented, including Apriori-based, tree-based, projection-based, vertical-/horizontal-data-format-based, and other hybrid approaches. A comprehensive review of advanced topics of existing high-utility pattern mining techniques is offered, with a discussion of their pros and cons. Finally, we present several well-known open-source software packages for UPM. We conclude our survey with a discussion on open and practical challenges in this field.
A Survey of Visual Analysis of Human Motion and Its Applications This paper summarizes the recent progress in human motion analysis and its applications. In the beginning, we reviewed the motion capture systems and the representation model of human’s motion data. Next, we sketched the advanced human motion data processing technologies, including motion data filtering, temporal alignment, and segmentation. The following parts overview the state-of-the-art approaches of action recognition and dynamics measuring since these two are the most active research areas in human motion analysis. The last part discusses some emerging applications of the human motion analysis in healthcare, human robot interaction, security surveillance, virtual reality and animation. The promising research topics of human motion analysis in the future is also summarized in the last part.
A Survey on Acceleration of Deep Convolutional Neural Networks Deep Neural Networks have achieved remarkable progress during the past few years and are currently the fundamental tools of many intelligent systems. At the same time, the computational complexity and resource consumption of these networks are also continuously increasing. This will pose a significant challenge to the deployment of such networks, especially for real-time applications or on resource-limited devices. Thus, network acceleration have become a hot topic within the deep learning community. As for hardware implementation of deep neural networks, a batch of accelerators based on FPGA/ASIC have been proposed these years. In this paper, we provide a comprehensive survey about the recent advances on network acceleration, compression and accelerator design from both algorithm and hardware side. Specifically, we provide thorough analysis for each of the following topics: network pruning, low-rank approximation, network quantization, teacher-student networks, compact network design and hardware accelerator. Finally, we make a discussion and introduce a few possible future directions.
A Survey on Active Learning and Human-in-the-Loop Deep Learning for Medical Image Analysis Fully automatic deep learning has become the state-of-the-art technique for many tasks including image acquisition, analysis and interpretation, and for the extraction of clinically useful information for computer-aided detection, diagnosis, treatment planning, intervention and therapy. However, the unique challenges posed by medical image analysis suggest that retaining a human end-user in any deep learning enabled system will be beneficial. In this review we investigate the role that humans might play in the development and deployment of deep learning enabled diagnostic applications and focus on techniques that will retain a significant input from a human end user. Human-in-the-Loop computing is an area that we see as increasingly important in future research due to the safety-critical nature of working in the medical domain. We evaluate four key areas that we consider vital for deep learning in the clinical practice: (1) Active Learning – to choose the best data to annotate for optimal model performance; (2) Interpretation and Refinement – using iterative feedback to steer models to optima for a given prediction and offering meaningful ways to interpret and respond to predictions; (3) Practical considerations – developing full scale applications and the key considerations that need to be made before deployment; (4) Related Areas – research fields that will benefit human-in-the-loop computing as they evolve. We offer our opinions on the most promising directions of research and how various aspects of each area might be unified towards common goals.
A survey on Adversarial Attacks and Defenses in Text Deep neural networks (DNNs) have shown an inherent vulnerability to adversarial examples which are maliciously crafted on real examples by attackers, aiming at making target DNNs misbehave. The threats of adversarial examples are widely existed in image, voice, speech, and text recognition and classification. Inspired by the previous work, researches on adversarial attacks and defenses in text domain develop rapidly. To the best of our knowledge, this article presents a comprehensive review on adversarial examples in text. We analyze the advantages and shortcomings of recent adversarial examples generation methods and elaborate the efficiency and limitations on countermeasures. Finally, we discuss the challenges in adversarial texts and provide a research direction of this aspect.
A Survey on Adversarial Information Retrieval on the Web This survey paper discusses different forms of malicious techniques that can affect how an information retrieval model retrieves documents for a query and their remedies.
A Survey on Artificial Intelligence and Data Mining for MOOCs Massive Open Online Courses (MOOCs) have gained tremendous popularity in the last few years. Thanks to MOOCs, millions of learners from all over the world have taken thousands of high-quality courses for free. Putting together an excellent MOOC ecosystem is a multidisciplinary endeavour that requires contributions from many different fields. Artificial intelligence (AI) and data mining (DM) are two such fields that have played a significant role in making MOOCs what they are today. By exploiting the vast amount of data generated by learners engaging in MOOCs, DM improves our understanding of the MOOC ecosystem and enables MOOC practitioners to deliver better courses. Similarly, AI, supported by DM, can greatly improve student experience and learning outcomes. In this survey paper, we first review the state-of-the-art artificial intelligence and data mining research applied to MOOCs, emphasising the use of AI and DM tools and techniques to improve student engagement, learning outcomes, and our understanding of the MOOC ecosystem. We then offer an overview of key trends and important research to carry out in the fields of AI and DM so that MOOCs can reach their full potential.
A Survey on Bias and Fairness in Machine Learning With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues into consideration while designing and engineering these types of systems. Such systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that the decisions do not reflect discriminatory behavior toward certain groups or populations. We have recently seen work in machine learning, natural language processing, and deep learning that addresses such challenges in different subdomains. With the commercialization of these systems, researchers are becoming aware of the biases that these applications can contain and have attempted to address them. In this survey we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined in order to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and how they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.
A survey on Big Data and Machine Learning for Chemistry Herein we review aspects of leading-edge research and innovation in chemistry which exploits big data and machine learning (ML), two computer science fields that combine to yield machine intelligence. ML can accelerate the solution of intricate chemical problems and even solve problems that otherwise would not be tractable. But the potential benefits of ML come at the cost of big data production; that is, the algorithms, in order to learn, demand large volumes of data of various natures and from different sources, from materials properties to sensor data. In the survey, we propose a roadmap for future developments, with emphasis on materials discovery and chemical sensing, and within the context of the Internet of Things (IoT), both prominent research fields for ML in the context of big data. In addition to providing an overview of recent advances, we elaborate upon the conceptual and practical limitations of big data and ML applied to chemistry, outlining processes, discussing pitfalls, and reviewing cases of success and failure.
A Survey on Compressive Sensing: Classical Results and Recent Advancements Recovering sparse signals from linear measurements has demonstrated outstanding utility in a vast variety of real-world applications. Compressive sensing is the topic that studies the associated raised questions for the possibility of a successful recovery. This topic is well-nourished and numerous results are available in the literature. However, their dispersity makes it challenging and time-consuming for new readers and practitioners to quickly grasp its main ideas and classical algorithms, and further touch upon the recent advancements in this surging field. Besides, the sparsity notion has already demonstrated its effectiveness in many contemporary fields. Thus, these results are useful and inspiring for further investigation of related questions in these emerging fields from new perspectives. In this survey, we gather and overview vital classical tools and algorithms in compressive sensing and describe significant recent advancements. We conclude this survey by a numerical comparison of the performance of described approaches on an interesting application.
A Survey on Contextual Multi-armed Bandits The natural of contextual bandits makes it suitable for many machine learning applications such as user modeling, Internet advertising, search engine, experiments optimization etc. In this survey we cover three different types of contextual bandits algorithms, and for each type we introduce several representative algorithms. We also compare the regrets and assumptions between these algorithms.
A Survey on Data Collection for Machine Learning: a Big Data – AI Integration Perspective Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning where feature engineering is the bottleneck, deep learning techniques automatically generate features, but instead require large amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data. In this survey, we perform a comprehensive study of data collection from a data management point of view. Data collection largely consists of data acquisition, data labeling, and improvement of existing data or models. We provide a research landscape of these operations, provide guidelines on which technique to use when, and identify interesting research challenges. The integration of machine learning and data management for data collection is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.
A Survey on Deep Learning for Named Entity Recognition Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.
A Survey on Deep Learning Methods for Robot Vision Deep learning has allowed a paradigm shift in pattern recognition, from using hand-crafted features together with statistical classifiers to using general-purpose learning procedures for learning data-driven representations, features, and classifiers together. The application of this new paradigm has been particularly successful in computer vision, in which the development of deep learning methods for vision applications has become a hot research topic. Given that deep learning has already attracted the attention of the robot vision community, the main purpose of this survey is to address the use of deep learning in robot vision. To achieve this, a comprehensive overview of deep learning and its usage in computer vision is given, that includes a description of the most frequently used neural models and their main application areas. Then, the standard methodology and tools used for designing deep-learning based vision systems are presented. Afterwards, a review of the principal work using deep learning in robot vision is presented, as well as current and future trends related to the use of deep learning in robotics. This survey is intended to be a guide for the developers of robot vision systems.
A Survey on Deep Learning Toolkits and Libraries for Intelligent User Interfaces This paper provides an overview of prominent deep learning toolkits and, in particular, reports on recent publications that contributed open source software for implementing tasks that are common in intelligent user interfaces (IUI). We provide a scientific reference for researchers and software engineers who plan to utilise deep learning techniques within their IUI research and development projects.
A Survey on Deep Transfer Learning As a new classification platform, deep learning has recently received increasing attention from researchers and has been successfully applied to many domains. In some domains, like bioinformatics and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation, which limits its development. Transfer learning relaxes the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. This survey focuses on reviewing the current researches of transfer learning by using deep neural network and its applications. We defined deep transfer learning, category and review the recent research works based on the techniques used in deep transfer learning.
A Survey on Dialogue Systems: Recent Advances and New Frontiers Dialogue systems have attracted more and more attention. Recent advances on dialogue systems are overwhelmingly contributed by deep learning techniques, which have been employed to enhance a wide range of big data applications such as computer vision, natural language processing, and recommender systems. For dialogue systems, deep learning can leverage a massive amount of data to learn meaningful feature representations and response generation strategies, while requiring a minimum amount of hand-crafting. In this article, we give an overview to these recent advances on dialogue systems from various perspectives and discuss some possible research directions. In particular, we generally di- vide existing dialogue systems into task-oriented and non- task-oriented models, then detail how deep learning techniques help them with representative algorithms and finally discuss some appealing research directions that can bring the dialogue system research into a new frontier.
A Survey on Domain-Specific Languages for Machine Learning in Big Data The amount of data generated in the modern society is increasing rapidly. New problems and novel approaches of data capture, storage, analysis and visualization are responsible for the emergence of the Big Data research field. Machine Learning algorithms can be used in Big Data to make better and more accurate inferences. However, because of the challenges Big Data imposes, these algorithms need to be adapted and optimized to specific applications. One important decision made by software engineers is the choice of the language that is used in the implementation of these algorithms. Therefore, this literature survey identifies and describes domain-specific languages and frameworks used for Machine Learning in Big Data. By doing this, software engineers can then make more informed choices and beginners have an overview of the main languages used in this domain.
A Survey on Expert Recommendation in Community Question Answering Community question answering (CQA) represents the type of Web applications where people can exchange knowledge via asking and answering questions. One significant challenge of most real-world CQA systems is the lack of effective matching between questions and the potential good answerers, which adversely affects the efficient knowledge acquisition and circulation. On the one hand, a requester might experience many low-quality answers without receiving a quality response in a brief time, on the other hand, an answerer might face numerous new questions without being able to identify their questions of interest quickly. Under this situation, expert recommendation emerges as a promising technique to address the above issues. Instead of passively waiting for users to browse and find their questions of interest, an expert recommendation method raises the attention of users to the appropriate questions actively and promptly. The past few years have witnessed considerable efforts that address the expert recommendation problem from different perspectives. These methods all have their issues that need to be resolved before the advantages of expert recommendation can be fully embraced. In this survey, we first present an overview of the research efforts and state-of-the-art techniques for the expert recommendation in CQA. We next summarize and compare the existing methods concerning their advantages and shortcomings, followed by discussing the open issues and future research directions.
A Survey on Geographically Distributed Big-Data Processing using MapReduce Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many industries, e.g., Google, Facebook, and Amazon, for solving a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and social network analysis. However, all these popular systems have a major drawback in terms of locally distributed computations, which prevent them in implementing geographically distributed data processing. The increasing amount of geographically distributed massive data is pushing industries and academia to rethink the current big-data processing systems. The novel frameworks, which will be beyond state-of-the-art architectures and technologies involved in the current system, are expected to process geographically distributed data at their locations without moving entire raw datasets to a single location. In this paper, we investigate and discuss challenges and requirements in designing geographically distributed data processing frameworks and protocols. We classify and study batch processing (MapReduce-based systems), stream processing (Spark-based systems), and SQL-style processing geo-distributed frameworks, models, and algorithms with their overhead issues.
A Survey on Graph Kernels Graph kernels have become an established and widely-used technique for solving classification tasks on graphs. This survey gives a comprehensive overview of techniques for kernel-based graph classification developed in the past 15 years. We describe and categorize graph kernels based on properties inherent to their design, such as the nature of their extracted graph features, their method of computation and their applicability to problems in practice. In an extensive experimental evaluation, we study the classification accuracy of a large suite of graph kernels on established benchmarks as well as new datasets. We compare the performance of popular kernels with several baseline methods and study the effect of applying a Gaussian RBF kernel to the metric induced by a graph kernel. In doing so, we find that simple baselines become competitive after this transformation on some datasets. Moreover, we study the extent to which existing graph kernels agree in their predictions (and prediction errors) and obtain a data-driven categorization of kernels as result. Finally, based on our experimental results, we derive a practitioner’s guide to kernel-based graph classification.
A Survey on Influence Maximization in a Social Network Given a social network with diffusion probabilities as edge weights and an integer k, which k nodes should be chosen for initial injection of information to maximize influence in the network? This problem is known as Target Set Selection in a social network (TSS Problem) and more popularly, Social Influence Maximization Problem (SIM Problem). This is an active area of research in computational social network analysis domain since one and half decades or so. Due to its practical importance in various domains, such as viral marketing, target advertisement, personalized recommendation, the problem has been studied in different variants, and different solution methodologies have been proposed over the years. Hence, there is a need for an organized and comprehensive review on this topic. This paper presents a survey on the progress in and around TSS Problem. At last, it discusses current research trends and future research directions as well.
A Survey on Load Balancing Algorithms for VM Placement in Cloud Computing The emergence of cloud computing based on virtualization technologies brings huge opportunities to host virtual resource at low cost without the need of owning any infrastructure. Virtualization technologies enable users to acquire, configure and be charged on pay-per-use basis. However, Cloud data centers mostly comprise heterogeneous commodity servers hosting multiple virtual machines (VMs) with potential various specifications and fluctuating resource usages, which may cause imbalanced resource utilization within servers that may lead to performance degradation and service level agreements (SLAs) violations. To achieve efficient scheduling, these challenges should be addressed and solved by using load balancing strategies, which have been proved to be NP-hard problem. From multiple perspectives, this work identifies the challenges and analyzes existing algorithms for allocating VMs to PMs in infrastructure Clouds, especially focuses on load balancing. A detailed classification targeting load balancing algorithms for VM placement in cloud data centers is investigated and the surveyed algorithms are classified according to the classification. The goal of this paper is to provide a comprehensive and comparative understanding of existing literature and aid researchers by providing an insight for potential future enhancements.
A Survey on Monochromatic Connections of Graphs The concept of monochromatic connection of graphs was introduced by Caro and Yuster in 2011. Recently, a lot of results have been published about it. In this survey, we attempt to bring together all the results that dealt with it. We begin with an introduction, and then classify the results into the following categories: monochromatic connection coloring of edge-version, monochromatic connection coloring of vertex-version, monochromatic index, monochromatic connection coloring of total-version.
A Survey on Multi-output Learning Multi-output learning aims to simultaneously predict multiple outputs given an input. It is an important learning problem due to the pressing need for sophisticated decision making in real-world applications. Inspired by big data, the 4Vs characteristics of multi-output imposes a set of challenges to multi-output learning, in terms of the volume, velocity, variety and veracity of the outputs. Increasing number of works in the literature have been devoted to the study of multi-output learning and the development of novel approaches for addressing the challenges encountered. However, it lacks a comprehensive overview on different types of challenges of multi-output learning brought by the characteristics of the multiple outputs and the techniques proposed to overcome the challenges. This paper thus attempts to fill in this gap to provide a comprehensive review on this area. We first introduce different stages of the life cycle of the output labels. Then we present the paradigm on multi-output learning, including its myriads of output structures, definitions of its different sub-problems, model evaluation metrics and popular data repositories used in the study. Subsequently, we review a number of state-of-the-art multi-output learning methods, which are categorized based on the challenges.
A Survey on Multi-View Clustering With the fast development of information technology, especially the popularization of internet, multi-view learning becomes more and more popular in machine learning and data mining fields. As we all know that, multi-view semi-supervised learning, such as co-training, co-regularization has gained considerable attentions. Although recently, multi-view clustering (MVC) has developed rapidly, there are not a survey or review to summarize and analyze the current progress. Therefore, this paper sums up the common strategies of combining multiple views and based on that we proposed a novel taxonomy of the MVC approaches. We also discussed the relationships between MVC and multi-view representation, ensemble clustering, multi-task clustering, multi-view supervised and multi-view semi-supervised learning. Several representative real-world applications are elaborated. To promote the further development of MVC, we pointed out several open problems that are worth exploring in the future.
A Survey on Natural Language Processing for Fake News Detection Fake news detection is a critical yet challenging problem in Natural Language Processing (NLP). The rapid rise of social networking platforms has not only yielded a vast increase in information accessibility but has also accelerated the spread of fake news. Given the massive amount of Web content, automatic fake news detection is a practical NLP problem required by all online content providers. This paper presents a survey on fake news detection. Our survey introduces the challenges of automatic fake news detection. We systematically review the datasets and NLP solutions that have been developed for this task. We also discuss the limits of these datasets and problem formulations, our insights, and recommended solutions.
A Survey on Neural Architecture Search The growing interest in both the automation of machine learning and deep learning has inevitably led to the development of automated methods for neural architecture optimization. The choice of the network architecture has proven to be critical, and many advances in deep learning spring from its immediate improvements. However, deep learning techniques are computationally intensive and their application requires a high level of domain knowledge. Therefore, even partial automation of this process would help make deep learning more accessible to both researchers and practitioners. With this survey, we provide a formalism which unifies and categorizes the landscape of existing methods along with a detailed analysis that compares and contrasts the different approaches. We achieve this via a discussion of common architecture search spaces and architecture optimization algorithms based on principles of reinforcement learning and evolutionary algorithms along with approaches that incorporate surrogate and one-shot models. Additionally, we address the new research directions which include constrained and multi-objective architecture search as well as automated data augmentation, optimizer and activation function search.
A Survey on Neural Network Language Models As the core component of Natural Language Processing (NLP) system, Language Model (LM) can provide word representation and probability indication of word sequences. Neural Network Language Models (NNLMs) overcome the curse of dimensionality and improve the performance of traditional LMs. A survey on NNLMs is performed in this paper. The structure of classic NNLMs is described firstly, and then some major improvements are introduced and analyzed. We summarize and compare corpora and toolkits of NNLMs. Further, some research directions of NNLMs are discussed.
A Survey on Recent Advances in Named Entity Recognition from Deep Learning models Named Entity Recognition (NER) is a key component in NLP systems for question answering, information retrieval, relation extraction, etc. NER systems have been studied and developed widely for decades, but accurate systems using deep neural networks (NN) have only been introduced in the last few years. We present a comprehensive survey of deep neural network architectures for NER, and contrast them with previous approaches to NER based on feature engineering and other supervised or semi-supervised learning algorithms. Our results highlight the improvements achieved by neural networks, and show how incorporating some of the lessons learned from past work on feature-based NER systems can yield further improvements.
A Survey on Resilient Machine Learning Machine learning based system are increasingly being used for sensitive tasks such as security surveillance, guiding autonomous vehicle, taking investment decisions, detecting and blocking network intrusion and malware etc. However, recent research has shown that machine learning models are venerable to attacks by adversaries at all phases of machine learning (eg, training data collection, training, operation). All model classes of machine learning systems can be misled by providing carefully crafted inputs making them wrongly classify inputs. Maliciously created input samples can affect the learning process of a ML system by either slowing down the learning process, or affecting the performance of the learned mode, or causing the system make error(s) only in attacker’s planned scenario. Because of these developments, understanding security of machine learning algorithms and systems is emerging as an important research area among computer security and machine learning researchers and practitioners. We present a survey of this emerging area in machine learning.
A Survey on Semantic Parsing A significant amount of information in today’s world is stored in structured and semi-structured knowledge bases. Efficient and simple methods to query these databases are essential and must not be restricted to only those who have expertise in formal query languages. The field of semantic parsing deals with converting natural language utterances to logical forms that can be easily executed on a knowledge base. In this survey, we examine the various components of a semantic parsing system and discuss prominent work ranging from the initial rule based methods to the current neural approaches to program synthesis. We also discuss methods that operate using varying levels of supervision and highlight the key challenges involved in the learning of such systems.
A Survey on Sentiment and Emotion Analysis for Computational Literary Studies Emotions have often been a crucial part of compelling narratives: literature tells about people with goals, desires, passions, and intentions. In the past, classical literary studies usually scrutinized the affective dimension of literature within the framework of hermeneutics. However, with emergence of the research field known as Digital Humanities (DH) some studies of emotions in literary context have taken a computational turn. Given the fact that DH is still being formed as a science, this direction of research can be rendered relatively new. At the same time, the research in sentiment analysis started in computational linguistic almost two decades ago and is nowadays an established field that has dedicated workshops and tracks in the main computational linguistics conferences. This leads us to the question of what are the commonalities and discrepancies between sentiment analysis research in computational linguistics and digital humanities? In this survey, we offer an overview of the existing body of research on sentiment and emotion analysis as applied to literature. We precede the main part of the survey with a short introduction to natural language processing and machine learning, psychological models of emotions, and provide an overview of existing approaches to sentiment and emotion analysis in computational linguistics. The papers presented in this survey are either coming directly from DH or computational linguistics venues and are limited to sentiment and emotion analysis as applied to literary text.
A Survey on Session-based Recommender Systems Session-based recommender systems (SBRS) are an emerging topic in the recommendation domain and have attracted much attention from both academia and industry in recent years. Most of existing works only work on modelling the general item-level dependency for recommendation tasks. However, there are many more other challenges at different levels, e.g., item feature level and session level, and from various perspectives, e.g., item heterogeneity and intra- and inter-item feature coupling relations, associated with SBRS. In this paper, we provide a systematic and comprehensive review on SBRS and create a hierarchical and in-depth understanding of a variety of challenges in SBRS. To be specific, we first illustrate the value and significance of SBRS, followed by a hierarchical framework to categorize the related research issues and methods of SBRS and to reveal its intrinsic challenges and complexities. Further, a summary together with a detailed introduction of the research progress is provided. Lastly, we share some prospects in this research area.
A Survey on Social Media Anomaly Detection Social media anomaly detection is of critical importance to prevent malicious activities such as bullying, terrorist attack planning, and fraud information dissemination. With the recent popularity of social media, new types of anomalous behaviors arise, causing concerns from various parties. While a large amount of work have been dedicated to traditional anomaly detection problems, we observe a surge of research interests in the new realm of social media anomaly detection. In this paper, we present a survey on existing approaches to address this problem. We focus on the new type of anomalous phenomena in the social media and review the recent developed techniques to detect those special types of anomalies. We provide a general overview of the problem domain, common formulations, existing methodologies and potential directions. With this work, we hope to call out the attention from the research community on this challenging problem and open up new directions that we can contribute in the future.
A survey on trajectory clustering analysis This paper comprehensively surveys the development of trajectory clustering. Considering the critical role of trajectory data mining in modern intelligent systems for surveillance security, abnormal behavior detection, crowd behavior analysis, and traffic control, trajectory clustering has attracted growing attention. Existing trajectory clustering methods can be grouped into three categories: unsupervised, supervised and semi-supervised algorithms. In spite of achieving a certain level of development, trajectory clustering is limited in its success by complex conditions such as application scenarios and data dimensions. This paper provides a holistic understanding and deep insight into trajectory clustering, and presents a comprehensive analysis of representative methods and promising future directions.
A Survey on Trust Modeling from a Bayesian Perspective This paper is concerned with trust modeling for networked computing systems. Of particular interest to this paper is the observation that trust is a subjective notion that is invisible, implicit and uncertain in nature, therefore it may be suitable for being expressed by subjective probabilities and then modeled on the basis of Bayesian principle. In spite of a few attempts to model trust in the Bayesian paradigm, the field lacks a global comprehensive overview of Bayesian methods and their theoretical connections to other alternatives. This paper presents a study to fill in this gap. It provides a comprehensive review and analysis of the literature, showing that a large deal of existing work, whether or not proposed based on Bayesian principle, can cast into a general Bayesian paradigm termed subjective Bayesian trust (SBT) theory here. The SBT framework can thus act as a general theoretical infrastructure for comparing or analyzing theoretical ties among existing trust models, and for developing novel models. The aim of this study is twofold. One is to gain insights about Bayesian philosophy in modeling trust. The other is to drive current research step ahead in seeking a high-level, abstract way of modeling and evaluating trust.
A Survey on Visual Query Systems in the Web Era (extended version) As more and more collections of data are becoming available on the web to everyone, non expert users demand easy ways to retrieve data from these collections. One solution is the so called Visual Query Systems (VQS) where queries are represented visually and users do not have to understand query languages such as SQL or XQuery. In 1996, a paper by Catarci reviewed the Visual Query Systems available until that year. In this paper, we review VQSs from 1997 until now and try to determine whether they have been the solution for non expert users. The short answer is no because very few systems have in fact been used in real environments or as commercial tools. We have also gathered basic features of VQSs such as the visual representation adopted to present the reality of interest or the visual representation adopted to express queries.
A Survey: Non-Orthogonal Multiple Access with Compressed Sensing Multiuser Detection for mMTC One objective of the 5G communication system and beyond is to support massive machine type of communication (mMTC) to propel the fast growth of diverse Internet of Things use cases. The mMTC aims to provide connectivity to tens of billions sensor nodes. The dramatic increase of sensor devices and massive connectivity impose critical challenges for the network to handle the enormous control signaling overhead with limited radio resource. Non-Orthogonal Multiple Access (NOMA) is a new paradigm shift in the design of multiple user detection and multiple access. NOMA with compressive sensing based multiuser detection is one of the promising candidates to address the challenges of mMTC. The survey article aims at providing an overview of the current state-of-art research work in various compressive sensing based techniques that enable NOMA. We present characteristics of different algorithms and compare their pros and cons, thereby provide useful insights for researchers to make further contributions in NOMA using compressive sensing techniques.
A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas This report will show the history of deep learning evolves. It will trace back as far as the initial belief of connectionism modelling of brain, and come back to look at its early stage realization: neural networks. With the background of neural network, we will gradually introduce how convolutional neural networks, as a representative of deep discriminative models, is developed from neural networks, together with many practical techniques that can help in optimization of neural networks. On the other hand, we will also trace back to see the evolution history of deep generative models, to see how researchers balance the representation power and computation complexity to reach Restricted Boltzmann Machine and eventually reach Deep Belief Nets. Further, we will also look into the development history of modelling time series data with neural networks. We start with Time Delay Neural Networks and move further to currently famous model named Recurrent Neural Network and its extension Lone Time Short Memory. We will also briefly look into how to construct deep recurrent neural networks. Finally, we will conclude this report with some interesting open-ended questions of deep neural networks.
A System for Accessible Artificial Intelligence While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects.
A systematic review of fuzzing based on machine learning techniques Security vulnerabilities play a vital role in network security system. Fuzzing technology is widely used as a vulnerability discovery technology to reduce damage in advance. However, traditional fuzzing techniques have many challenges, such as how to mutate input seed files, how to increase code coverage, and how to effectively bypass verification. Machine learning technology has been introduced as a new method into fuzzing test to alleviate these challenges. This paper reviews the research progress of using machine learning technology for fuzzing test in recent years, analyzes how machine learning improve the fuzz process and results, and sheds light on future work in fuzzing. Firstly, this paper discusses the reasons why machine learning techniques can be used for fuzzing scenarios and identifies six different stages in which machine learning have been used. Then this paper systematically study the machine learning based fuzzing models from selection of machine learning algorithm, pre-processing methods, datasets, evaluation metrics, and hyperparameters setting. Next, this paper assesses the performance of the machine learning models based on the frequently used evaluation metrics. The results of the evaluation prove that machine learning technology has an acceptable capability of categorize predictive for fuzzing. Finally, the comparison on capability of discovering vulnerabilities between traditional fuzzing tools and machine learning based fuzzing tools is analyzed. The results depict that the introduction of machine learning technology can improve the performance of fuzzing. However, there are still some limitations, such as unbalanced training samples and difficult to extract the characteristics related to vulnerabilities.
A Taxonomy for Neural Memory Networks In this paper, a taxonomy for memory networks is proposed based on their memory organization. The taxonomy includes all the popular memory networks: vanilla recurrent neural network (RNN), long short term memory (LSTM ), neural stack and neural Turing machine and their variants. The taxonomy puts all these networks under a single umbrella and shows their relative expressive power , i.e. vanilla RNN <=LSTM<=neural stack<=neural RAM. The differences and commonality between these networks are analyzed. These differences are also connected to the requirements of different tasks which can give the user instructions of how to choose or design an appropriate memory network for a specific task. As a conceptual simplified class of problems, four tasks of synthetic symbol sequences: counting, counting with interference, reversing and repeat counting are developed and tested to verify our arguments. And we use two natural language processing problems to discuss how this taxonomy helps choosing the appropriate neural memory networks for real world problem.
A Temporal Difference Reinforcement Learning Theory of Emotion: unifying emotion, cognition and adaptive behavior Emotions are intimately tied to motivation and the adaptation of behavior, and many animal species show evidence of emotions in their behavior. Therefore, emotions must be related to powerful mechanisms that aid survival, and, emotions must be evolutionary continuous phenomena. How and why did emotions evolve in nature, how do events get emotionally appraised, how do emotions relate to cognitive complexity, and, how do they impact behavior and learning? In this article I propose that all emotions are manifestations of reward processing, in particular Temporal Difference (TD) error assessment. Reinforcement Learning (RL) is a powerful computational model for the learning of goal oriented tasks by exploration and feedback. Evidence indicates that RL-like processes exist in many animal species. Key in the processing of feedback in RL is the notion of TD error, the assessment of how much better or worse a situation just became, compared to what was previously expected (or, the estimated gain or loss of utility – or well-being – resulting from new evidence). I propose a TDRL Theory of Emotion and discuss its ramifications for our understanding of emotions in humans, animals and machines, and present psychological, neurobiological and computational evidence in its support.
A Theoretical Connection Between Statistical Physics and Reinforcement Learning Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforcement learning using the tools and abstractions of statistical physics. The central object in the statistical physics abstraction is the idea of a partition function $\mathcal{Z}$, and here we construct a partition function from the ensemble of possible trajectories that an agent might take in a Markov decision process. Although value functions and $Q$-functions can be derived from this partition function and interpreted via average energies, the $\mathcal{Z}$-function provides an object with its own Bellman equation that can form the basis of alternative dynamic programming approaches. Moreover, when the MDP dynamics are deterministic, the Bellman equation for $\mathcal{Z}$ is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions. The policies learned via these $\mathcal{Z}$-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations. In addition to sampling actions proportionally to the exponential of the expected cumulative reward as Boltzmann policies would, these policies take entropy into account favoring states from which many outcomes are possible.
A Theory of Diagnostic Interpretation in Supervised Classification Interpretable deep learning is a fundamental building block towards safer AI, especially when the deployment possibilities of deep learning-based computer-aided medical diagnostic systems are so eminent. However, without a computational formulation of black-box interpretation, general interpretability research rely heavily on subjective bias. Clear decision structure of the medical diagnostics lets us approximate the decision process of a radiologist as a model – removed from subjective bias. We define the process of interpretation as a finite communication between a known model and a black-box model to optimally map the black box’s decision process in the known model. Consequently, we define interpretability as maximal information gain over the initial uncertainty about the black-box’s decision within finite communication. We relax this definition based on the observation that diagnostic interpretation is typically achieved by a process of minimal querying. We derive an algorithm to calculate diagnostic interpretability. The usual question of accuracy-interpretability tradeoff, i.e. whether a black-box model’s prediction accuracy is dependent on its ability to be interpreted by a known source model, does not arise in this theory. With multiple example simulation experiments of various complexity levels, we demonstrate the working of such a theoretical model in synthetic supervised classification scenarios.
A Theory of Output-Side Unsupervised Domain Adaptation When learning a mapping from an input space to an output space, the assumption that the sample distribution of the training data is the same as that of the test data is often violated. Unsupervised domain shift methods adapt the learned function in order to correct for this shift. Previous work has focused on utilizing unlabeled samples from the target distribution. We consider the complementary problem in which the unlabeled samples are given post mapping, i.e., we are given the outputs of the mapping of unknown samples from the shifted domain. Two other variants are also studied: the two sided version, in which unlabeled samples are give from both the input and the output spaces, and the Domain Transfer problem, which was recently formalized. In all cases, we derive generalization bounds that employ discrepancy terms.
A Tour of Reinforcement Learning: The View from Continuous Control This manuscript surveys reinforcement learning from the perspective of optimization and control with a focus on continuous control applications. It surveys the general formulation, terminology, and typical experimental implementations of reinforcement learning and reviews competing solution paradigms. In order to compare the relative merits of various techniques, this survey presents a case study of the Linear Quadratic Regulator (LQR) with unknown dynamics, perhaps the simplest and best studied problem in optimal control. The manuscript describes how merging techniques from learning theory and control can provide non-asymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate the role and importance of models and the cost of generality in reinforcement learning algorithms. This survey concludes with a discussion of some of the challenges in designing learning systems that safely and reliably interact with complex and uncertain environments and how tools from reinforcement learning and controls might be combined to approach these challenges.
A Tour through the Visualization Zoo A survey of powerful visualization techniques, from the obvious to the obscure
A Tutorial for Reinforcement Learning The tutorial is written for those who would like an introduction to reinforcement learning (RL). The aim is to provide an intuitive presentation of the ideas rather than concentrate on the deeper mathematics underlying the topic. RL is generally used to solve the so-called Markov decision problem (MDP). In other words, the problem that you are attempting to solve with RL should be an MDP or its variant. The theory of RL relies on dynamic programming (DP) and artificial intelligence (AI). We will begin with a quick description of MDPs. We will discuss what we mean by ‘complex’ and ‘large-scale’ MDPs. Then we will explain why RL is needed to solve complex and large-scale MDPs. The semi-Markov decision problem (SMDP) will also be covered.
A tutorial on active learning (Slide Deck)
A Tutorial on Bayesian Belief Networks This tutorial provides an overview of Bayesian belief networks. The subject is introduced through a discussion on probabilistic models that covers probability language, dependency models, graphical representations of models, and belief networks as a particular representation of probabilistic models. The general class of causal belief networks is presented, and the concept of d-separation and its relationship with independence in probabilistic models is introduced. This leads to a description of Bayesian belief networks as a specific class of causal belief networks, with detailed discussion on belief propagation and practical network design. The target recognition problem is presented as an example of the application of Bayesian belief networks to a real problem, and the tutorial concludes with a brief summary of Bayesian belief networks.
A Tutorial on Bayesian Optimization Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications.
A Tutorial on Bridge Sampling The marginal likelihood plays an important role in many areas of Bayesian statistics such as parameter estimation, model comparison, and model averaging. In most applications, however, the marginal likelihood is not analytically tractable and must be approximated using numerical methods. Here we provide a tutorial on bridge sampling (Bennett, 1976; Meng and Wong, 1996), a reliable and relatively straightforward sampling method that allows researchers to obtain the marginal likelihood for models of varying complexity. First, we introduce bridge sampling and three related sampling methods using the beta-binomial model as a running example. We then apply bridge sampling to estimate the marginal likelihood for the Expectancy Valence (EV) model—a popular model for reinforcement learning. Our results indicate that bridge sampling provides accurate estimates for both a single participant and a hierarchical version of the EV model. We conclude that bridge sampling is an attractive method for mathematical psychologists who typically aim to approximate the marginal likelihood for a limited set of possibly high-dimensional models.
A Tutorial on Canonical Correlation Methods Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has for instance been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation. This tutorial explains the theory of canonical correlation analysis including its regularised, kernel, and sparse variants. Additionally, the deep and Bayesian CCA extensions are briefly reviewed. Together with the numerical examples, this overview provides a coherent compendium on the applicability of the variants of canonical correlation analysis. By bringing together techniques for solving the optimisation problems, evaluating the statistical significance and generalisability of the canonical correlation model, and interpreting the relations, we hope that this article can serve as a hands-on tool for applying canonical correlation methods in data analysis.
A Tutorial on Deep Learning Part 1: Nonlinear Classi ers and The Backpropagation Algorithm In the past few years, Deep Learning has generated much excitement in Machine Learning and industry thanks to many breakthrough results in speech recognition, computer vision and text processing. So, what is Deep Learning For many researchers, Deep Learning is another name for a set of algorithms that use a neural network as an architecture. Even though neural networks have a long history, they became more successful in recent years due to the availability of inexpensive, parallel hardware (GPUs, computer clusters) and massive amounts of data. In this tutorial, we will start with the concept of a linear classi er and use that to develop the concept of neural networks. I will present two key algorithms in learning with neural networks: the stochastic gradient descent algorithm and the backpropagation algorithm. Towards the end of the tutorial, I will explain some simple tricks and recent advances that improve neural networks and their training. For that, let’s start with a simple example.
A Tutorial on Deep Learning Part 2: Autoencoders, Convolutional Neural Networks and Recurrent Neural Networks In the previous tutorial, I discussed the use of deep networks to classify nonlinear data. In addition to their ability to handle nonlinear data, deep networks also have a special strength in their exibility which sets them apart from other tranditional machine learning models: we can modify them in many ways to suit our tasks. In the following, I will discuss three most common modi cations: • Unsupervised learning and data compression via autoencoders which require modi cations in the loss function, • Translational invariance via convolutional neural networks which require modi cations in the network architecture, • Variable-sized sequence prediction via recurrent neural networks which require modi cations in the network architecture. The exibility of neural networks is a very powerful property. In many cases, these changes lead to great improvements in accuracy compared to basic models that we discussed in the previous tutorial. In the last part of the tutorial, I will also explain how to parallelize the training of neural networks. This is also an important topic because parallelizing neural networks has played an important role in the current deep learning movement.
A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms and Software This paper describes the discipline of distance metric learning, a branch of machine learning that aims to learn distances from the data. Distance metric learning can be useful to improve similarity learning algorithms, and also has applications in dimensionality reduction. We describe the distance metric learning problem and analyze its main mathematical foundations. We discuss some of the most popular distance metric learning techniques used in classification, showing their goals and the required information to understand and use them. Furthermore, we present a Python package that collects a set of 17 distance metric learning techniques explained in this paper, with some experiments to evaluate the performance of the different algorithms. Finally, we discuss several possibilities of future work in this topic.
A Tutorial on Fisher Information In many statistical applications that concern mathematical psychologists, the concept of Fisher information plays an important role. In this tutorial we clarify the concept of Fisher information as it manifests itself across three different statistical paradigms. First, in the frequentist paradigm, Fisher information is used to construct hypothesis tests and confidence intervals using maximum likelihood estimators; second, in the Bayesian paradigm, Fisher information is used to define a default prior; finally, in the minimum description length paradigm, Fisher information is used to measure model complexity.
A tutorial on geometric programming A geometric program (GP) is a type of mathematical optimization problem characterized by objective and constraint functions that have a special form. Recently developed solution methods can solve even large-scale GPs extremely efficiently and reliably; at the same time a number of practical problems, particularly in circuit design, have been found to be equivalent to (or well approximated by) GPs. Putting these two together, we get effective solutions for the practical problems. The basic approach in GP modeling is to attempt to express a practical problem, such as an engineering analysis or design problem, in GP format. In the best case, this formulation is exact; when this is not possible, we settle for an approximate formulation. This tutorial paper collects together in one place the basic background material needed to do GP modeling. We start with the basic definitions and facts, and some methods used to transform problems into GP format.We show how to recognize functions and problems compatible with GP, and how to approximate functions or data in a form compatible with GP (when this is possible). We give some simple and representative examples, and also describe some common extensions of GP, along with methods for solving (or approximately solving) them.
A Tutorial on Hawkes Processes for Events in Social Media This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time. We start by reviewing the definitions and the key concepts in point processes. We then introduce the Hawkes process, its event intensity function, as well as schemes for event simulation and parameter estimation. We also describe a practical example drawn from social media data – we show how to model retweet cascades using a Hawkes self-exciting process. We presents a design of the memory kernel, and results on estimating parameters and predicting popularity. The code and sample event data are available as an online appendix
A Tutorial on Kernel Density Estimation and Recent Advances This tutorial provides a gentle introduction to kernel density estimation (KDE) and recent advances regarding confidence bands and geometric/topological features. We begin with a discussion of basic properties of KDE: the convergence rate under various metrics, density derivative estimation, and bandwidth selection. Then, we introduce common approaches to the construction of confidence intervals/bands, and we discuss how to handle bias. Next, we talk about recent advances in the inference of geometric and topological features of a density function using KDE. Finally, we illustrate how one can use KDE to estimate a cumulative distribution function and a receiver operating characteristic curve. We provide R implementations related to this tutorial at the end.
A Tutorial on Modeling and Inference in Undirected Graphical Models for Hyperspectral Image Analysis Undirected graphical models have been successfully used to jointly model the spatial and the spectral dependencies in earth observing hyperspectral images. They produce less noisy, smooth, and spatially coherent land cover maps and give top accuracies on many datasets. Moreover, they can easily be combined with other state-of-the-art approaches, such as deep learning. This has made them an essential tool for remote sensing researchers and practitioners. However, graphical models have not been easily accessible to the larger remote sensing community as they are not discussed in standard remote sensing textbooks and not included in the popular remote sensing software and toolboxes. In this tutorial, we provide a theoretical introduction to Markov random fields and conditional random fields based spatial-spectral classification for land cover mapping along with a detailed step-by-step practical guide on applying these methods using freely available software. Furthermore, the discussed methods are benchmarked on four public hyperspectral datasets for a fair comparison among themselves and easy comparison with the vast number of methods in literature which use the same datasets. The source code necessary to reproduce all the results in the paper is published on-line to make it easier for the readers to apply these techniques to different remote sensing problems.
A Tutorial on Network Embeddings Network embedding methods aim at learning low-dimensional latent representation of nodes in a network. These representations can be used as features for a wide range of tasks on graphs such as classification, clustering, link prediction, and visualization. In this survey, we give an overview of network embeddings by summarizing and categorizing recent advancements in this research field. We first discuss the desirable properties of network embeddings and briefly introduce the history of network embedding algorithms. Then, we discuss network embedding methods under different scenarios, such as supervised versus unsupervised learning, learning embeddings for homogeneous networks versus for heterogeneous networks, etc. We further demonstrate the applications of network embeddings, and conclude the survey with future work in this area.
A tutorial on Particle Swarm Optimization Clustering This paper proposes a tutorial on the Data Clustering technique using the Particle Swarm Optimization approach. Following the work proposed by Merwe et al. here we present an in-deep analysis of the algorithm together with a Matlab implementation and a short tutorial that explains how to modify the proposed implementation and the effect of the parameters of the original algorithm. Moreover, we provide a comparison against the results obtained using the well known K-Means approach. All the source code presented in this paper is publicly available under the GPL-v2 license.
A Tutorial on Spectral Clustering In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.
A Tutorial on Statistically Sound Pattern Discovery Statistically sound pattern discovery harnesses the rigour of statistical hypothesis testing to overcome many of the issues that have hampered standard data mining approaches to pattern discovery. Most importantly, application of appropriate statistical tests allows precise control over the risk of false discoveries — patterns that are found in the sample data but do not hold in the wider population from which the sample was drawn. Statistical tests can also be applied to filter out patterns that are unlikely to be useful, removing uninformative variations of the key patterns in the data. This tutorial introduces the key statistical and data mining theory and techniques that underpin this fast developing field.
A Tutorial on Thompson Sampling Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance. The algorithm addresses a broad range of problems in a computationally efficient manner and is therefore enjoying wide use. This tutorial covers the algorithm and its application, illustrating concepts through a range of examples, including Bernoulli bandit problems, shortest path problems, dynamic pricing, recommendation, active learning with neural networks, and reinforcement learning in Markov decision processes. Most of these problems involve complex information structures, where information revealed by taking an action informs beliefs about other actions. We will also discuss when and why Thompson sampling is or is not effective and relations to alternative algorithms.
A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the ‘echo state network’ approach (Slide Deck)
A unified view of gradient-based attribution methods for Deep Neural Networks Understanding the flow of information in Deep Neural Networks is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, only few attempts to analyze them from a theoretical perspective have been made in the past. In this work we analyze various state-of-the-art attribution methods and prove unexplored connections between them. We also show how some methods can be reformulated and more conveniently implemented. Finally, we perform an empirical evaluation with six attribution methods on a variety of tasks and architectures and discuss their strengths and limitations.
A Universal Hypercomputer This paper describes a type of infinitary computer (a hypercomputer) capable of computing truth in initial levels of the set theoretic universe, V. The proper class of such hypercomputers is called a universal hypercomputer. There are two basic variants of hypercomputer: a serial hypercomputer and a parallel hypercomputer. The set of computable functions of the two variants is identical but the parallel hypercomputer is in general faster than a serial hypercomputer (as measured by an ordinal complexity measure). Insights into set theory using information theory and a universal hypercomputer are possible, and it is argued that the Generalised Continuum Hypothesis can be regarded as a information-theoretic principle, which follows from an information minimization principle.
A User’s Guide to Support Vector Machines The Support Vector Machine (SVM) is a widely used classifier. And yet, obtaining the best results with SVMs requires an understanding of their workings and the various ways a user can in uence their accuracy. We provide the user with a basic understanding of the theory behind SVMs and focus on their use in practice. We describe the effect of the SVM parameters on the resulting classifier, how to select good values for those parameters, data normalization, factors that affect training time, and software for training SVMs.
A vector linear programming approach for certain global optimization problems Global optimization problems with a quasi-concave objective function and linear constraints are studied. We point out that various other classes of global optimization problems can be expressed in this way. We present two algorithms, which can be seen as slight modifications of Benson-type algorithms for multiple objective linear programs. The modification of the MOLP algorithms results into a more efficient treatment of the studied optimization problems. This paper generalizes and improves results of Schulz and Mittal on quasi-concave problems, Shao and Ehrgott on multiplicative linear programs and L\’ohne and Wagner on minimizing the difference $f=g-h$ of two convex functions $g$, $h$ where either $g$ or $h$ is polyhedral. Numerical examples are given and the results are compared with the global optimization software BARON.
A Very Brief Introduction to Machine Learning With Applications to Communication Systems Given the unprecedented availability of data and computing resources, there is widespread renewed interest in applying data-driven machine learning methods to problems for which the development of conventional engineering solutions is challenged by modelling or algorithmic deficiencies. This tutorial-style paper starts by addressing the questions of why and when such techniques can be useful. It then provides a high-level introduction to the basics of supervised and unsupervised learning with a focus on probabilistic models. For both supervised and unsupervised learning, exemplifying applications to communication networks are discussed by distinguishing tasks carried out at the edge and at the cloud segments of the network at different layers of the protocol stack.
A weakly informative default prior distribution for logistic and other regression models We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. Cross-validation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation. We implement a procedure to fit generalized linear models in R with the Student-t prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several applications, including a series of logistic regressions predicting voting preferences, a small bioassay experiment, and an imputation model for a public health data set.
Abandon Statistical Significance In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only then is consideration–often scant–given to such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. There have been recent proposals to change the p-value threshold, but instead we recommend abandoning the null hypothesis significance testing paradigm entirely, leaving p-values as just one of many pieces of information with no privileged role in scientific publication and decision making. We argue that this radical approach is both practical and sensible.
Above the Clouds: A Brief Survey Cloud Computing is a versatile technology that can support a broad-spectrum of applications. The low cost of cloud computing and its dynamic scaling renders it an innovation driver for small companies, particularly in the developing world. Cloud deployed enterprise resource planning (ERP), supply chain management applications (SCM), customer relationship management (CRM) applications, medical applications, business applications and mobile applications have potential to reach millions of users. In this paper, we explore the different concepts involved in cloud computing and we also examine clouds from technical aspects. We highlight some of the opportunities in cloud computing underlining the importance of clouds showing why that technology must succeed and we have provided additional cloud computing problems that businesses may need to address. Finally, we discuss some of the issues that this area should deal with.
Abstraction Learning There has been a gap between artificial intelligence and human intelligence. In this paper, we identify three key elements forming human intelligence, and suggest that abstraction learning combines these elements and is thus a way to bridge the gap. Prior researches in artificial intelligence either specify abstraction by human experts, or take abstraction as a qualitative explanation for the model. This paper aims to learn abstraction directly. We tackle three main challenges: representation, objective function, and learning algorithm. Specifically, we propose a partition structure that contains pre-allocated abstraction neurons; we formulate abstraction learning as a constrained optimization problem, which integrates abstraction properties; we develop a network evolution algorithm to solve this problem. This complete framework is named ONE (Optimization via Network Evolution). In our experiments on MNIST, ONE shows elementary human-like intelligence, including low energy consumption, knowledge sharing, and lifelong learning.
Accelerating CNN inference on FPGAs: A Survey Convolutional Neural Networks (CNNs) are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image classification and segmentation. The large amount of processing required by CNNs calls for dedicated and tailored hardware support methods. Moreover, CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs. The amount and diversity of research on the subject of CNN FPGA acceleration within the last 3 years demonstrates the tremendous industrial and academic interest. This paper presents a state-of-the-art of CNN inference accelerators over FPGAs. The computational workloads, their parallelism and the involved memory accesses are analyzed. At the level of neurons, optimizations of the convolutional and fully connected layers are explained and the performances of the different methods compared. At the network level, approximate computing and datapath optimization methods are covered and state-of-the-art approaches compared. The methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators and will fuel the future advances on efficient hardware deep learning.
Activation Functions: Comparison of trends in Practice and Research for Deep Learning Deep neural networks have been successfully used in diverse emerging domains to solve real world complex problems with may more deep learning(DL) architectures, being developed to date. To achieve these state-of-the-art performances, the DL architectures use activation functions (AFs), to perform diverse computations between the hidden layers and the output layers of any given DL architecture. This paper presents a survey on the existing AFs used in deep learning applications and highlights the recent trends in the use of the activation functions for deep learning applications. The novelty of this paper is that it compiles majority of the AFs used in DL and outlines the current trends in the applications and usage of these functions in practical deep learning deployments against the state-of-the-art research results. This compilation will aid in making effective decisions in the choice of the most suitable and appropriate activation function for any given application, ready for deployment. This paper is timely because most research papers on AF highlights similar works and results while this paper will be the first, to compile the trends in AF applications in practice against the research results from literature, found in deep learning research to date.
Active Learning for Visual Question Answering: An Empirical Study We present an empirical study of active learning for Visual Question Answering, where a deep VQA model selects informative question-image pairs from a pool and queries an oracle for answers to maximally improve its performance under a limited query budget. Drawing analogies from human learning, we explore cramming (entropy), curiosity-driven (expected model change), and goal-driven (expected error reduction) active learning approaches, and propose a fast and effective goal-driven active learning scoring function to pick question-image pairs for deep VQA models under the Bayesian Neural Network framework. We find that deep VQA models need large amounts of training data before they can start asking informative questions. But once they do, all three approaches outperform the random selection baseline and achieve significant query savings. For the scenario where the model is allowed to ask generic questions about images but is evaluated only on specific questions (e.g., questions whose answer is either yes or no), our proposed goal-driven scoring function performs the best.
Ad Click Prediction: a View from the Trenches Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a deployed CTR prediction system. These include improvements in the context of traditional supervised learning based on an FTRL-Proximal online learning algorithm (which has excellent sparsity and convergence properties) and the use of per-coordinate learning rates. We also explore some of the challenges that arise in a real-world system that may appear at first to be outside the domain of traditional machine learning research. These include useful tricks for memory savings, methods for as- sessing and visualizing performance, practical methods for providing con dence estimates for predicted probabilities, calibration methods, and methods for automated management of features. Finally, we also detail several directions that did not turn out to be bene cial for us, despite promis- ing results elsewhere in the literature. The goal of this paper is to highlight the close relationship between theoretical ad- vances and practical engineering in this industrial setting, and to show the depth of challenges that appear when applying traditional machine learning methods in a complex dynamic system.
ADADELTA: An Adaptive Learning Rate Method We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.
Adaptive Graph Signal Processing: Algorithms and Optimal Sampling Strategies The goal of this paper is to propose novel strategies for adaptive learning of signals defined over graphs, which are observed over a (randomly time-varying) subset of vertices. We recast two classical adaptive algorithms in the graph signal processing framework, namely, the least mean squares (LMS) and the recursive least squares (RLS) adaptive estimation strategies. For both methods, a detailed mean-square analysis illustrates the effect of random sampling on the adaptive reconstruction capability and the steady-state performance. Then, several probabilistic sampling strategies are proposed to design the sampling probability at each node in the graph, with the aim of optimizing the tradeoff between steady-state performance, graph sampling rate, and convergence rate of the adaptive algorithms. Finally, a distributed RLS strategy is derived and is shown to be convergent to its centralized counterpart. Numerical simulations carried out over both synthetic and real data illustrate the good performance of the proposed sampling and reconstruction strategies for (possibly distributed) adaptive learning of signals defined over graphs.
Addressing the ‘Big Data’ Issue: What You Need to Know These days, you´re probably hearing a lot of hype about ‘big data.’ Vendors are currently hawking a wealth of new tools, all of which promise to help your organization unlock previously inaccessible insights from your proprietary information. According to the authors, there is no doubt that big data, i.e., organization-wide data that´s being managed in a centralized repository, can yield valuable discoveries that will result in improved products and performance – if properly analyzed. Nonetheless, you must look before you leap. First, is your company culture ready for such a move How will data managers be affected when scores of discrete data silos are gathered and reviewed as a whole How will you involve leadership and others in ongoing decision-making processes How will you choose your architecture and tools from the dizzying array of options that are currently available How will you stay up-to-date in this rapidly evolving field Finally, how will you train your company´s users so that they can actually leverage the new capabilities This ExecBlueprint explores these and other key concerns.
Advanced Analytics with the SAP HANA Database MapReduce as a programming paradigm provides a simple-to-use yet very powerful abstraction encapsulated in two second-order functions: Map and Reduce. As such, they allow defining single sequentially processed tasks while at the same time hiding many of the framework details about how those tasks are parallelized and scaled out. In this paper we discuss four processing patterns in the context of the distributed SAP HANA database that go beyond the classic MapReduce paradigm. We illustrate them using some typical Machine Learning algorithms and present experimental results that demonstrate how the data flows scale out with the number of parallel tasks.
Advances in Artificial Intelligence Require Progress Across all of Computer Science Advances in Artificial Intelligence require progress across all of computer science.
Advances in Machine Learning for the Behavioral Sciences The areas of machine learning and knowledge discovery in databases have considerably matured in recent years. In this article, we briefly review recent developments as well as classical algorithms that stood the test of time. Our goal is to provide a general introduction into different tasks such as learning from tabular data, behavioral data, or textual data, with a particular focus on actual and potential applications in behavioral sciences. The supplemental appendix to the article also provides practical guidance for using the methods by pointing the reader to proven software implementations. The focus is on R, but we also cover some libraries in other programming languages as well as systems with easy-to-use graphical interfaces.
Advances in Natural Language Question Answering: A Review Question Answering has recently received high attention from artificial intelligence communities due to the advancements in learning technologies. Early question answering models used rule-based approaches and moved to the statistical approach to address the vastly available information. However, statistical approaches are shown to underperform in handling the dynamic nature and the variation of language. Therefore, learning models have shown the capability of handling the dynamic nature and variations in language. Many deep learning methods have been introduced to question answering. Most of the deep learning approaches have shown to achieve higher results compared to machine learning and statistical methods. The dynamic nature of language has profited from the nonlinear learning in deep learning. This has created prominent success and a spike in work on question answering. This paper discusses the successes and challenges in question answering question answering systems and techniques that are used in these challenges.
Advances in Variational Inference Many modern unsupervised or semi-supervised machine learning algorithms rely on Bayesian probabilistic models. These models are usually intractable and thus require approximate inference. Variational inference (VI) lets us approximate a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem. This approach has been successfully used in various models and large-scale applications. In this review, we give an overview of recent trends in variational inference. We first introduce standard mean field variational inference, then review recent advances focusing on the following aspects: (a) scalable VI, which includes stochastic approximations, (b) generic VI, which extends the applicability of VI to a large class of otherwise intractable models, such as non-conjugate models, (c) accurate VI, which includes variational models beyond the mean field approximation or with atypical divergences, and (d) amortized VI, which implements the inference over local latent variables with inference networks. Finally, we provide a summary of promising future research directions.
Adversarial Attacks and Defences: A Survey Deep learning has emerged as a strong and efficient framework that can be applied to a broad spectrum of complex learning problems which were difficult to solve using the traditional machine learning techniques in the past. In the last few years, deep learning has advanced radically in such a way that it can surpass human-level performance on a number of tasks. As a consequence, deep learning is being extensively used in most of the recent day-to-day applications. However, security of deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify the output. In recent times, different types of adversaries based on their threat model leverage these vulnerabilities to compromise a deep learning system where adversaries have high incentives. Hence, it is extremely important to provide robustness to deep learning algorithms against these adversaries. However, there are only a few strong countermeasures which can be used in all types of attack scenarios to design a robust deep learning system. In this paper, we attempt to provide a detailed discussion on different types of adversarial attacks with various threat models and also elaborate the efficiency and challenges of recent countermeasures against them.
Adversarial Examples – A Complete Characterisation of the Phenomenon We provide a complete characterisation of the phenomenon of adversarial examples – inputs intentionally crafted to fool machine learning models. We aim to cover all the important concerns in this field of study: (1) the conjectures on the existence of adversarial examples, (2) the security, safety and robustness implications, (3) the methods used to generate and (4) protect against adversarial examples and (5) the ability of adversarial examples to transfer between different machine learning models. We provide ample background information in an effort to make this document self-contained. Therefore, this document can be used as survey, tutorial or as a catalog of attacks and defences using adversarial examples.
Adversarial Examples in Modern Machine Learning: A Review Recent research has found that many families of machine learning models are vulnerable to adversarial examples: inputs that are specifically designed to cause the target model to produce erroneous outputs. In this survey, we focus on machine learning models in the visual domain, where methods for generating and detecting such examples have been most extensively studied. We explore a variety of adversarial attack methods that apply to image-space content, real world adversarial attacks, adversarial defenses, and the transferability property of adversarial examples. We also discuss strengths and weaknesses of various methods of adversarial attack and defense. Our aim is to provide an extensive coverage of the field, furnishing the reader with an intuitive understanding of the mechanics of adversarial attack and defense mechanisms and enlarging the community of researchers studying this fundamental set of problems.
Adversarial Examples: Attacks and Defenses for Deep Learning With rapid progress and great successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called \textit{adversarial examples}. Adversarial examples are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical scenarios. Therefore, the attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples against deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications and countermeasures for adversarial examples are investigated. We further elaborate on adversarial examples and explore the challenges and the potential solutions.
Adversarial Learning in Statistical Classification: A Comprehensive Review of Defenses Against Attacks With the wide deployment of machine learning (ML) based systems for a variety of applications including medical, military, automotive, genomic, as well as multimedia and social networking, there is great potential for damage from adversarial learning (AL) attacks. In this paper, we provide a contemporary survey of AL, focused particularly on defenses against attacks on statistical classifiers. After introducing relevant terminology and the goals and range of possible knowledge of both attackers and defenders, we survey recent work on test-time evasion (TTE), data poisoning (DP), and reverse engineering (RE) attacks and particularly defenses against same. In so doing, we distinguish robust classification from anomaly detection (AD), unsupervised from supervised, and statistical hypothesis-based defenses from ones that do not have an explicit null (no attack) hypothesis; we identify the hyperparameters a particular method requires, its computational complexity, as well as the performance measures on which it was evaluated and the obtained quality. We then dig deeper, providing novel insights that challenge conventional AL wisdom and that target unresolved issues, including: 1) robust classification versus AD as a defense strategy; 2) the belief that attack success increases with attack strength, which ignores susceptibility to AD; 3) small perturbations for test-time evasion attacks: a fallacy or a requirement?; 4) validity of the universal assumption that a TTE attacker knows the ground-truth class for the example to be attacked; 5) black, grey, or white box attacks as the standard for defense evaluation; 6) susceptibility of query-based RE to an AD defense. We then present benchmark comparisons of several defenses against TTE, RE, and backdoor DP attacks on images. The paper concludes with a discussion of future work.
Advice from the Oracle: Really Intelligent Information Retrieval What is ‘intelligent’ information retrieval Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as mentor to a number of less senior people, and as trusted advisor to even the most accomplished scientists, and knows essentially everyone in the field. In an appendix I will present a brief summary of the Statistical Factor Space method for text indexing and retrieval, and indicate how it will be used in the Astrophysics Data System Abstract Service. 2018 Keywords: Personal Digital Assistant; Supervised Topic Models
Agent-based computing from multi-agent systems to agent-based Models: a visual survey Agent-Based Computing is a diverse research domain concerned with the building of intelligent software based on the concept of ‘agents’. In this paper, we use Scientometric analysis to analyze all sub-domains of agent-based computing. Our data consists of 1,064 journal articles indexed in the ISI web of knowledge published during a twenty year period: 1990-2010. These were retrieved using a topic search with various keywords commonly used in sub-domains of agent-based computing. In our proposed approach, we have employed a combination of two applications for analysis, namely Network Workbench and CiteSpace – wherein Network Workbench allowed for the analysis of complex network aspects of the domain, detailed visualization-based analysis of the bibliographic data was performed using CiteSpace. Our results include the identification of the largest cluster based on keywords, the timeline of publication of index terms, the core journals and key subject categories. We also identify the core authors, top countries of origin of the manuscripts along with core research institutes. Finally, our results have interestingly revealed the strong presence of agent-based computing in a number of non-computing related scientific domains including Life Sciences, Ecological Sciences and Social Sciences.
Agent-Based Modeling and Simulation Agent-based modeling and simulation (ABMS) is a new approach to modeling systems comprised of autonomous, interacting agents. Computational advances have made possible a growing number of agent-based models across a variety of application domains. Applications range from modeling agent behavior in the stock market, supply chains, and consumer markets, to predicting the spread of epidemics, mitigating the threat of bio-warfare, and understanding the factors that may be responsible for the fall of ancient civilizations. Such progress suggests the potential of ABMS to have far-reaching effects on the way that businesses use computers to support decision-making and researchers use agent-based models as electronic laboratories. Some contend that ABMS ‘is a third way of doing science’ and could augment traditional deductive and inductive reasoning as discovery methods. This brief tutorial introduces agent-based modeling by describing the foundations of ABMS, discuss-ing some illustrative applications, and addressing toolkits and methods for developing agent-based models.
Agent-based models of collective intelligence Collective or group intelligence is manifested in the fact that a team of cooperating agents can solve problems more efficiently than when those agents work in isolation. Although cooperation is, in general, a successful problem solving strategy, it is not clear whether it merely speeds up the time to find the solution, or whether it alters qualitatively the statistical signature of the search for the solution. Here we review and offer insights on two agent-based models of distributed cooperative problem-solving systems, whose task is to solve a cryptarithmetic puzzle. The first model is the imitative learning search in which the agents exchange information on the quality of their partial solutions to the puzzle and imitate the most successful agent in the group. This scenario predicts a very poor performance in the case imitation is too frequent or the group is too large, a phenomenon akin to Groupthink of social psychology. The second model is the blackboard organization in which agents read and post hints on a public blackboard. This brainstorming scenario performs the best when there is a stringent limit to the amount of information that is exhibited on the board. Both cooperative scenarios produce a substantial speed up of the time to solve the puzzle as compared with the situation where the agents work in isolation. The statistical signature of the search, however, is the same as that of the independent search.
Agile business intelligence: reshaping the landscape The last few years have brought a wave of changes for business intelligence (BI) solutions. A set of redefining technological trends is reshaping the landscape from a slow and cumbersome process practiced mainly by large enterprises to a much more flexible, agile process that mid-market companies as well as individuals can utilize. This report explores the key features that influence the evolution of agile BI and takes a look at the BI landscape under this light. At first glance, polarization seems to exist between traditional BI vendors, who are focused on extract, transform, and load (ETL) and reporting, and the newcomers, who are focused on data exploration and visualization, but a closer look reveals that, in fact, they converge as adoption of useful features is taking place across the spectrum.
AI Enabling Technologies: A Survey Artificial Intelligence (AI) has the opportunity to revolutionize the way the United States Department of Defense (DoD) and Intelligence Community (IC) address the challenges of evolving threats, data deluge, and rapid courses of action. Developing an end-to-end artificial intelligence system involves parallel development of different pieces that must work together in order to provide capabilities that can be used by decision makers, warfighters and analysts. These pieces include data collection, data conditioning, algorithms, computing, robust artificial intelligence, and human-machine teaming. While much of the popular press today surrounds advances in algorithms and computing, most modern AI systems leverage advances across numerous different fields. Further, while certain components may not be as visible to end-users as others, our experience has shown that each of these interrelated components play a major role in the success or failure of an AI system. This article is meant to highlight many of these technologies that are involved in an end-to-end AI system. The goal of this article is to provide readers with an overview of terminology, technical details and recent highlights from academia, industry and government. Where possible, we indicate relevant resources that can be used for further reading and understanding.
AI in the media and creative industries Thanks to the Big Data revolution and increasing computing capacities, Artificial Intelligence (AI) has made an impressive revival over the past few years and is now omnipresent in both research and industry. The creative sectors have always been early adopters of AI technologies and this continues to be the case. As a matter of fact, recent technological developments keep pushing the boundaries of intelligent systems in creative applications: the critically acclaimed movie ‘Sunspring’, released in 2016, was entirely written by AI technology, and the first-ever Music Album, called ‘Hello World’, produced using AI has been released this year. Simultaneously, the exploratory nature of the creative process is raising important technical challenges for AI such as the ability for AI-powered techniques to be accurate under limited data resources, as opposed to the conventional ‘Big Data’ approach, or the ability to process, analyse and match data from multiple modalities (text, sound, images, etc.) at the same time. The purpose of this white paper is to understand future technological advances in AI and their growing impact on creative industries. This paper addresses the following questions: Where does AI operate in creative Industries? What is its operative role? How will AI transform creative industries in the next ten years? This white paper aims to provide a realistic perspective of the scope of AI actions in creative industries, proposes a vision of how this technology could contribute to research and development works in such context, and identifies research and development challenges.
AI Reasoning Systems: PAC and Applied Methods Learning and logic are distinct and remarkable approaches to prediction. Machine learning has experienced a surge in popularity because it is robust to noise and achieves high performance; however, ML experiences many issues with knowledge transfer and extrapolation. In contrast, logic is easily intepreted, and logical rules are easy to chain and transfer between systems; however, inductive logic is brittle to noise. We then explore the premise of combining learning with inductive logic into AI Reasoning Systems. Specifically, we summarize findings from PAC learning (conceptual graphs, robust logics, knowledge infusion) and deep learning (DSRL, $\partial$ILP, DeepLogic) by reproducing proofs of tractability, presenting algorithms in pseudocode, highlighting results, and synthesizing between fields. We conclude with suggestions for integrated models by combining the modules listed above and with a list of unsolved (likely intractable) problems.
AI-Powered Social Bots This paper gives an overview of impersonation bots that generate output in one, or possibly, multiple modalities. We also discuss rapidly advancing areas of machine learning and artificial intelligence that could lead to frighteningly powerful new multi-modal social bots. Our main conclusion is that most commonly known bots are one dimensional (i.e., chatterbot), and far from deceiving serious interrogators. However, using recent advances in machine learning, it is possible to unleash incredibly powerful, human-like armies of social bots, in potentially well coordinated campaigns of deception and influence.
AI-Powered Text Generation for Harmonious Human-Machine Interaction: Current State and Future Directions In the last two decades, the landscape of text generation has undergone tremendous changes and is being reshaped by the success of deep learning. New technologies for text generation ranging from template-based methods to neural network-based methods emerged. Meanwhile, the research objectives have also changed from generating smooth and coherent sentences to infusing personalized traits to enrich the diversification of newly generated content. With the rapid development of text generation solutions, one comprehensive survey is urgent to summarize the achievements and track the state of the arts. In this survey paper, we present the general systematical framework, illustrate the widely utilized models and summarize the classic applications of text generation.
AIR5: Five Pillars of Artificial Intelligence Research In this article, we provide and overview of what we consider to be some of the most pressing research questions facing the field of artificial intelligence (AI); as well as its sub-field of computational intelligence (CI). We demarcate these questions using five unique Rs – namely, (i) rationalizability, (ii) resilience, (iii) reproducibility, (iv) realism, and (v) responsibility. Just as air serves as the basic element of biological life, the term AIR5 – cumulatively referring to the five aforementioned Rs – is introduced herein to mark some of the basic elements of artificial life (supporting the sustained growth of AI and CI). A brief summary of each of the Rs is presented, highlighting their relevance as pillars of future research in this arena.
Algorithm quasi-optimal (AQ) learning The algorithm quasi-optimal (AQ) is a powerful machine learning methodology aimed at learning symbolic decision rules from a set of examples and counterexamples. It was first proposed in the late 1960s to solve the Boolean function satisfiability problem and further refined over the following decade to solve the general covering problem. In its newest implementations, it is a powerful but yet little explored methodology for symbolic machine learning classification. It has been applied to solve several problems from different domains, including the generation of individuals within an evolutionary computation framework. The current article introduces the main concepts of the AQ methodology and describes AQ for source detection(AQ4SD), a tailored implementation of the AQ methodology to solve the problem of finding the sources of atmospheric releases using distributed sensor measurements. The AQ4SD program is tested to find the sources of all the releases of the prairie grass field experiment.
Algorithms and Methods in Recommender Systems Today, there is a big variety of different approaches and algorithms of data filtering and recommendations giving. In this paper we describe traditional approaches and explain what kind of modern approaches have been developed lately. All the paper long we will try to explain approaches and their problems based on movie recommendations. In the end we will show the main challenges recommender systems come across.
Algorithms and Statistical Models for Scientific Discovery in the Petabyte Era The field of astronomy has arrived at a turning point in terms of size and complexity of both datasets and scientific collaboration. Commensurately, algorithms and statistical models have begun to adapt — e.g., via the onset of artificial intelligence — which itself presents new challenges and opportunities for growth. This white paper aims to offer guidance and ideas for how we can evolve our technical and collaborative frameworks to promote efficient algorithmic development and take advantage of opportunities for scientific discovery in the petabyte era. We discuss challenges for discovery in large and complex data sets; challenges and requirements for the next stage of development of statistical methodologies and algorithmic tool sets; how we might change our paradigms of collaboration and education; and the ethical implications of scientists’ contributions to widely applicable algorithms and computational modeling. We start with six distinct recommendations that are supported by the commentary following them. This white paper is related to a larger corpus of effort that has taken place within and around the Petabytes to Science Workshops (https://petabytestoscience.github.io ).
Algorithms for Active Learning This dissertation explores both the algorithmic and statistical aspects of active learning for binary classification. What are effective procedures for determining which data to label How can these procedures take advantage of the interactive learning process, and in what circumstances do they yield improved learning performance compared to standard passive learners To answer these questions, we develop and rigorously analyze a broad class of general active learning methods that address the essential algorithmic and statistical difficulties of the problem.
Algorithms for Reinforcement Learning Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner’s predictions. Further, the predictions may have long term e ects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop e cient learning algorithms, as well as to understand the algorithms’ merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in arti cial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming. We give a fairly comprehensive catalog of learning problems, describe the core ideas together with a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.
Algorithms for the Greater Good! On Mental Modeling and Acceptable Symbiosis in Human-AI Collaboration Effective collaboration between humans and AI-based systems requires effective modeling of the human in the loop, both in terms of the mental state as well as the physical capabilities of the latter. However, these models can also open up pathways for manipulating and exploiting the human in the hopes of achieving some greater good, especially when the intent or values of the AI and the human are not aligned or when they have an asymmetrical relationship with respect to knowledge or computation power. In fact, such behavior does not necessarily require any malicious intent but can rather be borne out of cooperative scenarios. It is also beyond simple misinterpretation of intents, as in the case of value alignment problems, and thus can be effectively engineered if desired. Such techniques already exist and pose several unresolved ethical and moral questions with regards to the design of autonomy. In this paper, we illustrate some of these issues in a teaming scenario and investigate how they are perceived by participants in a thought experiment.
Algorithms in Data Mining using Matrix and Tensor Methods In many elds of science, engineering, and economics large amounts of data are stored and there is a need to analyze these data in order to extract information for various purposes. Data mining is a general concept involving di erent tools for performing this kind of analysis. The development of mathematical models and e cient algorithms is of key importance. In this thesis we discuss algorithms for the reduced rank regression problem and algorithms for the computation of the best multilinear rank approximation of tensors.
All Neural Networks are Created Equal One of the unresolved questions in the context of deep learning is the triumph of GD based optimization, which is guaranteed to converge to one of many local minima. To shed light on the nature of the solutions that are thus being discovered, we investigate the ensemble of solutions reached by the same network architecture, with different random initialization of weights and random mini-batches. Surprisingly, we observe that these solutions are in fact very similar – more often than not, each train and test example is either classified correctly by all the networks, or by none at all. Moreover, all the networks seem to share the same learning dynamics, whereby initially the same train and test examples are incorporated into the learnt model, followed by other examples which are learnt in roughly the same order. When different neural network architectures are compared, the same learning dynamics is observed even when one architecture is significantly stronger than the other and achieves higher accuracy. Finally, when investigating other methods that involve the gradual refinement of a solution, such as boosting, once again we see the same learning pattern. In all cases, it appears as if all the classifiers start by learning to classify correctly the same train and test examples, while the more powerful classifiers continue to learn to classify correctly additional examples. These results are incredibly robust, observed for a large variety of architectures, hyperparameters and different datasets of images. Thus we observe that different classification solutions may be discovered by different means, but typically they evolve in roughly the same manner and demonstrate a similar success and failure behavior. For a given dataset, such behavior seems to be strongly correlated with effective generalization, while the induced ranking of examples may reflect inherent structure in the data.
AlphaStar: An Evolutionary Computation Perspective In January 2019, DeepMind revealed AlphaStar to the world-the first artificial intelligence (AI) system to beat a professional player at the game of StarCraft II-representing a milestone in the progress of AI. AlphaStar draws on many areas of AI research, including deep learning, reinforcement learning, game theory, and evolutionary computation (EC). In this paper we analyze AlphaStar primarily through the lens of EC, presenting a new look at the system and relating it to many concepts in the field. We highlight some of its most interesting aspects-the use of Lamarckian evolution, competitive co-evolution, and quality diversity. In doing so, we hope to provide a bridge between the wider EC community and one of the most significant AI systems developed in recent times.
Amazon.com Recommendations: Item-to-Item Collaborative Filtering Recommendation algorithms are best known for their use on e-commerce Web sites, where they use input about a customer´s interests to generate a list of recommended items. Many applications use only the items that customers purchase and explicitly rate to represent their interests, but they can also use other attributes, including items viewed, demographic data, subject interests, and favorite artists. At Amazon.com, we use recommendation algorithms to personalize the online store for each customer. The store radically changes based on customer interests, showing programming titles to a software engineer and baby toys to a new mother. The click-through and conversion rates – two important measures of Web-based and email advertising effectiveness – vastly exceed those of untargeted content such as banner advertisements and top-seller lists….
An Analysis of Hierarchical Text Classification Using Word Embeddings Efficient distributed numerical word representation models (word embeddings) combined with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This study investigates the application of those models and algorithms on this specific problem by means of experimentation and analysis. We trained classification models with prominent machine learning algorithm implementations—fastText, XGBoost, SVM, and Keras’ CNN—and noticeable word embeddings generation methods—GloVe, word2vec, and fastText—with publicly available data and evaluated them with measures specifically appropriate for the hierarchical context. FastText achieved an ${}_{LCA}F_1$ of 0.893 on a single-labeled version of the RCV1 dataset. An analysis indicates that using word embeddings and its flavors is a very promising approach for HTC.
An Analysis of Machine Learning Intelligence Deep neural networks (DNNs) have set state of the art results in many machine learning and NLP tasks. However, we do not have a strong understanding of what DNN models learn. In this paper, we examine learning in DNNs through analysis of their outputs. We compare DNN performance directly to a human population, and use characteristics of individual data points such as difficulty to see how well models perform on easy and hard examples. We investigate how training size and the incorporation of noise affect a DNN’s ability to generalize and learn. Our experiments show that unlike traditional machine learning models (e.g., Naive Bayes, Decision Trees), DNNs exhibit human-like learning properties. As they are trained with more data, they are more able to distinguish between easy and difficult items, and performance on easy items improves at a higher rate than difficult items. We find that different DNN models exhibit different strengths in learning and are robust to noise in training data.
An Analysis of the t-SNE Algorithm for Data Visualization A first line of attack in exploratory data analysis is data visualization, i.e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable. Standard Johnson-Lindenstrauss dimensionality reduction does not produce data visualizations. The t-SNE heuristic of van der Maaten and Hinton, which is based on non-convex optimization, has become the de facto standard for visualization in a wide range of applications. This work gives a formal framework for the problem of data visualization – finding a 2-dimensional embedding of clusterable data that correctly separates individual clusters to make them visually identifiable. We then give a rigorous analysis of the performance of t-SNE under a natural, deterministic condition on the ‘ground-truth’ clusters (similar to conditions assumed in earlier analyses of clustering) in the underlying data. These are the first provable guarantees on t-SNE for constructing good data visualizations. We show that our deterministic condition is satisfied by considerably general probabilistic generative models for clusterable data such as mixtures of well-separated log-concave distributions. Finally, we give theoretical evidence that t-SNE provably succeeds in partially recovering cluster structure even when the above deterministic condition is not met.
An Analysis of Visual Question Answering Algorithms In visual question answering (VQA), an algorithm must answer text-based questions about images. While multiple datasets for VQA have been created since late 2014, they all have flaws in both their content and the way algorithms are evaluated on them. As a result, evaluation scores are inflated and predominantly determined by answering easier questions, making it difficult to compare different methods. In this paper, we analyze existing VQA algorithms using a new dataset. It contains over 1.6 million questions organized into 12 different categories. We also introduce questions that are meaningless for a given image to force a VQA system to reason about image content. We propose new evaluation schemes that compensate for over-represented question-types and make it easier to study the strengths and weaknesses of algorithms. We analyze the performance of both baseline and state-of-the-art VQA models, including multi-modal compact bilinear pooling (MCB), neural module networks, and recurrent answering units. Our experiments establish how attention helps certain categories more than others, determine which models work better than others, and explain how simple models (e.g. MLP) can surpass more complex models (MCB) by simply learning to answer large, easy question categories.
An Attentive Survey of Attention Models Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we propose a taxonomy which groups existing techniques into coherent categories. We review the different neural architectures in which attention has been incorporated, and also show how attention improves interpretability of neural models. Finally, we discuss some applications in which modeling attention has a significant impact. We hope this survey will provide a succinct introduction to attention models and guide practitioners while developing approaches for their applications.
An Economist´s Guide to Visualizing Data Once upon a time, a picture was worth a thousand words. But with online news, blogs, and social media, a good picture can now be worth so much more. Economists who want to disseminate their research, both inside and outside the seminar room, should invest some time in thinking about how to construct compelling and effective graphics. An effective graph should tap into the brain´s ‘pre-attentive visual processing’ (Few 2004; Healey and Enns 2012). Because our eyes detect a limited set of visual characteristics, such as shape or contrast, we easily combine those characteristics and unconsciously perceive them as an image. In contrast to ‘attentive processing’ – the conscious part of perception that allows us to perceive things serially – pre-attentive processing is done in parallel and is much faster. Pre-attentive processing allows the reader to perceive multiple basic visual elements simultaneously….
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks.
An Essay on Optimization Mystery of Deep Learning Despite the huge empirical success of deep learning, theoretical understanding of neural networks learning process is still lacking. This is the reason, why some of its features seem ‘mysterious’. We emphasize two mysteries of deep learning: generalization mystery, and optimization mystery. In this essay we review and draw connections between several selected works concerning the latter.
An Example Inference Task: Clustering Human brains are good at nding regularities in data. One way of expressing regularity is to put a set of objects into groups that are similar to each other. For example, biologists have found that most objects in the natural world fall into one of two categories: things that are brown and run away, and things that are green and don’t run away. The rst group they call animals, and the second, plants. We’ll call this operation of grouping things together clustering. If the biologist further sub-divides the cluster of plants into sub- clusters, we would call this `hierarchical clustering’; but we won’t be talking about hierarchical clustering yet. In this chapter we’ll just discuss ways to take a set of N objects and group them into K clusters.
An Experimental Study of Algorithms for Online Bipartite Matching We perform an experimental study of algorithms for online bipartite matching under the known i.i.d. input model with integral types. In the last decade, there has been substantial effort in designing complex algorithms with the goal of improving worst-case approximation ratios. Our goal is to determine how these algorithms perform on more practical instances rather than worst-case instances. In particular, we are interested in whether the ranking of the algorithms by their worst-case performance is consistent with the ranking of the algorithms by their average-case/practical performance. We are also interested in whether preprocessing times and implementation difficulties that are introduced by these algorithms are justified in practice. To that end we evaluate these algorithms on different random inputs as well as real-life instances obtained from publicly available repositories. We compare these algorithms against several simple greedy-style algorithms. Most of the complex algorithms in the literature are presented as being non-greedy (i.e., an algorithm can intentionally skip matching a node that has available neighbors) to simplify the analysis. Every such algorithm can be turned into a greedy one without hurting its worst-case performance. On our benchmarks, non-greedy versions of these algorithms perform much worse than their greedy versions. Greedy versions perform about as well as the simplest greedy algorithm by itself. This, together with our other findings, suggests that simplest greedy algorithms are competitive with the state-of-the-art worst-case algorithms for online bipartite matching on many average-case and practical input families. Greediness is by far the most important property of online algorithms for bipartite matching.
An exploration of algorithmic discrimination in data and classification Algorithmic discrimination is an important aspect when data is used for predictive purposes. This paper analyzes the relationships between discrimination and classification, data set partitioning, and decision models, as well as correlation. The paper uses real world data sets to demonstrate the existence of discrimination and the independence between the discrimination of data sets and the discrimination of classification models.
An Impossibility Theorem for Clustering Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and pro- foundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median.
An Information-Theoretic Analysis of Deep Latent-Variable Models We present an information-theoretic framework for understanding trade-offs in unsupervised learning of deep latent-variables models using variational inference. This framework emphasizes the need to consider latent-variable models along two dimensions: the ability to reconstruct inputs (distortion) and the communication cost (rate). We derive the optimal frontier of generative models in the two-dimensional rate-distortion plane, and show how the standard evidence lower bound objective is insufficient to select between points along this frontier. However, by performing targeted optimization to learn generative models with different rates, we are able to learn many models that can achieve similar generative performance but make vastly different trade-offs in terms of the usage of the latent variable. Through experiments on MNIST and Omniglot with a variety of architectures, we show how our framework sheds light on many recent proposed extensions to the variational autoencoder family.
An Information-Theoretic View for Deep Learning Deep learning has transformed the computer vision, natural language processing and speech recognition. However, the following two critical questions are remaining obscure: (1) why deep neural networks generalize better than shallow networks (2) Does it always hold that a deeper network leads to better performance Specifically, letting $L$ be the number of convolutional and pooling layers in a deep neural network, and $n$ be the size of the training sample, we derive the upper bound on the expected generalization error for this network, i.e., \begin{eqnarray*} \mathbb{E}[R(W)-R_S(W)] \leq \exp{\left(-\frac{L}{2}\log{\frac{1}{\eta}}\right)}\sqrt{\frac{2\sigma^2}{n}I(S,W) } \end{eqnarray*} where $\sigma >0$ is a constant depending on the loss function, $0<\eta<1$ is a constant depending on the information loss for each convolutional or pooling layer, and $I(S, W)$ is the mutual information between the training sample $S$ and the output hypothesis $W$. This upper bound discovers: (1) As the network increases its number of convolutional and pooling layers $L$, the expected generalization error will decrease exponentially to zero. Layers with strict information loss, such as the convolutional layers, reduce the generalization error of deep learning algorithms. This answers the first question. However, (2) algorithms with zero expected generalization error does not imply a small test error or $\mathbb{E}[R(W)]$. This is because $\mathbb{E}[R_S(W)]$ will be large when the information for fitting the data is lost as the number of layers increases. This suggests that the claim 'the deeper the better' is conditioned on a small training error or $\mathbb{E}[R_S(W)]$.
An Interdisciplinary Comparison of Sequence Modeling Methods for Next-Element Prediction Data of sequential nature arise in many application domains in forms of, e.g. textual data, DNA sequences, and software execution traces. Different research disciplines have developed methods to learn sequence models from such datasets: (i) in the machine learning field methods such as (hidden) Markov models and recurrent neural networks have been developed and successfully applied to a wide-range of tasks, (ii) in process mining process discovery techniques aim to generate human-interpretable descriptive models, and (iii) in the grammar inference field the focus is on finding descriptive models in the form of formal grammars. Despite their different focuses, these fields share a common goal – learning a model that accurately describes the behavior in the underlying data. Those sequence models are generative, i.e, they can predict what elements are likely to occur after a given unfinished sequence. So far, these fields have developed mainly in isolation from each other and no comparison exists. This paper presents an interdisciplinary experimental evaluation that compares sequence modeling techniques on the task of next-element prediction on four real-life sequence datasets. The results indicate that machine learning techniques that generally have no aim at interpretability in terms of accuracy outperform techniques from the process mining and grammar inference fields that aim to yield interpretable models.
An Interpretable Compression and Classification System: Theory and Applications This study proposes a low-complexity interpretable classification system. The proposed system contains three main modules including feature extraction, feature reduction, and classification. All of them are linear. Thanks to the linear property, the extracted and reduced features can be inversed to original data, like a linear transform such as Fourier transform, so that one can quantify and visualize the contribution of individual features towards the original data. Also, the reduced features and reversibility naturally endure the proposed system ability of data compression. This system can significantly compress data with a small percent deviation between the compressed and the original data. At the same time, when the compressed data is used for classification, it still achieves high testing accuracy. Furthermore, we observe that the extracted features of the proposed system can be approximated to uncorrelated Gaussian random variables. Hence, classical theory in estimation and detection can be applied for classification. This motivates us to propose using a MAP (maximum a posteriori) based classification method. As a result, the extracted features and the corresponding performance have statistical meaning and mathematically interpretable. Simulation results show that the proposed classification system not only enjoys significant reduced training and testing time but also high testing accuracy compared to the conventional schemes.
An Introduction to Advanced Analytics Advanced Analytics is ‘the analysis of all kinds of data using sophisticated quantitative methods (for example, statistics, descriptive and predictive data mining, simulation and optimization) to produce insights that traditional approaches to business intelligence (BI) – such as query and reporting – are unlikely to discover.’
An Introduction to Advanced Machine Learning : Meta Learning Algorithms, Applications and Promises In [1, 2], we have explored the theoretical aspects of feature extraction optimization processes for solving largescale problems and overcoming machine learning limitations. Majority of optimization algorithms that have been introduced in [1, 2] guarantee the optimal performance of supervised learning, given offline and discrete data, to deal with curse of dimensionality (CoD) problem. These algorithms, however, are not tailored for solving emerging learning problems. One of the important issues caused by online data is lack of sufficient samples per class. Further, traditional machine learning algorithms cannot achieve accurate training based on limited distributed data, as data has proliferated and dispersed significantly. Machine learning employs a strict model or embedded engine to train and predict which still fails to learn unseen classes and sufficiently use online data. In this chapter, we introduce these challenges elaborately. We further investigate Meta-Learning (MTL) algorithm, and their application and promises to solve the emerging problems by answering how autonomous agents can learn to learn?.
An Introduction to Artificial Intelligence Applied to Multimedia In this chapter, we give an introduction to symbolic artificial intelligence (AI) and discuss its relation and application to multimedia. We begin by defining what symbolic AI is, what distinguishes it from non-symbolic approaches, such as machine learning, and how it can used in the construction of advanced multimedia applications. We then introduce description logic (DL) and use it to discuss symbolic representation and reasoning. DL is the logical underpinning of OWL, the most successful family of ontology languages. After discussing DL, we present OWL and related Semantic Web technologies, such as RDF and SPARQL. We conclude the chapter by discussing a hybrid model for multimedia representation, called Hyperknowledge. Throughout the text, we make references to technologies and extensions specifically designed to solve the kinds of problems that arise in multimedia representation.
An Introduction to Bayesian Networks: Concepts and Learning from Data (Slide Deck)
An Introduction to Causal Inference This paper summarizes recent advances in causal inference and underscores the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underlie all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: those about (1) the effects of potential interventions, (2) probabilities of counterfactuals, and (3) direct and indirect effects (also known as ‘mediation’). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both. The tools are demonstrated in the analyses of mediation, causes of effects, and probabilities of causation.
An Introduction to Cluster Analysis for Data Mining Cluster analysis divides data into meaningful or useful groups (clusters). If meaningful clusters are the goal, then the resulting clusters should capture the ‘natural’ structure of the data. For example, cluster analysis has been used to group related documents for browsing, to find genes and proteins that have similar functionality, and to provide a grouping of spatial locations prone to earthquakes. However, in other cases, cluster analysis is only a useful starting point for other purposes, e.g., data compression or efficiently finding the nearest neighbors of points. Whether for understanding or utility, cluster analysis has long been used in a wide variety of fields: psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning, and data mining. The scope of this paper is modest: to provide an introduction to cluster analysis in the field of data mining, where we define data mining to be the discovery of useful, but non-obvious, information or patterns in large collections of data. Much of this paper is necessarily consumed with providing a general background for cluster analysis, but we also discuss a number of clustering techniques that have recently been developed specifically for data mining. While the paper strives to be self-contained from a conceptual point of view, many details have been omitted. Consequently, many references to relevant books and papers are provided.
An Introduction To Compressive Sampling This article surveys the theory of compressive sampling, also known as compressed sensing or CS, a novel sensing/sampling paradigm that goes against the common wisdom in data acquisition. CS theory asserts that one can recover certain signals and images from far fewer samples or measurements than traditional methods use. To make this possible, CS relies on two principles: sparsity, which pertains to the signals of interest, and incoherence, which pertains to the sensing modality.
An Introduction to Deep Reinforcement Learning Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision-making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.
An introduction to domain adaptation and transfer learning In machine learning, if the training data is an unbiased sample of an underlying distribution, then the learned classification function will make accurate predictions for new samples. However, if the training data is not an unbiased sample, then there will be differences between how the training data is distributed and how the test data is distributed. Standard classifiers cannot cope with changes in data distributions between training and test phases, and will not perform well. Domain adaptation and transfer learning are sub-fields within machine learning that are concerned with accounting for these types of changes. Here, I present an introduction to these fields, guided by the question: when and how can a classifier generalize from a source to a target domain? I will start with a brief introduction into risk minimization, and how transfer learning and domain adaptation expand upon this framework. Following that, I discuss three special cases of data set shift, namely prior, covariate and concept shift. For more complex domain shifts, there are a wide variety of approaches. These are categorized into: importance-weighting, subspace mapping, domain-invariant spaces, feature augmentation, minimax estimators and robust algorithms. A number of points will arise, which I will discuss in the last section. I conclude with the remark that many open questions will have to be addressed before transfer learners and domain-adaptive classifiers become practical.
An Introduction to Factor Graphs A large variety of algorithms in coding, signal processing, and artificial intelligence may be viewed as instances of the summary-product algorithm (or belief/probability propagation algorithm), which operates by message passing in a graphical model. Specific instances of such algorithms include Kalman filtering and smoothing; the forward-backward algorithm for hidden Markov models; probability propagation in Bayesian networks; and decoding algorithms for error-correcting codes such as the Viterbi algorithm, the BCJR algorithm, and the iterative decoding of turbo codes, low-density parity-check (LDPC) codes, and similar codes. New algorithms for complex detection and estimation problems can also be derived as instances of the summary-product algorithm. In this article, we give an introduction to this unified perspective in terms of (Forney-style) factor graphs.
An Introduction to Fuzzy and Annotated Semantic Web Languages We present the state of the art in representing and reasoning with fuzzy knowledge in Semantic Web Languages such as triple languages RDF/RDFS, conceptual languages of the OWL 2 family and rule languages. We further show how one may generalise them to so-called annotation domains, that cover also e.g. temporal and provenance extensions.
An introduction to Graph Data Management A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them.
An introduction to graphical models The following quotation, from the Preface provides a very concise introduction to graphical models: Graphical models are a marriage between probability theory and graph theory. They provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering { uncertainty and complexity { and in particular they are playing an increasingly important role in the design and analysis of machine learning algorithms. Fundamental to the idea of a graphical model is the notion of modularity { a complex system is built by combining simpler parts. Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data. The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly-interacting sets of variables as well as a data structure that lends itself naturally to the design of e cient general-purpose algorithms. Many of the classical multivariate probabalistic systems studied in elds such as statistics, systems engineering, information theory, pattern recognition and statistical mechanics are special cases of the general graphical model formalism { examples include mixture models, factor analysis, hidden Markov models, Kalman lters and Ising models. The graphical model framework provides a way to view all of these systems as instances of a common underlying formalism. This view has many advantages { in particular, specialized techniques that have been developed in one eld can be transferred between research communities and exploited more widely. Moreover, the graphical model formalism provides a natural framework for the design of new systems.
An introduction to graphical models (Slide Deck)
An introduction to high-dimensional statistics In this note, we aim to give a very brief introduction to high-dimensional statistics. Rather than attempting to give an overview of this vast area, we will explain what is meant by highdimensional data and then focus on two methods which have been introduced to deal with this sort of data. Many of the state of the art techniques used in high-dimensional statistics today are based on these two core methods. We begin with a quick recap of least squares regression.
An Introduction to Image Synthesis with Generative Adversarial Nets There has been a drastic growth of research in Generative Adversarial Nets (GANs) in the past few years. Proposed in 2014, GAN has been applied to various applications such as computer vision and natural language processing, and achieves impressive performance. Among the many applications of GAN, image synthesis is the most well-studied one, and research in this area has already demonstrated the great potential of using GAN in image synthesis. In this paper, we provide a taxonomy of methods used in image synthesis, review different models for text-to-image synthesis and image-to-image translation, and discuss some evaluation metrics as well as possible future research directions in image synthesis with GAN.
An Introduction to Inductive Statistical Inference — from Parameter Estimation to Decision-Making These lecture notes aim at a post-Bachelor audience with a backgound at an introductory level in Applied Mathematics and Applied Statistics. They discuss the logic and methodology of the Bayes-Laplace approach to inductive statistical inference that places common sense and the guiding lines of the scientific method at the heart of systematic analyses of quantitative-empirical data. Following an exposition of exactly solvable cases of single- and two-parameter estimation, the main focus is laid on Markov Chain Monte Carlo (MCMC) simulations on the basis of Gibbs sampling and Hamiltonian Monte Carlo sampling of posterior joint probability distributions for regression parameters occurring in generalised linear models. The modelling of fixed as well as of varying effects (varying intercepts) is considered, and the simulation of posterior predictive distributions is outlined. The issues of model comparison with Bayes factors and the assessment of models’ relative posterior predictive accuracy with information entropy-based criteria DIC and WAIC are addressed. Concluding, a conceptual link to the behavioural subjective expected utility representation of a single decision-maker’s choice behaviour in static one-shot decision problems is established. Codes for MCMC simulations of multi-dimensional posterior joint probability distributions with the JAGS and Stan packages implemented in the statistical software R are provided. The lecture notes are fully hyperlinked. They direct the reader to original scientific research papers and to pertinent biographical information.
An Introduction to Latent Semantic Analysis The question of knowledge induction, i.e. how children are able to learn so much about, say, what words mean without any explicit instruction, is one that has vexed philosophers, linguistics, and psychologists alike. Indeed, inferring the vast amount of knowledge that children learn almost effortlessly from an apparently ‘impoverished stimulus’ seems paradoxical. The Latent Semantic Analysis model (Landauer and Dumais, 1997) is a theory for how meaning representations might be learned from encountering large samples of language without explicit directions as to how it is structured. To do this, LSA makes two assumptions about how the meaning of linguistic expressions is present in the distributional patterns of simple expressions (e.g words) within more complex expressions (e.g. sentences and paragraphs) viewed across many samples of language….
An Introduction to Latent Variable Mixture Modeling (Part 1): Overview and Cross-Sectional Latent Class and Latent Profile Analyses Objective: Pediatric psychologists are often interested in finding patterns in heterogeneous cross-sectional data. Latent variable mixture modeling is an emerging person-centered statistical approach that models heterogeneity by classifying individuals into unobserved groupings (latent classes) with similar (more homogenous) patterns. The purpose of this article is to offer a nontechnical introduction to cross-sectional mixture modeling. Method: An overview of latent variable mixture modeling is provided and 2 cross-sectional examples are reviewed and distinguished. Results: Step-by-step pediatric psychology examples of latent class and latent profile analyses are provided using the Early Childhood Longitudinal Study-Kindergarten Class of 1998-1999 data file. Conclusions: Latent variable mixture modeling is a technique that is useful to pediatric psychologists who wish to find groupings of individuals who share similar data patterns to determine the extent to which these patterns may relate to variables of interest.
An Introduction to Latent Variable Mixture Modeling (Part 2): Longitudinal Latent Class Growth Analysis and Growth Mixture Models Objective: Pediatric psychologists are often interested in finding patterns in heterogeneous longitudinal data. Latent Variable Mixture Modeling is an emerging statistical approach that models such heterogeneity by classifying individuals into unobserved groupings (latent classes) with similar (more homogenous) patterns. The purpose of the second of a two article set is to offer a nontechnical introduction to longitudinal latent variable mixture modeling. Methods: 3 latent variable approaches to modeling longitudinal data are reviewed and distinguished. Results: Step-by-step pediatric psychology examples of latent growth curve modeling, latent class growth analysis, and growth mixture modeling are provided using the Early Childhood Longitudinal Study-Kindergarten Class of 1998-99 data file. Conclusions: Latent variable mixture modeling is a technique that is useful to pediatric psychologists who wish to find groupings of individuals who share similar longitudinal data patterns to determine the extent to which these patterns may relate to variables of interest.
An Introduction to Mathematical Optimal Control Theory Version 0.2 These notes build upon a course I taught at the University of Maryland during the fall of 1983. My great thanks go to Martino Bardi, who took careful notes, saved them all these years and recently mailed them to me. Faye Yeager typed up his notes into a first draft of these lectures as they now appear. Scott Armstrong read over the notes and suggested many improvements: thanks, Scott. Stephen Moye of the American Math Society helped me a lot with AMSTeX versus LaTeX issues. My thanks also to Atilla Yilmaz for spotting lots of typos and errors, which I have corrected. I have radically modified much of the notation (to be consistent with my other writings), updated the references, added several new examples, and provided a proof of the Pontryagin Maximum Principle. As this is a course for undergraduates, I have dispensed in certain proofs with various measurability and continuity issues, and as compensation have added various critiques as to the lack of total rigor. This current version of the notes is not yet complete, but meets I think the usual high standards for material posted on the internet. Please email me at evans@math.berkeley.edu with any corrections or comments.
An introduction to modern missing data analyses A great deal of recent methodological research has focused on two modern missing data analysis methods: maximum likelihood and multiple imputation. These approaches are advantageous to traditional techniques (e.g. deletion and mean imputation techniques) because they require less stringent assumptions and mitigate the pitfalls of traditional techniques. This article explains the theoretical underpinnings of missing data analyses, gives an overview of traditional missing data techniques, and provides accessible descriptions of maximum likelihood and multiple imputation. In particular, this article focuses on maximum likelihood estimation and presents two analysis examples from the Longitudinal Study of American Youth data. One of these examples includes a description of the use of auxiliary variables. Finally, the paper illustrates ways that researchers can use intentional, or planned, missing data to enhance their research designs.
An Introduction to Multivariate Statistics The term ‘multivariate statistics’ is appropriately used to include all statistics where there are more than two variables simultaneously analyzed. You are already familiar with bivariate statistics such as the Pearson product moment correlation coefficient and the independent groups t-test. A one-way ANOVA with 3 or more treatment groups might also be considered a bivariate design, since there are two variables: one independent variable and one dependent variable. Statistically, one could consider the one-way ANOVA as either a bivariate curvilinear regression or as a multiple regression with the K level categorical independent variable dummy coded into K-1 dichotomous variables.
An Introduction to Neural Networks An accurate forecast into the future can offer tremendous value in areas as diverse as financial market price movements, financial expense budget forecasts, website clickthrough likelihoods, insurance risk, and drug compound efficacy, to name just a few. Many algorithm techniques, ranging from regression analysis to ARIMA for time series, among others, are regularly used to generate forecasts. A neural network approach provides a forecasting technique that can operate in circumstances where classical techniques cannot perform or do not generate the desired accuracy in a forecast.
An Introduction to Ontology Learning Ever since the early days of Artificial Intelligence and the development of the first knowledge-based systems in the 70s people have dreamt of self-learning machines. When knowledge-based systems grew larger and the commercial interest in these technologies increased, people became aware of the knowledge acquisition bottleneck and the necessity to (partly) automatize the creation and maintenance of knowledge bases. Today, many applications which exhibit ´intelligent´ behavior thanks to symbolic knowledge representation and logical inference rely on ontologies and the standards provided by the World Wide Web Committee (W3C). Supporting the construction of ontologies and populating them with instantiations of both concepts and relations, commonly referred to as ontology learning. Early research in ontology learning has concentrated on the extraction of facts or schema-level knowledge from textual resources building upon earlier work in the field of computational linguistics and lexical acquisition. However, as we will show in this book, ontology learning is a very diverse and interdisciplinary field of research. Ontology learning approaches are as heterogeneous as the sources of data on the web, and as different from one another as the types of knowledge representations called ‘ontologies’. In the remainder of this introduction, we briefly summarize the state-of-the-art in ontology learning and elaborate on what we consider as the key challenges for current and future ontology learning research.
An Introduction to Probabilistic Programming This document is designed to be a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages. We start with a discussion of model-based reasoning and explain why conditioning as a foundational computation is central to the fields of probabilistic machine learning and artificial intelligence. We then introduce a simple first-order probabilistic programming language (PPL) whose programs define static-computation-graph, finite-variable-cardinality models. In the context of this restricted PPL we introduce fundamental inference algorithms and describe how they can be implemented in the context of models denoted by probabilistic programs. In the second part of this document, we introduce a higher-order probabilistic programming language, with a functionality analogous to that of established programming languages. This affords the opportunity to define models with dynamic computation graphs, at the cost of requiring inference methods that generate samples by repeatedly executing the program. Foundational inference algorithms for this kind of probabilistic programming language are explained in the context of an interface between program executions and an inference controller. This document closes with a chapter on advanced topics which we believe to be, at the time of writing, interesting directions for probabilistic programming research; directions that point towards a tight integration with deep neural network research and the development of systems for next-generation artificial intelligence applications.
An introduction to ROC analysis Receiver operating characteristics (ROC) graphs are useful for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been used increasingly in machine learning and data mining research. Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls when using them in practice. The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.
An Introduction to the Practical and Theoretical Aspects of Mixture-of-Experts Modeling Mixture-of-experts (MoE) models are a powerful paradigm for modeling of data arising from complex data generating processes (DGPs). In this article, we demonstrate how different MoE models can be constructed to approximate the underlying DGPs of arbitrary types of data. Due to the probabilistic nature of MoE models, we propose the maximum quasi-likelihood (MQL) estimator as a method for estimating MoE model parameters from data, and we provide conditions under which MQL estimators are consistent and asymptotically normal. The blockwise minorization-maximizatoin (blockwise-MM) algorithm framework is proposed as an all-purpose method for constructing algorithms for obtaining MQL estimators. An example derivation of a blockwise-MM algorithm is provided. We then present a method for constructing information criteria for estimating the number of components in MoE models and provide justification for the classic Bayesian information criterion (BIC). We explain how MoE models can be used to conduct classification, clustering, and regression and we illustrate these applications via a pair of worked examples.
An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists Topological Data Analysis (tda) is a recent and fast growing eld providing a set of new topological and geometric tools to infer relevant features for possibly complex data. This paper is a brief introduction, through a few selected topics, to basic fundamental and practical aspects of tda for non experts. 1 Introduction and motivation Topological Data Analysis (tda) is a recent eld that emerged from various works in applied (algebraic) topology and computational geometry during the rst decade of the century. Although one can trace back geometric approaches for data analysis quite far in the past, tda really started as a eld with the pioneering works of Edelsbrunner et al. (2002) and Zomorodian and Carlsson (2005) in persistent homology and was popularized in a landmark paper in 2009 Carlsson (2009). tda is mainly motivated by the idea that topology and geometry provide a powerful approach to infer robust qualitative, and sometimes quantitative, information about the structure of data-see, e.g. Chazal (2017). tda aims at providing well-founded mathematical, statistical and algorithmic methods to infer, analyze and exploit the complex topological and geometric structures underlying data that are often represented as point clouds in Euclidean or more general metric spaces. During the last few years, a considerable eort has been made to provide robust and ecient data structures and algorithms for tda that are now implemented and available and easy to use through standard libraries such as the Gudhi library (C++ and Python) Maria et al. (2014) and its R software interface Fasy et al. (2014a). Although it is still rapidly evolving, tda now provides a set of mature and ecient tools that can be used in combination or complementary to other data sciences tools. The tdapipeline. tda has recently known developments in various directions and application elds. There now exist a large variety of methods inspired by topological and geometric approaches. Providing a complete overview of all these existing approaches is beyond the scope of this introductory survey. However, most of them rely on the following basic and standard pipeline that will serve as the backbone of this paper: 1. The input is assumed to be a nite set of points coming with a notion of distance-or similarity between them. This distance can be induced by the metric in the ambient space (e.g. the Euclidean metric when the data are embedded in R d) or come as an intrinsic metric dened by a pairwise distance matrix. The denition of the metric on the data is usually given as an input or guided by the application. It is however important to notice that the choice of the metric may be critical to reveal interesting topological and geometric features of the data.
An Introduction to Variable and Feature Selection Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
An Introduction to Variational Autoencoders Variational autoencoders provide a principled framework for learning deep latent-variable models and corresponding inference models. In this work, we provide an introduction to variational autoencoders and some important extensions.
An Introduction to Visualizing Data The purpose of this document is to provide an introduction to the theory behind visualizing data. After studying the works of many talented people I decided to summarize the key points of information into this single paper. If you found this document interesting please take some time to look at the list of resources that I used (see Chapter 8) because I could never have created this without the excellent work done by others.
An Introductory Survey on Attention Mechanisms in NLP Problems First derived from human intuition, later adapted to machine translation for automatic token alignment, attention mechanism, a simple method that can be used for encoding sequence data based on the importance score each element is assigned, has been widely applied to and attained significant improvement in various tasks in natural language processing, including sentiment classification, text summarization, question answering, dependency parsing, etc. In this paper, we survey through recent works and conduct an introductory summary of the attention mechanism in different NLP problems, aiming to provide our readers with basic knowledge on this widely used method, discuss its different variants for different tasks, explore its association with other techniques in machine learning, and examine methods for evaluating its performance.
An Overview of Blockchain Integration with Robotics and Artificial Intelligence Blockchain technology is growing everyday at a fast-passed rhythm and it’s possible to integrate it with many systems, namely Robotics with AI services. However, this is still a recent field and there isn’t yet a clear understanding of what it could potentially become. In this paper, we conduct an overview of many different methods and platforms that try to leverage the power of blockchain into robotic systems, to improve AI services or to solve problems that are present in the major blockchains, which can lead to the ability of creating robotic systems with increased capabilities and security. We present an overview, discuss the methods and conclude the paper with our view on the future of the integration of these technologies.
An Overview of Computational Approaches for Analyzing Interpretation It is said that beauty is in the eye of the beholder. But how exactly can we characterize such discrepancies in interpretation? For example, are there any specific features of an image that makes person A regard an image as beautiful while person B finds the same image displeasing? Such questions ultimately aim at explaining our individual ways of interpretation, an intention that has been of fundamental importance to the social sciences from the beginning. More recently, advances in computer science brought up two related questions: First, can computational tools be adopted for analyzing ways of interpretation? Second, what if the ‘beholder’ is a computer model, i.e., how can we explain a computer model’s point of view? Numerous efforts have been made regarding both of these points, while many existing approaches focus on particular aspects and are still rather separate. With this paper, in order to connect these approaches we introduce a theoretical framework for analyzing interpretation, which is applicable to interpretation of both human beings and computer models. We give an overview of relevant computational approaches from various fields, and discuss the most common and promising application areas. The focus of this paper lies on interpretation of text and image data, while many of the presented approaches are applicable to other types of data as well.
An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos Videos represent the primary source of information for surveillance applications and are available in large amounts but in most cases contain little or no annotation for supervised learning. This article reviews the state-of-the-art deep learning based methods for video anomaly detection and categorizes them based on the type of model and criteria of detection. We also perform simple studies to understand the different approaches and provide the criteria of evaluation for spatio-temporal anomaly detection.
An Overview of Machine Teaching In this paper we try to organize machine teaching as a coherent set of ideas. Each idea is presented as varying along a dimension. The collection of dimensions then form the problem space of machine teaching, such that existing teaching problems can be characterized in this space. We hope this organization allows us to gain deeper understanding of individual teaching problems, discover connections among them, and identify gaps in the field.
An Overview of Multi-Processor Approximate Message Passing Approximate message passing (AMP) is an algorithmic framework for solving linear inverse problems from noisy measurements, with exciting applications such as reconstructing images, audio, hyper spectral images, and various other signals, including those acquired in compressive signal acquisiton systems. The growing prevalence of big data systems has increased interest in large-scale problems, which may involve huge measurement matrices that are unsuitable for conventional computing systems. To address the challenge of large-scale processing, multiprocessor (MP) versions of AMP have been developed. We provide an overview of two such MP-AMP variants. In row-MP-AMP, each computing node stores a subset of the rows of the matrix and processes corresponding measurements. In column- MP-AMP, each node stores a subset of columns, and is solely responsible for reconstructing a portion of the signal. We will discuss pros and cons of both approaches, summarize recent research results for each, and explain when each one may be a viable approach. Aspects that are highlighted include some recent results on state evolution for both MP-AMP algorithms, and the use of data compression to reduce communication in the MP network.
An Overview of Multi-Task Learning in Deep Neural Networks Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery. This article aims to give a general overview of MTL, particularly in deep neural networks. It introduces the two most common methods for MTL in Deep Learning, gives an overview of the literature, and discusses recent advances. In particular, it seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks.
An Overview of Open-Ended Evolution: Editorial Introduction to the Open-Ended Evolution II Special Issue Nature’s spectacular inventiveness, reflected in the enormous diversity of form and function displayed by the biosphere, is a feature of life that distinguishes living most strongly from nonliving. It is, therefore, not surprising that this aspect of life should become a central focus of artificial life. We have known since Darwin that the diversity is produced dynamically, through the process of evolution; this has led life’s creative productivity to be called Open-Ended Evolution (OEE) in the field. This article introduces the second of two special issues on current research in OEE and provides an overview of the contents of both special issues. Most of the work was presented at a workshop on open-ended evolution that was held as a part of the 2018 Conference on Artificial Life in Tokyo, and much of it had antecedents in two previous workshops on open-ended evolution at artificial life conferences in Cancun and York. We present a simplified categorization of OEE and summarize progress in the field as represented by the articles in this special issue.
An Overview of Spatial Econometrics This paper offers an expository overview of the field of spatial econometrics. It first justifies the necessity of special statistical procedures for the analysis of spatial data and then proceeds to describe the fundamentals of these procedures. In particular, this paper covers three crucial techniques for building models with spatial data. First, we discuss how to create a spatial weights matrix based on the distances between each data point in a dataset. Next, we describe the conventional methods to formally detect spatial autocorrelation, both global and local. Finally, we outline the chief components of a spatial autoregressive model, noting the circumstances under which it would be appropriate to incorporate each component into a model. This paper seeks to offer a concise introduction to spatial econometrics that will be accessible to interested individuals with a background in statistics or econometrics.
An Overview of Statistical Data Analysis The use of statistical software in academia and enterprises has been evolving over the last years. More often than not, students, professors, workers, and users, in general, have all had, at some point, exposure to statistical software. Sometimes, difficulties are felt when dealing with such type of software. Very few persons have theoretical knowledge to clearly understand software configurations or settings, and sometimes even the presented results. Very often, the users are required by academies or enterprises to present reports, without the time to explore or understand the results or tasks required to do an optimal preparation of data or software settings. In this work, we present a statistical overview of some theoretical concepts, to provide fast access to some concepts.
An Overview of Statistical Learning Theory Statistical learning theory was introduced in the late 1960´s. Until the 1990´s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990´s new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems. A more detailed overview of the theory (without proofs) can be found in Vapnik (1995). In Vapnik (1998) one can find detailed description of the theory (including proofs).
An Overview on Data Representation Learning: From Traditional Feature Learning to Recent Deep Learning Since about 100 years ago, to learn the intrinsic structure of data, many representation learning approaches have been proposed, including both linear ones and nonlinear ones, supervised ones and unsupervised ones. Particularly, deep architectures are widely applied for representation learning in recent years, and have delivered top results in many tasks, such as image classification, object detection and speech recognition. In this paper, we review the development of data representation learning methods. Specifically, we investigate both traditional feature learning algorithms and state-of-the-art deep learning models. The history of data representation learning is introduced, while available resources (e.g. online course, tutorial and book information) and toolboxes are provided. Finally, we conclude this paper with remarks and some interesting research directions on data representation learning.
Analysing spatial point patterns in R This is a detailed set of notes for a workshop on Analysing spatial point patterns in R, presented by the author in Australia and New Zealand since 2006. The goal of the workshop is to equip researchers with a range of practical techniques for the statistical analysis of spatial point patterns. Some of the techniques are well established in the applications literature, while some are very recent developments. The workshop is based on spatstat, a contributed library for the statistical package R, which is free open source software. Topics covered include: statistical formulation and methodological issues; data input and handling; R concepts such as classes and methods; exploratory data analysis; nonparametric intensity and risk estimates; goodness-of-fit testing for Complete Spatial Randomness; maximum likelihood inference for Poisson processes; spatial logistic regression; model validation for Poisson processes; exploratory analysis of dependence; distance methods and summary functions such as Ripley´s K function; simulation techniques; non-Poisson point process models; fitting models using summary statistics; LISA and local analysis; inhomogeneous K-functions; Gibbs point process models; fitting Gibbs models; simulating Gibbs models; validating Gibbs models; multitype and marked point patterns; exploratory analysis of multitype and marked point patterns; multitype Poisson process models and maximum likelihood inference; multitype Gibbs process models and maximum pseudolikelihood; line segment patterns, 3-dimensional point patterns, multidimensional space-time point patterns, replicated point patterns, and stochastic geometry methods.
Analysis and Optimization of Convolutional Neural Network Architectures Convolutional Neural Networks (CNNs) dominate various computer vision tasks since Alex Krizhevsky showed that they can be trained effectively and reduced the top-5 error from 26.2 % to 15.3 % on the ImageNet large scale visual recognition challenge. Many aspects of CNNs are examined in various publications, but literature about the analysis and construction of neural network architectures is rare. This work is one step to close this gap. A comprehensive overview over existing techniques for CNN analysis and topology construction is provided. A novel way to visualize classification errors with confusion matrices was developed. Based on this method, hierarchical classifiers are described and evaluated. Additionally, some results are confirmed and quantified for CIFAR-100. For example, the positive impact of smaller batch sizes, averaging ensembles, data augmentation and test-time transformations on the accuracy. Other results, such as the positive impact of learned color transformation on the test accuracy could not be confirmed. A model which has only one million learned parameters for an input size of 32x32x3 and 100 classes and which beats the state of the art on the benchmark dataset Asirra, GTSRB, HASYv2 and STL-10 was developed.
Analysis Methods in Neural Language Processing: A Survey The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.
Analysis of Dropout in Online Learning Deep learning is the state-of-the-art in fields such as visual object recognition and speech recognition. This learning uses a large number of layers and a huge number of units and connections. Therefore, overfitting is a serious problem with it, and the dropout which is a kind of regularization tool is used. However, in online learning, the effect of dropout is not well known. This paper presents our investigation on the effect of dropout in online learning. We analyzed the effect of dropout on convergence speed near the singular point. Our results indicated that dropout is effective in online learning. Dropout tends to avoid the singular point for convergence speed near that point.
Analysis of Evolutionary Algorithms in Dynamic and Stochastic Environments Many real-world optimization problems occur in environments that change dynamically or involve stochastic components. Evolutionary algorithms and other bio-inspired algorithms have been widely applied to dynamic and stochastic problems. This survey gives an overview of major theoretical developments in the area of runtime analysis for these problems. We review recent theoretical studies of evolutionary algorithms and ant colony optimization for problems where the objective functions or the constraints change over time. Furthermore, we consider stochastic problems under various noise models and point out some directions for future research.
Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey Deep Learning is a state-of-the-art technique to make inference on extensive or complex data. As a black box model due to their multilayer nonlinear structure, Deep Neural Networks are often criticized to be non-transparent and their predictions not traceable by humans. Furthermore, the models learn from artificial datasets, often with bias or contaminated discriminating content. Through their increased distribution, decision-making algorithms can contribute promoting prejudge and unfairness which is not easy to notice due to lack of transparency. Hence, scientists developed several so-called explanators or explainers which try to point out the connection between input and output to represent in a simplified way the inner structure of machine learning black boxes. In this survey we differ the mechanisms and properties of explaining systems for Deep Neural Networks for Computer Vision tasks. We give a comprehensive overview about taxonomy of related studies and compare several survey papers that deal with explainability in general. We work out the drawbacks and gaps and summarize further research ideas.
Analysis of Unstructured Data: Applications of Text Analytics and Sentiment Mining The proliferation of textual data in business is overwhelming. Unstructured textual data is being constantly generated via call center logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, and so on. While the amount of textual data is increasing rapidly, businesses´ ability to summarize, understand, and make sense of such data for making better business decisions remain challenging. This paper takes a quick look at how to organize and analyze textual data for extracting insightful customer intelligence from a large collection of documents and for using such information to improve business operations and performance. Multiple business applications of case studies using real data that demonstrate applications of text analytics and sentiment mining using SAS Text Miner and SAS Sentiment Analysis Studio are presented. While SAS products are used as tools for demonstration only, the topics and theories covered are generic (not tool specific).
Analytical Skills, Tools and Attitudes 2013: Analytics capabilities needed now and in the future Organizations continue to invest more in analytics, but increasingly there is recognition that a shortage of analytic talent is holding back even greater investment. Lavastorm Analytics polled more than 425 people in the analytics community about whether their organization needs more analytic resources or skills and which skills are valued most and are most urgently needed. Survey respondents included business analysts, technologists, data analytics professionals, managers, and C-level executives across a broad variety of industries. The top findings were: – According to the survey respondents, a lack of skills/training/education is the biggest factor holding back organizations from using analytics more. – Skills most urgently needed in their organizations are Statistics, math or other quantitative skills; Analytic tool training; and Critical thinking. – Lack of funding or resources, however, also has a significant impact on adoption of analytics to drive day-to-day decisions. Lesser factors also include inadequate support from executives and data that is not integrated.
Analytics 3.0 In the new era, big data will power consumer products and services
Analytics for the Internet of Things: A Survey The Internet of Things (IoT) envisions a world-wide, interconnected network of smart physical entities. These physical entities generate a large amount of data in operation and as the IoT gains momentum in terms of deployment, the combined scale of those data seems destined to continue to grow. Increasingly, applications for the IoT involve analytics. Data analytics is the process of deriving knowledge from data, generating value like actionable insights from them. This article reviews work in the IoT and big data analytics from the perspective of their utility in creating efficient, effective and innovative applications and services for a wide spectrum of domains. We review the broad vision for the IoT as it is shaped in various communities, examine the application of data analytics across IoT domains, provide a categorisation of analytic approaches and propose a layered taxonomy from IoT data to analytics. This taxonomy provides us with insights on the appropriateness of analytical techniques, which in turn shapes a survey of enabling technology and infrastructure for IoT analytics. Finally, we look at some tradeoffs for analytics in the IoT that can shape future research.
Analytics: The real-world use of big data Big data’ – which admittedly means many things to many people – is no longer confined to the realm of technology. Today it is a business priority, given its ability to profoundly affect commerce in the globally integrated economy. In addition to providing solutions to long-standing business challenges, big data inspires new ways to transform processes, organizations, entire industries and even society itself. Yet extensive media coverage makes it hard to distinguish hype from reality – what is really happening Our newest research finds that organizations are using big data to target customer-centric outcomes, tap into internal data and build a better information ecosystem.
Analyzing biological and artificial neural networks: challenges with opportunities for synergy? Deep neural networks (DNNs) transform stimuli across multiple processing stages to produce representations that can be used to solve complex tasks, such as object recognition in images. However, a full understanding of how they achieve this remains elusive. The complexity of biological neural networks substantially exceeds the complexity of DNNs, making it even more challenging to understand the representations that they learn. Thus, both machine learning and computational neuroscience are faced with a shared challenge: how can we analyze their representations in order to understand how they solve complex tasks? We review how data-analysis concepts and techniques developed by computational neuroscientists can be useful for analyzing representations in DNNs, and in turn, how recently developed techniques for analysis of DNNs can be useful for understanding representations in biological neural networks. We explore opportunities for synergy between the two fields, such as the use of DNNs as in-silico model systems for neuroscience, and how this synergy can lead to new hypotheses about the operating principles of biological neural networks.
Analyzing the Analyzers Binita, Chao, Dmitri, and Rebecca are data scientists. What does that statement tell you about them Probably not as much as you´d like. You know they probably know something about statistics, programming, and data visualization. You´d hope that they had some experience finding insights from data, maybe even ‘big data.’ But if you´re trying to find the best person for a job, you need to be more specific than just ‘doctor,’ or ‘athlete,’ or ‘data scientist.’ And that´s a problem. Finding the right people for a task is all about efficient communication and, without the appropriate shared vocabulary, data science talent and data science problems are too often kept apart….
Anomaly Detection in Road Traffic Using Visual Surveillance: A Survey Computer vision has evolved in the last decade as a key technology for numerous applications replacing human supervision. In this paper, we present a survey on relevant visual surveillance related researches for anomaly detection in public places, focusing primarily on roads. Firstly, we revisit the surveys done in the last 10 years in this field. Since the underlying building block of a typical anomaly detection is learning, we emphasize more on learning methods applied on video scenes. We then summarize the important contributions made during last six years on anomaly detection primarily focusing on features, underlying techniques, applied scenarios and types of anomalies using single static camera. Finally, we discuss the challenges in the computer vision related anomaly detection techniques and some of the important future possibilities.
Anomaly Detection: A Survey Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
Anomaly Detection: Review and preliminary Entropy method tests Anomalies are strange data points; they usually represent an unusual occurrence. Anomaly detection is presented from the perspective of Wireless sensor networks. Different approaches have been taken in the past, as we will see, not only to identify outliers, but also to establish the statistical properties of the different methods. The usual goal is to show that the approach is asymptotically efficient and that the metric used is unbiased or maybe biased. This project is based on a work done by [1]. The approach is based on the principle that the entropy of the data is increased when an anomalous data point is measured. The entropy of the data set is thus to be estimated. In this report however, preliminary efforts at confirming the results of [1] is presented. To estimate the entropy of the dataset, since no parametric form is assumed, the probability density function of the data set is first estimated using data split method. This estimated pdf value is then plugged-in to the entropy estimation formula to estimate the entropy of the dataset. The data (test signal) used in this report is Gaussian distributed with zero mean and variance 4. Results of pdf estimation using the k-nearest neighbour method using the entire dataset, and a data-split method are presented and compared based on how well they approximate the probability density function of a Gaussian with similar mean and variance. The number of nearest neighbours chosen for the purpose of this report is 8. This is arbitrary, but is reasonable since the number of anomalies introduced is expected to be less than this upon data-split. The data-split method is preferred and rightly so.
Anticipatory Thinking: A Metacognitive Capability Anticipatory thinking is a complex cognitive process for assessing and managing risk in many contexts. Humans use anticipatory thinking to identify potential future issues and proactively take actions to manage their risks. In this paper we define a cognitive systems approach to anticipatory thinking as a metacognitive goal reasoning mechanism. The contributions of this paper include (1) defining anticipatory thinking in the MIDCA cognitive architecture, (2) operationalizing anticipatory thinking as a three step process for managing risk in plans, and (3) a numeric risk assessment calculating an expected cost-benefit ratio for modifying a plan with anticipatory actions.
Any-gram Kernels for Sentence Classification: A Sentiment Analysis Case Study Any-gram kernels are a flexible and efficient way to employ bag-of-n-gram features when learning from textual data. They are also compatible with the use of word embeddings so that word similarities can be accounted for. While the original any-gram kernels are implemented on top of tree kernels, we propose a new approach which is independent of tree kernels and is more efficient. We also propose a more effective way to make use of word embeddings than the original any-gram formulation. When applied to the task of sentiment classification, our new formulation achieves significantly better performance.
APACHE DRILL: Interactive Ad-Hoc Analysis at Scale Apache Drill is a distributed system for interactive ad-hoc analysis of large-scale datasets. Designed to handle up to petabytes of data spread across thousands of servers, the goal of Drill is to respond to ad-hoc queries in a lowlatency manner. In this article, we introduce Drill´s architecture, discuss its extensibility points, and put it into the context of the emerging offerings in the interactive analytics realm.
Applications of Artificial Intelligence to Network Security Attacks to networks are becoming more complex and sophisticated every day. Beyond the so-called script-kiddies and hacking newbies, there is a myriad of professional attackers seeking to make serious profits infiltrating in corporate networks. Either hostile governments, big corporations or mafias are constantly increasing their resources and skills in cybercrime in order to spy, steal or cause damage more effectively. traditional approaches to Network Security seem to start hitting their limits and it is being recognized the need for a smarter approach to threat detections. This paper provides an introduction on the need for evolution of Cyber Security techniques and how Artificial Intelligence could be of application to help solving some of the problems. It provides also, a high-level overview of some state of the art AI Network Security techniques, to finish analysing what is the foreseeable future of the application of AI to Network Security.
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey This paper presents a comprehensive literature review on applications of deep reinforcement learning in communications and networking. Modern networks, e.g., Internet of Things (IoT) and Unmanned Aerial Vehicle (UAV) networks, become more decentralized and autonomous. In such networks, network entities need to make decisions locally to maximize the network performance under uncertainty of network environment. Reinforcement learning has been efficiently used to enable the network entities to obtain the optimal policy including, e.g., decisions or actions, given their states when the state and action spaces are small. However, in complex and large-scale networks, the state and action spaces are usually large, and the reinforcement learning may not be able to find the optimal policy in reasonable time. Therefore, deep reinforcement learning, a combination of reinforcement learning with deep learning, has been developed to overcome the shortcomings. In this survey, we first give a tutorial of deep reinforcement learning from fundamental concepts to advanced models. Then, we review deep reinforcement learning approaches proposed to address emerging issues in communications and networking. The issues include dynamic network access, data rate control, wireless caching, data offloading, network security, and connectivity preservation which are all important to next generation networks such as 5G and beyond. Furthermore, we present applications of deep reinforcement learning for traffic routing, resource sharing, and data collection. Finally, we highlight important challenges, open issues, and future research directions of applying deep reinforcement learning.
Applied Data Science in Europe Google Trends and other IT fever charts rate Data Science among the most rapidly emerging and promising fields that expand around computer science. Although Data Science draws on content from established fields like artificial intelligence, statistics, databases, visualization and many more, industry is demanding for trained data scientists that no one seems able to deliver. This is due to the pace at which the field has expanded and the corresponding lack of curricula; the unique skill set, which is inherently multi-disciplinary; and the translation work (from the US web economy to other ecosystems) necessary to realize the recognized world-wide potential of applying analytics to all sorts of data. In this contribution we draw from our experiences in establishing an inter-disciplinary Data Science lab in order to highlight the challenges and potential remedies for Data Science in Europe. We discuss our role as academia in the light of the potential societal/economic impact as well as the challenges in organizational leadership tied to such inter-disciplinary work.
Architecting a High Performance Storage System Designing a large-scale, high-performance data storage system presents significant challenges. This paper describes a step-by-step approach to designing such a system and presents an iterative methodology that applies at both the component level and the system level. A detailed case study using the methodology described to design a Lustre storage system is presented.
Are Efficient Deep Representations Learnable Many theories of deep learning have shown that a deep network can require dramatically fewer resources to represent a given function compared to a shallow network. But a question remains: can these efficient representations be learned using current deep learning techniques In this work, we test whether standard deep learning methods can in fact find the efficient representations posited by several theories of deep representation. Specifically, we train deep neural networks to learn two simple functions with known efficient solutions: the parity function and the fast Fourier transform. We find that using gradient-based optimization, a deep network does not learn the parity function, unless initialized very close to a hand-coded exact solution. We also find that a deep linear neural network does not learn the fast Fourier transform, even in the best-case scenario of infinite training data, unless the weights are initialized very close to the exact hand-coded solution. Our results suggest that not every element of the class of compositional functions can be learned efficiently by a deep network, and further restrictions are necessary to understand what functions are both efficiently representable and learnable.
Are GANs Created Equal A Large-Scale Study Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes. To overcome some limitations of the current metrics, we also propose several data sets on which precision and recall can be computed. Our experimental results suggest that future GAN research should be based on more systematic and objective evaluation procedures. Finally, we did not find evidence that any of the tested algorithms consistently outperforms the original one.
Are profile likelihoods likelihoods? No, but sometimes they can be We contribute our two cents to the ongoing discussion on whether profile likelihoods are ‘true’ likelihood functions, by showing that the profile likelihood function can in fact be identical to a marginal likelihood in the special case of normal models. Thus, profile likelihoods can be ‘true’ likelihoods insofar as marginal likelihoods are ‘true’ likelihoods. The prior distribution that achieves this equivalence turns out to be the Jeffreys prior. We suspect, however, that normal models are the only class of models for which such an equivalence between maximization and marginalization is exact.
Are Saddles Good Enough for Deep Learning Recent years have seen a growing interest in understanding deep neural networks from an optimization perspective. It is understood now that converging to low-cost local minima is sufficient for such models to become effective in practice. However, in this work, we propose a new hypothesis based on recent theoretical findings and empirical studies that deep neural network models actually converge to saddle points with high degeneracy. Our findings from this work are new, and can have a significant impact on the development of gradient descent based methods for training deep networks. We validated our hypotheses using an extensive experimental evaluation on standard datasets such as MNIST and CIFAR-10, and also showed that recent efforts that attempt to escape saddles finally converge to saddles with high degeneracy, which we define as `good saddles’. We also verified the famous Wigner’s Semicircle Law in our experimental results.
Are screening methods useful in feature selection? An empirical study Filter or screening methods are often used as a preprocessing step for reducing the number of variables used by a learning algorithm in obtaining a classification or regression model. While there are many such filter methods, there is a need for an objective evaluation of these methods. Such an evaluation is needed to compare them with each other and also to answer whether they are at all useful, or a learning algorithm could do a better job without them. For this purpose, many popular screening methods are partnered in this paper with three regression learners and five classification learners and evaluated on ten real datasets to obtain accuracy criteria such as R-square and area under the ROC curve (AUC). The obtained results are compared through curve plots and comparison tables in order to find out whether screening methods help improve the performance of learning algorithms and how they fare with each other. Our findings revealed that the screening methods were only useful in one regression and three classification datasets out of the ten datasets evaluated.
Are You a Bayesian or a Frequentist (Slide Deck)
Artificial Intelligence (AI) Methods in Optical Networks: A Comprehensive Survey Artificial intelligence (AI) is an extensive scientific discipline which enables computer systems to solve problems by emulating complex biological processes such as learning, reasoning and self-correction. This paper presents a comprehensive review of the application of AI techniques for improving performance of optical communication systems and networks. The use of AI-based techniques is first studied in applications related to optical transmission, ranging from the characterization and operation of network components to performance monitoring, mitigation of nonlinearities, and quality of transmission estimation. Then, applications related to optical network control and management are also reviewed, including topics like optical network planning and operation in both transport and access networks. Finally, the paper also presents a summary of opportunities and challenges in optical networking where AI is expected to play a key role in the near future.
Artificial Intelligence and Data Science in the Automotive Industry Data science and machine learning are the key technologies when it comes to the processes and products with automatic learning and optimization to be used in the automotive industry of the future. This article defines the terms ‘data science’ (also referred to as ‘data analytics’) and ‘machine learning’ and how they are related. In addition, it defines the term ‘optimizing analytics’ and illustrates the role of automatic optimization as a key technology in combination with data analytics. It also uses examples to explain the way that these technologies are currently being used in the automotive industry on the basis of the major subprocesses in the automotive value chain (development, procurement; logistics, production, marketing, sales and after-sales, connected customer). Since the industry is just starting to explore the broad range of potential uses for these technologies, visionary application examples are used to illustrate the revolutionary possibilities that they offer. Finally, the article demonstrates how these technologies can make the automotive industry more efficient and enhance its customer focus throughout all its operations and activities, extending from the product and its development process to the customers and their connection to the product.
Artificial Intelligence and Economic Theories The advent of artificial intelligence has changed many disciplines such as engineering, social science and economics. Artificial intelligence is a computational technique which is inspired by natural intelligence such as the swarming of birds, the working of the brain and the pathfinding of the ants. These techniques have impact on economic theories. This book studies the impact of artificial intelligence on economic theories, a subject that has not been extensively studied. The theories that are considered are: demand and supply, asymmetrical information, pricing, rational choice, rational expectation, game theory, efficient market hypotheses, mechanism design, prospect, bounded rationality, portfolio theory, rational counterfactual and causality. The benefit of this book is that it evaluates existing theories of economics and update them based on the developments in artificial intelligence field.
Artificial Intelligence and its Role in Near Future AI technology has a long history which is actively and constantly changing and growing. It focuses on intelligent agents, which contain devices that perceive the environment and based on which takes actions in order to maximize goal success chances. In this paper, we will explain the modern AI basics and various representative applications of AI. In the context of the modern digitalized world, AI is the property of machines, computer programs, and systems to perform the intellectual and creative functions of a person, independently find ways to solve problems, be able to draw conclusions and make decisions. Most artificial intelligence systems have the ability to learn, which allows people to improve their performance over time. The recent research on AI tools, including machine learning, deep learning and predictive analysis intended toward increasing the planning, learning, reasoning, thinking and action taking ability. Based on which, the proposed research intends towards exploring on how the human intelligence differs from the artificial intelligence. Moreover, we critically analyze what AI of today is capable of doing, why it still cannot reach human intelligence and what are the open challenges existing in front of AI to reach and outperform human level of intelligence. Furthermore, it will explore the future predictions for artificial intelligence and based on which potential solution will be recommended to solve it within next decades.
Artificial Intelligence and Robotics The recent successes of AI have captured the wildest imagination of both the scientific communities and the general public. Robotics and AI amplify human potentials, increase productivity and are moving from simple reasoning towards human-like cognitive abilities. Current AI technologies are used in a set area of applications, ranging from healthcare, manufacturing, transport, energy, to financial services, banking, advertising, management consulting and government agencies. The global AI market is around 260 billion USD in 2016 and it is estimated to exceed 3 trillion by 2024. To understand the impact of AI, it is important to draw lessons from it’s past successes and failures and this white paper provides a comprehensive explanation of the evolution of AI, its current status and future directions.
Artificial Intelligence Approaches Artificial Intelligence (AI) has received tremendous attention from academia, industry, and the general public in recent years. The integration of geography and AI, or GeoAI, provides novel approaches for addressing a variety of problems in the natural environment and our human society. This entry briefly reviews the recent development of AI with a focus on machine learning and deep learning approaches. We discuss the integration of AI with geography and particularly geographic information science, and present a number of GeoAI applications and possible future directions.
Artificial Intelligence Enabled Software Defined Networking: A Comprehensive Overview In recent years, the increased demand for dynamic management of network resources in modern computer networks in general and in today’s data centers in particular has resulted in a new promising architecture, in which a more flexible controlling functionalities can be achieved with high level of abstraction. In software defined networking (SDN) architecture, a central management of the forwarding elements (i.e. switches and routers) is accomplished by a central unit, which can be programmed directly to perform fundamental networking tasks or implementing any other additional services. Combining both central management and network programmability, opens the door to employ more advanced techniques such as artificial intelligence (AI) in order to deal with high-demand and rapidly-changing networks. In this study, we provide a detailed overview of current efforts and recent advancements to include AI in SDN-based networks.
Artificial Intelligence for Long-Term Robot Autonomy: A Survey Autonomous systems will play an essential role in many applications across diverse domains including space, marine, air, field, road, and service robotics. They will assist us in our daily routines and perform dangerous, dirty and dull tasks. However, enabling robotic systems to perform autonomously in complex, real-world scenarios over extended time periods (i.e. weeks, months, or years) poses many challenges. Some of these have been investigated by sub-disciplines of Artificial Intelligence (AI) including navigation and mapping, perception, knowledge representation and reasoning, planning, interaction, and learning. The different sub-disciplines have developed techniques that, when re-integrated within an autonomous system, can enable robots to operate effectively in complex, long-term scenarios. In this paper, we survey and discuss AI techniques as ‘enablers’ for long-term robot autonomy, current progress in integrating these techniques within long-running robotic systems, and the future challenges and opportunities for AI in long-term autonomy.
Artificial Intelligence Now The phrase ‘artificial intelligence’ has a way of retreating into the future: as things that were once in the realm of imagination and fiction become reality, they lose their wonder and become ‘machine translation,’ ‘real-time traffic updates,’ ‘self-driving cars,’ and more. But the past 12 months have seen a true explosion in the capacities as well as adoption of AI technologies. While the flavor of these developments has not pointed to the ‘general AI’ of science fiction, it has come much closer to offering generalized AI tools—these tools are being deployed to solve specific problems. But now they solve them more powerfully than the complex, rule-based tools that preceded them. More importantly, they are flexible enough to be deployed in many contexts. This means that more applications and industries are ripe for transformation with AI technologies. This book, drawing from the best posts on the O´Reilly AI blog, brings you a summary of the current state of AI technologies and applications, as well as a selection of useful guides to getting started with deep learning and AI technologies. Part I covers the overall landscape of AI, focusing on the platforms, businesses, and business models are shaping the growth of AI. We then turn to the technologies underlying AI, particularly deep learning, in Part II. Part III brings us some ‘hobbyist’ applications: intelligent robots. Even if you don´t build them, they are an incredible illustration of the low cost of entry into computer vision and autonomous operation. Part IV also focuses on one application: natural language. Part V takes us into commercial use cases: bots and autonomous vehicles. And finally, Part VI discusses a few of the interplays ix between human and machine intelligence, leaving you with some big issues to ponder in the coming year.
Artificial Intelligence: A Child’s Play We discuss the objectives of any endeavor in creating artificial intelligence, AI, and provide a possible alternative. Intelligence might be an unintended consequence of curiosity left to roam free, best exemplified by a frolicking infant. This suggests that our attempts at AI could have been misguided; what we actually need to strive for can be termed artificial curiosity, AC, and intelligence happens as a consequence of those efforts. For this unintentional yet welcome aftereffect to set in a foundational list of guiding principles needs to be present. We discuss what these essential doctrines might be and why their establishment is required to form connections, possibly growing, between a knowledge store that has been built up and new pieces of information that curiosity will bring back. As more findings are acquired and more bonds are fermented, we need a way to, periodically, reduce the amount of data; in the sense, it is important to capture the critical characteristics of what has been accumulated or produce a summary of what has been gathered. We start with the intuition for this line of reasoning and formalize it with a series of models (and iterative improvements) that will be necessary to make the incubation of intelligence a reality. Our discussion provides conceptual modifications to the Turing Test and to Searle’s Chinese room argument. We discuss the future implications for society as AI becomes an integral part of life.
Artificial Intelligence-Based Techniques for Emerging Robotics Communication: A Survey and Future Perspectives This paper reviews the current development of artificial intelligence (AI) techniques for the application area of robot communication. The study of the control and operation of multiple robots collaboratively toward a common goal is fast growing. Communication among members of a robot team and even including humans is becoming essential in many real-world applications. The survey focuses on the AI techniques for robot communication to enhance the communication capability of the multi-robot team, making more complex activities, taking an appreciated decision, taking coordinated action, and performing their tasks efficiently.
Artificial Neural Networks These are lecture notes for my course on Artificial Neural Networks that I have given at Chalmers (FFR135) and Gothenburg University (FIM720). This course describes the use of neural networks in machine learning: deep learning, recurrent networks, and other supervised and unsupervised machine-learning algorithms.
Assessing four Neural Networks on Handwritten Digit Recognition Dataset (MNIST) Although the image recognition has been a research topic for many years, many researchers still have a keen interest in it. In some papers, however, there is a tendency to compare models only on one or two datasets, either because of time restraints or because the model is tailored to a specific task. Accordingly, it is hard to understand how well a certain model generalizes across image recognition field. In this paper, we compare four neural networks on MNIST dataset with different division. Among of them, three are Convolutional Neural Networks (CNN), Deep Residual Network (ResNet) and Dense Convolutional Network (DenseNet) respectively, and the other is our improvement on CNN baseline through introducing Capsule Network (CapsNet) to image recognition area. We show that the previous models despite do a quite good job in this area, our retrofitting can be applied to get a better performance. The result obtained by CapsNet is an accuracy rate of 99.75\%, and it is the best result published so far. Another inspiring result is that CapsNet only needs a small amount of data to get the excellent performance. Finally, we will apply CapsNet’s ability to generalize in other image recognition field in the future.
Assessing Your Business Analytics Initiatives – Eight Metrics That Matter It´s no secret that using analytics to uncover meaningful insights from data is crucial for making fact-based decisions. Now considered mainstream, the business analytics market worldwide is expected to exceed $50 billion by the year 2016.1 Yet when it comes to making analytics work, not all organizations are equal. In fact, despite the transformative power of big data and analytics, many organizations still struggle to wring value from their information. The complexities of dealing with big data, integrating technologies, finding analytical talent and challenging corporate culture are the main pitfalls to the successful use of analytics within organizations. The management of information – including the analytics used to transform it – is an evolutionary process, and organizations are at various levels of this evolution. Those wanting to advance analytics to a new level need to understand their analytics activities across the organization, from both an IT and business perspective. Toward that end, an assessment focusing on eight key analytics metrics can be used to identify strengths and areas for improvement in the analytics life cycle.
At what sample size do correlations stabilize Sample correlations converge to the population value with increasing sample size, but the estimates are often inaccurate in small samples. In this report we use Monte-Carlo simulations to determine the critical sample size from which on the magnitude of a correlation can be expected to be stable. The necessary sample size to achieve stable estimates for correlations depends on the effect size, the width of the corridor of stability (i.e., a corridor around the true value where deviations are tolerated), and the requested confidence that the trajectory does not leave this corridor any more. Results indicate that in typical scenarios the sample size should approach 250 for stable estimates.
Attacking Automatic Video Analysis Algorithms: A Case Study of Google Cloud Video Intelligence API Due to the growth of video data on Internet, automatic video analysis has gained a lot of attention from academia as well as companies such as Facebook, Twitter and Google. In this paper, we examine the robustness of video analysis algorithms in adversarial settings. Specifically, we propose targeted attacks on two fundamental classes of video analysis algorithms, namely video classification and shot detection. We show that an adversary can subtly manipulate a video in such a way that a human observer would perceive the content of the original video, but the video analysis algorithm will return the adversary’s desired outputs. We then apply the attacks on the recently released Google Cloud Video Intelligence API. The API takes a video file and returns the video labels (objects within the video), shot changes (scene changes within the video) and shot labels (description of video events over time). Through experiments, we show that the API generates video and shot labels by processing only the first frame of every second of the video. Hence, an adversary can deceive the API to output only her desired video and shot labels by periodically inserting an image into the video at the rate of one frame per second. We also show that the pattern of shot changes returned by the API can be mostly recovered by an algorithm that compares the histograms of consecutive frames. Based on our equivalent model, we develop a method for slightly modifying the video frames, in order to deceive the API into generating our desired pattern of shot changes. We perform extensive experiments with different videos and show that our attacks are consistently successful across videos with different characteristics. At the end, we propose introducing randomness to video analysis algorithms as a countermeasure to our attacks.
Attend Before you Act: Leveraging human visual attention for continual learning When humans perform a task, such as playing a game, they selectively pay attention to certain parts of the visual input, gathering relevant information and sequentially combining it to build a representation from the sensory data. In this work, we explore leveraging where humans look in an image as an implicit indication of what is salient for decision making. We build on top of the UNREAL architecture in DeepMind Lab’s 3D navigation maze environment. We train the agent both with original images and foveated images, which were generated by overlaying the original images with saliency maps generated using a real-time spectral residual technique. We investigate the effectiveness of this approach in transfer learning by measuring performance in the context of noise in the environment.
Attention Models in Graphs: A Survey Graph-structured data arise naturally in many different application domains. By representing data as graphs, we can capture entities (i.e., nodes) as well as their relationships (i.e., edges) with each other. Many useful insights can be derived from graph-structured data as demonstrated by an ever-growing body of work focused on graph mining. However, in the real-world, graphs can be both large – with many complex patterns – and noisy which can pose a problem for effective graph mining. An effective way to deal with this issue is to incorporate ‘attention’ into graph mining solutions. An attention mechanism allows a method to focus on task-relevant parts of the graph, helping it to make better decisions. In this work, we conduct a comprehensive and focused survey of the literature on the emerging field of graph attention models. We introduce three intuitive taxonomies to group existing work. These are based on problem setting (type of input and output), the type of attention mechanism used, and the task (e.g., graph classification, link prediction, etc.). We motivate our taxonomies through detailed examples and use each to survey competing approaches from a unique standpoint. Finally, we highlight several challenges in the area and discuss promising directions for future work.
Attribute-aware Collaborative Filtering: Survey and Classification Attribute-aware CF models aims at rating prediction given not only the historical rating from users to items, but also the information associated with users (e.g. age), items (e.g. price), or even ratings (e.g. rating time). This paper surveys works in the past decade developing attribute-aware CF systems, and discovered that mathematically they can be classified into four different categories. We provide the readers not only the high level mathematical interpretation of the existing works in this area but also the mathematical insight for each category of models. Finally we provide in-depth experiment results comparing the effectiveness of the major works in each category.
Augmented Data Science: Towards Industrialization and Democratization of Data Science Conversion of raw data into insights and knowledge requires substantial amounts of effort from data scientists. Despite breathtaking advances in Machine Learning (ML) and Artificial Intelligence (AI), data scientists still spend the majority of their effort in understanding and then preparing the raw data for ML/AI. The effort is often manual and ad hoc, and requires some level of domain knowledge. The complexity of the effort increases dramatically when data diversity, both in form and context, increases. In this paper, we introduce our solution, Augmented Data Science (ADS), towards addressing this ‘human bottleneck’ in creating value from diverse datasets. ADS is a data-driven approach and relies on statistics and ML to extract insights from any data set in a domain-agnostic way to facilitate the data science process. Key features of ADS are the replacement of rudimentary data exploration and processing steps with automation and the augmentation of data scientist judgment with automatically-generated insights. We present building blocks of our end-to-end solution and provide a case study to exemplify its capabilities.
Augmented Reality, Cyber-Physical Systems, and Feedback Control for Additive Manufacturing: A Review Our objective in this paper is to review the application of feedback ideas in the area of additive manufacturing. Both the application of feedback control to the 3D printing process, and the application of feedback theory to enable users to interact better with machines, are reviewed. Where appropriate, opportunities for future work are highlighted.
Automated Algorithm Selection: Survey and Perspectives It has long been observed that for practically any computational problem that has been intensely studied, different instances are best solved using different algorithms. This is particularly pronounced for computationally hard problems, where in most cases, no single algorithm defines the state of the art; instead, there is a set of algorithms with complementary strengths. This performance complementarity can be exploited in various ways, one of which is based on the idea of selecting, from a set of given algorithms, for each problem instance to be solved the one expected to perform best. The task of automatically selecting an algorithm from a given set is known as the per-instance algorithm selection problem and has been intensely studied over the past 15 years, leading to major improvements in the state of the art in solving a growing number of discrete combinatorial problems, including propositional satisfiability and AI planning. Per-instance algorithm selection also shows much promise for boosting performance in solving continuous and mixed discrete/continuous optimisation problems. This survey provides an overview of research in automated algorithm selection, ranging from early and seminal works to recent and promising application areas. Different from earlier work, it covers applications to discrete and continuous problems, and discusses algorithm selection in context with conceptually related approaches, such as algorithm configuration, scheduling or portfolio selection. Since informative and cheaply computable problem instance features provide the basis for effective per-instance algorithm selection systems, we also provide an overview of such features for discrete and continuous problems. Finally, we provide perspectives on future work in the area and discuss a number of open research challenges.
Automated Machine Learning – Bayesian Optimization, Meta-Learning and Applications Automating machine learning by providing techniques that autonomously find the best algorithm, hyperparameter configuration and preprocessing is helpful for both researchers and practitioners. Therefore, it is not surprising that automated machine learning has become a very interesting field of research. Bayesian optimization has proven to be a very successful tool for automated machine learning. In the first part of the thesis we present different approaches to improve Bayesian optimization by means of transfer learning. We present three different ways of considering meta-knowledge in Bayesian optimization, i.e. search space pruning, initialization and transfer surrogate models. Finally, we present a general framework for Bayesian optimization combined with meta-learning and conduct a comparison among existing work on two different meta-data sets. A conclusion is that in particular the meta-target driven approaches provide better results. Choosing algorithm configurations based on the improvement on the meta-knowledge combined with the expected improvement yields best results. The second part of this thesis is more application-oriented. Bayesian optimization is applied to large data sets and used as a tool to participate in machine learning challenges. We compare its autonomous performance and its performance in combination with a human expert. At two ECML-PKDD Discovery Challenges, we are able to show that automated machine learning outperforms human machine learning experts. Finally, we present an approach that automates the process of creating an ensemble of several layers, different algorithms and hyperparameter configurations. These kinds of ensembles are jokingly called Frankenstein ensembles and proved their benefit on versatile data sets in many machine learning challenges. We compare our approach Automatic Frankensteining with the current state of the art for automated machine learning on 80 different data sets and can show that it outperforms them on the majority using the same training time. Furthermore, we compare Automatic Frankensteining on a large-scale data set to more than 3,500 machine learning expert teams and are able to outperform more than 3,000 of them within 12 CPU hours.
Automated Machine Learning in Practice: State of the Art and Recent Results A main driver behind the digitization of industry and society is the belief that data-driven model building and decision making can contribute to higher degrees of automation and more informed decisions. Building such models from data often involves the application of some form of machine learning. Thus, there is an ever growing demand in work force with the necessary skill set to do so. This demand has given rise to a new research topic concerned with fitting machine learning models fully automatically – AutoML. This paper gives an overview of the state of the art in AutoML with a focus on practical applicability in a business context, and provides recent benchmark results on the most important AutoML algorithms.
Automated Machine Learning: State-of-The-Art and Open Challenges With the continuous and vast increase in the amount of data in our digital world, it has been acknowledged that the number of knowledgeable data scientists can not scale to address these challenges. Thus, there was a crucial need for automating the process of building good machine learning models. In the last few years, several techniques and frameworks have been introduced to tackle the challenge of automating the process of Combined Algorithm Selection and Hyper-parameter tuning (CASH) in the machine learning domain. The main aim of these techniques is to reduce the role of the human in the loop and fill the gap for non-expert machine learning users by playing the role of the domain expert. In this paper, we present a comprehensive survey for the state-of-the-art efforts in tackling the CASH problem. In addition, we highlight the research work of automating the other steps of the full complex machine learning pipeline (AutoML) from data understanding till model deployment. Furthermore, we provide comprehensive coverage for the various tools and frameworks that have been introduced in this domain. Finally, we discuss some of the research directions and open challenges that need to be addressed in order to achieve the vision and goals of the AutoML process.
Automated Problem Identification: Regression vs Classification via Evolutionary Deep Networks Regression or classification This is perhaps the most basic question faced when tackling a new supervised learning problem. We present an Evolutionary Deep Learning (EDL) algorithm that automatically solves this by identifying the question type with high accuracy, along with a proposed deep architecture. Typically, a significant amount of human insight and preparation is required prior to executing machine learning algorithms. For example, when creating deep neural networks, the number of parameters must be selected in advance and furthermore, a lot of these choices are made based upon pre-existing knowledge of the data such as the use of a categorical cross entropy loss function. Humans are able to study a dataset and decide whether it represents a classification or a regression problem, and consequently make decisions which will be applied to the execution of the neural network. We propose the Automated Problem Identification (API) algorithm, which uses an evolutionary algorithm interface to TensorFlow to manipulate a deep neural network to decide if a dataset represents a classification or a regression problem. We test API on 16 different classification, regression and sentiment analysis datasets with up to 10,000 features and up to 17,000 unique target values. API achieves an average accuracy of $96.3\%$ in identifying the problem type without hardcoding any insights about the general characteristics of regression or classification problems. For example, API successfully identifies classification problems even with 1000 target values. Furthermore, the algorithm recommends which loss function to use and also recommends a neural network architecture. Our work is therefore a step towards fully automated machine learning.
Automatic Conversion of Tables to LongForm Dataframes TableToLongForm automatically converts hierarchical Tables intended for a human reader into a simple LongForm Dataframe that is machine readable, hence enabling much greater utilisation of the data. It does this by recognising positional cues present in the hierarchical Table (which would normally be interpreted visually by the human brain) to decompose, then reconstruct the data into a LongForm Dataframe. The article motivates the benefit of such a conversion with an example Table, followed by a short user manual, which includes a comparison between the simple one argument call to TableToLongForm, with code for an equivalent manual conversion. The article then explores the types of Tables the package can convert by providing a gallery of all recognised patterns. It finishes with a discussion of available diagnostic methods and future work.
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.
Automatic Extraction of Causal Relations from Natural Language Texts: A Comprehensive Survey Automatic extraction of cause-effect relationships from natural language texts is a challenging open problem in Artificial Intelligence. Most of the early attempts at its solution used manually constructed linguistic and syntactic rules on small and domain-specific data sets. However, with the advent of big data, the availability of affordable computing power and the recent popularization of machine learning, the paradigm to tackle this problem has slowly shifted. Machines are now expected to learn generic causal extraction rules from labelled data with minimal supervision, in a domain independent-manner. In this paper, we provide a comprehensive survey of causal relation extraction techniques from both paradigms, and analyse their relative strengths and weaknesses, with recommendations for future work.
Automatic Extraction of Personality from Text: Challenges and Opportunities In this study, we examined the possibility to extract personality traits from a text. We created an extensive dataset by having experts annotate personality traits in a large number of texts from multiple online sources. From these annotated texts, we selected a sample and made further annotations ending up in a large low-reliability dataset and a small high-reliability dataset. We then used the two datasets to train and test several machine learning models to extract personality from text, including a language model. Finally, we evaluated our best models in the wild, on datasets from different domains. Our results show that the models based on the small high-reliability dataset performed better (in terms of $\textrm{R}^2$) than models based on large low-reliability dataset. Also, language model based on small high-reliability dataset performed better than the random baseline. Finally, and more importantly, the results showed our best model did not perform better than the random baseline when tested in the wild. Taken together, our results show that determining personality traits from a text remains a challenge and that no firm conclusions can be made on model performance before testing in the wild.
Automatic Keyphrase Extraction: A Survey of the State of the Art While automatic keyphrase extraction has been examined extensively, state-of-theart performance on this task is still much lower than that on many core natural language processing tasks. We present a survey of the state of the art in automatic keyphrase extraction, examining the major sources of errors made by existing systems and discussing the challenges ahead.
Automatic Keyword Extraction for Text Summarization: A Survey In recent times, data is growing rapidly in every domain such as news, social media, banking, education, etc. Due to the excessiveness of data, there is a need of automatic summarizer which will be capable to summarize the data especially textual data in original document without losing any critical purposes. Text summarization is emerged as an important research area in recent past. In this regard, review of existing work on text summarization process is useful for carrying out further research. In this paper, recent literature on automatic keyword extraction and text summarization are presented since text summarization process is highly depend on keyword extraction. This literature includes the discussion about different methodology used for keyword extraction and text summarization. It also discusses about different databases used for text summarization in several domains along with evaluation matrices. Finally, it discusses briefly about issues and research challenges faced by researchers along with future direction.
Automatic Language Identification in Texts: A Survey Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used so far in the LI literature. For describing the features and methods we introduce a unified notation. We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI.
Automatic Rumor Detection on Microblogs: A Survey The ever-increasing amount of multimedia content on modern social media platforms are valuable in many applications. While the openness and convenience features of social media also foster many rumors online. Without verification, these rumors would reach thousands of users immediately and cause serious damages. Many efforts have been taken to defeat online rumors automatically by mining the rich content provided on the open network with machine learning techniques. Most rumor detection methods can be categorized in three paradigms: the hand-crafted features based classification approaches, the propagation-based approaches and the neural networks approaches. In this survey, we introduce a formal definition of rumor in comparison with other definitions used in literatures. We summary the studies of automatic rumor detection so far and present details in three paradigms of rumor detection. We also give an introduction on existing datasets for rumor detection which would benefit following researches in this area. We give our suggestions for future rumors detection on microblogs as a conclusion.
Automatic Sarcasm Detection: A Survey Automatic detection of sarcasm has witnessed interest from the sentiment analysis research community. With diverse approaches, datasets and analyses that have been reported, there is an essential need to have a collective understanding of the research in this area. In this survey of automatic sarcasm detection, we describe datasets, approaches (both supervised and rule-based), and trends in sarcasm detection research. We also present a research matrix that summarizes past work, and list pointers to future work.
Automatic Tag Recommendation Algorithms for Social Recommender Systems The emergence of Web 2.0 and the consequent success of social network websites such as del.icio.us and Flickr introduce us to a new concept called social bookmarking, or tagging in short. Tagging can be seen as the action of connecting a relevant user-defined keyword to a document, image or video, which helps user to better organize and share their collections of interesting stuff. With the rapid growth of Web 2.0, tagged data is becoming more and more abundant on the social network websites. An interesting problem is how to automate the process of making tag recommendations to users when a new resource becomes available. In this paper, we address the issue of tag recommendation from a machine learning perspective of view. From our empirical observation of two large-scale data sets, we first argue that the user-centered approach for tag recommendation is not very effective in practice. Consequently, we propose two novel document-centered approaches that are capable of making effective and efficient tag recommendations in real scenarios. The first graph-based method represents the tagged data into two bipartite graphs of (document, tag) and (document, word), then finds document topics by leveraging graph partitioning algorithms. The second prototype-based method aims at finding the most representative documents within the data collections and advocates a sparse multi-class Gaussian process classifier for efficient document classification. For both methods, tags are ranked within each topic cluster/class by a novel ranking method. Recommendations are performed by first classifying a new document into one or more topic clusters/classes, and then selecting the most relevant tags from those clusters/classes as machine-recommended tags. Experiments on real-world data from Del.icio.us, CiteULike and BibSonomy examine the quality of tag recommendation as well as the efficiency of our recommendation algorithms. The results suggest that our document-centered models can substantially improve the performance of tag recommendations when compared to the user-centered methods, as well as topic models LDA and SVM classifiers.
AutoML: A Survey of the State-of-the-Art Deep learning has penetrated all aspects of our lives and brought us great convenience. However, the process of building a high-quality deep learning system for a specific task is not only time-consuming but also requires lots of resources and relies on human expertise, which hinders the development of deep learning in both industry and academia. To alleviate this problem, a growing number of research projects focus on automated machine learning (AutoML). In this paper, we provide a comprehensive and up-to-date study on the state-of-the-art AutoML. First, we introduce the AutoML techniques in details according to the machine learning pipeline. Then we summarize existing Neural Architecture Search (NAS) research, which is one of the most popular topics in AutoML. We also compare the models generated by NAS algorithms with those human-designed models. Finally, we present several open problems for future research.
Autonomics: In Search of a Foundation for Next Generation Autonomous Systems The potential benefits of autonomous systems have been driving intensive development of such systems, and of supporting tools and methodologies. However, there are still major issues to be dealt with before such development becomes commonplace engineering practice, with accepted and trustworthy deliverables. We argue that a solid, evolving, publicly available, community-controlled foundation for developing next generation autonomous systems is a must. We discuss what is needed for such a foundation, identify a central aspect thereof, namely, decision-making, and focus on three main challenges: (i) how to specify autonomous system behavior and the associated decisions in the face of unpredictability of future events and conditions and the inadequacy of current languages for describing these; (ii) how to carry out faithful simulation and analysis of system behavior with respect to rich environments that include humans, physical artifacts, and other systems,; and (iii) how to engineer systems that combine executable model-driven techniques and data-driven machine learning techniques. We argue that autonomics, i.e., the study of unique challenges presented by next generation autonomous systems, and research towards resolving them, can introduce substantial contributions and innovations in system engineering and computer science.
Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.
Autonomous development and learning in artificial intelligence and robotics: Scaling up deep learning to human–like learning Autonomous lifelong development and learning is a fundamental capability of humans, differentiating them from current deep learning systems. However, other branches of artificial intelligence have designed crucial ingredients towards autonomous learning: curiosity and intrinsic motivation, social learning and natural interaction with peers, and embodiment. These mechanisms guide exploration and autonomous choice of goals, and integrating them with deep learning opens stimulating perspectives. Deep learning (DL) approaches made great advances in artificial intelligence, but are still far away from human learning. As argued convincingly by Lake et al., differences include human capabilities to learn causal models of the world from very little data, leveraging compositional representations and priors like intuitive physics and psychology. However, there are other fundamental differences between current DL systems and human learning, as well as technical ingredients to fill this gap, that are either superficially, or not adequately, discussed by Lake et al. These fundamental mechanisms relate to autonomous development and learning. They are bound to play a central role in artificial intelligence in the future. Current DL systems require engineers to manually specify a task-specific objective function for every new task, and learn through off-line processing of large training databases. On the contrary, humans learn autonomously open-ended repertoires of skills, deciding for themselves which goals to pursue or value, and which skills to explore, driven by intrinsic motivation/curiosity and social learning through natural interaction with peers. Such learning processes are incremental, online, and progressive. Human child development involves a progressive increase of complexity in a curriculum of learning where skills are explored, acquired, and built on each other, through particular ordering and timing. Finally, human learning happens in the physical world, and through bodily and physical experimentation, under severe constraints on energy, time, and computational resources. In the two last decades, the field of Developmental and Cognitive Robotics (Cangelosi and Schlesinger, 2015, Asada et al., 2009), in strong interaction with developmental psychology and neuroscience, has achieved significant advances in computational
Auto-scaling Web Applications in Clouds: A Taxonomy and Survey Web application providers have been migrating their applications to cloud data centers, attracted by the emerging cloud computing paradigm. One of the appealing features of cloud is elasticity. It allows cloud users to acquire or release computing resources on demand, which enables web application providers to auto-scale the resources provisioned to their applications under dynamic workload in order to minimize resource cost while satisfying Quality of Service (QoS) requirements. In this paper, we comprehensively analyze the challenges remain in auto-scaling web applications in clouds and review the developments in this field. We present a taxonomy of auto-scaling systems according to the identified challenges and key properties. We analyze the surveyed works and map them to the taxonomy to identify the weakness in this field. Moreover, based on the analysis, we propose new future directions.
Average Predictive Comparisons for models with nonlinearity, interactions, and variance components In a predictive model, what is the expected difference in the outcome associated with a unit difference in one of the inputs In a linear regression model without interactions, this average predictive comparison is simply a regression coefficient (with associated uncertainty). In a model with nonlinearity or interactions, however, the average predictive comparison in general depends on the values of the predictors. We consider various definitions based on averages over a population distribution of the predictors, and we compute standard errors based on uncertainty in model parameters. We illustrate with a study of criminal justice data for urban counties in the United States. The outcome of interest measures whether a convicted felon received a prison sentence rather than a jail or non-custodial sentence, with predictors available at both individual and county levels.We fit three models: (1)a hierarchical logistic regression with varying coefficients for the within-county intercepts as well as for each individual predictor; (2)a hierarchical model with varying intercepts only; and (3)a nonhierarchical model that ignores themultilevel nature of the data. The regression coefficients have different interpretations for the different models; in contrast, the models can be compared directly using predictive comparisons. Furthermore, predictive comparisons clarify the interplay between the individual and county predictors for the hierarchical models and also illustrate the relative size of varying county effects.
Avoiding the Barriers of In-Memory Business Intelligence: Making Data Discovery Scalable When looking at the growth rates of the business intelligence platform space, it is apparent that acquisitions of new business intelligence tools have shifted dramatically from traditional data visualization and aggregation use cases to newer data discovery implementations. This shift toward data discovery use cases has been driven by two key factors: faster implementation times and the ability to visualize and manipulate data as quickly as an analyst can click a mouse. The improvements in implementation speeds stem from the use of architectures that access source data directly without having to first aggregate all the data in a central location such as an enterprise data warehouse or departmental data mart. The promise of fast manipulation of data has largely been accomplished by employing in-memory data management models to exploit the speed advantage of accessing data from server memory over traditional disk-based approaches. The ‘physics’ of data access favors in-memory data management models. However, in-memory techniques are not without drawbacks. As companies attempt to evolve from small departmental projects to broader division-wide or enterprise-wide initiatives, increasing data volumes and the impact of increasing data consumer counts challenge the limits of early in-memory implementations. These challenges raise serious questions that should be considered by any organization considering in-memory techniques for business intelligence platforms.

B

Babel Storage: Uncoordinated Content Delivery from Multiple Coded Storage Systems In future content-centric networks, content is identified independently of its location. From an end-user’s perspective, individual storage systems dissolve into a seemingly omnipresent structureless `storage fog’. Content should be delivered oblivious of the network topology, using multiple storage systems simultaneously, and at minimal coordination overhead. Prior works have addressed the advantages of error correction coding for distributed storage and content delivery separately. This work takes a comprehensive approach to highlighting the tradeoff between storage overhead and transmission overhead in uncoordinated content delivery from multiple coded storage systems. Our contribution is twofold. First, we characterize the tradeoff between storage and transmission overhead when all participating storage systems employ the same code. Second, we show that the resulting stark inefficiencies can be avoided when storage systems use diverse codes. What is more, such code diversity is not just technically desirable, but presumably will be the reality in the increasingly heterogeneous networks of the future. To this end, we show that a mix of Reed-Solomon, low-density parity-check and random linear network codes achieves close-to-optimal performance at minimal coordination and operational overhead.
Basic Principles of Clustering Methods Clustering methods group a set of data points into a few coherent groups or clusters of similar data points. As an example, consider clustering pixels in an image (or video) if they belong to the same object. Different clustering methods are obtained by using different notions of similarity and different representations of data points.
Bayesian Computation Via Markov Chain Monte Carlo Markov chain Monte Carlo (MCMC) algorithms are an indispensable tool for performing Bayesian inference. This review discusses widely used sampling algorithms and illustrates their implementation on a probit regression model for lupus data. The examples considered highlight the importance of tuning the simulation parameters and underscore the important contributions of modern developments such as adaptive MCMC. We then use the theory underlying MCMC to explain the validity of the algorithms considered and to assess the variance of the resulting Monte Carlo estimators.
Bayesian Computational Tools This article surveys advances in the field of Bayesian computation over the past 20 years from a purely personal viewpoint, hence containing some ommissions given the spectrum of the field. Monte Carlo, MCMC, and ABC themes are covered here, whereas the rapidly expanding area of particle methods is only briefly mentioned and different approximative techniques such as variational Bayes and linear Bayes methods do not appear at all. This article also contains some novel computational entries on the doubleexponential model that may be of interest.
Bayesian Computing with INLA: A Review The key operation in Bayesian inference, is to compute high-dimensional integrals. An old approximate technique is the Laplace method or approximation, which dates back to Pierre- Simon Laplace (1774). This simple idea approximates the integrand with a second order Taylor expansion around the mode and computes the integral analytically. By developing a nested version of this classical idea, combined with modern numerical techniques for sparse matrices, we obtain the approach of Integrated Nested Laplace Approximations (INLA) to do approximate Bayesian inference for latent Gaussian models (LGMs). LGMs represent an important model-abstraction for Bayesian inference and include a large proportion of the statistical models used today. In this review, we will discuss the reasons for the success of the INLA-approach, the R-INLA package, why it is so accurate, why the approximations are very quick to compute and why LGMs make such a useful concept for Bayesian computing.
Bayesian Decision Theory and Stochastic Independence Stochastic independence has a complex status in probability theory. It is not part of the definition of a probability measure, but it is nonetheless an essential property for the mathematical development of this theory. Bayesian decision theorists such as Savage can be criticized for being silent about stochastic independence. From their current preference axioms, they can derive no more than the definitional properties of a probability measure. In a new framework of twofold uncertainty, we introduce preference axioms that entail not only these definitional properties, but also the stochastic independence of the two sources of uncertainty. This goes some way towards filling a curious lacuna in Bayesian decision theory.
Bayesian estimation supersedes the t test Bayesian estimation for two groups provides complete distributions of credible values for the effect size, group means and their difference, standard deviations and their difference, and the normality of the data. The method handles outliers. The decision rule can accept the null value (unlike traditional t tests) when certainty in the estimate is high (unlike Bayesian model com- parison using Bayes factors). The method also yields precise estimates of statistical power for various research goals. The software and programs are free, and run on Macintosh, Windows, and Linux platforms.
Bayesian Group Decisions: Algorithms and Complexity Many important real-world decision-making problems involve interactions of individuals with purely informational externalities, for example, in jury deliberations, expert committees, etc. We model such interactions of rational agents in a group, where they receive private information and act based on that information while also observing other people’s beliefs or actions. As a Bayesian agent attempts to infer the truth from her sequence of observations of actions of others and her own private signal, she recursively refines her belief on the signals that other players could have observed and actions that they could have taken given that other players are also rational. The existing literature addresses asymptotic properties of Bayesian group decisions (important questions such as convergence to consensus and learning). In this work, we address the computations that the Bayesian agent should undertake to realize the optimal actions at every decision epoch. We use the iterated eliminations of infeasible signals (IEIS) to model the thinking process as well as the calculations of a Bayesian agent in a group decision scenario. We show that IEIS algorithm runs in exponential time; however, when the group structure is a partially ordered set the Bayesian calculations simplify and polynomial-time computation of the Bayesian recommendations is possible. We next shift attention to the case where agents reveal their beliefs (instead of actions) at every decision epoch. We analyze the computational complexity of the Bayesian belief formation in groups and show that it is NP-hard. We also investigate the factors underlying this computational complexity and show how belief calculations simplify in special network structures or cases with strong inherent symmetries. We finally give insights about the statistical efficiency (optimality) of the beliefs and its relations to computational efficiency.
Bayesian Methods of Parameter Estimation In order to motivate the idea of parameter estimation we need to first understand the notion of mathematical modeling. What is the idea behind modeling real world phenomena Mathematically modeling an aspect of the real world enables us to better understand it and better explain it, and perhaps enables us to reproduce it, either on a large scale, or on a simplified scale that characterizes only the critical parts of that phenomenon. How do we model these real life phenomena These real life phenomena are captured by means of distribution models, which are extracted or learned directly from data gathered about them. So, what do we mean by parameter estimation Every distribution model has a set of parameters that need to be estimated. These parameters specify any constants appearing in the model and provide a mechanism for efficient and accurate use of data. …
Bayesian Model Averaging: A Tutorial Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-confident inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA have recently emerged. We discuss these methods and present a number of examples. In these examples, BMA provides improved out-ofsample predictive performance. We also provide a catalogue of currently available BMA software.
Bayesian model reduction This paper reviews recent developments in statistical structure learning; namely, Bayesian model reduction. Bayesian model reduction is a special but ubiquitous case of Bayesian model comparison that, in the setting of variational Bayes, furnishes an analytic solution for (a lower bound on) model evidence induced by a change in priors. This analytic solution finesses the problem of scoring large model spaces in model comparison or structure learning. This is because each new model can be cast in terms of an alternative set of priors over model parameters. Furthermore, the reduced free energy (i.e. evidence bound on the reduced model) finds an expedient application in hierarchical models, where it plays the role of a summary statistic. In other words, it contains all the necessary information contained in the posterior distributions over parameters of lower levels. In this technical note, we review Bayesian model reduction – in terms of common forms of reduced free energy – and illustrate recent applications in structure learning, hierarchical or empirical Bayes and as a metaphor for neurobiological processes like abductive reasoning and sleep.
Bayesian Networks, Total Variation and Robustness Now that Bayesian Networks (BNs) have become widely used, an appreciation is developing of just how critical an awareness of the sensitivity and robustness of certain target variables are to changes in the model. When time resources are limited, such issues impact directly on the chosen level of complexity of the BN as well as the quantity of missing probabilities we are able to elicit. Currently most such analyses are performed once the whole BN has been elicited and are based on Kullback-Leibler information measures. In this paper we argue that robustness methods based instead on the familiar total variation distance provide simple and more useful bounds on robustness to misspecification which are both formally justifiable and transparent. We demonstrate how such formal robustness considerations can be embedded within the process of building a BN. Here we focus on two particular choices a modeller needs to make: the choice of the parents of each node and the number of levels to choose for each variable within the system. Our analyses are illustrated throughout using two BNs drawn from the recent literature.
Bayesian Reinforcement Learning: A Survey Bayesian methods for machine learning have been widely investigated, yielding principled methods for incorporating prior information into inference algorithms. In this survey, we provide an in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm. The major incentives for incorporating Bayesian reasoning in RL are: 1) it provides an elegant approach to action-selection (exploration/exploitation) as a function of the uncertainty in learning; and 2) it provides a machinery to incorporate prior knowledge into the algorithms. We first discuss models and methods for Bayesian inference in the simple single-step Bandit model. We then review the extensive recent literature on Bayesian methods for model-based RL, where prior information can be expressed on the parameters of the Markov model. We also present Bayesian methods for model-free RL, where priors are expressed over the value function or policy class. The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.
Bayesian Statistics Students are to choose one paper in the following list, or possibly outside of the list upon my agreement. The papers are available online. Most of them are collected in this zip file . The presentation can focus on a particular section / result / example of the paper. Evaluation of the students is based on the understanding and presentation of the chosen paper.
Bayesian Statistics Papers (Paper Collection)
Behavior Trees in Robotics and AI, an Introduction A Behavior Tree (BT) is a way to structure the switching between different tasks in an autonomous agent, such as a robot or a virtual entity in a computer game. BTs are a very efficient way of creating complex systems that are both modular and reactive. These properties are crucial in many applications, which has led to the spread of BT from computer game programming to many branches of AI and Robotics. In this book, we will first give an introduction to BTs, then we describe how BTs relate to, and in many cases generalize, earlier switching structures. These ideas are then used as a foundation for a set of efficient and easy to use design principles. Properties such as safety, robustness, and efficiency are important for an autonomous system, and we describe a set of tools for formally analyzing these using a state space description of BTs. With the new analysis tools, we can formalize the descriptions of how BTs generalize earlier approaches. Finally, we describe an extended set of tools to capture the behavior of Stochastic BTs, where the outcomes of actions are described by probabilities. These tools enable the computation of both success probabilities and time to completion.
Best Practices for Applying Deep Learning to Novel Applications This report is targeted to groups who are subject matter experts in their application but deep learning novices. It contains practical advice for those interested in testing the use of deep neural networks on applications that are novel for deep learning. We suggest making your project more manageable by dividing it into phases. For each phase this report contains numerous recommendations and insights to assist novice practitioners.
Best Practices for Scientific Computing Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists´ productivity and the reliability of their software.
Better Decisions through Science Math-based aids for making decisions in medicine and industry could improve many diagnoses – often saving lives in the process.
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Beyond Mobile Apps: A Survey of Technologies for Mental Well-being Mental health problems are on the rise globally and strain national health systems worldwide. Mental disorders are closely associated with fear of stigma, structural barriers such as financial burden, and lack of available services and resources which often prohibit the delivery of frequent clinical advice and monitoring. Technologies for mental well-being exhibit a range of attractive properties which facilitate the delivery of state of the art clinical monitoring. This review article provides an overview of traditional techniques followed by their technological alternatives, sensing devices, behaviour changing tools, and feedback interfaces. The challenges presented by these technologies are then discussed with data collection, privacy and battery life being some of the key issues which need to be carefully considered for the successful deployment of mental health tool-kits. Finally, the opportunities this growing research area presents are discussed including the use of portable tangible interfaces combining sensing and feedback technologies. Capitalising on the captured data these ubiquitous devices offer, state of the art machine learning algorithms can lead to the develop
BI forward: A full view of your business Imagine that your organization is effectively using a business intelligence (BI) solution that provides everything you need to make better decisions and improve operational efficiency. Imagine users with their fingers on the pulse of markets, customers, channels and operations at all times. And imagine that your programs, plans, services and products are being designed with full and timely insight into all the factors – past, present and future – critical to success. What would it take to make that happen What businesses need from BI is a full picture. And that is why it is important to understand that, for now and in the future, BI should help you not only describe and diagnose your past and current performance, but also predict future performance. When your business can do all three, you have a better idea of what your business needs to do to stay competitive. You have reports that show you where you have been, scorecards and real-time monitoring that show what is happening now and predictive analytics to show where your business is headed. This paper explains the advantages of a BI solution that includes predictive analytics.
BI, Analytics and Big Data A Modern-Day Perspective (Slide Deck)
Big Data Analytics for Manufacturing Internet of Things: Opportunities, Challenges and Enabling Technologies The recent advances in information and communication technology (ICT) have promoted the evolution of conventional computer-aided manufacturing industry to smart data-driven manufacturing. Data analytics in massive manufacturing data can extract huge business values while can also result in research challenges due to the heterogeneous data types, enormous volume and real-time velocity of manufacturing data. This paper provides an overview on big data analytics in manufacturing Internet of Things (MIoT). This paper first starts with a discussion on necessities and challenges of big data analytics in manufacturing data of MIoT. Then, the enabling technologies of big data analytics of manufacturing data are surveyed and discussed. Moreover, this paper also outlines the future directions in this promising area.
Big Data Analytics in Action How Your Organization Can Improve its Bottom Line through Better Measurement, Better Decisions and Faster Response to Dynamic Market Conditions.
Big Data Analytics: A Survey The age of big data is now coming. But the traditional data analytics may not be able to handle such large quantities of data. The question that arises now is, how to develop a high performance platform to efficiently analyze big data and how to design an appropriate mining algorithm to find the useful things from big data. To deeply discuss this issue, this paper begins with a brief introduction to data analytics, followed by the discussions of big data analytics. Some important open issues and further research directions will also be presented for the next step of big data analytics.
Big Data and Fog Computing Fog computing serves as a computing layer that sits between the edge devices and the cloud in the network topology. They have more compute capacity than the edge but much less so than cloud data centers. They typically have high uptime and always-on Internet connectivity. Applications that make use of the fog can avoid the network performance limitation of cloud computing while being less resource constrained than edge computing. As a result, they offer a useful balance of the current paradigms. This article explores various aspects of fog computing in the context of big data.
Big Data and Machine Learning with an Actuarial Perspective (Slide Deck)
Big Data and the Creative Destruction of Today’s Business Models
Big data and the democratisation of decisions In August 2012 the Economist Intelligence Unit conducted a survey sponsored by Alteryx of 241 global executives to gauge their perceptions of big data adoption. Fifty-three percent of respondents are board members or C-suite executives, including 66 CEOs, presidents or managing directors. Those polled are based in North America (34%), the Asia-Pacific region (27%), Western Europe (25%), the Middle East and Africa (6%), Latin America (5%) and Eastern Europe (4%). Half of executives work for companies with revenue that exceeds US$500m. Executives hail from 18 sectors and represent 14 functional roles, including general management (30%), strategy and business development (18%), finance (17%) and marketing and sales (10%).
Big Data and the Internet of Things Advances in sensing and computing capabilities are making it possible to embed increasing computing power in small devices. This has enabled the sensing devices not just to passively capture data at very high resolution but also to take sophisticated actions in response. Combined with advances in communication, this is resulting in an ecosystem of highly interconnected devices referred to as the Internet of Things – IoT. In conjunction, the advances in machine learning have allowed building models on this ever increasing amounts of data. Consequently, devices all the way from heavy assets such as aircraft engines to wearables such as health monitors can all now not only generate massive amounts of data but can draw back on aggregate analytics to ‘improve’ their performance over time. Big data analytics has been identified as a key enabler for the IoT. In this chapter, we discuss various avenues of the IoT where big data analytics either is already making a significant impact or is on the cusp of doing so. We also discuss social implications and areas of concern.
Big Data for Big Business A Taxonomy of Data-driven Business Models used by Start-up Firms This paper reports a study which provides a series of implications that may be particularly helpful to companies already leveraging ‘big data´ for their businesses or planning to do so. The Data Driven Business Model (DDBM) framework represents a basis for the analysis and clustering of business models. For practitioners the dimensions and various features may provide guidance on possibilities to form a business model for their specific venture. The framework allows identification and assessment of available potential data sources that can be used in a new DDBM. It also provides comprehensive sets of potential key activities as well as revenue models. The identified business model types can serve as both inspiration and blueprint for companies considering creating new data-driven business models. Although the focus of this paper was on business models in the start-up world, the key findings presumably also apply to established organisations to a large extent. The DDBM can potentially be used and tested by established organisations across different sectors in future research.
Big Data for Finance According to the 2014 IDG Enterprise Big Data research report, companies are intensifying their efforts to derive value through big data initiatives with nearly half (49%) of respondents already implementing big data projects or in the process of doing so in the future. Further, organizations are seeing exponential growth in the amount of data managed with an expected increase of 76% within the next 12-18 months. With growth there are opportunities as well as challenges. Among those facing the big data challenge are finance executives, as this extraordinary growth presents a unique opportunity to leverage data assets like never before. As the 3 V´s of big data: volume, velocity and variety continue to grow, so too does the opportunity for finance sector firms to capitalize on this data for strategic advantage. Finance professionals are accomplished in collecting, analyzing and benchmarking data, so they are in a unique position to provide a new and critical service – making big data more manageable while condensing vast amounts of information into actionable business insights.
Big Data Gets Personal Big data and personal data are converging to shape the Internet´s most surprising consumer products. They´ll predict your needs and store your memories – if you let them.
Big Data in Big Companies Big data burst upon the scene in the first decade of the 21st century, and the first organizations to embrace it were online and startup firms. Arguably, firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning. They didn´t have to reconcile or integrate big data with more traditional sources of data and the analytics performed upon them, because they didn´t have those traditional forms. They didn´t have to merge big data technologies with their traditional IT infrastructures because those infrastructures didn´t exist. Big data could stand alone, big data analytics could be the only focus of analytics, and big data technology architectures could be the only architecture. Consider, however, the position of large, well-established businesses. Big data in those environments shouldn´t be separate, but must be integrated with everything else that´s going on in the company. Analytics on big data have to coexist with analytics on other types of data. Hadoop clusters have to do their work alongside IBM mainframes. Data scientists must somehow get along and work jointly with mere quantitative analysts. In order to understand this coexistence, we interviewed 20 large organizations in the early months of 2013 about how big data fit in to their overall data and analytics environments. Overall, we found the expected co-existence; in not a single one of these large organizations was big data being managed separately from other types of data and analytics. The integration was in fact leading to a new management perspective on analytics, which we´ll call ‘Analytics 3.0.’ In this paper we´ll describe the overall context for how organizations think about big data, the organizational structure and skills required for it…etc. We´ll conclude by describing the Analytics 3.0 era.
Big Data Machine Learning: Patterns for Predictive Analytics (RefCard)
Big data maturity: An action plan for policymakers and executives Big data have the potential to improve or transform existing business operations and reshape entire economic sectors. Big data can pave the way for disruptive, entrepreneurial companies and allow new industries to emerge. The technological aspect is important, but insufficient to allow big data to show their full potential and to stop companies from feeling swamped by this information. What matters is to reshape internal decision-making culture so that executives base their judgments on data rather than hunches. Research already indicates that companies that have managed this are more likely to be productive and profitable than the competition. Organizations need to understand where they are in terms of big data maturity, an approach that allows them to assess progress and identify necessary initiatives. Judging maturity requires looking at environment readiness, how far governments have provided the necessary legal and regulatory frameworks, and information and communications technology (ICT) infrastructure; an organization´s internal capabilities and how ready it is to implement big data initiatives; and the many and more complicated methods for using big data, which can mean simple efficiency gains or revamping a business model. The ultimate maturity level involves transforming the business model to be data-driven, which requires significant investment over many years. Policymakers should pay particular attention to environment readiness. They should present citizens with a compelling case for the benefits of big data. This means addressing privacy concerns and seeking to harmonize regulations around data privacy globally. Policymakers should establish an environment that facilitates the business viability of the big data sector (such as data, service, or IT system providers), and they should take educational measures to address the shortage of big data specialists. As big data become ubiquitous in public and private organizations, their use will become a source of national and corporate competitive advantage.
Big Data Meet Cyber-Physical Systems: A Panoramic Survey The world is witnessing an unprecedented growth of cyber-physical systems (CPS), which are foreseen to revolutionize our world {via} creating new services and applications in a variety of sectors such as environmental monitoring, mobile-health systems, intelligent transportation systems and so on. The {information and communication technology }(ICT) sector is experiencing a significant growth in { data} traffic, driven by the widespread usage of smartphones, tablets and video streaming, along with the significant growth of sensors deployments that are anticipated in the near future. {It} is expected to outstandingly increase the growth rate of raw sensed data. In this paper, we present the CPS taxonomy {via} providing a broad overview of data collection, storage, access, processing and analysis. Compared with other survey papers, this is the first panoramic survey on big data for CPS, where our objective is to provide a panoramic summary of different CPS aspects. Furthermore, CPS {require} cybersecurity to protect {them} against malicious attacks and unauthorized intrusion, which {become} a challenge with the enormous amount of data that is continuously being generated in the network. {Thus, we also} provide an overview of the different security solutions proposed for CPS big data storage, access and analytics. We also discuss big data meeting green challenges in the contexts of CPS.
Big Data Quality: A systematic literature review and future research directions One of the challenges manifested after global growth of social networks and the exponential growth of user-generated data is to identify user needs based on the data they share or tend to like. ‘Big Data’ is a term referring to data that exist in huge volume and various formats, i.e. structured or semi structured. The inherent features of this data have forced organizations to seek to identify desirable patterns amongst big data and make their fundamental decisions based on this information, in order to improve their customer services and enhance their business. As long as the big data that is being used is not of good quality, the business needs would not be expected to be met. As a result, big data quality needs to be taken into consideration seriously. Since there is no systematic review in the big data quality area, this study aims to present a systematic literature review of the research efforts on big data quality for those researchers who attempt to enter this area. In this systematic review, and after determining the basic requirements, a total of 419 studies are initially considered to be relevant. Then, with a review of the abstracts of the studies, 170 papers are included and ultimately after the complete study, 88 papers have been added to the final papers pool. Through careful study and analysis of these papers, the desired information has been extracted. As a result, a research tree is presented that divides the studies based on the type of processing, task, and technique. Then the active venues and other interesting profiles, as well as the classification of the new challenges of this field are discussed.
Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service Recently, we have been witnessing huge advancements in the scale of data we routinely generate and collect in pretty much everything we do, as well as our ability to exploit modern technologies to process, analyze and understand this data. The intersection of these trends is what is called, nowadays, as Big Data Science. Cloud computing represents a practical and cost-effective solution for supporting Big Data storage, processing and for sophisticated analytics applications. We analyze in details the building blocks of the software stack for supporting big data science as a commodity service for data scientists. We provide various insights about the latest ongoing developments and open challenges in this domain.
Big Data Visualization Tools Data visualization is the presentation of data in a pictorial or graphical format, and a data visualization tool is the software that generates this presentation. Data visualization provides users with intuitive means to interactively explore and analyze data, enabling them to effectively identify interesting patterns, infer correlations and causalities, and supports sense-making activities.
Big Data Visualization: Turning Big Data Into Big Insights This white paper provides valuable information about visualization-based data discovery tools and how they can help IT decision-makers derive more value from big data. Topics include: • An overview of the IT landscape and the challenges that are leading more businesses to look for alternatives to traditional business intelligence tools • A description of the features and benefits of visualization-based data discovery tools • Guidance and suggestions on data governance, and ways to protect the quality of big data while facilitating self-service business intelligence • Several usage examples of visualization-based data discovery tools from TIBCO* Software, the world´s second-largest data discovery vendor
Big Data: Harnessing the Power of Big Data through Education and data-driven Decision Making Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data-science programs, and publications are touting data science as a hot – even ‘‘sexy´´ – career choice. However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz. In this article, we argue that there are good reasons why it has been hard to pin down exactly what is data science. One reason is that data science is intricately intertwined with other important concepts also of growing importance, such as big data and data-driven decision making. Another reason is the natural tendency to associate what a practitioner does with the definition of the practitioner´s field; this can result in overlooking the fundamentals of the field. We believe that trying to define the boundaries of data science precisely is not of the utmost importance. We can debate the boundaries of the field in an academic setting, but in order for data science to serve business effectively, it is important (i) to understand its relationships to other important related concepts, and (ii) to begin to identify the fundamental principles underlying data science. Once we embrace (ii), we can much better understand and explain exactly what data science has to offer. Furthermore, only once we embrace (ii) should we be comfortable calling it data science. In this article, we present a perspective that addresses all these concepts.We close by offering, as examples, a partial list of fundamental principles underlying data science.
Big Data: New Tricks for Econometrics Computers are now involved in many economic transactions and can capture data associated with these transactions, which can then be manipulated and analyzed. Conventional statistical and econometric techniques such as regression often work well, but there are issues unique to big datasets that may require different tools. First, the sheer size of the data involved may require more powerful data manipulation tools. Second, we may have more potential predictors than appropriate for estimation, so we need to do some kind of variable selection. Third, large datasets may allow for more flexible relationships than simple linear models. Machine learning techniques such as decision trees, support vector machines, neural nets, deep learning, and so on may allow for more effective ways to model complex relationships. In this essay, I will describe a few of these tools for manipulating and analyzing big data. I believe that these methods have a lot to offer and should be more widely known and used by economists. In fact, my standard advice to graduate students these days is go to the computer science department and take a class in machine learning. There have been very fruitful collaborations between computer scientists and statisticians in the last decade or so, and I expect collaborations between computer scientists and econometricians will also be productive in the future.
Big data: The next frontier for innovation, competition, and productivity This report contributes to MGI´s mission to help global leaders understand the forces transforming the global economy, improve company performance, and work for better national and international policies. As with all MGI research, we would like to emphasize that this work is independent and has not been commissioned or sponsored in any way by any business, government, or other institution.
Big Workflow: More than Just Intelligent Workload Management for Big Data Big data applications represent a fast-growing category of high-value applications that are increasingly employed by business and technical computing users. However, they have exposed an inconvenient dichotomy in the way resources are utilized in data centers. Conventional enterprise and web-based applications can be executed efficiently in virtualized server environments, where resource management and scheduling is generally confined to a single server. By contrast, data-intensive analytics and technical simulations demand large aggregated resources, necessitating intelligent scheduling and resource management that spans a computer cluster, cloud, or entire data center. Although these tools exist in isolation, they are not available in a general-purpose framework that allows them to interoperate easily and automatically within existing IT infrastructure. A new approach, known as ‘Big Workflow,’ is being created by Adaptive Computing to address the needs of these applications. It is designed to unify public clouds, private clouds, Map Reduce-type clusters, and technical computing clusters. Specifically Big Workflow will: • Schedule, optimize and enforce policies across the data center • Enable data-aware workflow coordination across storage and compute silos • Integrate with external workflow automation tools Such a solution will provide a much-needed toolset for managing big data applications, shortening timelines, simplifying operations, and maximizing resource utilization, and preserving existing investments.
Blending Transactions and Analytics in a Single In-Memory Platform: Key to the Real-Time Enterprise This white paper discusses the issues involved in the traditional practice of deploying transactional and analytic applications on separate platforms using separate databases. It analyzes the results from a user survey, conducted on SAP’s behalf by IDC, that explores these issues. The paper then considers how SAP HANA, with its combination of in-memory data management and its ability to handle both transactions and analytics in real time, can resolve these issues. It explores how businesses may find opportunities for innovation (such as the ability to engage in a richer dialog with a customer based on analysis of the latest transactional information), for speed (with the ability to provide faster access to information to make timely decisions), and for simplification of the IT landscape with a single in-memory platform.
Blind Source Separation: Fundamentals and Recent Advances (A Tutorial Overview Presented at SBrT-2001) Blind source separation (BSS), i.e., the decoupling of unknown signals that have been mixed in an unknown way, has been a topic of great interest in the signal processing community for the last decade, covering a wide range of applications in such diverse fields as digital communications, pattern recognition, biomedical engineering, and financial data analysis, among others. This course aims at an introduction to the BSS problem via an exposition of well-known and established as well as some more recent approaches to its solution. A unified way is followed in presenting the various results so as to more easily bring out their similarities/differences and emphasize their relative advantages/disadvantages. Only a representative sample of the existing knowledge on BSS will be included in this course. The interested readers are encouraged to consult the list of bibliographical references for more details on this exciting and always active research topic.
Blockchain and Artificial Intelligence It is undeniable that artificial intelligence (AI) and blockchain concepts are spreading at a phenomenal rate. Both technologies have distinct degree of technological complexity and multi-dimensional business implications. However, a common misunderstanding about blockchain concept, in particular, is that blockchain is decentralized and is not controlled by anyone. But the underlying development of a blockchain system is still attributed to a cluster of core developers. Take smart contract as an example, it is essentially a collection of codes (or functions) and data (or states) that are programmed and deployed on a blockchain (say, Ethereum) by different human programmers. It is thus, unfortunately, less likely to be free of loopholes and flaws. In this article, through a brief overview about how artificial intelligence could be used to deliver bug-free smart contract so as to achieve the goal of blockchain 2.0, we to emphasize that the blockchain implementation can be assisted or enhanced via various AI techniques. The alliance of AI and blockchain is expected to create numerous possibilities.
Blockchain for Future Smart Grid: A Comprehensive Survey The concept of smart grid has been introduced as a new vision of the conventional power grid to figure out an efficient way of integrating green and renewable energy technologies. In this way, Internet-connected smart grid, also called energy Internet, is also emerging as an innovative approach to ensure the energy from anywhere at any time. The ultimate goal of these developments is to build a sustainable society. However, integrating and coordinating a large number of growing connections can be a challenging issue for the traditional centralized grid system. Consequently, the smart grid is undergoing a transformation to the decentralized topology from its centralized form. On the other hand, blockchain has some excellent features which make it a promising application for smart grid paradigm. In this paper, we have an aim to provide a comprehensive survey on application of blockchain in smart grid. As such, we identify the significant security challenges of smart grid scenarios that can be addressed by blockchain. Then, we present a number of blockchain-based recent research works presented in different literatures addressing security issues in the area of smart grid. We also summarize several related practical projects, trials, and products that have been emerged recently. Finally, we discuss essential research challenges and future directions of applying blockchain to smart grid security issues.
Blockchain for Internet of Things: A Survey Internet of Things (IoT) is reshaping the incumbent industry to smart industry featured with data-driven decision-making. However, intrinsic features of IoT result in a number of challenges such as decentralization, poor interoperability, privacy and security vulnerabilities. Blockchain technology brings the opportunities in addressing the challenges of IoT. In this paper, we investigate the integration of blockchain technology with IoT. We name such synthesis of blockchain and IoT as Blockchain of Things (BCoT). This paper presents an in-depth survey of BCoT and discusses the insights of this new paradigm. In particular, we first briefly introduce IoT and discuss the challenges of IoT. Then we give an overview of blockchain technology. We next concentrate on introducing the convergence of blockchain and IoT and presenting the proposal of BCoT architecture. We further discuss the issues about using blockchain for 5G beyond in IoT as well as industrial applications of BCoT. Finally, we outline the open research directions in this promising area.
Blockchain for the IoT: Opportunities and Challenges Blockchain technology has been transforming the financial industry and has created a new crypto-economy in the last decade. The foundational concepts such as decentralized trust and distributed ledger are promising for distributed, and large-scale Internet of Things (IoT) applications. However, the applications of Blockchain beyond cryptocurrencies in this domain are few and far between because of the lack of understanding and inherent architectural challenges. In this paper, we describe the opportunities for applications of blockchain for the IoT and examine the challenges involved in architecting Blockchain-based IoT applications.
Blockchain Games: A Survey With the support of the blockchain systems, the cryptocurrency has changed the world of virtual assets. Digital games, especially those with massive multi-player scenarios, will be significantly impacted by this novel technology. However, there are insufficient academic studies on this topic. In this work, we filled the blank by surveying the state-of-the-art blockchain games. We discuss the blockchain integration for games and then categorize existing blockchain games from the aspects of their genres and technical platforms. Moreover, by analyzing the industrial trend with a statistical approach, we envision the future of blockchain games from technological and commercial perspectives.
Blockchain Technology Overview Blockchains are tamper evident and tamper resistant digital ledgers implemented in a distributed fashion (i.e., without a central repository) and usually without a central authority (i.e., a bank, company, or government). At their basic level, they enable a community of users to record transactions in a shared ledger within that community, such that under normal operation of the blockchain network no transaction can be changed once published. This document provides a high-level technical overview of blockchain technology. The purpose is to help readers understand how blockchain technology works.
Blockchain: Emerging Applications and Use Cases Blockchain also known as a distributed ledger technology stores different transactions/operations in a chain of blocks in a distributed manner without needing a trusted third-party. Blockchain is proven to be immutable which helps for integrity and accountability, and, to some extent, confidentiality through a pair of public and private keys. Blockchain has been in the spotlight after successful boom of the Bitcoin. There have been efforts to leverage salient features of Blockchain for different applications and use cases. This paper present a comprehensive survey of applications and use cases of Blockchain technology. Specifically, readers of this paper can have thorough understanding of applications and user cases of Blockchain technology.
Brain Intelligence: Go Beyond Artificial Intelligence Artificial intelligence (AI) is an important technology that supports daily social life and economic activities. It contributes greatly to the sustainable growth of Japan’s economy and solves various social problems. In recent years, AI has attracted attention as a key for growth in developed countries such as Europe and the United States and developing countries such as China and India. The attention has been focused mainly on developing new artificial intelligence information communication technology (ICT) and robot technology (RT). Although recently developed AI technology certainly excels in extracting certain patterns, there are many limitations. Most ICT models are overly dependent on big data, lack a self-idea function, and are complicated. In this paper, rather than merely developing next-generation artificial intelligence technology, we aim to develop a new concept of general-purpose intelligence cognition technology called Beyond AI. Specifically, we plan to develop an intelligent learning model called Brain Intelligence (BI) that generates new ideas about events without having experienced them by using artificial life with an imagine function. We will also conduct demonstrations of the developed BI intelligence learning model on automatic driving, precision medical care, and industrial robots.
Breaking Data Science Open Deliver Collaboration, Self-Service and Production Deployment with Open Data Science Data science has burst into public attention over the past few years as perhaps the hottest and most lucrative technology field. No longer just a buzzword for advanced analytics, Christine Doig is a senior data scientist at Continuum Analytics, where she’s worked on several projects, including MEMEX, a DARPA-funded open data science project to help stop human trafficking. She has 5+ years of experience in analytics, operations research, and machine learning in a variety of industries. Christine Doig @ch_doig data science is poised to change everything about an organization: its potential customers, expansion plans, engineering and manufacturing process, how it chooses and interacts with suppliers and more. The leading edge of this tsunami is a combination of innovative business and technology trends that promise a more intelligent future based on Open Data Science. Open Data Science is a movement that makes the open source tools of data science—data, analytics and computation—work together as a connected ecosystem.
Bridging the gap between hierarchical network representation and functional analysis RedeR is an R-based package combined with a Java application for dynamic network visualization and manipulation. It implements a callback engine by using a low-level R-to-Java interface to build and run common plugins. In this sense, RedeR takes advantage of R to run robust statistics, while the R-to-Java interface bridge the gap between network analysis and visualization. RedeR is designed to deal with three key challenges in network analysis. Firstly, biological networks are modular and hierarchical, so network visualization needs to take advantage of such structural features. Secondly, network analysis relies on statistical methods, many of which are already available in resources like CRAN or Bioconductor. However, the missing link between ad- vanced visualization and statistical computing makes it hard to take full advantage of R packages for network analysis. Thirdly, in larger networks user input is needed to focus the view of the network on the biologically relevant parts, rather than relying on an automatic layout function. RedeR is designed to address these challenges (additional information is available at Castro et al.).
Brief Review of Computational Intelligence Algorithms Computational Intelligence algorithms have gained a lot of attention of researchers in the recent years due to their ability to deliver near optimal solutions. In this paper we propose a new hierarchy which classifies algorithms based on their sources of inspiration. The algorithms have been divided into two broad domains namely modeling of human mind and nature inspired intelligence. Algorithms of Modeling of human mind take their motivation from the manner in which humans perceive and deal with information. Similarly algorithms of nature inspired intelligence domain are based on ordinary phenomenon occurring in nature. The latter has further been broken into swarm intelligence, geosciences and artificial immune system. Geoscience based is the new domain whose algorithms are based on geographic phenomenon on the Earths surface. A comprehensive tabular comparison is done amongst algorithms in each domain in various attributes such as problem solving method, application, characteristics and more. For further insights, we examine a variant of every algorithm and its implementation for a specific application. To understand the performance and efficiency better, we compare the performance of select algorithms on Traveling salesman problem.
Brief: Real-Time Speech Analytics – Still More Sizzle Than Steak Most customer service organizations record phone interactions with their customers. If they get around to analyzing those recordings, whatever they find can´t change the outcome of those calls — they are long since over. Vendors of real-time speech analytics tools promise to allow companies to intervene at the moment of truth, while the customer and the contact center agent are still talking. This brief discusses the hurdles application development and delivery (ADandD) pros will need to overcome to justify the expenditure on this technology and the steps they will need to take to prepare for a world of alerts generated in real-time based on customer conversations.
Build a Powerful Business Case for Data Quality with Metrics Money and resources wasted; sales missed; extra costs incurred. Recent research by industry analyst firm Gartner shows that the shocking price that companies are paying because of poor quality data adds up to a staggering $8.2 million annually.
Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models We survey latent variable models for solving data-analysis problems. A latent variable model is a probabilistic model that encodes hidden patterns in the data.We uncover these patterns from their conditional distribution and use them to summarize data and form predictions. Latent variable models are important in many fields, including computational biology, natural language processing, and social network analysis. Our perspective is that models are developed iteratively: We build a model, use it to analyze data, assess how it succeeds and fails, revise it, and repeat. We describe how new research has transformed these essential activities. First, we describe probabilistic graphical models, a language for formulating latent variable models. Second, we describe mean field variational inference, a generic algorithm for approximating conditional distributions. Third, we describe how to use our analyses to solve problems: exploring the data, forming predictions, and pointing us in the direction of improved models.
Building Data Science Teams Starting in 2008, Jeff Hammerbacher (@hackingdata) and I sat down to share our experiences building the data and analytics groups at Facebook and LinkedIn. In many ways, that meeting was the start of data science as a distinct professional specialization (see ‘What Makes a Data Scientist ‘ on page 11 for the story on how we came up with the title ‘Data Scientist’). Since then, data science has taken on a life of its own. The hugely positive response to ‘What Is Data Science ,’ a great introduction to the meaning of data science in today´s world, showed that we were at the start of a movement. There are now regular meetups, well-established startups, and even college curricula focusing on data science. As McKinsey´s big data research report and LinkedIn´s data indicates indicates (see Figure 1), data science talent is in high demand. This increase in the demand for data scientists has been driven by the success of the major Internet companies. Google, Facebook, LinkedIn, and Amazon have all made their marks by using data creatively: not just warehousing data, but turning it into something of value. Whether that value is a search result, a targeted advertisement, or a list of possible acquaintances, data science is producing products that people want and value. And it´s not just Internet companies: Walmart doesn´t produce ‘data products’ as such, but they´re well known for using data to optimize every aspect of their retail operations. Given how important data science has grown, it´s important to think about what data scientists add to an organization, how they fit in, and how to hire and build effective data science teams.
Building High-level Features Using Large Scale Unsupervised Learning We consider the problem of building highlevel, class-speci c feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images To answer this, we train a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200×200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also nd that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8% accuracy in recognizing 22,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art.
Building Machines that Learn and Think for Themselves: Commentary on Lake et al., Behavioral and Brain Sciences, 2017 We agree with Lake and colleagues on their list of key ingredients for building humanlike intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand-engineering. We believe an approach centered on autonomous learning has the greatest chance of success as we scale toward real-world complexity, tackling domains for which ready-made formal models are not available. Here we survey several important examples of the progress that has been made toward building autonomous agents with humanlike abilities, and highlight some outstanding challenges.
Building Production-Ready Predictive Analytics There´s a part of data science that you never hear about: the production. Everybody talks about how to build models, but not many people worry about how to actually use those models. Yet production issues are the reason many companies fail to see value come from their data science efforts. We wondered how companies handled their production processes and environments to build production-ready data products, and we figured the easiest way to find out was to ask them. We conducted a worldwide survey and asked thousands of companies. And we got our answers. After analyzing those answers, we isolated four different ways companies are dealing with production today, and we put together a series of recommendations on how to build production-ready data science projects.
Building Real-Time Data Pipelines Imagine you had a time machine that could go back one minute, or an hour. Think about what you could do with it. From the perspective of other people, it would seem like there was nothing you couldn´t do, no contest you couldn´t win. In the real world, there are three basic ways to win. One way is to have something, or to know something, that your competition does not. Nice work if you can get it. The second way to win is to simply be more intelligent. However, the number of people who think they are smarter is much larger than the number of people who actually are smarter. The third way is to process information faster so you can make and act on decisions faster. Being able to make more decisions in less time gives you an advantage in both information and intelligence. It allows you to try many ideas, correct the bad ones, and react to changes before your competition. If your opponent cannot react as fast as you can, it does not matter what they have, what they know, or how smart they are. Taken to extremes, it´s almost like having a time machine. An example of the third way can be found in high-frequency stock trading. Every trading desk has access to a large pool of highly intelligent people, and pays them well. All of the players have access to the same information at the same time, at least in theory. Being more or less equally smart and informed, the most active area of competition is the end-to-end speed of their decision loops. In recent years, traders have gone to the trouble of building their own wireless long-haul networks, to exploit the fact that microwaves move through the air 50% faster than light can pulse through fiber optics. This allows them to execute trades a crucial millisecond faster. Finding ways to shorten end-to-end information latency is also a constant theme at leading tech companies. They are forever working to reduce the delay between something happening out there in the world or in their huge clusters of computers, and when it shows up on a graph. At Facebook in the early 2010s, it was normal to wait hours after pushing new code to discover whether everything was working efficiently. The full report came in the next day. After building their own distributed in-memory database and event pipeline, their information loop is now on the order of 30 seconds, and they push at least two full builds per day. Instead of slowing down as they got bigger, Facebook doubled down on making more decisions faster. What is your system´s end-to-end latency How long is your decision loop, compared to the competition Imagine you had a system that was twice as fast. What could you do with it This might be the most important question for your business. In this book we´ll explore new models of quickly processing information end to end that are enabled by long-term hardware trends, learnings from some of the largest and most successful tech companies, and surprisingly powerful ideas that have survived the test of time.
Business Analytics for Manufacturing: Four Ways to Increase Efficiency and Performance Whether the economy is strong or weak, the fundamental strategies for surviving and thriving still hold true. Manufacturers have to be highly efficient to meet demand and supply requirements. Costs and resources also have to be managed carefully and intelligently. At the same time, companies are considering new tactics: inventory optimization, maintenance operations, intelligent supply chains and leveraging technology as a focal point of business strategy. In order to be successful your company needs access to critical information and visibility into how well your business, your market and your competitors are responding to today´s challenging and changing times. …
Business Models for the Data Economy Whether you call it Big Data, data science, or simply analytics, modern businesses see data as a gold mine. Sometimes they already have this data in hand and understand that it is central to their activities. Other times, they uncover new data that fills a perceived gap, or seemingly ‘useless’ data generated by other processes. Whatever the case, there is certainly value in using data to advance your business.
Business Process Deviance Mining: Review and Evaluation Business process deviance refers to the phenomenon whereby a subset of the executions of a business process deviate, in a negative or positive way, with respect to its expected or desirable outcomes. Deviant executions of a business process include those that violate compliance rules, or executions that undershoot or exceed performance targets. Deviance mining is concerned with uncovering the reasons for deviant executions by analyzing business process event logs. This article provides a systematic review and comparative evaluation of deviance mining approaches based on a family of data mining techniques known as sequence classification. Using real-life logs from multiple domains, we evaluate a range of feature types and classification methods in terms of their ability to accurately discriminate between normal and deviant executions of a process. We also analyze the interestingness of the rule sets extracted using different methods. We observe that feature sets extracted using pattern mining techniques only slightly outperform simpler feature sets based on counts of individual activity occurrences in a trace.
Business-Driven BI: Using New Technologies to Foster Self-Service Access to Insights Self-Service Business Intelligence (BI) has been the holy grail for BI professionals for a long time. Yet almost two-thirds of BI professionals (64%) rate the success of their self-service initiatives ‘average’ or lower. Newcomers to BI struggle even more, with more than half (52%) rating their attempts at selfservice BI ‘fair’ or ‘poor.’ One reason for these less-than-stellar numbers is this: Implementing selfservice BI is more complex than it looks. It´s not a one-size-fits-all program. BI users come in many different shapes and sizes, each with unique information requirements. This report lays out several frameworks that explain how users interact with information and then maps elements of each to BI functionality and categories of BI tools. This mapping is critical to success with self-service BI….

C

Caching and Distributing Statistical Analyses in R We present the cacher package for R, which provides tools for caching statistical analyses and for distributing these analyses to others in an e cient manner. The cacher package takes objects created by evaluating R expressions and stores them in key-value databases. These databases of cached objects can subsequently be assembled into packages for distribution over the web. The cacher package also provides tools to help readers examine the data and code in a statistical analysis and reproduce, modify, or improve upon the results. In addition, readers can easily conduct alternate analyses of the data. We describe the design and implementation of the cacher package and provide two examples of how the package can be used for reproducible research. This vignette was originally published as Peng (2008).
Calling R from .NET: a case-study using Rapid NCA, the non-compartmental analysis workflow tool (Slide Deck)
Can Autism be Catered with Artificial Intelligence-Assisted Intervention Technology A Literature Review This article presents an extensive literature review of technology based intervention methodologies for individuals facing Autism Spectrum Disorder (ASD). Reviewed methodologies include: contemporary Computer Aided Systems (CAS), Computer Vision Assisted Technologies (CVAT) and Virtual Reality (VR) or Artificial Intelligence-Assisted interventions. The research over the past decade has provided enough demonstrations that individuals of ASD have a strong interest in technology based interventions and can connect with them for longer durations without facing any trouble(s). Theses technology based interventions are useful for individuals facing autism in clinical settings as well as at home and classrooms. Despite showing great promise, research in developing an advanced technology based intervention that is clinically quantitative for ASD is minimal. Moreover, the clinicians are generally not convinced about the potential of the technology based interventions due to non-empirical nature of published results. A major reason behind this non-acceptability is a vast majority of studies on distinct intervention methodologies do not follow any specific standard or research design. Consequently, the data produced by these studies is minimally appealing to the clinical community. This research domain has a vast social impact as per official statistics given by the Autism Society of America, autism is the fastest growing developmental disability in the United States (US). The estimated annual cost in the US for diagnosis and treatment for ASD is 236-262 Billion US Dollars. The cost of up-bringing an ASD individual is estimated to be 1.4 million USD while statistics show 1% of the worlds’ total population is suffering from ASD.
Can Deep Neural Networks Match the Related Objects : A Survey on ImageNet-trained Classification Models Deep neural networks (DNNs) have shown the state-of-the-art level of performances in wide range of complicated tasks. In recent years, the studies have been actively conducted to analyze the black box characteristics of DNNs and to grasp the learning behaviours, tendency, and limitations of DNNs. In this paper, we investigate the limitation of DNNs in image classification task and verify it with the method inspired by cognitive psychology. Through analyzing the failure cases of ImageNet classification task, we hypothesize that the DNNs do not sufficiently learn to associate related classes of objects. To verify how DNNs understand the relatedness between object classes, we conducted experiments on the image database provided in cognitive psychology. We applied the ImageNet-trained DNNs to the database consisting of pairs of related and unrelated object images to compare the feature similarities and determine whether the pairs match each other. In the experiments, we observed that the DNNs show limited performance in determining relatedness between object classes. In addition, the DNNs present somewhat improved performance in discovering relatedness based on similarity, but they perform weaker in discovering relatedness based on association. Through these experiments, a novel analysis of learning behaviour of DNNs is provided and the limitation which needs to be overcome is suggested.
Can Entropy Explain Successor Surprisal Effects in Reading? Human reading behavior is sensitive to surprisal: more predictable words tend to be read faster. Unexpectedly, this applies not only to the surprisal of the word that is currently being read, but also to the surprisal of upcoming (successor) words that have not been fixated yet. This finding has been interpreted as evidence that readers can extract lexical information parafoveally. Calling this interpretation into question, Angele et al. (2015) showed that successor effects appear even in contexts in which those successor words are not yet visible. They hypothesized that successor surprisal predicts reading time because it approximates the reader’s uncertainty about upcoming words. We test this hypothesis on a reading time corpus using an LSTM language model, and find that successor surprisal and entropy are independent predictors of reading time. This independence suggests that entropy alone is unlikely to be the full explanation for successor surprisal effects.
Can machine learning identify interesting mathematics An exploration using empirically observed laws We explore the possibility of using machine learning to identify interesting mathematical structures by using certain quantities that serve as fingerprints. In particular, we extract features from integer sequences using two empirical laws: Benford’s law and Taylor’s law and experiment with various classifiers to identify whether a sequence is nice, important, multiplicative, easy to compute or related to primes or palindromes.
Can Machines Design An Artificial General Intelligence Approach Can machines design Can they come up with creative solutions to problems and build tools and artifacts across a wide range of domains Recent advances in the field of computational creativity and formal Artificial General Intelligence (AGI) provide frameworks for machines with the general ability to design. In this paper we propose to integrate a formal computational creativity framework into the G\’odel machine framework. We call this machine a design G\’odel machine. Such a machine could solve a variety of design problems by generating novel concepts. In addition, it could change the way these concepts are generated by modifying itself. The design G\’odel machine is able to improve its initial design program, once it has proven that a modification would increase its return on the utility function. Finally, we sketch out a specific version of the design G\’odel machine which specifically aims at the design of complex software and hardware systems. Future work could be the development of a more formal version of the Design G\’odel machine and a potential implementation.
Can We Distinguish Machine Learning from Human Learning? What makes a task relatively more or less difficult for a machine compared to a human? Much AI/ML research has focused on expanding the range of tasks that machines can do, with a focus on whether machines can beat humans. Allowing for differences in scale, we can seek interesting (anomalous) pairs of tasks T, T’. We define interesting in this way: The ‘harder to learn’ relation is reversed when comparing human intelligence (HI) to AI. While humans seems to be able to understand problems by formulating rules, ML using neural networks does not rely on constructing rules. We discuss a novel approach where the challenge is to ‘perform well under rules that have been created by human beings.’ We suggest that this provides a rigorous and precise pathway for understanding the difference between the two kinds of learning. Specifically, we suggest a large and extensible class of learning tasks, formulated as learning under rules. With these tasks, both the AI and HI will be studied with rigor and precision. The immediate goal is to find interesting groundtruth rule pairs. In the long term, the goal will be to understand, in a generalizable way, what distinguishes interesting pairs from ordinary pairs, and to define saliency behind interesting pairs. This may open new ways of thinking about AI, and provide unexpected insights into human learning.
Canonical example of Bayes´ theorem in detail The most common elementary illustration of Bayes´ theorem is medical testing for a rare disease. The example is almost a clich´e in probability and statistics books. And yet in my opinion, it´s usually presented too quickly and too abstractly. Here I´m going to risk erring on the side of going too slowly and being too concrete. I´ll work out an example with numbers and no equations before presenting Bayes theorem. Then I´ll include a few graphs.
Capitalizing on the power of big data for retail The retail industry is changing dramatically as consumers shop in new ways. With the growing popularity of online shopping and mobile commerce, consumers are using more retail channels than ever before to research products, compare prices, search for promotions, make purchases and provide feedback. Social media has become one of the key channels. Consumers are using social media – and the leading e-commerce platforms that integrate with social media – to find product recommendations, lavish praise, voice complaints, capitalize on product offers and engage in ongoing dialogs with their favorite brands. The multiplication of retail channels and the increasing use of social media are empowering consumers. With a wealth of information readily available online, consumers are now better able to compare products, services and prices – even as they shop in physical stores. When consumers interact with companies publically through social media, they have greater power to influence other customers or damage a brand. These and other changes in the retail industry are creating important opportunities for retailers. But to capitalize on those opportunities, retailers need ways to collect, manage and analyze a tremendous volume, variety and velocity of data. When point-of-sale (POS) systems were first commercialized, retailers were able to collect large amounts of potentially valuable information, but most of that information remained untapped. The emergence of social media and other consumer-oriented technologies is now introducing even more data to the retail ecosystem. Retailers must handle not only the growing volume of information but also an increasing variety – including both structured and unstructured data. They must also find ways to accommodate the changing nature of this data and the velocity at which is being produced and collected. If retailers succeed in addressing the challenges of ‘big data,’ they can use this data to generate valuable insights for personalizing marketing and improving the effectiveness of marketing campaigns, optimizing assortment and merchandising decisions, and removing inefficiencies in distribution and operations. Adopting solutions designed to capitalize on this big data allows companies to navigate the shifting retail landscape and drive a positive transformation for the business….
Career Transitions and Trajectories: A Case Study in Computing From artificial intelligence to network security to hardware design, it is well-known that computing research drives many important technological and societal advancements. However, less is known about the long-term career paths of the people behind these innovations. What do their careers reveal about the evolution of computing research Which institutions were and are the most important in this field, and for what reasons Can insights into computing career trajectories help predict employer retention In this paper we analyze several decades of post-PhD computing careers using a large new dataset rich with professional information, and propose a versatile career network model, R^3, that captures temporal career dynamics. With R^3 we track important organizations in computing research history, analyze career movement between industry, academia, and government, and build a powerful predictive model for individual career transitions. Our study, the first of its kind, is a starting point for understanding computing research careers, and may inform employer recruitment and retention mechanisms at a time when the demand for specialized computational expertise far exceeds supply.
Causal inference and the data-fusion problem We review concepts, principles, and tools that unify current approaches to causal analysis and attend to new challenges presented by big data. In particular, we address the problem of data fusion – piecing together multiple datasets collected under heterogeneous conditions (i.e., different populations, regimes, and sampling methods) to obtain valid answers to queries of interest. The availability of multiple heterogeneous datasets presents new opportunities to big data analysts, because the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted parametric models. We here present a general, nonparametric framework for handling these biases and, ultimately, a theoretical solution to the problem of data fusion in causal inference tasks.
Causal inference in statistics: An overview This review presents empirical researchers with recent advances in causal inference, and stresses the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called ‘causal effects’ or ‘policy evaluation’) (2) queries about probabilities of counterfactuals, (including assessment of ‘regret,’ ‘attribution’ or ’causes of effects’) and (3) queries about direct and indirect effects (also known as ‘mediation’). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both.
Causality and Statistical Learning In social science we are sometimes in the position of studying descriptive questions (In what places do working-class whites vote for Republicans In what eras has social mobility been higher in the United States than in Europe In what social settings are different sorts of people more likely to act strategically ). Answering descriptive questions is not easy and involves issues of data collection, data analysis, and measurement (how one should define concepts such as ‘working-class whites,’ ‘social mobility,’ and ‘strategic’) but is uncontroversial from a statistical standpoint. All becomes more difficult when we shift our focus from what to what if and why. Consider two broad classes of inferential questions: 1. Forward causal inference. What might happen if we do X What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth 2. Reverse causal inference. What causes Y Why do more attractive people earn more money Why do many poor people vote for Republicans and rich people vote for Democrats Why did the economy collapse
Causality for Machine Learning Graphical causal inference as pioneered by Judea Pearl arose from research on artificial intelligence (AI), and for a long time had little connection to the field of machine learning. This article discusses where links have been and should be established, introducing key concepts along the way. It argues that the hard open problems of machine learning and AI are intrinsically related to causality, and explains how the field is beginning to understand them.
Challenges and Opportunities with Big Data The promise of data-driven decision-making is now being recognized broadly, and there is growing enthusiasm for the notion of “Big Data.´´ While the promise of Big Data is real — for example, it is estimated that Google alone contributed 54 billion dollars to the US economy in 2009 — there is currently a wide gap between its potential and its realization. Heterogeneity, scale, timeliness, complexity, and privacy problems with Big Data impede progress at all phases of the pipeline that can create value from data. The problems start right away during data acquisition, when the data tsunami requires us to make decisions, currently in an ad hoc manner, about what data to keep and what to discard, and how to store what we keep reliably with the right metadata. Much data today is not natively in structured format; for example, tweets and blogs are weakly structured pieces of text, while images and video are structured for storage and display, but not for semantic content and search: transforming such content into a structured format for later analysis is a major challenge. The value of data explodes when it can be linked with other data, thus data integration is a major creator of value. Since most data is directly generated in digital format today, we have the opportunity and the challenge both to influence the creation to facilitate later linkage and to automatically link previously created data. Data analysis, organization, retrieval, and modeling are other foundational challenges. Data analysis is a clear bottleneck in many applications, both due to lack of scalability of the underlying algorithms and due to the complexity of the data that needs to be analyzed. Finally, presentation of the results and its interpretation by non-technical domain experts is crucial to extracting actionable knowledge. During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led, during the last 35 years, to a multi-billion dollar industry. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today. The many novel challenges and opportunities associated with Big Data necessitate rethinking many aspects of these data management platforms, while retaining other desirable aspects. We believe that appropriate investment in Big Data will lead to a new wave of fundamental technological advances that will be embodied in the next generations of Big Data management and analysis platforms, products, and systems. We believe that these research problems are not only timely, but also have the potential to create huge economic value in the US economy for years to come. However, they are also hard, requiring us to rethink data analysis systems in fundamental ways. A major investment in Big Data, properly directed, can result not only in major scientific advances, but also lay the foundation for the next generation of advances in science, medicine, and business.
Challenges of Big Data Analysis Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.
Characterization of Fundamental Networks In the framework of coupled cell systems, a coupled cell network describes graphically the dynamical dependencies between individual dynamical systems, the cells. The fundamental network of a network reveals the hidden symmetries of that network. Subspaces defined by equalities of coordinates which are flow-invariant for any coupled cell system consistent with a network structure are called the network synchrony subspaces. Moreover, for every synchrony subspaces, each network admissible system restricted to that subspace is a dynamical systems consistent with a smaller network. The original network is then said to be a lift of the smaller network. We characterize networks such that: its fundamental network is a lift of the network; the network is a subnetwork of its fundamental network, and the network is a fundamental network. The size of cycles in a network and the distance of a cell to a cycle are two important properties concerning the description of the network architecture. In this paper, we relate these two architectural properties in a network and its fundamental network.
Characterizing HCI Research in China: Streams, Methodologies and Future Directions Human-computer Interaction (HCI) is an interdisciplinary research field involving multiple disciplines, such as computer science, psychology, social science and design. It studies the interaction between users and computer in order to better design technologies and solve real-life problems. This position paper characterizes HCI research in China by comparing it with international HCI research traditions. We discuss the current streams and methodologies of Chinese HCI research. We then propose future HCI research directions such as including emergent users who have less access to technology and addressing the cultural dimensions in order to provide better technical solutions and support.
Character-level Convolutional Networks for Text Classification This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several largescale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.
Chart Suggestions – A Thought-Starter (Cheat Sheet)
Choosing the right NoSQL database for the job: a quality attribute evaluation For over forty years, relational databases have been the leading model for data storage, retrieval and management. However, due to increasing needs for scalability and performance, alternative systems have emerged, namely NoSQL technology. The rising interest in NoSQL technology, as well as the growth in the number of use case scenarios, over the last few years resulted in an increasing number of evaluations and comparisons among competing NoSQL technologies. While most research work mostly focuses on performance evaluation using standard benchmarks, it is important to notice that the architecture of real world systems is not only driven by performance requirements, but has to comprehensively include many other quality attribute requirements. Software quality attributes form the basis from which software engineers and architects develop software and make design decisions. Yet, there has been no quality attribute focused survey or classification of NoSQL databases where databases are compared with regards to their suitability for quality attributes common on the design of enterprise systems. To fill this gap, and aid software engineers and architects, in this article, we survey and create a concise and up-to-date comparison of NoSQL engines, identifying their most beneficial use case scenarios from the software engineer point of view and the quality attributes that each of them is most suited to.
Classification and Regression Tree Methods A classification or regression tree is a prediction model that can be represented as a decision tree. This article discusses the C4.5, CART, CRUISE, GUIDE, and QUEST methods in terms of their algorithms, features, properties, and performance.
Classification and Regression Trees Classification and regression trees are machine-learningmethods for constructing predictionmodels from data. Themodels are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered discrete values, with prediction error typically measured by the squared difference between the observed and predicted values. This article gives an introduction to the subject by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
Classification And Regression Trees : A Practical Guide for Describing a Dataset (Slide Deck)
Classification revisited: a web of knowledge The vision of the Semantic Web (SW) is gradually unfolding and taking shape through a web of linked data, a part of which is built by capturing semantics stored in existing knowledge organization systems (KOS), subject metadata and resource metadata. The content of vast bibliographic collections is currently categorized by some widely used bibliographic classification and we may soon see them being mined for information and linked in a meaningful way across the Web. Bibliographic classifications are designed for knowledge mediation which offers both a rich terminology and different ways in which concepts can be categorized and related to each other in the universe of knowledge. From 1990-2010 they have been used in various resource discovery services on the Web and continue to be used to support information integration in a number of international digital library projects. In this chapter we will revisit some of the ways in which universal classifications, as language independent concept schemes, can assist humans and computers in structuring and presenting information and formulating queries. Most importantly, we highlight issues important to understanding bibliographic classifications, both in terms of their unused potential and technical limitations.
Classification via Minimum Incremental Coding Length We present a simple new criterion for classification, based on principles from lossy data compression. The criterion assigns a test sample to the class that uses the minimum number of additional bits to code the test sample, subject to an allowable distortion. We demonstrate the asymptotic optimality of this criterion for Gaussian distributions and analyze its relationships to classical classifiers. The theoretical results clarify the connections between our approach and popular classifiers such as MAP, RDA, k-NN, and SVM, as well as unsupervised methods based on lossy coding. Our formulation induces several good effects on the resulting classifier. First, minimizing the lossy coding length induces a regularization effect which stabilizes the (implicit) density estimate in a small sample setting. Second, compression provides a uniform means of handling classes of varying dimension. The new criterion and its kernel and local versions perform competitively on synthetic examples, as well as on real imagery data such as handwritten digits and face images. On these problems, the performance of our simple classifier approaches the best reported results, without using domain-specific information. All MATLAB code and classification results are publicly available for peer evaluation at http://…/home.htm.
Classification with imperfect training labels We study the effect of imperfect training data labels on the performance of classification methods. In a general setting, where the probability that an observation in the training dataset is mislabelled may depend on both the feature vector and the true label, we bound the excess risk of an arbitrary classifier trained with imperfect labels in terms of its excess risk for predicting a noisy label. This reveals conditions under which a classifier trained with imperfect labels remains consistent for classifying uncorrupted test data points. Furthermore, under stronger conditions, we derive detailed asymptotic properties for the popular $k$-nearest neighbour ($k$nn), Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA) classifiers. One consequence of these results is that the $k$nn and SVM classifiers are robust to imperfect training labels, in the sense that the rate of convergence of the excess risks of these classifiers remains unchanged; in fact, it even turns out that in some cases, imperfect labels may improve the performance of these methods. On the other hand, the LDA classifier is shown to be typically inconsistent in the presence of label noise unless the prior probabilities of each class are equal. Our theoretical results are supported by a simulation study.
Closing the AI Knowledge Gap AI researchers employ not only the scientific method, but also methodology from mathematics and engineering. However, the use of the scientific method – specifically hypothesis testing – in AI is typically conducted in service of engineering objectives. Growing interest in topics such as fairness and algorithmic bias show that engineering-focused questions only comprise a subset of the important questions about AI systems. This results in the AI Knowledge Gap: the number of unique AI systems grows faster than the number of studies that characterize these systems’ behavior. To close this gap, we argue that the study of AI could benefit from the greater inclusion of researchers who are well positioned to formulate and test hypotheses about the behavior of AI systems. We examine the barriers preventing social and behavioral scientists from conducting such studies. Our diagnosis suggests that accelerating the scientific study of AI systems requires new incentives for academia and industry, mediated by new tools and institutions. To address these needs, we propose a two-sided marketplace called TuringBox. On one side, AI contributors upload existing and novel algorithms to be studied scientifically by others. On the other side, AI examiners develop and post machine intelligence tasks designed to evaluate and characterize algorithmic behavior. We discuss this market’s potential to democratize the scientific study of AI behavior, and thus narrow the AI Knowledge Gap.
Cloud based Predictive Analytics poised for rapid growth Rather than report survey results question by question the results, and their implications, have been grouped into a number of sections. Each section highlights significant results from the survey and discusses its implication. – Business solutions are what organizations need – Predictive analytics are showing real strength – Customers are the focus for predictive analytics and cloud – Cloud-based predictive analytic scenarios are gaining momentum – Early adopters are gaining a competitive advantage – Decision Management matters to predictive analytic success – There are still some barriers and concerns with cloud-based predictive analytics – Industries vary in their adoption and concerns – A mix of clouds is appropriate – Traditional data sources dominate predictive analytic models After the survey results and implications are discussed we will make some recommendations and identify pros and cons of the various options. Demographics and vendor profiles complete the paper.
Cloud Computing – Architecture and Applications In the era of Internet of Things and with the explosive worldwide growth of electronic data volume, and associated need of processing, analysis, and storage of such humongous volume of data, it has now become mandatory to exploit the power of massively parallel architecture for fast computation. Cloud computing provides a cheap source of such computing framework for large volume of data for real-time applications. It is, therefore, not surprising to see that cloud computing has become a buzzword in the computing fraternity over the last decade. This book presents some critical applications in cloud frameworks along with some innovation design of algorithms and architecture for deployment in cloud environment. It is a valuable source of knowledge for researchers, engineers, practitioners, and graduate and doctoral students working in the field of cloud computing. It will also be useful for faculty members of graduate schools and universities.
Cloud Service Matchmaking Approaches: A Systematic Literature Survey Service matching concerns finding suitable services according to the service requester’s requirements, which is a complex task due to the increasing number and diversity of cloud services available. Service matching is discussed in web services composition and user oriented service marketplaces contexts. The suggested approaches have different problem definitions and have to be examined closer in order to identify comparable results and to find out which approaches have built on the former ones. One of the most important use cases is service requesters with limited technical knowledge who need to compare services based on their QoS requirements in cloud service marketplaces. Our survey examines the service matching approaches in order to find out the relation between their context and their objectives. Moreover, it evaluates their applicability for the cloud service marketplaces context.
Cluster Analysis: Tutorial with R In this tutorial we inspect classification. classification and ordination are al- ternative strategies of simplifying data. Ordination tries to simplify data into a map showing similarities among points. classification simpli es data by putting similar points into same class. The task of describing a high number of points is simpli ed to an easier task of describing a low number of classes.
Cluster Validation (Slide Deck)
Clustering large Data Sets with mixed numeric and Categorical Values Efficient partitioning of large data sets into homogenous clusters is a fundamental problem in data mining. The standard hierarchical clustering methods provide no solution for this problem due to their computational inefficiency. The k-means based methods are promising for their efficiency in processing large data sets. However, their use is often limited to numeric data. In this paper we present a k-prototypes algorithm which is based on the k-means paradigm but removes the numeric data limitation whilst preserving its efficiency. In the algorithm, objects are clustered against k prototypes. A method is developed to dynamically update the k prototypes in order to maximise the intra cluster similarity of objects. When applied to numeric data the algorithm is identical to the kmeans. To assist interpretation of clusters we use decision tree induction algorithms to create rules for clusters. These rules, together with other statistics about clusters, can assist data miners to understand and identify interesting clusters.
Clustering with Deep Learning: Taxonomy and New Methods Clustering is a fundamental machine learning method. The quality of its results is dependent on the data distribution. For this reason, deep neural networks can be used for learning better representations of the data. In this paper, we propose a systematic taxonomy for clustering with deep learning, in addition to a review of methods from the field. Based on our taxonomy, creating new methods is more straightforward. We also propose a new approach which is built on the taxonomy and surpasses some of the limitations of some previous work. Our experimental evaluation on image datasets shows that the method approaches state-of-the-art clustering quality, and performs better in some cases.
Cogniculture: Towards a Better Human-Machine Co-evolution Research in Artificial Intelligence is breaking technology barriers every day. New algorithms and high performance computing are making things possible which we could only have imagined earlier. Though the enhancements in AI are making life easier for human beings day by day, there is constant fear that AI based systems will pose a threat to humanity. People in AI community have diverse set of opinions regarding the pros and cons of AI mimicking human behavior. Instead of worrying about AI advancements, we propose a novel idea of cognitive agents, including both human and machines, living together in a complex adaptive ecosystem, collaborating on human computation for producing essential social goods while promoting sustenance, survival and evolution of the agents’ life cycle. We highlight several research challenges and technology barriers in achieving this goal. We propose a governance mechanism around this ecosystem to ensure ethical behaviors of all cognitive agents. Along with a novel set of use-cases of Cogniculture, we discuss the road map ahead for this journey.
Cognitive Dynamic Systems: A Technical Review of Cognitive Radar We start with the history of cognitive radar, where origins of the PAC, Fuster research on cognition and principals of cognition are provided. Fuster describes five cognitive functions: perception, memory, attention, language, and intelligence. We describe the Perception-Action Cyclec as it applies to cognitive radar, and then discuss long-term memory, memory storage, memory retrieval and working memory. A comparison between memory in human cognition and cognitive radar is given as well. Attention is another function described by Fuster, and we have given the comparison of attention in human cognition and cognitive radar. We talk about the four functional blocks from the PAC: Bayesian filter, feedback information, dynamic programming and state-space model for the radar environment. Then, to show that the PAC improves the tracking accuracy of Cognitive Radar over Traditional Active Radar, we have provided simulation results. In the simulation, three nonlinear filters: Cubature Kalman Filter, Unscented Kalman Filter and Extended Kalman Filter are compared. Based on the results, radars implemented with CKF perform better than the radars implemented with UKF or radars implemented with EKF. Further, radar with EKF has the worst accuracy and has the biggest computation load because of derivation and evaluation of Jacobian matrices. We suggest using the concept of risk management to better control parameters and improve performance in cognitive radar. We believe, spectrum sensing can be seen as a potential interest to be used in cognitive radar and we propose a new approach Probabilistic ICA which will presumably reduce noise based on estimation error in cognitive radar. Parallel computing is a concept based on divide and conquers mechanism, and we suggest using the parallel computing approach in cognitive radar by doing complicated calculations or tasks to reduce processing time.
Collaborative Filtering Recommender Systems Recommender systems are an important part of the information and e-commerce ecosystem. They represent a powerful method for enabling users to filter through large information and product spaces. Nearly two decades of research on collaborative filtering have led to a varied set of algorithms and a rich collection of tools for evaluating their performance. Research in the field is moving in the direction of a richer understanding of how recommender technology may be embedded in specific domains. The differing personalities exhibited by different recommender algorithms show that recommendation is not a one-sizefits- all problem. Specific tasks, information needs, and item domains represent unique problems for recommenders, and design and evaluation of recommenders needs to be done based on the user tasks to be supported. Effective deployments must begin with careful analysis of prospective users and their goals. Based on this analysis, system designers have a host of options for the choice of algorithm and for its embedding in the surrounding user experience. This paper discusses a wide variety of the choices available and their implications, aiming to provide both practicioners and researchers with an introduction to the important issues underlying recommenders and current best practices for addressing these issues.
Combine Statistical Thinking With Scientific Practice: A Protocol of a Bayesian Thesis Project For Undergraduate Students Current developments in the statistics community suggest that modern statistics education should be structured holistically, i.e., by allowing students to work with real data and answer concrete statistical questions, but also by educating them about alternative statistical frameworks, such as Bayesian statistics. In this article, we describe how we incorporated such a holistic structure in a Bayesian thesis project on ordered binomial probabilities. The project was targeted at undergraduate students in psychology with basic knowledge in Bayesian statistics and programming, but no formal mathematical training. The thesis project aimed to (1) convey the basic mathematical concepts of Bayesian inference, (2) let students experience the entire empirical cycle including the collection, analysis, and interpretation of data, and (3) teach students open science practices.
Combining Predictions for Accurate Recommender Systems We analyze the application of ensemble learning to recommender systems on the Net ix Prize dataset. For our analysis we use a set of diverse state-of-the-art collaborative ltering (CF) algorithms, which include: SVD, Neighborhood Based Approaches, Restricted Boltzmann Machine, Asymmetric Factor Model and Global E ects. We show that linearly combining (blending) a set of CF algorithms increases the accuracy and outperforms any single CF algorithm. Furthermore, we show how to use ensemble methods for blending predictors in order to outperform a single blending algorithm. The dataset and the source code for the ensemble blending are available online.
Comment: A brief survey of the current state of play for Bayesian computation in data science at Big-Data scale We wish to contribute to the discussion of ‘Comparing Consensus Monte Carlo Strategies for Distributed Bayesian Computation’ by offering our views on the current best methods for Bayesian computation, both at big-data scale and with smaller data sets, as summarized in Table 1. This table is certainly an over-simplification of a highly complicated area of research in constant (present and likely future) flux, but we believe that constructing summaries of this type is worthwhile despite their drawbacks, if only to facilitate further discussion.
Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches Commonsense knowledge and commonsense reasoning are some of the main bottlenecks in machine intelligence. In the NLP community, many benchmark datasets and tasks have been created to address commonsense reasoning for language understanding. These tasks are designed to assess machines’ ability to acquire and learn commonsense knowledge in order to reason and understand natural language text. As these tasks become instrumental and a driving force for commonsense research, this paper aims to provide an overview of existing tasks and benchmarks, knowledge resources, and learning and inference approaches toward commonsense reasoning for natural language understanding. Through this, our goal is to support a better understanding of the state of the art, its limitations, and future challenges.
Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis We describe a detailed analysis of a sample of large benchmark of commonsense reasoning problems that has been automatically obtained from WordNet, SUMO and their mapping. The objective is to provide a better assessment of the quality of both the benchmark and the involved knowledge resources for advanced commonsense reasoning tasks. By means of this analysis, we are able to detect some knowledge misalignments, mapping errors and lack of knowledge and resources. Our final objective is the extraction of some guidelines towards a better exploitation of this commonsense knowledge framework by the improvement of the included resources.
Community Detection in Networks: The Leader-Follower Algorithm Natural networks such as those between humans observed through their interactions or biological networks predicted based on various experimental measurements contain a wealth of information about the unobserved structure of the social or biological system. However, these networks are inherently noisy in the sense that they contain spurious connections making them seemingly dense. Therefore, identifying important, refined structures such as communities or clusters becomes quite challenging. Specifically, we find that the popular, traditional method of spectral clustering does not manage to learn refined community structure. The primary reason for this is that it is based upon external community connectivity properties such as graph-cuts. Motivated to overcome this limitation, we propose a community detection algorithm, called the leader-follower algorithm, based upon identifying the natural internal structure of the expected communities. The algorithm uses the notion of network centrality in a novel manner to differentiate leaders (nodes which connect different communities) from loyal followers (nodes which only have neighbors within a single community). Using this approach, it is able to learn the communities from the network structure. A salient feature of our algorithm is that, unlike the spectral clustering, it does not require knowledge of number of communities in the network; it learns it naturally. We show that our algorithm is quite effective. We prove that it detects all of the communities exactly for any network possessing communities with the natural internal structure expected in social networks. More importantly, we demonstrate its effectiveness in the context of various real networks ranging from social networks such as Facebook to biological networks such as an fMRI based human brain network.
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms In the arena of software, data mining technology has been considered as useful means for identifying patterns and trends of large volume of data. This approach is basically used to extract the unknown pattern from the large set of data for business as well as real time applications. It is a computational intelligence discipline which has emerged as a valuable tool for data analysis, new knowledge discovery and autonomous decision making. The raw, unlabeled data from the large volume of dataset can be classified initially in an unsupervised fashion by using cluster analysis i.e. clustering the assignment of a set of observations into clusters so that observations in the same cluster may be in some sense be treated as similar. The outcome of the clustering process and efficiency of its domain application are generally determined through algorithms. There are various algorithms which are used to solve this problem. In this research work two important clustering algorithms namely centroid based K-Means and representative object based FCM (Fuzzy C-Means) clustering algorithms are compared. These algorithms are applied and performance is evaluated on the basis of the efficiency of clustering output. The numbers of data points as well as the number of clusters are the factors upon which the behaviour patterns of both the algorithms are analyzed. FCM produces close results to K-Means clustering but it still requires more computation time than K-Means clustering.
Comparative Analysis of Open Source Frameworks for Machine Learning with Use Case in Single-Threaded and Multi-Threaded Modes The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation.
Comparative Study on Generative Adversarial Networks In recent years, there have been tremendous advancements in the field of machine learning. These advancements have been made through both academic as well as industrial research. Lately, a fair amount of research has been dedicated to the usage of generative models in the field of computer vision and image classification. These generative models have been popularized through a new framework called Generative Adversarial Networks. Moreover, many modified versions of this framework have been proposed in the last two years. We study the original model proposed by Goodfellow et al. as well as modifications over the original model and provide a comparative analysis of these models.
Comparison of Bayesian predictive methods for model selection The goal of this paper is to compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches. We focus on the variable subset selection for regression and classification and perform several numerical experiments using both simulated and real world data. The results show that the optimization of a utility estimate such as the cross-validation score is liable to finding overfitted models due to relatively high variance in the utility estimates when the data is scarce. Better and much less varying results are obtained by incorporating all the uncertainties into a full encompassing model and projecting this information onto the submodels. The reference model projection appears to outperform also the maximum a posteriori model and the selection of the most probable variables. The study also demonstrates that the model selection can greatly bene t from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.
Comparison of PCA with ICA from data distribution perspective We performed an empirical comparison of ICA and PCA algorithms by applying them on two simulated noisy time series with varying distribution parameters and level of noise. In general, ICA shows better results than PCA because it takes into account higher moments of data distribution. On the other hand, PCA remains quite sensitive to the level of correlations among signals.
Complex and Holographic Embeddings of Knowledge Graphs: A Comparison Embeddings of knowledge graphs have received significant attention due to their excellent performance for tasks like link prediction and entity resolution. In this short paper, we are providing a comparison of two state-of-the-art knowledge graph embeddings for which their equivalence has recently been established, i.e., ComplEx and HolE [Nickel, Rosasco, and Poggio, 2016; Trouillon et al., 2016; Hayashi and Shimbo, 2017]. First, we briefly review both models and discuss how their scoring functions are equivalent. We then analyze the discrepancy of results reported in the original articles, and show experimentally that they are likely due to the use of different loss functions. In further experiments, we evaluate the ability of both models to embed symmetric and antisymmetric patterns. Finally, we discuss advantages and disadvantages of both models and under which conditions one would be preferable to the other.
Complex Contagions: A Decade in Review Since the publication of ‘Complex Contagions and the Weakness of Long Ties’ in 2007, complex contagions have been studied across an enormous variety of social domains. In reviewing this decade of research, we discuss recent advancements in applied studies of complex contagions, particularly in the domains of health, innovation diffusion, social media, and politics. We also discuss how these empirical studies have spurred complementary advancements in the theoretical modeling of contagions, which concern the effects of network topology on diffusion, as well as the effects of individual-level attributes and thresholds. In synthesizing these developments, we suggest three main directions for future research. The first concerns the study of how multiple contagions interact within the same network and across networks, in what may be called an ecology of contagions. The second concerns the study of how the structure of thresholds and their behavioral consequences can vary by individual and social context. The third area concerns the roles of diversity and homophily in the dynamics of complex contagion, including both diversity of demographic profiles among local peers, and the broader notion of structural diversity within a network. Throughout this discussion, we make an effort to highlight the theoretical and empirical opportunities that lie ahead.
Comprehensive View on Cran Packages (Cheat Sheet)
Computation of the multivariate Oja median The multivariate Oja (1983) median is an affine equivariant multivariate location estimate with high efficiency. This estimate has a bounded influence function but zero breakdown. The computation of the estimate appears to be highly intensive. We consider different, exact and stochastic, algorithms for the calculation of the value of the estimate. In the stochastic algorithms, the gradient of the objective function, the rank function, is estimated by sampling observation hyperplanes. The estimated rank function with its estimated accuracy then yields a confidence region for the true Oja samplemedian, and the confidence region shrinks to the sample median with the increasing number of the sampled hyperplanes. Regular grids and and the grid given by the data points are used in the construction. Computation times of different algorithms are discussed and compared.
Computational Intelligence in Sports: A Systematic Literature Review Recently, data mining studies are being successfully conducted to estimate several parameters in a variety of domains. Data mining techniques have attracted the attention of the information industry and society as a whole, due to a large amount of data and the imminent need to turn it into useful knowledge. However, the effective use of data in some areas is still under development, as is the case in sports, which in recent years, has presented a slight growth; consequently, many sports organizations have begun to see that there is a wealth of unexplored knowledge in the data extracted by them. Therefore, this article presents a systematic review of sports data mining. Regarding years 2010 to 2018, 31 types of research were found in this topic. Based on these studies, we present the current panorama, themes, the database used, proposals, algorithms, and research opportunities. Our findings provide a better understanding of the sports data mining potentials, besides motivating the scientific community to explore this timely and interesting topic.
Computational Machines in a Coexistence with Concrete Universals and Data Streams We discuss that how the majority of traditional modeling approaches are following the idealism point of view in scientific modeling, which follow the set theoretical notions of models based on abstract universals. We show that while successful in many classical modeling domains, there are fundamental limits to the application of set theoretical models in dealing with complex systems with many potential aspects or properties depending on the perspectives. As an alternative to abstract universals, we propose a conceptual modeling framework based on concrete universals that can be interpreted as a category theoretical approach to modeling. We call this modeling framework pre-specific modeling. We further, discuss how a certain group of mathematical and computational methods, along with ever-growing data streams are able to operationalize the concept of pre-specific modeling.
Computational Power and the Social Impact of Artificial Intelligence Machine learning is a computational process. To that end, it is inextricably tied to computational power – the tangible material of chips and semiconductors that the algorithms of machine intelligence operate on. Most obviously, computational power and computing architectures shape the speed of training and inference in machine learning, and therefore influence the rate of progress in the technology. But, these relationships are more nuanced than that: hardware shapes the methods used by researchers and engineers in the design and development of machine learning models. Characteristics such as the power consumption of chips also define where and how machine learning can be used in the real world. Despite this, many analyses of the social impact of the current wave of progress in AI have not substantively brought the dimension of hardware into their accounts. While a common trope in both the popular press and scholarly literature is to highlight the massive increase in computational power that has enabled the recent breakthroughs in machine learning, the analysis frequently goes no further than this observation around magnitude. This paper aims to dig more deeply into the relationship between computational power and the development of machine learning. Specifically, it examines how changes in computing architectures, machine learning methodologies, and supply chains might influence the future of AI. In doing so, it seeks to trace a set of specific relationships between this underlying hardware layer and the broader social impacts and risks around AI.
Computational Theories of Curiosity-Driven Learning What are the functions of curiosity What are the mechanisms of curiosity-driven learning We approach these questions using concepts and tools from machine learning and developmental robotics. We argue that curiosity-driven learning enables organisms to make discoveries to solve complex problems with rare or deceptive rewards. By fostering exploration and discovery of a diversity of behavioural skills, and ignoring these rewards, curiosity can be efficient to bootstrap learning when there is no information, or deceptive information, about local improvement towards these problems. We review both normative and heuristic computational frameworks used to understand the mechanisms of curiosity in humans, conceptualizing the child as a sense-making organism. These frameworks enable us to discuss the bi-directional causal links between curiosity and learning, and to provide new hypotheses about the fundamental role of curiosity in self-organizing developmental structures through curriculum learning. We present various developmental robotics experiments that study these mechanisms in action, both supporting these hypotheses and opening new research avenues in machine learning and artificial intelligence. Finally, we discuss challenges for the design of experimental paradigms for studying curiosity in psychology and cognitive neuroscience. Keywords: Curiosity, intrinsic motivation, lifelong learning, predictions, world model, rewards, free-energy principle, learning progress, machine learning, AI, developmental robotics, development, curriculum learning, self-organization.
Computer Science and Metaphysics: A Cross-Fertilization Computational philosophy is the use of mechanized computational techniques to unearth philosophical insights that are either difficult or impossible to find using traditional philosophical methods. Computational metaphysics is computational philosophy with a focus on metaphysics. In this paper, we (a) develop results in modal metaphysics whose discovery was computer assisted, and (b) conclude that these results work not only to the obvious benefit of philosophy but also, less obviously, to the benefit of computer science, since the new computational techniques that led to these results may be more broadly applicable within computer science. The paper includes a description of our background methodology and how it evolved, and a discussion of our new results.
Computer-Assisted Text Analysis for Social Science: Topic Models and Beyond Topic models are a family of statistical-based algorithms to summarize, explore and index large collections of text documents. After a decade of research led by computer scientists, topic models have spread to social science as a new generation of data-driven social scientists have searched for tools to explore large collections of unstructured text. Recently, social scientists have contributed to topic model literature with developments in causal inference and tools for handling the problem of multi-modality. In this paper, I provide a literature review on the evolution of topic modeling including extensions for document covariates, methods for evaluation and interpretation, and advances in interactive visualizations along with each aspect’s relevance and application for social science research.
Computer-Simulation Model Theory (P= NP is not provable) The simulation hypothesis says that all the materials and events in the reality (including the universe, our body, our thinking, walking and etc) are computations, and the reality is a computer simulation program like a video game. All works we do (talking, reasoning, seeing and etc) are computations performed by the universe-computer which runs the simulation program. Inspired by the view of the simulation hypothesis (but independent of this hypothesis), we propose a new method of logical reasoning named ‘Computer-Simulation Model Theory’, CSMT. Computer-Simulation Model Theory is an extension of Mathematical Model Theory where instead of mathematical-structures, computer-simulations are replaced, and the activity of reasoning and computing of the reasoner is also simulated in the model. (CSMT) argues that: For a formula $\phi$, construct a computer simulation model $S$, such that 1- $\phi$ does not hold in $S$, and 2- the reasoner $I$ $($human being, the one who lives inside the reality$)$ cannot distinguish $S$ from the reality $(R)$, then $I$ cannot prove $\phi$ in reality. Although $\mathrm{CSMT}$ is inspired by the simulation hypothesis, but this reasoning method is independent of the acceptance of this hypothesis. As we argue in this part, one may do not accept the simulation hypothesis, but knows $\mathrm{CSMT}$ a valid reasoning method. As an application of Computer-Simulation Model Theory, we study the famous problem P vs NP. We let $\phi \equiv\mathrm{ [P= NP]} $ and construct a computer simulation model $E$ such that $\mathrm{P= NP}$ does not hold in $E$.
Computing the Unique Information Given a set of predictor variables and a response variable, how much information do the predictors have about the response, and how is this information distributed between unique, complementary, and shared components Recent work has proposed to quantify the unique component of the decomposition as the minimum value of the conditional mutual information over a constrained set of information channels. We present an efficient iterative divergence minimization algorithm to solve this optimization problem with convergence guarantees, and we evaluate its performance against other techniques.
Concept Tagging for Natural Language Understanding: Two Decadelong Algorithm Development Concept tagging is a type of structured learning needed for natural language understanding (NLU) systems. In this task, meaning labels from a domain ontology are assigned to word sequences. In this paper, we review the algorithms developed over the last twenty five years. We perform a comparative evaluation of generative, discriminative and deep learning methods on two public datasets. We report on the statistical variability performance measurements. The third contribution is the release of a repository of the algorithms, datasets and recipes for NLU evaluation.
Condition-Based Maintenance Using Sensor Arrays and Telematics Emergence of uniquely addressable embeddable devices has raised the bar on Telematics capabilities. Though the technology itself is not new, its application has been quite limited until now. Sensor based telematics technologies generate volumes of data that are orders of magnitude larger than what operators have dealt with previously. Real-time big data computation capabilities have opened the flood gates for creating new predictive analytics capabilities into an otherwise simple data log systems, enabling real-time control and monitoring to take preventive action in case of any anomalies. Condition-based-maintenance, usage-based-insurance, smart metering and demand-based load generation etc. are some of the predictive analytics use cases for Telematics. This paper presents the approach of condition-based maintenance using real-time sensor monitoring, Telematics and predictive data analytics.
Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2013) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B = Theta(n^1.5) bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B = Theta(n) replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.
Conservation AI: Live Stream Analysis for the Detection of Endangered Species Using Convolutional Neural Networks and Drone Technology Many different species are adversely affected by poaching. In response to this escalating crisis, efforts to stop poaching using hidden cameras, drones and DNA tracking have been implemented with varying degrees of success. Limited resources, costs and logistical limitations are often the cause of most unsuccessful poaching interventions. The study presented in this paper outlines a flexible and interoperable framework for the automatic detection of animals and poaching activity to facilitate early intervention practices. Using a robust deep learning pipeline, a convolutional neural network is trained and implemented to detect rhinos and cars (considered an important tool in poaching for fast access and artefact transportation in natural habitats) in the study, that are found within live video streamed from drones Transfer learning with the Faster RCNN Resnet 101 is performed to train a custom model with 350 images of rhinos and 350 images of cars. Inference is performed using a frame sampling technique to address the required trade-off control precision and processing speed and maintain synchronisation with the live feed. Inference models are hosted on a web platform using flask web serving, OpenCV and TensorFlow 1.13. Video streams are transmitted from a DJI Mavic Pro 2 drone using the Real-Time Messaging Protocol (RMTP). The best trained Faster RCNN model achieved a mAP of 0.83 @IOU 0.50 and 0.69 @IOU 0.75 respectively. In comparison an SSD-mobilenetmodel trained under the same experimental conditions achieved a mAP of 0.55 @IOU .50 and 0.27 @IOU 0.75.The results demonstrate that using a FRCNN and off-the-shelf drones is a promising and scalable option for a range of conservation projects.
Considerations for maximising analytic performance When it comes to running business analytics, there are three key nonfunctional requirements that must be met: fast performance, usability and affordability. Bloor Research was asked by IBM to compare the performance capabilities of the leading business analytic platforms. Specifically, we were asked to evaluate how the combined capabilities of business analytic tools and the underlying database management system can affect the overall performance of your analytic applications, reports and dashboards.
Constrained Bayesian Networks: Theory, Optimization, and Applications We develop the theory and practice of an approach to modelling and probabilistic inference in causal networks that is suitable when application-specific or analysis-specific constraints should inform such inference or when little or no data for the learning of causal network structure or probability values at nodes are available. Constrained Bayesian Networks generalize a Bayesian Network such that probabilities can be symbolic, arithmetic expressions and where the meaning of the network is constrained by finitely many formulas from the theory of the reals. A formal semantics for constrained Bayesian Networks over first-order logic of the reals is given, which enables non-linear and non-convex optimisation algorithms that rely on decision procedures for this logic, and supports the composition of several constrained Bayesian Networks. A non-trivial case study in arms control, where few or no data are available to assess the effectiveness of an arms inspection process, evaluates our approach. An open-access prototype implementation of these foundations and their algorithms uses the SMT solver Z3 as decision procedure, leverages an open-source package for Bayesian inference to symbolic computation, and is evaluated experimentally.
Content Recommendation through Semantic Annotation of User Reviews and Linked Data – An Extended Technical Report Nowadays, most recommender systems exploit user-provided ratings to infer their preferences. However, the growing popularity of social and e-commerce websites has encouraged users to also share comments and opinions through textual reviews. In this paper, we introduce a new recommendation approach which exploits the semantic annotation of user reviews to extract useful and non-trivial information about the items to recommend. It also relies on the knowledge freely available in the Web of Data, notably in DBpedia and Wikidata, to discover other resources connected with the annotated entities. We evaluated our approach in three domains, using both DBpedia and Wikidata. The results showed that our solution provides a better ranking than another recommendation method based on the Web of Data, while it improves in novelty with respect to traditional techniques based on ratings. Additionally, our method achieved a better performance with Wikidata than DBpedia.
Content Selection in Data-to-Text Systems: A Survey Data-to-text systems are powerful in generating reports from data automatically and thus they simplify the presentation of complex data. Rather than presenting data using visualisation techniques, data-to-text systems use natural (human) language, which is the most common way for human-human communication. In addition, data-to-text systems can adapt their output content to users’ preferences, background or interests and therefore they can be pleasant for users to interact with. Content selection is an important part of every data-to-text system, because it is the module that determines which from the available information should be conveyed to the user. This survey initially introduces the field of data-to-text generation, describes the general data-to-text system architecture and then it reviews the state-of-the-art content selection methods. Finally, it provides recommendations for choosing an approach and discusses opportunities for future research.
Context is Everything: Finding Meaning Statistically in Semantic Spaces This paper introduces a simple and explicit measure of word importance in a global context, including very small contexts (10+ sentences). After generating a word-vector space containing both 2-gram clauses and single tokens, it became clear that more contextually significant words disproportionately define clause meanings. Using this simple relationship in a weighted bag-of-words sentence embedding model results in sentence vectors that outperform the state-of-the-art for subjectivity/objectivity analysis, as well as paraphrase detection, and fall within those produced by state-of-the-art models for six other transfer learning tests. The metric was then extended to a sentence/document summarizer, an improved (and context-aware) cosine distance and a simple document stop word identifier. The sigmoid-global context weighted bag of words is presented as a new baseline for sentence embeddings.
Context-Aware Recommender Systems The importance of contextual information has been recognized by researchers and practitioners in many disciplines, including e-commerce personalization, information retrieval, ubiquitous and mobile computing, data mining, marketing, and management. While a substantial amount of research has already been performed in the area of recommender systems, most existing approaches focus on recommending the most relevant items to users without taking into account any additional contextual information, such as time, location, or the company of other people (e.g., for watching movies or dining out). In this chapter we argue that relevant contextual information does matter in recommender systems and that it is important to take this information into account when providing recommendations. We discuss the general notion of context and how it can be modeled in recommender systems. Furthermore, we introduce three different algorithmic paradigms – contextual prefiltering, post-filtering, and modeling – for incorporating contextual information into the recommendation process, discuss the possibilities of combining several context-aware recommendation techniques into a single unifying approach, and provide a case study of one such combined approach. Finally, we present additional capabil- ities for context-aware recommenders and discuss important and promising directions for future research.
Continual Lifelong Learning with Neural Networks: A Review Humans and animals have the ability to continually acquire and fine-tune knowledge throughout their lifespan. This ability is mediated by a rich set of neurocognitive functions that together contribute to the early development and experience-driven specialization of our sensorimotor skills. Consequently, the ability to learn from continuous streams of information is crucial for computational learning systems and autonomous agents (inter)acting in the real world. However, continual lifelong learning remains a long-standing challenge for machine learning and neural network models since the incremental acquisition of new skills from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback also for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which the number of tasks is not known a priori and the information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to continual lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic interference. Although significant advances have been made in domain-specific continual lifelong learning with neural networks, extensive research efforts are required for the development of general-purpose artificial intelligence and autonomous agents. We discuss well-established research and recent methodological trends motivated by experimentally observed lifelong learning factors in biological systems. Such factors include principles of neurosynaptic stability-plasticity, critical developmental stages, intrinsically motivated exploration, transfer learning, and crossmodal integration.
Control And Protect Sensitive Information In The Era Of Big Data This report outlines the future look of Forrester´s solution for security and risk (SandR) executives seeking to develop a holistic strategy to protect and manage sensitive data. In the never-ending race to stay ahead of the competition, companies are developing advanced capabilities to store, process, and analyze vast amounts of data from social networks, sensors, IT systems, and other sources to improve business intelligence and decisioning capabilities. ‘Big data processing’ refers to the tools and techniques that handle the extreme data volumes and velocities and wide variety of data formats resulting from implementing these capabilities. As organizations aggregate more and more data, they need to be aware that much of it could be financial, personal, and other types of sensitive data that are subject to global laws and regulations. SandR professionals need to be aware of the security issues surrounding big data so they can take an active role early in these initiatives. This report will help SandR pros understand how to control and properly protect sensitive information in the era of big data.
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey Ubiquitous sensors and smart devices from factories and communities guarantee massive amounts of data and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network. As an important enabler broadly changing people’s lives, from face recognition to ambitious smart factories and cities, artificial intelligence (especially deep learning) applications and services have experienced a thriving development process. However, due to efficiency and latency issues, the current cloud computing service architecture hinders the vision of ‘providing artificial intelligence for every person and every organization at everywhere’. Thus, recently, a better solution is unleashing deep learning services from the cloud to the edge near to data sources. Therefore, edge intelligence, aiming to facilitate the deployment of deep learning services by edge computing, has received great attention. In addition, deep learning, as the main representative of artificial intelligence, can be integrated into edge computing frameworks to build intelligent edge for dynamic, adaptive edge maintenance and management. With regard to mutually benefited edge intelligence and intelligent edge, this paper introduces and discusses: 1) the application scenarios of both; 2) the practical implementation methods and enabling technologies, namely deep learning training and inference in the customized edge computing framework; 3) existing challenges and future trends of more pervasive and fine-grained intelligence. We believe that this survey can help readers to garner information scattered across the communication, networking, and deep learning, understand the connections between enabling technologies, and promotes further discussions on the fusion of edge intelligence and intelligent edge.
Converging High-Throughput and High-Performance Computing: A Case Study The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan — a DOE leadership facility in conjunction with traditional distributed high-throughput computing to reach sustained production scales of approximately 51M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.
Cooperating with Machines Since Alan Turing envisioned Artificial Intelligence (AI) [1], a major driving force behind technical progress has been competition with human cognition. Historical milestones have been frequently associated with computers matching or outperforming humans in difficult cognitive tasks (e.g. face recognition [2], personality classification [3], driving cars [4], or playing video games [5]), or defeating humans in strategic zero-sum encounters (e.g. Chess [6], Checkers [7], Jeopardy! [8], Poker [9], or Go [10]). In contrast, less attention has been given to developing autonomous machines that establish mutually cooperative relationships with people who may not share the machine’s preferences. A main challenge has been that human cooperation does not require sheer computational power, but rather relies on intuition [11], cultural norms [12], emotions and signals [13, 14, 15, 16], and pre-evolved dispositions toward cooperation [17], common-sense mechanisms that are difficult to encode in machines for arbitrary contexts. Here, we combine a state-of-the-art machine-learning algorithm with novel mechanisms for generating and acting on signals to produce a new learning algorithm that cooperates with people and other machines at levels that rival human cooperation in a variety of two-player repeated stochastic games. This is the first general-purpose algorithm that is capable, given a description of a previously unseen game environment, of learning to cooperate with people within short timescales in scenarios previously unanticipated by algorithm designers. This is achieved without complex opponent modeling or higher-order theories of mind, thus showing that flexible, fast, and general human-machine cooperation is computationally achievable using a non-trivial, but ultimately simple, set of algorithmic mechanisms.
Cooperative Multi-Agent Planning: A Survey Cooperative multi-agent planning (MAP) is a relatively recent research field that combines technologies, algorithms and techniques developed by the Artificial Intelligence Planning and Multi-Agent Systems communities. While planning has been generally treated as a single-agent task, MAP generalizes this concept by considering multiple intelligent agents that work cooperatively to develop a course of action that satisfies the goals of the group. This paper reviews the most relevant approaches to MAP, putting the focus on the solvers that took part in the 2015 Competition of Distributed and Multi-Agent Planning, and classifies them according to their key features and relative performance.
Copulas: A Personal View Copula modeling has taken the world of finance and insurance, and well beyond, by storm. Why is this In this paper I review the early start of this development, discuss some important current research, mainly from an applications point of view, and comment on potential future developments. An alternative title of the paper would be ‘Demystifying the copula craze’. The paper also contains what I would like to call the copula must-reads.
Copy the dynamics using a learning machine Is it possible to generally construct a dynamical system to simulate a black system without recovering the equations of motion of the latter Here we show that this goal can be approached by a learning machine. Trained by a set of input-output responses or a segment of time series of a black system, a learning machine can be served as a copy system to mimic the dynamics of various black systems. It can not only behave as the black system at the parameter set that the training data are made, but also recur the evolution history of the black system. As a result, the learning machine provides an effective way for prediction, and enables one to probe the global dynamics of a black system. These findings have significance for practical systems whose equations of motion cannot be approached accurately. Examples of copying the dynamics of an artificial neural network, the Lorenz system, and a variable star are given. Our idea paves a possible way towards copy a living brain.
Core Decomposition in Multilayer Networks: Theory, Algorithms, and Applications Multilayer networks are a powerful paradigm to model complex systems, where various relations might occur among the same set of entities. Despite the keen interest in a variety of problems, algorithms, and analysis methods in this type of network, the problem of extracting dense subgraphs has remained largely unexplored. As a first step in this direction, we study the problem of core decomposition of a multilayer network. Unlike the single-layer counterpart in which cores are all nested into one another, in the multilayer context no total order exists among multilayer cores: they form a lattice whose size is exponential in the number of layers. In this setting we devise three algorithms which differ in the way they visit the core lattice and in their pruning techniques. We assess time and space efficiency of the three algorithms on a large variety of real-world multilayer networks. We then study the problem of extracting only the inner-most cores, i.e., the cores that are not dominated by any other core in terms of their index on all the layers. As inner-most cores are orders of magnitude less than all the cores, it is desirable to develop algorithms that effectively exploit the maximality property and extract inner-most cores directly, without first computing a complete decomposition. Moreover, we showcase an application of the multilayer core-decomposition tool to the problem of densest-subgraph extraction from multilayer networks. We introduce a definition of multilayer densest subgraph that trades-off between high density and number of layers in which the high density holds, and show how multilayer core decomposition can be exploited to approximate this problem with quality guarantees. We also exploit multilayer core decomposition to speed-up the extraction of frequent cross-graph quasi-cliques and to generalize the community-search problem to the multilayer setting.
Correlated Topic Models Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than x-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution. We derive a mean-field variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. The CTM gives a better fit than LDA on a collection of OCRed articles from the journal Science. Furthermore, the CTM provides a natural way of visualizing and exploring this and other unstructured data sets.
Correspondence Analysis This working paper gives a comprehensive explanation of the multivariate technique called correspondence analysis, applied in the context of a large survey of a nation´s state of health, in this case the Spanish National Health Survey. It is first shown how correspondence analysis can be used to interpret a simple cross-tabulation by visualizing the table in the form of a map of points representing the rows and columns of the table. Combinations of variables can also be interpreted by coding the data in the appropriate way. The technique can also be used to deduce optimal scale values for the levels of a categorical variable, thus giving quantitative meaning to the categories. Multiple correspondence analysis can analyze several categorical variables simultaneously, and is analogous to factor analysis of continuous variables. Other uses of correspondence analysis are illustrated using different variables of the same Spanish database: for example, exploring patterns of missing data and visualizing trends across surveys from consecutive years.
Coupled Ensembles of Neural Networks We investigate in this paper the architecture of deep convolutional networks. Building on existing state of the art models, we propose a reconfiguration of the model parameters into several parallel branches at the global network level, with each branch being a standalone CNN. We show that this arrangement is an efficient way to significantly reduce the number of parameters without losing performance or to significantly improve the performance with the same level of performance. The use of branches brings an additional form of regularization. In addition to the split into parallel branches, we propose a tighter coupling of these branches by placing the ‘fuse (averaging) layer’ before the Log-Likelihood and SoftMax layers during training. This gives another significant performance improvement, the tighter coupling favouring the learning of better representations, even at the level of the individual branches. We refer to this branched architecture as ‘coupled ensembles’. The approach is very generic and can be applied with almost any DCNN architecture. With coupled ensembles of DenseNet-BC and parameter budget of 25M, we obtain error rates of 2.92%, 15.68% and 1.50% respectively on CIFAR-10, CIFAR-100 and SVHN tasks. For the same budget, DenseNet-BC has error rate of 3.46%, 17.18%, and 1.8% respectively. With ensembles of coupled ensembles, of DenseNet-BC networks, with 50M total parameters, we obtain error rates of 2.72%, 15.13% and 1.42% respectively on these tasks.
Credimus We believe that economic design and computational complexity—while already important to each other—should become even more important to each other with each passing year. But for that to happen, experts in on the one hand such areas as social choice, economics, and political science and on the other hand computational complexity will have to better understand each other’s worldviews. This article, written by two complexity theorists who also work in computational social choice theory, focuses on one direction of that process by presenting a brief overview of how most computational complexity theorists view the world. Although our immediate motivation is to make the lens through which complexity theorists see the world be better understood by those in the social sciences, we also feel that even within computer science it is very important for nontheoreticians to understand how theoreticians think, just as it is equally important within computer science for theoreticians to understand how nontheoreticians think.
Cross-Dataset Recognition: A Survey This paper summarise and analyse the cross-dataset recognition techniques with the emphasize on what kinds of methods can be used when the available source and target data are presented in different forms for boosting the target task. This paper for the first time summarises several transferring criteria in details from the concept level, which are the key bases to guide what kind of knowledge to transfer between datasets. In addition, a taxonomy of cross-dataset scenarios and problems is proposed according the properties of data that define how different datasets are diverged, thereby review the recent advances on each specific problem under different scenarios. Moreover, some real world applications and corresponding commonly used benchmarks of cross-dataset recognition are reviewed. Lastly, several future directions are identified.
Cross-media Similarity Metric Learning with Unified Deep Networks As a highlighting research topic in the multimedia area, cross-media retrieval aims to capture the complex correlations among multiple media types. Learning better shared representation and distance metric for multimedia data is important to boost the cross-media retrieval. Motivated by the strong ability of deep neural network in feature representation and comparison functions learning, we propose the Unified Network for Cross-media Similarity Metric (UNCSM) to associate cross-media shared representation learning with distance metric in a unified framework. First, we design a two-pathway deep network pretrained with contrastive loss, and employ double triplet similarity loss for fine-tuning to learn the shared representation for each media type by modeling the relative semantic similarity. Second, the metric network is designed for effectively calculating the cross-media similarity of the shared representation, by modeling the pairwise similar and dissimilar constraints. Compared to the existing methods which mostly ignore the dissimilar constraints and only use sample distance metric as Euclidean distance separately, our UNCSM approach unifies the representation learning and distance metric to preserve the relative similarity as well as embrace more complex similarity functions for further improving the cross-media retrieval accuracy. The experimental results show that our UNCSM approach outperforms 8 state-of-the-art methods on 4 widely-used cross-media datasets.
Cross-Platform Emoji Interpretation: Analysis, a Solution, and Applications Most social media platforms are largely based on text, and users often write posts to describe where they are, what they are seeing, and how they are feeling. Because written text lacks the emotional cues of spoken and face-to-face dialogue, ambiguities are common in written language. This problem is exacerbated in the short, informal nature of many social media posts. To bypass this issue, a suite of special characters called ’emojis,’ which are small pictograms, are embedded within the text. Many emojis are small depictions of facial expressions designed to help disambiguate the emotional meaning of the text. However, a new ambiguity arises in the way that emojis are rendered. Every platform (Windows, Mac, and Android, to name a few) renders emojis according to their own style. In fact, it has been shown that some emojis can be rendered so differently that they look ‘happy’ on some platforms, and ‘sad’ on others. In this work, we use real-world data to verify the existence of this problem. We verify that the usage of the same emoji can be significantly different across platforms, with some emojis exhibiting different sentiment polarities on different platforms. We propose a solution to identify the intended emoji based on the platform-specific nature of the emoji used by the author of a social media post. We apply our solution to sentiment analysis, a task that can benefit from the emoji calibration technique we use in this work. We conduct experiments to evaluate the effectiveness of the mapping in this task.
Cross-validation This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given family. For the risk estimation problem, we compute the bias (which can also be corrected) and the variance of cross-validation methods. For estimator selection, we first provide a first-order analysis (based on expectations). Then, we explain how to take into account second-order terms (from variance computations, and by taking into account the usefulness of overpenalization). This allows, in the end, to provide some guidelines for choosing the best cross-validation method for a given learning problem.
Crowd-Powered Data Mining Many data mining tasks cannot be completely addressed by automated processes, such as sentiment analysis and image classification. Crowdsourcing is an effective way to harness the human cognitive ability to process these machine-hard tasks. Thanks to public crowdsourcing platforms, e.g., Amazon Mechanical Turk and CrowdFlower, we can easily involve hundreds of thousands of ordi- nary workers (i.e., the crowd) to address these machine-hard tasks. In this tutorial, we will survey and synthesize a wide spectrum of existing studies on crowd-powered data mining. We rst give an overview of crowdsourcing, and then summarize the fundamental techniques, including quality control, cost control, and latency control, which must be considered in crowdsourced data mining. Next we review crowd-powered data mining operations, including classification, clustering, pattern mining, outlier detection, knowledge base construction and enrichment. Finally, we provide the emerging challenges in crowdsourced data mining.
Cumulative Gains Model Quality Metric This paper proposes a more comprehensive look at the ideas of KS and Area Under the Curve AUC of a cumulative gains chart to develop a model quality statistic which can be used agnostically to evaluate the quality of a wide range of models in a standardized fashion. It can be either used holistically on the entire range of the model or at a given decision threshold of the model. Further it can be extended into the model learning process.
Customer Analytics in the age of Social Media Becoming ‘customer centric’ is a top priority today, and for good reason: as if it weren´t important enough that customers buy products and contract for services, they now do much more than simply buy. Customers participate in social media networks and chat rooms; they write blogs and contribute to comment sites; and they share information through sites such as YouTube and Flickr. Their activities and expressions not only reveal personal buying behavior and interests, but they also bring into focus their influence on purchasing by others in their social networks.
Customised Structural Elicitation Established methods for structural elicitation typically rely on code modelling standard graphical models classes, most often Bayesian networks. However, more appropriate models may arise from asking the expert questions in common language about what might relate to what and exploring the logical implications of the statements. Only after identifying the best matching structure should this be embellished into a fully quantified probability model. Examples of the efficacy and potential of this more flexible approach are shown below for four classes of graphical models: Bayesian networks, Chain Event Graphs, Multi-regression Dynamic Models, and Flow Graphs. We argue that to be fully effective any structural elicitation phase must first be customised to an application and if necessary new types of structure with their own bespoke semantics elicited.
Cyber-Physical Systems Resilience: State of the Art, Research Issues and Future Trends Ideally, full integration is needed between the Internet and Cyber-Physical Systems (CPSs). These systems should fulfil time-sensitive functions with variable levels of integration with their environment, incorporating data storage, computation, communications, sensing, and control. There are, however, significant problems emerging from the convergence between CPS and Internet of Things (IoT) areas. The high heterogeneity, complexity, and dynamics of these resource-constrained systems bring new challenges to their robust and reliable operation, which implies the need for novel resilience management strategies. This paper surveys the state of the art in the relevant fields and, discusses the research issues and future trends that emerge. Thus, we hope to provide new insights into the management of resilient CPSs, formed by IoT devices, modelled by Game Theory, and flexibly programmed using the latest software and virtualization platforms.

D

Data Acceleration: Architecture for the Modern Data Supply Chain Data technologies are evolving rapidly, but organizations have adopted most of these in piecemeal fashion. As a result, enterprise data—whether related to customer interactions, business performance, computer notifications, or external events in the business environment —is vastly underutilized. Moreover, companies´ data ecosystems have become complex and littered with data silos. This makes the data more difficult to access, which in turn limits the value that organizations can get out of it. Indeed, according to a recent Gartner, Inc. report, 85 percent of Fortune 500 organizations will be unable to exploit Big Data for competitive advantage through 2015.1 Furthermore, a recent Accenture study found that half of all companies have concerns about the accuracy of their data, and the majority of executives are unclear about the business outcomes they are getting from their data analytics programs. To unlock the value hidden in their data, companies must start treating data as a supply chain, enabling it to flow easily and usefully through the entire organization—and eventually throughout each company´s ecosystem of partners, including suppliers and customers. The time is right for this approach. For one thing, new external data sources are becoming available, providing fresh opportunities for data insights. In addition, the tools and technology required to build a better data platform are available and in use. These provide a foundation on which companies can construct an integrated, end-to-end data supply chain.
Data Analysis the Data.Table way (Cheat Sheet)
Data Center Infrastructure Management (DCIM) For Dummies Data Center Infrastructure Management (DCIM) is the discipline of managing the physical infrastructure of a data center and optimizing its ongoing operation. DCIM is a software suite that bridges the traditional gap between IT and the facilities groups and coordinates between the two. DCIM reduces computing costs while making it easier to quickly support new applications and other business requirements. About This Book This book explains the importance of DCIM, describes the key components of a modern DCIM system, guides you in the selection of the right DCIM solution for your particular needs, and gives you a step-by-step formula for a successful DCIM implementation. Because this is a For Dummies book, you can be sure that it´s easy to read and has touches of humor.
Data Clustering With Leaders and Subleaders Algorithm In this paper, an efficient hierarchical clustering algorithm, suitable for large data sets is proposed for effective clustering and prototype selection for pattern classification. It is another simple and efficient technique which uses incremental clustering principles to generate a hierarchical structure for finding the subgroups/subclusters within each cluster. As an example, a two level clustering algorithm – Leaders-Subleaders, an extension of the leader algorithm is presented. Classification accuracy (CA) obtained using the representatives generated by the Leaders-Subleaders method is found to be better than that of using leaders as representatives. Even if more number of prototypes are generated, classification time is less as only a part of the hierarchical structure is searched.
Data Clustering: A Review Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
Data Curation with Deep Learning [Vision]: Towards Self Driving Data Curation Past. Data curation – the process of discovering, integrating, and cleaning data – is one of the oldest data management problems. Unfortunately, it is still the most time consuming and least enjoyable work of data scientists. So far, successful data curation stories are mainly ad-hoc solutions that are either domain-specific (for example, ETL rules) or task-specific (for example, entity resolution). Present. The power of current data curation solutions are not keeping up with the ever changing data ecosystem in terms of volume, velocity, variety and veracity, mainly due to the high human cost, instead of machine cost, needed for providing the ad-hoc solutions mentioned above. Meanwhile, deep learning is making strides in achieving remarkable successes in areas such as image recognition, natural language processing, and speech recognition. This is largely due to its ability to understanding features that are neither domain-specific nor task-specific. Future. Data curation solutions need to keep the pace with the fast-changing data ecosystem, where the main hope is to devise domain-agnostic and task-agnostic solutions. To this end, we start a new research project, called AutoDC, to unleash the potential of deep learning towards self-driving data curation. We will discuss how different deep learning concepts can be adapted and extended to solve various data curation problems. We showcase some low-hanging fruits about the early encounters between deep learning and data curation happening in AutoDC. We believe that the directions pointed out by this work will not only drive AutoDC towards democratizing data curation, but also serve as a cornerstone for researchers and practitioners to move to a new realm of data curation solutions.
Data Driven: Creating a Data Culture The data movement is in full swing. There are conferences (Strata +Hadoop World), bestselling books (Big Data, The Signal and the Noise, Lean Analytics), business articles (‘Data Scientist: The Sexiest Job of the 21st Century’), and training courses (An Introduction to Machine Learning with Web Data, the Insight Data Science Fellows Program) on the value of data and how to be a data scientist. Unfortunately, there is little that discusses how companies that successfully use data actually do that work. Using data effectively is not just about which database you use or how many data scientists you have on staff, but rather it´s a complex interplay between the data you have, where it is stored and how people work with it, and what problems are considered worth solving. While most people focus on the technology, the best organizations recognize that people are at the center of this complexity. In any organization, the answers to questions such as who controls the data, who they report to, and how they choose what to work on are always more important than whether to use a database like PostgreSQL or Amazon Redshift or HDFS. We want to see more organizations succeed with data. We believe data will change the way that businesses interact with the world, and we want more people to have access. To succeed with data, businesses must develop a data culture.
Data Innovation for International Development: An overview of natural language processing for qualitative data analysis Availability, collection and access to quantitative data, as well as its limitations, often make qualitative data the resource upon which development programs heavily rely. Both traditional interview data and social media analysis can provide rich contextual information and are essential for research, appraisal, monitoring and evaluation. These data may be difficult to process and analyze both systematically and at scale. This, in turn, limits the ability of timely data driven decision-making which is essential in fast evolving complex social systems. In this paper, we discuss the potential of using natural language processing to systematize analysis of qualitative data, and to inform quick decision-making in the development context. We illustrate this with interview data generated in a format of micro-narratives for the UNDP Fragments of Impact project.
Data learning from big data Technology is generating a huge and growing availability of observa tions of diverse nature. This big data is placing data learning as a central scientific discipline. It includes collection, storage, preprocessing, visualization and, essentially, statistical analysis of enormous batches of data. In this paper, we discuss the role of statistics regarding some of the issues raised by big data in this new paradigm and also propose the name of data learning to describe all the activities that allow to obtain relevant knowledge from this new source of information.
Data Management: A Unified Approach Unified data management is becoming a strategic advantage in today´s business world. With the advent of big data, the volume and type of information that companies must use in near-real time to gain a competitive edge is growing at an unprecedented rate. Meanwhile, industry consolidation is leading to mergers and acquisitions that require disparate IT systems to be harmonized in order to move forward. These forces, combined with ongoing pressure to use all available data to improve employee productivity, customer satisfaction and innovation, are spurring enterprises to make data management planning a top priority. To support these plans and help achieve important business goals, enterprises are turning to data management solutions with significant urgency. According to a recent IDG Research Services study of 118 IT professionals, 87 percent of respondents said data integration tools have been deployed or are on their company´s road maps; 84 percent answered the same for data quality tools; 82 percent for master data management solutions; and 81 percent for data governance/data stewardship initiatives. Nearly three-fifths of respondents at organizations that have data management solutions in place are planning to continue making near-term investments in these types of tools.
Data Mining and Statistics: What is the Connection Data Mining is used to discover patterns and relationships in data, with an emphasis on large observational data bases. It sits at the common frontiers of several fields including Data Base Management, Arti cial Intelligence, Machine Learning, Pattern Recognition, and Data Visualization. From a statistical perspective it can be viewed as computer automated exploratory data analysis of (usually) large complex data sets. In spite of (or perhaps because of) the somewhat exaggerated hype, this eld is having a major impact in business, industry, and science. It also a ords enormous research opportunities for new methodological developments. Despite the obvious connections between data mining and statistical data analysis, most of the methodologies used in Data Mining have so far originated in fields other than Statistics. This paper explores some of the reasons for this, and why statisticians should have an interest in Data Mining. It is argued that Statistics can potentially have a major in uence on Data Mining, but in order to do so some of our basic paradigms and operating principles may have to be modified.
Data Mining Cluster Analysis: Basic Concepts and Algorithms (Slide Deck)
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Manufacturing enterprises have been collecting and storing more and more current, detailed and accurate production relevant data. The data stores offer enormous potential as source of new knowledge, but the huge amount of data and its complexity far exceeds the ability to reduce and analyze data without the use of automated analysis techniques. This paper provides a brief introduction into knowledge discovery from databases and presents the methology for data mining in time series. The relevancy of data mining for manufacturing shall be depicted.
Data Mining Standards In this survey paper we have consolidated all the current data mining standards. We have categorized them in to process standards, XML standards, standard APIs, web standards and grid standards and discussed them in considerable detail. We have also designed an application using these standards. We later also analyze the standards their influence on data mining application development and later point out areas in the data mining application development that need to be standardized. We also talk about the trend in the focus areas addressed by these standards.
Data Mining: A Conceptual Overview This tutorial provides an overview of the data mining process. The tutorial also provides a basic understanding of how to plan, evaluate and successfully refine a data mining project, particularly in terms of model building and model evaluation. Methodological considerations are discussed and illustrated. After explaining the nature of data mining and its importance in business, the tutorial describes the underlying machine learning and statistical techniques involved. It describes the CRISP-DM standard now being used in industry as the standard for a technology-neutral data mining process model. The paper concludes with a major illustration of the data mining process methodology and the unsolved problems that offer opportunities for research. The approach is both practical and conceptually sound in order to be useful to both academics and practitioners.
Data Mining: Discovering and Visualizing Patterns with Python (RefCard)
Data profit vs. Data waste: Boosting business performance every day in the real world with information optimization Companies do many things to grow profits. They discover new market opportunities. They sell more effectively. They innovate. They delight their customers. They improve productivity. They find ways to cut costs and mitigate risks. It can be difficult to do these things in today´s economic environment, because revenue opportunities are not always abundant and executives are largely disinclined to make substantial investments in new business capabilities. Despite current conditions, businesses are still finding ways to significantly improve their performance on a daily basis. One of these ways is the aggressive pursuit of data profit. Data profit is what results when companies make economically optimized use of all the structured and unstructured data already residing in existing systems across the enterprise to get better at everything the business needs to do: discovering opportunities, selling, innovating, delighting customers, improving productivity, cutting costs, and mitigating risk. Data profit has become an especially compelling business strategy today, because companies now suffer as never before from a specific problem that is the very opposite of data profit. That problem is data waste. Data waste occurs when companies do not fully utilize the wealth of data that they already have. This problem has become highly prevalent because companies have implemented so many systems over the past decade or more – from high-end databases and applications to email and basic desktop productivity tools – but have not developed effective strategies for fully leveraging their collective information output….
Data Science (Poster)
Data Science and its Relationship to Big Data and data-driven Decision Making Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data-science programs, and publications are touting data science as a hot – even ‘‘sexy´´ – career choice. However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz. In this article, we argue that there are good reasons why it has been hard to pin down exactly what is data science. One reason is that data science is intricately intertwined with other important concepts also of growing importance, such as big data and data-driven decision making. Another reason is the natural tendency to associate what a practitioner does with the definition of the practitioner´s field; this can result in overlooking the fundamentals of the field. We believe that trying to define the boundaries of data science precisely is not of the utmost importance. We can debate the boundaries of the field in an academic setting, but in order for data science to serve business effectively, it is important (i) to understand its relationships to other important related concepts, and (ii) to begin to identify the fundamental principles underlying data science. Once we embrace (ii), we can much better understand and explain exactly what data science has to offer. Furthermore, only once we embrace (ii) should we be comfortable calling it data science. In this article, we present a perspective that addresses all these concepts. We close by offering, as examples, a partial list of fundamental principles underlying data science.
Data science as a language: challenges for computer science – a position paper In this paper, I posit that from a research point of view, Data Science is a language. More precisely Data Science is doing Science using computer science as a language for datafied sciences; much as mathematics is the language of, e.g., physics. From this viewpoint, three (classes) of challenges for computer science are identified; complementing the challenges the closely related Big Data problem already poses to computer science. I discuss the challenges with references to, in my opinion, related, interesting directions in computer science research; note, I claim neither that these directions are the most appropriate to solve the challenges nor that the cited references represent the best work in their field, they are inspirational to me. So, what are these challenges Firstly, if computer science is to be a language, what should that language look like While our traditional specifications such as pseudocode are an excellent way to convey what has been done, they fail for more mathematics like reasoning about computations. Secondly, if computer science is to function as a foundation of other, datafied, sciences, its own foundations should be in order. While we have excellent foundations for supervised learning—e.g., by having loss functions to optimize and, more general, by PAC learning (Valiant in Commun ACM 27(11):1134-1142, 1984)—this is far less true for unsupervised learning. Kolmogorov complexity—or, more general, Algorithmic Information Theory—provides a solid base (Li and Vitányi in An introduction to Kolmogorov complexity and its applications, Springer, Berlin, 1993). It provides an objective criterion to choose between competing hypotheses, but it lacks, e.g., an objective measure of the uncertainty of a discovery that datafied sciences need. Thirdly, datafied sciences come with new conceptual challenges. Data-driven scientists come up with data analysis questions that sometimes do and sometimes don´t, fit our conceptual toolkit. Clearly, computer science does not suffer from a lack of interesting, deep, research problems. However, the challenges posed by data science point to a large reservoir of untapped problems. Interesting, stimulating problems, not in the least because they are posed by our colleagues in datafied sciences. It is an exciting time to be a computer scientist.
Data Science Code of Professional Conduct We look at the proposed Data Science Code of Professional Conduct and nominate a ‘Golden Rule’ which summarizes the data scientist ethic.
Data Science in the Cloud with Microsoft Azure Machine Learning and R Recently, Microsoft launched the Azure Machine Learning cloud platform – Azure ML. Azure ML provides an easy-to-use and powerful set of cloud-based data transformation and machine learning tools. This report covers the basics of manipulating data, as well as constructing and evaluating models in Azure ML, illustrated with a data science example. Before we get started, here are a few of the benefits Azure ML provides for machine learning solutions: • Solutions can be quickly deployed as web services. • Models run in a highly scalable cloud environment. • Code and data are maintained in a secure cloud environment. • Available algorithms and data transformations are extendable using the R language for solution-specific functionality. Throughout this report, we’ll perform the required data manipulation then construct and evaluate a regression model for a bicycle sharing demand dataset. You can follow along by downloading the code and data provided below. Afterwards, we’ll review how to publish your trained models as web services in the Azure cloud.
Data Science in the Cloud with Microsoft Azure Machine Learning and R: 2015 Update This report covers the basics of manipulating data, constructing models, and evaluating models in the Microsoft Azure Machine Learning platform (Azure ML). The Azure ML platform has greatly simplified the development and deployment of machine learning models, with easy-to-use and powerful cloud-based data transformation and machine learning tools. In this report, we´ll explore extending Azure ML with the R language. (A companion report explores extending Azure ML using the Python language.) All of the concepts we will cover are illustrated with a data science example, using a bicycle rental demand dataset. We´ll perform the required data manipulation, or data munging. Then, we will construct and evaluate regression models for the dataset. You can follow along by downloading the code and data provided in the next section. Later in the report, we´ll discuss publishing your trained models as web services in the Azure cloud.
Data Science Revealed: A Data-Driven Glimpse into the Burgeoning new Field As the cost of computing power, data storage, and high-bandwidth Internet access and have plunged exponentially over the past two decades, companies around the globe recognized the power of harnessing data as a source of competitive advantage. But it was only recently, as social web applications and massive, parallel processing have become more widely available that the nescient field of data science revealed what many are becoming to understand: that data is the new oil,i the source for corporate energy and differentiation in the 21st century. Companies like Facebook, LinkedIn, Yahoo, and Google are generating data not only as their primary product, but are analyzing it to continuously improve their products. Pharmaceutical and biomedical companies are using big data to find new cures and analyze genetic information, while marketers leverage the same technology to generate new customer insights. In order to tap this newfound wealth, organizations of all sizes are turning to practitioners in the new field of data science who are capable of translating massive data into predictive insights that lead to results. Data science is an emerging field, with rapid changes, great uncertainty, and exciting opportunities. Our study attempts the first ever benchmark of the data science community, looking at how they interact with their data, the tools they use, their education, and how their organizations approach data-driven problem solving. We also looked at a smaller group of business intelligence professionals to identify areas of contrast between the emerging role of data scientists and the more mature field of BI. Our findings, summarized here, show an emerging talent gap between organizational needs and current industry capabilities exemplified by the unique contributions data scientists can make to an organization and the broad expectations of data science professionals generally.
Data Science Salary Survey 2013 O´Reilly Media conducted an anonymous salary and tools survey in 2012 and 2013 with attendees of the Strata Conference: Making Data Work in Santa Clara, California and Strata + Hadoop World in New York. Respondents from 37 US states and 33 countries, representing a variety of industries in the public and private sector, completed the survey. We ran the survey to better understand which tools data analysts and data scientists use and how those tools correlate with salary. Not all respondents describe their primary role as data scientist/data analyst, but almost all respondents are exposed to data analytics. Similarly, while just over half the respondents described themselves as technical leads, almost all reported that some part of their role included technical duties (i.e., 10-20% of their responsibilities included data analysis or software development). We looked at which tools correlate with others (if respondents use one, are they more likely to use another ) and created a network graph of the positive correlations. Tools could then be compared with salary, either individually or collectively, based on where they clustered on the graph.
Data Science vs. Statistics: Two Cultures Data science is the business of learning from data, which is traditionally the business of statistics. Data science, however, is often understood as a broader, task-driven and computationally-oriented version of statistics. Both the term data science and the broader idea it conveys have origins in statistics and are a reaction to a narrower view of data analysis. Expanding upon the views of a number of statisticians, this paper encourages a big-tent view of data analysis. We examine how evolving approaches to modern data analysis relate to the existing discipline of statistics (e.g. exploratory analysis, machine learning, reproducibility, computation, communication and the role of theory). Finally, we discuss what these trends mean for the future of statistics by highlighting promising directions for communication, education and research.
Data Science, an Overview of Classification Techniques (Slide Deck)
Data Science, Banking, and Fintech The financial industry today is under siege, but not from economic pressures in Europe and China. Rather, this once-impenetrable fortress is currently riding a giant entrepreneurial wave of disruption, disintermediation, and digital innovation. Behind the siege is fintech, a spunky and growing group of financial technology companies. These venture-backed new arrivals are challenging the old champions in lending, payments, money transfer, trading, wealth management, and cryptocurrencies. In this O´Reilly report, author Cornelia Lévy-Bencheton examines the disruptive megatrends taking hold at every level and juncture of the financial ecosystem. You´ll find out how fintech is reshaping the financial industry, reimagining the ways consumers manage, save, and spend money through a data-driven culture of big data analytics, mobile payment services, and robo-advising. Can traditional financial institutions evolve in time to catch up and avoid being replaced Pick up this report to learn about the current banking and financial services industry, key participants in fintech, and some adaptive strategies being used by traditional financial organizations.
Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics An action plan to enlarge the technical areas of statistics focuses on the data analyst. The plan sets out six technical areas of work for a university department and advocates a specific allocation of resources devoted to research in each area and to courses in each area. The value of technical work is judged by the extent to which it benefits the data analyst, either directly or indirectly. The plan is also applicable to government research labs and corporate research organizations.
Data Science: The Impact of Statistics In this paper, we substantiate our premise that statistics is one of the most important disciplines to provide tools and methods to find structure in and to give deeper insight into data, and the most important discipline to analyze and quantify uncertainty. We give an overview over different proposed structures of Data Science and address the impact of statistics on such steps as data acquisition and enrichment, data exploration, data analysis and modeling, validation and representation and reporting. Also, we indicate fallacies when neglecting statistical reasoning.
Data Scientist Enablement Roadmap (Slide Deck)
Data Scientist: The Sexiest Job of the 21st Century When Jonathan Goldman arrived for work in June 2006 at LinkedIn, the business networking site, the place still felt like a startup. The company had just under 8 million accounts, and the number was growing quickly as existing members invited their friends and colleagues to join. But users weren´t seeking out connections with the people who were already on the site at the rate executives had expected. Something was apparently missing in the social experience. As one LinkedIn manager put it, ‘It was like arriving at a conference reception and realizing you don´t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.’
Data Storytelling: Using visualization to share the human impact of numbers Storytelling is a cornerstone of the human experience. The universe may be full of atoms, but it´s through stories that we truly construct our world. From Greek mythology to the Bible to television series like Cosmos, stories have been shaping our experience on Earth for as long as we´ve lived on it. A key purpose of storytelling is not just understanding the world but changing it. After all, why would we study the world if we didn´t want to know how we can—and should— influence it Though many elements of stories have remained the same throughout history, we have developed better tools and mediums for telling them, such as printed books, movies, and comics. This has changed storytelling styles—and perhaps most importantly, the impact of those stories—over the millennia. But can stories be told with data, as well as with images and words That´s what this paper´s about.
Data Stream Mining – A Practical Approach Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA includes a collection of offline and online methods as well as tools for evaluation. In particular, it implements boosting, bagging, and Hoeffding Trees, all with and without Naïve Bayes classifiers at the leaves. MOA is related to WEKA, the Waikato Environment for Knowledge Analysis, which is an award-winning open-source workbench containing implementations of a wide range of batch machine learning methods. WEKA is also written in Java. The main benefits of Java are portability, where applications can be run on any platform with an appropriate Java virtual machine, and the strong and welldeveloped support libraries. Use of the language is widespread, and features such as the automatic garbage collection help to reduce programmer burden and error. This text explains the theoretical and practical foundations of the methods and streams available in MOA.
Data Visualization Techniques – From Basics to Big Data with SAS Visual Analytics A picture is worth a thousand words – especially when you are trying to understand and gain insights from data. It is particularly relevant when you are trying to find relationships among thousands or even millions of variables and determine their relative importance. Organizations of all types and sizes generate data each minute, hour and day. Everyone – including executives, departmental decision makers, call center workers and employees on production lines – hopes to learn things from collected data that can help them make better decisions, take smarter actions and operate more efficiently. Regardless of how much data you have, one of the best ways to discern important relationships is through advanced analysis and high-performance data visualization. If sophisticated analyses can be performed quickly, even immediately, and results presented in ways that showcase patterns and allow querying and exploration, people across all levels in your organization can make faster, more effective decisions. To create meaningful visuals of your data, there are some basics you should consider. Data size and column composition play an important role when selecting graphs to represent your data. This paper discusses some of the basic issues concerning data visualization and provides suggestions for addressing those issues. In addition, big data brings a unique set of challenges for creating visualizations. This paper covers some of those challenges and potential solutions as well. If you are working with massive amounts of data, one challenge is how to display results of data exploration and analysis in a way that is not overwhelming. You may need a new way to look at the data – one that collapses and condenses the results in an intuitive fashion but still displays graphs and charts that decision makers are accustomed to seeing. And, in today´s on-the-go society, you may also need to make the results available quickly via mobile devices, and provide users with the ability to easily explore data on their own in real time. SAS Visual Analytics is a new business intelligence solution that uses intelligent autocharting to help business analysts and nontechnical users visualize data. It creates the best possible visual based on the data that is selected. The visualizations make it easy to see patterns and trends and identify opportunities for further analysis. The heart and soul of SAS Visual Analytics is the SAS LASR Analytic Server, which can execute and accelerate analytic computations in-memory with unprecedented performance. The combination of high-performance analytics and an easy-to-use data exploration interface enables different types of users to create and interact with graphs so they can understand and derive value from their data faster than ever. This creates an unprecedented ability to solve difficult problems, improve business performance and mitigate risk – rapidly and confidently.
Data Visualization with ggplot2 (Cheat Sheet)
Data Visualization: A New Language for Storytelling An Emerging Universal Medium: When was the last time you saw a business presentation that did not include at least one slide with a bar graph or a pie chart Data visualizations have become so ubiquitous that we no longer find them remarkable.
Data Visualization: Making Big Data Approachable and Valuable Enterprises today are beginning to realize the important role Big Data plays in achieving business goals. Concepts that used to be difficult for companies to comprehend— factors that influence a customer to make a purchase, behavior patterns that point to fraud or misuse, inefficiencies slowing down business processes—now can be understood and addressed by collecting and analyzing Big Data. The insight gained from such analysis helps organizations improve operations and identify new product and service opportunities that they may have otherwise missed. In essence, Big Data promises to deliver the advantages that companies need to drive revenue growth and gain a competitive edge. However, getting to that Big Data payoff is proving a difficult challenge for many organizations. Big Data is often voluminous and tends to rapidly change and morph, making it challenging to get a handle on and difficult to access. The majority of tools available to work with Big Data are complex and hard to use, and most enterprises don´t have the in-house expertise to perform the required data analysis and manipulation to draw out the answers that the business is seeking. In fact, in a recent survey conducted by IDG Research, when asked about analyzing Big Data, respondents cite lack of skills and difficulty in making Big Data available to users as two significant challenges. ‘A lot of existing Big Data techniques require you to really get your hands dirty; I don´t think that most Big Data software is as mature as it needs to be in order to be accessible to business users at most enterprises,’ says Paul Kent, vice president of Big Data with SAS. ‘So if you´re not Google or LinkedIn or Facebook, and you don´t have thousands of engineers to work with Big Data, it can be difficult to find business answers in the information.’ What enterprises need are tools to help them easily and effectively understand and analyze Big Data. Employees who aren´t data scientists or analysts should be able to ask questions of the data based on their own business expertise and quickly and easily find patterns, spot inconsistencies, even get answers to questions they haven´t yet thought to ask. Otherwise, the effort and expense that companies invest in collecting and mining Big Data may be challenged to yield significant actionable results. And companies run the risk of missing important business opportunities if they can´t find the answers that are likely stored in their own data.
Data Visualization: When Data Speaks Business This TEC Product Analysis Report aims to provide an extensive review of the set of data visualization features that form part of the essential core of IBM Cognos Business Intelligence (BI) capabilities. The report contains the following elements: 1. An introduction to IBM Cognos Business Intelligence and data visualization for providing extensive analytics and data discovery services 2. An analyst perspective covering data visualization, its role, importance, and value in the BI lifecycle chain and examining its relationship to other elements in a reliable and best practice scenario for performing BI within an organization 3. A review of IBM Cognos data visualization capabilities 4. A general conclusion and final analyst summary
Data Warehousing: Best Practices for Collecting, Storing, and Delivering Decision-Support Data Data Warehousing is a process for collecting, storing, and delivering decision-support data for some or all of an enterprise. Data warehousing is a broad subject that is described point by point in this Refcard. A data warehouse is one of the artifacts created in the data warehousing process.
Data Wrangling with dplyr and tidyr Cheat Sheet (Cheat Sheet)
Data: Emerging Trends and Technologies What are the emerging trends and technologies that will transform the data landscape in coming months In this report from Strata + Hadoop World co-chair Alistair Croll, you’ll learn how the ubiquity of cheap sensors, fast networks, and distributed computing have given rise to several developments that will soon have a profound effect on individuals and society as a whole. Machine learning, for example, has quickly moved from lab tool to hosted, pay-as-you-go services in the cloud. Those services, in turn, are leading to predictive apps that will provide individuals with the right functionality and content at the right time by continuously learning about them and predicting what they’ll need. Computational power can produce cognitive augmentation.
Database as a Service – Current Issues and Its Future With the prevalence of applications in cloud, Database as a Service (DBaaS) becomes a promising method to provide cloud applications with reliable and flexible data storage services. It provides a number of interesting features to cloud developers, however, it suffers a few drawbacks: long learning curve and development cycle, lacking of in-depth support for NoSQL, lacking of flexible configuration for security and privacy, and high cost models. In this paper, we investigate these issues among current DBaaS providers and propose a novel Trinity Model that can significantly reduce the learning curves, improve the security and privacy, and accelerate database design and development. We further elaborate our ongoing and future work on developing large real-world SaaS projects using this new DBaaS model.
Database Meets Deep Learning: Challenges and Opportunities Deep learning has recently become very popular on account of its incredible success in many complex data-driven applications, such as image classification and speech recognition. The database community has worked on data-driven applications for many years, and therefore should be playing a lead role in supporting this new wave. However, databases and deep learning are different in terms of both techniques and applications. In this paper, we discuss research problems at the intersection of the two fields. In particular, we discuss possible improvements for deep learning systems from a database perspective, and analyze database applications that may benefit from deep learning techniques.
Data-Driven Nested Stochastic Robust Optimization: A General Computational Framework and Algorithm for Optimization under Uncertainty in the Big Data Era A novel data-driven nested stochastic robust optimization (DDNSRO) framework is proposed to systematically and automatically handle labeled multi-class uncertainty data in optimization problems. Uncertainty realizations in large datasets are often collected from various conditions, which are encoded by class labels. A group of Dirichlet process mixture models is employed for uncertainty modeling from the multi-class uncertainty data. The proposed data-driven nonparametric uncertainty model could automatically adjust its complexity based on the data structure and complexity, thus accurately capturing the uncertainty information. A DDNSRO framework is further proposed based on the data-driven uncertainty model through a bi-level optimization structure. The outer optimization problem follows a two-stage stochastic programming approach to optimize the expected objective across different classes of data; robust optimization is nested as the inner problem to ensure the robustness of the solution while maintaining computational tractability. A tailored column-and-constraint generation algorithm is further developed to solve the resulting multi-level optimization problem efficiently. Case studies on strategic planning of process networks are presented to demonstrate the applicability of the proposed framework.
Data-intensive applications, challenges, techniques and technologies: A survey on Big Data It is already true that Big Data has drawn huge attention from researchers in information sciences, policy and decision makers in governments and enterprises. As the speed of information growth exceeds Moore´s Law at the beginning of this new century, excessive data is making great troubles to human beings. However, there are so much potential and highly useful values hidden in the huge volume of data. A new scientific paradigm is born as dataintensive scientific discovery (DISD), also known as Big Data problems. A large number of fields and sectors, ranging from economic and business activities to public administration, from national security to scientific researches in many areas, involve with Big Data problems. On the one hand, Big Data is extremely valuable to produce productivity in businesses and evolutionary breakthroughs in scientific disciplines, which give us a lot of opportunities to make great progresses in many fields. There is no doubt that the future competitions in business productivity and technologies will surely converge into the Big Data explorations. On the other hand, Big Data also arises with many challenges, such as difficulties in data capture, data storage, data analysis and data visualization. This paper is aimed to demonstrate a close-up view about Big Data, including Big Data applications, Big Data opportunities and challenges, as well as the state-of-the-art techniques and technologies we currently adopt to deal with the Big Data problems. We also discuss several underlying methodologies to handle the data deluge, for example, granular computing, cloud computing, bio-inspired computing, and quantum computing.
Deciphering Big Data Stacks: An Overview of Big Data Tools With its ability to ingest, process, and decipher an abundance of incoming data, the Big Data is considered by many a cornerstone of future research and development. However, the large number of available tools and the overlap between those are impeding their technological potential. In this paper, we present a systematic grouping of the available tools and present a network of dependencies among those with the aim of composing individual tools into functional software stacks required to perform Big Data analyses.
Decision Management and Cloud as a Platform for Predictive Analytics (Slide Deck)
Decision Modeling with DMN: How to Build a Decision Requirements Model using the new Decision Model and Notation (DMN) standard The goal of this paper is to describe the four iterative steps to complete a Decision Requirements Model using the forthcoming DMN standard. Before beginning, it is important to understand the value of defining decision requirements as part of your overall requirements process. Experience shows that there are three main reasons for doing so: 1. Current requirements approaches don´t tackle the decision-making that is increasingly important in information systems. 2. While important for all software development projects, decision requirements are especially important for projects adopting business rules and advanced analytic technologies. 3. Decisions are a common language across business, IT and analytic organizations improving collaboration, increasing reuse, and easing implementation.
Decision Requirements Modeling for Analytic Projects Established analytic approaches like CRISP-DM stress the importance of understanding the project objectives and requirements from a business perspective, but to date there are no formal approaches to capturing this understanding in a repeatable, understandable format. Decision Requirements Modeling closes this gap. Decision Requirements Modeling is a successful technique that develops a richer, more complete business understanding earlier. Decision Requirements Modeling results in a clear business target, an understanding of how the results will be used and deployed, and by whom. Using Decision Requirements Modeling to guide and shape analytics projects reduces reliance on constrained specialist resources by improving requirements gathering, helps teams ask the key questions and enables teams to collaborate effectively across the organization, bringing analytics, IT and business professionals together. Using Decision Requirements Modeling to document analytic project requirements enables organizations to: – Compare multiple projects for prioritization, including allowing new analytic development to be compared with updating or refining existing analytics. – Act on a specific plan to guide analytic development that is accessible to business, IT and analytic teams alike. – Reuse knowledge from project to project by creating an increasingly detailed and accurate view of decision-making and the role of analytics. – Value information sources and analytics in terms of business impact. There is an emerging consensus that Decision Requirements Modeling is the best way to specify decision-making. It is also central to a forthcoming standard, the Object Management Group´s Decision Model and Notation, which will give adopters access to a broad community and a vehicle for sharing expertise more widely.
Decision Theory – A Brief Introduction Decision theory is theory about decisions. The subject is not a very unified one. To the contrary, there are many different ways to theorize about decisions, and therefore also many different research traditions. This text attempts to reflect some of the diversity of the subject. Its emphasis lies on the less (mathematically) technical aspects of decision theory.
Decision Tree Classification with Differential Privacy: A Survey Data mining information about people is becoming increasingly important in the data-driven society of the 21st century. Unfortunately, sometimes there are real-world considerations that conflict with the goals of data mining; sometimes the privacy of the people being data mined needs to be considered. This necessitates that the output of data mining algorithms be modified to preserve privacy while simultaneously not ruining the predictive power of the outputted model. Differential privacy is a strong, enforceable definition of privacy that can be used in data mining algorithms, guaranteeing that nothing will be learned about the people in the data that could not already be discovered without their participation. In this survey, we focus on one particular data mining algorithm — decision trees — and how differential privacy interacts with each of the components that constitute decision tree algorithms. We analyze both greedy and random decision trees, and the conflicts that arise when trying to balance privacy requirements with the accuracy of the model.
Decision-Making with Belief Functions: a Review Approaches to decision-making under uncertainty in the belief function framework are reviewed. Most methods are shown to blend criteria for decision under ignorance with the maximum expected utility principle of Bayesian decision theory. A distinction is made between methods that construct a complete preference relation among acts, and those that allow incomparability of some acts due to lack of information. Methods developed in the imprecise probability framework are applicable in the Dempster-Shafer context and are also reviewed. Shafer’s constructive decision theory, which substitutes the notion of goal for that of utility, is described and contrasted with other approaches. The paper ends by pointing out the need to carry out deeper investigation of fundamental issues related to decision-making with belief functions and to assess the descriptive, normative and prescriptive values of the different approaches.
Declarative Data Analytics: a Survey The area of declarative data analytics explores the application of the declarative paradigm on data science and machine learning. It proposes declarative languages for expressing data analysis tasks and develops systems which optimize programs written in those languages. The execution engine can be either centralized or distributed, as the declarative paradigm advocates independence from particular physical implementations. The survey explores a wide range of declarative data analysis frameworks by examining both the programming model and the optimization techniques used, in order to provide conclusions on the current state of the art in the area and identify open challenges.
Declarative Statistics In this work we introduce declarative statistics, a suite of declarative modelling tools for statistical analysis. Statistical constraints represent the key building block of declarative statistics. First, we introduce a range of relevant counting and matrix constraints and associated decompositions, some of which novel, that are instrumental in the design of statistical constraints. Second, we introduce a selection of novel statistical constraints and associated decompositions, which constitute a self-contained toolbox that can be used to tackle a wide range of problems typically encountered by statisticians. Finally, we deploy these statistical constraints to a wide range of application areas drawn from classical statistics and we contrast our framework against established practices.
Deconstructing Blockchains: A Comprehensive Survey on Consensus, Membership and Structure It is no exaggeration to say that since the introduction of Bitcoin, blockchains have become a disruptive technology that has shaken the world. However, the rising popularity of the paradigm has led to a flurry of proposals addressing variations and/or trying to solve problems stemming from the initial specification. This added considerable complexity to the current blockchain ecosystems, amplified by the absence of detail in many accompanying blockchain whitepapers. Through this paper, we set out to explain blockchains in a simple way, taming that complexity through the deconstruction of the blockchain into three simple, critical components common to all known systems: membership selection, consensus mechanism and structure. We propose an evaluation framework with insight into system models, desired properties and analysis criteria, using the decoupled components as criteria. We use this framework to provide clear and intuitive overviews of the design principles behind the analyzed systems and the properties achieved. We hope our effort will help clarifying the current state of blockchain proposals and provide directions to the analysis of future proposals.
Decorrelation of Neutral Vector Variables: Theory and Applications In this paper, we propose novel strategies for neutral vector variable decorrelation. Two fundamental invertible transformations, namely serial nonlinear transformation and parallel nonlinear transformation, are proposed to carry out the decorrelation. For a neutral vector variable, which is not multivariate Gaussian distributed, the conventional principal component analysis (PCA) cannot yield mutually independent scalar variables. With the two proposed transformations, a highly negatively correlated neutral vector can be transformed to a set of mutually independent scalar variables with the same degrees of freedom. We also evaluate the decorrelation performances for the vectors generated from a single Dirichlet distribution and a mixture of Dirichlet distributions. The mutual independence is verified with the distance correlation measurement. The advantages of the proposed decorrelation strategies are intensively studied and demonstrated with synthesized data and practical application evaluations.
Decoupling Learning Rules from Representations In the artificial intelligence field, learning often corresponds to changing the parameters of a parameterized function. A learning rule is an algorithm or mathematical expression that specifies precisely how the parameters should be changed. When creating an artificial intelligence system, we must make two decisions: what representation should be used (i.e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions. Using most learning rules, these two decisions are coupled in a subtle (and often unintentional) way. That is, using the same learning rule with two different representations that can represent the same sets of functions can result in two different outcomes. After arguing that this coupling is undesirable, particularly when using artificial neural networks, we present a method for partially decoupling these two decisions for a broad class of learning rules that span unsupervised learning, reinforcement learning, and supervised learning.
Deep Active Learning for Named Entity Recognition Deep neural networks have advanced the state of the art in named entity recognition. However, under typical training procedures, advantages over classical methods emerge only with large datasets. As a result, deep learning is employed only when large public datasets or a large budget for manually labeling data is available. In this work, we show that by combining deep learning with active learning, we can outperform classical methods even with a significantly smaller amount of training data.
Deep Architectures for Modulation Recognition We survey the latest advances in machine learning with deep neural networks by applying them to the task of radio modulation recognition. Results show that radio modulation recognition is not limited by network depth and further work should focus on improving learned synchronization and equalization. Advances in these areas will likely come from novel architectures designed for these tasks or through novel training methods.
Deep Belief Nets (Slide Deck)
Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review In recent years, deep convolutional neural networks (CNNs) have shown record-shattering performance in a variety of computer vision problems, such as visual object recognition, detection and segmentation. These methods have also been utilized in medical image analysis domain for lesion segmentation, anatomical segmentation and classification. We present an extensive literature review of CNN techniques applied in brain magnetic resonance imaging (MRI) analysis, focusing on the architectures, pre-processing, data-preparation and post-processing strategies available in these works. The aim of this study is three-fold. Our primary goal is to report how different CNN architectures have evolved, now entailing state-of-the-art methods by extensive discussion of the architectures and examining the pros and cons of the models when evaluating their performance using public datasets. Second, this paper is intended to be a detailed reference of the research activity in deep CNN for brain MRI analysis. Finally, our goal is to present a perspective on the future of CNNs, which we believe will be among the growing approaches in brain image analysis in subsequent years.
Deep Dive into Anonymity: A Large Scale Analysis of Quora Questions Anonymity forms an integral and important part of our digital life. It enables us to express our true selves without the fear of judgment. In this paper, we investigate the different aspects of anonymity in the social QandA site Quora. The choice of Quora is motivated by the fact that this is one of the rare social QandA sites that allow users to explicitly post anonymous questions and such activity in this forum has become normative rather than a taboo. Through an analysis of 5.1 million questions, we observe that at a global scale almost no difference manifests between the linguistic structure of the anonymous and the non-anonymous questions. We find that topical mixing at the global scale to be the primary reason for the absence. However, the differences start to feature once we ‘deep dive’ and (topically) cluster the questions and compare the clusters that have high volumes of anonymous questions with those that have low volumes of anonymous questions. In particular, we observe that the choice to post the question as anonymous is dependent on the user’s perception of anonymity and they often choose to speak about depression, anxiety, social ties and personal issues under the guise of anonymity. We further perform personality trait analysis and observe that the anonymous group of users has positive correlation with extraversion, agreeableness, and negative correlation with openness. Subsequently, to gain further insights, we build an anonymity grid to identify the differences in the perception on anonymity of the user posting the question and the community of users answering it. We also look into the first response time of the questions and observe that it is lowest for topics which talk about personal and sensitive issues, which hints toward a higher degree of community support and user engagement.
Deep EHR: A Survey of Recent Advances on Deep Learning Techniques for Electronic Health Record (EHR) Analysis The past decade has seen an explosion in the amount of digital information stored in electronic health records (EHR). While primarily designed for archiving patient clinical information and administrative healthcare tasks, many researchers have found secondary use of these records for various clinical informatics tasks. Over the same period, the machine learning community has seen widespread advances in deep learning techniques, which also have been successfully applied to the vast amount of EHR data. In this paper, we review these deep EHR systems, examining architectures, technical aspects, and clinical applications. We also identify shortcomings of current techniques and discuss avenues of future research for EHR-based deep learning.
Deep Face Recognition: A Survey Driven by graphics processing units (GPUs), massive amounts of annotated data and more advanced algorithms, deep learning has recently taken the computer vision community by storm and has benefited real-world applications, including face recognition (FR). Deep FR methods leverage deep networks to learn more discriminative representations, significantly improving the state of the art and surpassing human performance (97.53%). In this paper, we provide a comprehensive survey of deep FR methods, including data, algorithms and scenes. First, we summarize the commonly used datasets for training and testing. Then, the data preprocessing methods are categorized into two classes: ‘one-to-many augmentation’ and ‘many-to-one normalization’. Second, for algorithms, we summarize different network architectures and loss functions used in the state-of-the art methods. Third, we review several scenes in deep FR, such as video FR, 3D FR and cross-age FR. Finally, some potential deficiencies of the current methods and several future directions are highlighted.
Deep Facial Expression Recognition: A Survey With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.
Deep Generative Models with Learnable Knowledge Constraints The broad set of deep generative models (DGMs) has achieved remarkable advances. However, it is often difficult to incorporate rich structured domain knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a principled framework to impose structured constraints on probabilistic models, but has limited applicability to the diverse DGMs that can lack a Bayesian formulation or even explicit density evaluation. PR also requires constraints to be fully specified {\it a priori}, which is impractical or suboptimal for complex knowledge with learnable uncertain parts. In this paper, we establish mathematical correspondence between PR and reinforcement learning (RL), and, based on the connection, expand PR to learn constraints as the extrinsic reward in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is flexible to adapt arbitrary constraints with the model jointly. Experiments on human image generation and templated sentence generation show models with learned knowledge constraints by our algorithm greatly improve over base generative models.
Deep Learning Deep learning (DL) is a high dimensional data reduction technique for constructing high-dimensional predictors in input-output models. DL is a form of machine learning that uses hierarchical layers of latent features. In this article, we review the state-of-the-art of deep learning from a modeling and algorithmic perspective. We provide a list of successful areas of applications in Artificial Intelligence (AI), Image Processing, Robotics and Automation. Deep learning is predictive in its nature rather then inferential and can be viewed as a black-box methodology for high-dimensional function estimation.
Deep Learning (Slide Deck)
Deep Learning and Quantum Physics : A Fundamental Bridge Deep convolutional networks have witnessed unprecedented success in various machine learning applications. Formal understanding on what makes these networks so successful is gradually unfolding, but for the most part there are still significant mysteries to unravel. The inductive bias, which reflects prior knowledge embedded in the network architecture, is one of them. In this work, we establish a fundamental connection between the fields of quantum physics and deep learning. We use this connection for asserting novel theoretical observations regarding the role that the number of channels in each layer of the convolutional network fulfills in the overall inductive bias. Specifically, we show an equivalence between the function realized by a deep convolutional arithmetic circuit (ConvAC) and a quantum many-body wave function, which relies on their common underlying tensorial structure. This facilitates the use of quantum entanglement measures as well-defined quantifiers of a deep network’s expressive ability to model intricate correlation structures of its inputs. Most importantly, the construction of a deep ConvAC in terms of a Tensor Network is made available. This description enables us to carry a graph-theoretic analysis of a convolutional network, with which we demonstrate a direct control over the inductive bias of the deep network via its channel numbers, that are related min-cut in the underlying graph. This result is relevant to any practitioner designing a convolutional network for a specific task. We theoretically analyze ConvACs, and empirically validate our findings on more common ConvNets which involve ReLU activations and max pooling. Beyond the results described above, the description of a deep convolutional network in well-defined graph-theoretic tools and the formal connection to quantum entanglement, are two interdisciplinary bridges that are brought forth by this work.
Deep learning applications and challenges in big data analytics Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and decisions, impacting existing and future technology. Deep Learning algorithms extract high-level, complex abstractions as data representations through a hierarchical learning process. Complex abstractions are learnt at a given level based on relatively simpler abstractions formulated in the preceding level in the hierarchy. A key benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled and un-categorized. In the present study, we explore how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks.We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional data, scalability of models, and distributed computing. We conclude by presenting insights into relevant future works by posing some questions, including defining data sampling criteria, domain adaptation modeling, defining criteria for obtaining useful data abstractions, improving semantic indexing, semi-supervised learning, and active learning.
Deep Learning applied to NLP Convolutional Neural Network (CNNs) are typically associated with Computer Vision. CNNs are responsible for major breakthroughs in Image Classification and are the core of most Computer Vision systems today. More recently CNNs have been applied to problems in Natural Language Processing and gotten some interesting results. In this paper, we will try to explain the basics of CNNs, its different variations and how they have been applied to NLP.
Deep Learning based Recommender System: A Survey and New Perspectives With the ever-growing volume, complexity and dynamicity of online information, recommender system is an effective key solution to overcome such information overload. In recent years, deep learning’s revolutionary advances in speech recognition, image analysis and natural language processing have drawn significant attention. Meanwhile, recent studies also demonstrate its effectiveness in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performances and high-quality recommendations. In contrast to traditional recommendation models, deep learning provides a better understanding of user’s demands, item’s characteristics and historical interactions between them. This article provides a comprehensive review of recent research efforts on deep learning based recommender systems towards fostering innovations of recommender system research. A taxonomy of deep learning based recommendation models is presented and used to categorise surveyed articles. Open problems are identified based on the insightful analytics of the reviewed works and potential solutions discussed.
Deep Learning for 2D and 3D Rotatable Data: An Overview of Methods One of the reasons for the success of convolutional networks is their equivariance/invariance under translations. However, rotatable data such as molecules, living cells, everyday objects, or galaxies require processing with equivariance/invariance under rotations in cases where the rotation of the coordinate system does not affect the meaning of the data (e.g. object classification). On the other hand, estimation/processing of rotations is necessary in cases where rotations are important (e.g. motion estimation). There has been recent progress in methods and theory in all these regards. Here we provide an overview of existing methods, both for 2D and 3D rotations (and translations), and identify commonalities and links between them, in the hope that our insights will be useful for choosing and perfecting the methods.
Deep Learning for Anomaly Detection: A Survey Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. The aim of this survey is two-fold, firstly we present a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore, we review the adoption of these methods for anomaly across various application domains and assess their effectiveness. We have grouped state-of-the-art research techniques into different categories based on the underlying assumptions and approach adopted. Within each category we outline the basic anomaly detection technique, along with its variants and present key assumptions, to differentiate between normal and anomalous behavior. For each category, we present we also present the advantages and limitations and discuss the computational complexity of the techniques in real application domains. Finally, we outline open issues in research and challenges faced while adopting these techniques.
Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions Deep neural networks are widely used for classification. These deep models often suffer from a lack of interpretability — they are particularly difficult to understand because of their non-linear nature. As a result, neural networks are often treated as ‘black box’ models, and in the past, have been trained purely to optimize the accuracy of predictions. In this work, we create a novel network architecture for deep learning that naturally explains its own reasoning for each prediction. This architecture contains an autoencoder and a special prototype layer, where each unit of that layer stores a weight vector that resembles an encoded training input. The encoder of the autoencoder allows us to do comparisons within the latent space, while the decoder allows us to visualize the learned prototypes. The training objective has four terms: an accuracy term, a term that encourages every prototype to be similar to at least one encoded input, a term that encourages every encoded input to be close to at least one prototype, and a term that encourages faithful reconstruction by the autoencoder. The distances computed in the prototype layer are used as part of the classification process. Since the prototypes are learned during training, the learned network naturally comes with explanations for each prediction, and the explanations are loyal to what the network actually computes.
Deep Learning For Computer Vision Tasks: A review Deep learning has recently become one of the most popular sub-fields of machine learning owing to its distributed data representation with multiple levels of abstraction. A diverse range of deep learning algorithms are being employed to solve conventional artificial intelligence problems. This paper gives an overview of some of the most widely used deep learning algorithms applied in the field of computer vision. It first inspects the various approaches of deep learning algorithms, followed by a description of their applications in image classification, object identification, image extraction and semantic segmentation in the presence of noise. The paper concludes with the discussion of the future scope and challenges for construction and training of deep neural networks.
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments Eliminating the negative effect of highly non-stationary environmental noise is a long-standing research topic for speech recognition but remains an important challenge nowadays. To address this issue, traditional unsupervised signal processing methods seem to have touched the ceiling. However, data-driven based supervised approaches, particularly the ones designed with deep learning, have recently emerged as potential alternatives. In this light, we are going to comprehensively summarise the recently developed and most representative deep learning approaches to deal with the raised problem in this article, with the aim of providing guidelines for those who are going deeply into the field of environmentally robust speech recognition. To better introduce these approaches, we categorise them into single- and multi-channel techniques, each of which is specifically described at the front-end, the back-end, and the joint framework of speech recognition systems. In the meanwhile, we describe the pros and cons of these approaches as well as the relationships among them, which can probably benefit future research.
Deep Learning for Fine-Grained Image Analysis: A Survey Computer vision (CV) is the process of using machines to understand and analyze imagery, which is an integral branch of artificial intelligence. Among various research areas of CV, fine-grained image analysis (FGIA) is a longstanding and fundamental problem, and has become ubiquitous in diverse real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, \eg, species of birds or models of cars. The small inter-class variations and the large intra-class variations caused by the fine-grained nature makes it a challenging problem. During the booming of deep learning, recent years have witnessed remarkable progress of FGIA using deep learning techniques. In this paper, we aim to give a survey on recent advances of deep learning based FGIA techniques in a systematic way. Specifically, we organize the existing studies of FGIA techniques into three major categories: fine-grained image recognition, fine-grained image retrieval and fine-grained image generation. In addition, we also cover some other important issues of FGIA, such as publicly available benchmark datasets and its related domain specific applications. Finally, we conclude this survey by highlighting several directions and open problems which need be further explored by the community in the future.
Deep Learning for Generic Object Detection: A Survey Generic object detection, aiming at locating object instances from a large number of predefined categories in natural images, is one of the most fundamental and challenging problems in computer vision. Deep learning techniques have emerged in recent years as powerful methods for learning feature representations directly from data, and have led to remarkable breakthroughs in the field of generic object detection. Given this time of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought by deep learning techniques. More than 250 key contributions are included in this survey, covering many aspects of generic object detection research: leading detection frameworks and fundamental subproblems including object feature representation, object proposal generation, context information modeling and training strategies; evaluation issues, specifically benchmark datasets, evaluation metrics, and state of the art performance. We finish by identifying promising directions for future research.
Deep Learning for Genomics: A Concise Overview Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into ‘big data’ disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.
Deep Learning for Hyperspectral Image Classification: An Overview Hyperspectral image (HSI) classification has become a hot topic in the field of remote sensing. In general, the complex characteristics of hyperspectral data make the accurate classification of such data challenging for traditional machine learning methods. In addition, hyperspectral imaging often deals with an inherently nonlinear relation between the captured spectral information and the corresponding materials. In recent years, deep learning has been recognized as a powerful feature-extraction tool to effectively address nonlinear problems and widely used in a number of image processing tasks. Motivated by those successful applications, deep learning has also been introduced to classify HSIs and demonstrated good performance. This survey paper presents a systematic review of deep learning-based HSI classification literatures and compares several strategies for this topic. Specifically, we first summarize the main challenges of HSI classification which cannot be effectively overcome by traditional machine learning methods, and also introduce the advantages of deep learning to handle these problems. Then, we build a framework which divides the corresponding works into spectral-feature networks, spatial-feature networks, and spectral-spatial-feature networks to systematically review the recent achievements in deep learning-based HSI classification. In addition, considering the fact that available training samples in the remote sensing field are usually very limited and training deep networks require a large number of samples, we include some strategies to improve classification performance, which can provide some guidelines for future studies on this topic. Finally, several representative deep learning-based classification methods are conducted on real HSIs in our experiments.
Deep Learning for Image Denoising: A Survey Since the proposal of big data analysis and Graphic Processing Unit (GPU), the deep learning technology has received a great deal of attention and has been widely applied in the field of imaging processing. In this paper, we have an aim to completely review and summarize the deep learning technologies for image denoising proposed in recent years. Morever, we systematically analyze the conventional machine learning methods for image denoising. Finally, we point out some research directions for the deep learning technologies in image denoising.
Deep Learning for Image Super-resolution: A Survey Image Super-Resolution (SR) is an important class of image processing techniques to enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. In this survey, we aim to give a survey on recent advances of image super-resolution techniques using deep learning approaches in a systematic way. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.
Deep Learning for Sensor-based Activity Recognition: A Survey Sensor-based activity recognition seeks the profound high-level knowledge about human activity from multitudes of low-level sensor readings. Conventional pattern recognition approaches have made tremendous progress in the past years. However, most of those approaches heavily rely on heuristic hand-crafted feature extraction methods, which dramatically hinder their generalization performance. Additionally, those methods often produce unsatisfactory results for unsupervised and incremental learning tasks. Meanwhile, the recent advancement of deep learning makes it possible to perform automatic high-level feature extraction thus achieves promising performance in many areas. Since then, deep learning based methods have been widely adopted for the sensor-based activity recognition tasks. In this paper, we survey and highlight the recent advancement of deep learning approaches for sensor-based activity recognition. Specifically, we summarize existing literatures from three aspects: sensor modality, deep model and application. We also present a detailed discussion and propose grand challenges for future direction.
Deep Learning for Sentiment Analysis : A Survey Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.
Deep Learning for Single Image Super-Resolution: A Brief Review Single image super-resolution (SISR) is a notoriously challenging ill-posed problem, which aims to obtain a high- resolution (HR) output from one of its low-resolution (LR) versions. To solve the SISR problem, recently powerful deep learning algorithms have been employed and achieved the state- of-the-art performance. In this survey, we review representative deep learning-based SISR methods, and group them into two categories according to their major contributions to two essential aspects of SISR: the exploration of efficient neural network archi- tectures for SISR, and the development of effective optimization objectives for deep SISR learning. For each category, a baseline is firstly established and several critical limitations of the baseline are summarized. Then representative works on overcoming these limitations are presented based on their original contents as well as our critical understandings and analyses, and relevant comparisons are conducted from a variety of perspectives. Finally we conclude this review with some vital current challenges and future trends in SISR leveraging deep learning algorithms.
Deep Learning for Spatio-Temporal Data Mining: A Survey With the fast development of various positioning techniques such as Global Position System (GPS), mobile devices and remote sensing, spatio-temporal data has become increasingly available nowadays. Mining valuable knowledge from spatio-temporal data is critically important to many real world applications including human mobility understanding, smart transportation, urban planning, public safety, health care and environmental management. As the number, volume and resolution of spatio-temporal datasets increase rapidly, traditional data mining methods, especially statistics based methods for dealing with such data are becoming overwhelmed. Recently, with the advances of deep learning techniques, deep leaning models such as convolutional neural network (CNN) and recurrent neural network (RNN) have enjoyed considerable success in various machine learning tasks due to their powerful hierarchical feature learning ability in both spatial and temporal domains, and have been widely applied in various spatio-temporal data mining (STDM) tasks such as predictive learning, representation learning, anomaly detection and classification. In this paper, we provide a comprehensive survey on recent progress in applying deep learning techniques for STDM. We first categorize the types of spatio-temporal data and briefly introduce the popular deep learning models that are used in STDM. Then a framework is introduced to show a general pipeline of the utilization of deep learning models for STDM. Next we classify existing literatures based on the types of ST data, the data mining tasks, and the deep learning models, followed by the applications of deep learning for STDM in different domains including transportation, climate science, human mobility, location based social network, crime analysis, and neuroscience. Finally, we conclude the limitations of current research and point out future research directions.
Deep learning for time series classification: a review Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state of the art performance for document classification and speech recognition. In this article, we study the current state of the art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.
Deep learning in agriculture: A survey Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.
Deep learning in bioinformatics: introduction, application, and perspective in big data era Deep learning, which is especially formidable in handling big data, has achieved great success in various fields, including bioinformatics. With the advances of the big data era in biology, it is foreseeable that deep learning will become increasingly important in the field and will be incorporated in vast majorities of analysis pipelines. In this review, we provide both the exoteric introduction of deep learning, and concrete examples and implementations of its representative applications in bioinformatics. We start from the recent achievements of deep learning in the bioinformatics field, pointing out the problems which are suitable to use deep learning. After that, we introduce deep learning in an easy-to-understand fashion, from shallow neural networks to legendary convolutional neural networks, legendary recurrent neural networks, graph neural networks, generative adversarial networks, variational autoencoder, and the most recent state-of-the-art architectures. After that, we provide eight examples, covering five bioinformatics research directions and all the four kinds of data type, with the implementation written in Tensorflow and Keras. Finally, we discuss the common issues, such as overfitting and interpretability, that users will encounter when adopting deep learning methods and provide corresponding suggestions. The implementations are freely available at \url{https://…/Deep_learning_examples}.
Deep Learning in Mobile and Wireless Networking: A Survey The rapid uptake of mobile devices and the rising popularity of mobile applications and services pose unprecedented demands on mobile and wireless networking infrastructure. Upcoming 5G systems are evolving to support exploding mobile traffic volumes, agile management of network resource to maximize user experience, and extraction of fine-grained real-time analytics. Fulfilling these tasks is challenging, as mobile environments are increasingly complex, heterogeneous, and evolving. One potential solution is to resort to advanced machine learning techniques to help managing the rise in data volumes and algorithm-driven applications. The recent success of deep learning underpins new and powerful tools that tackle problems in this space. In this paper we bridge the gap between deep learning and mobile and wireless networking research, by presenting a comprehensive survey of the crossovers between the two areas. We first briefly introduce essential background and state-of-the-art in deep learning techniques with potential applications to networking. We then discuss several techniques and platforms that facilitate the efficient deployment of deep learning onto mobile systems. Subsequently, we provide an encyclopedic review of mobile and wireless networking research based on deep learning, which we categorize by different domains. Drawing from our experience, we discuss how to tailor deep learning to mobile environments. We complete this survey by pinpointing current challenges and open future directions for research.
Deep learning in remote sensing: a review Standing at the paradigm shift towards data-intensive science, machine learning techniques are becoming increasingly important. In particular, as a major breakthrough in the field, deep learning has proven as an extremely powerful tool in many fields. Shall we embrace deep learning as the key to all Or, should we resist a ‘black-box’ solution There are controversial opinions in the remote sensing community. In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously simple to start with. More importantly, we advocate remote sensing scientists to bring their expertise into deep learning, and use it as an implicit general model to tackle unprecedented large-scale influential challenges, such as climate change and urbanization.
Deep Learning is Robust to Massive Label Noise Deep neural networks trained on large supervised datasets have led to impressive results in recent years. However, since well-annotated datasets can be prohibitively expensive and time-consuming to collect, recent work has explored the use of larger but noisy datasets that can be more easily obtained. In this paper, we investigate the behavior of deep neural networks on training sets with massively noisy labels. We show that successful learning is possible even with an essentially arbitrary amount of noise. For example, on MNIST we find that accuracy of above 90 percent is still attainable even when the dataset has been diluted with 100 noisy examples for each clean example. Such behavior holds across multiple patterns of label noise, even when noisy labels are biased towards confusing classes. Further, we show how the required dataset size for successful training increases with higher label noise. Finally, we present simple actionable techniques for improving learning in the regime of high label noise.
Deep learning methods in speaker recognition: a review This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.
Deep Learning on Graphs: A Survey Deep learning has been shown successful in a number of domains, ranging from acoustics, images to natural language processing. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs. Recently, a significant amount of research efforts have been devoted to this area, greatly advancing graph analyzing techniques. In this survey, we comprehensively review different kinds of deep learning methods applied to graphs. We divide existing methods into three main categories: semi-supervised methods including Graph Neural Networks and Graph Convolutional Networks, unsupervised methods including Graph Autoencoders, and recent advancements including Graph Recurrent Neural Networks and Graph Reinforcement Learning. We then provide a comprehensive overview of these methods in a systematic manner following their history of developments. We also analyze the differences of these methods and how to composite different architectures. Finally, we briefly outline their applications and discuss potential future directions.
Deep learning research landscape & roadmap in a nutshell: past, present and future — Towards deep cortical learning The past, present and future of deep learning is presented in this work. Given this landscape & roadmap, we predict that deep cortical learning will be the convergence of deep learning & cortical learning which builds an artificial cortical column ultimately.
Deep Learning Techniques for Music Generation – A Survey This book is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. At first, we propose a methodology based on four dimensions for our analysis: – objective – What musical content is to be generated (e.g., melody, accompaniment…); – representation – What are the information formats used for the corpus and for the expected generated output (e.g., MIDI, piano roll, text…); – architecture – What type of deep neural network is to be used (e.g., recurrent network, autoencoder, generative adversarial networks…); – strategy – How to model and control the process of generation (e.g., direct feedforward, sampling, unit selection…). For each dimension, we conduct a comparative analysis of various models and techniques. For the strategy dimension, we propose some tentative typology of possible approaches and mechanisms. This classification is bottom-up, based on the analysis of many existing deep-learning based systems for music generation, which are described in this book. The last part of the book includes discussion and prospects.
Deep Learning Works in Practice. But Does it Work in Theory Deep learning relies on a very specific kind of neural networks: those superposing several neural layers. In the last few years, deep learning achieved major breakthroughs in many tasks such as image analysis, speech recognition, natural language processing, and so on. Yet, there is no theoretical explanation of this success. In particular, it is not clear why the deeper the network, the better it actually performs. We argue that the explanation is intimately connected to a key feature of the data collected from our surrounding universe to feed the machine learning algorithms: large non-parallelizable logical depth. Roughly speaking, we conjecture that the shortest computational descriptions of the universe are algorithms with inherently large computation times, even when a large number of computers are available for parallelization. Interestingly, this conjecture, combined with the folklore conjecture in theoretical computer science that $ P \neq NC$, explains the success of deep learning.
Deep Learning: A Bayesian Perspective Deep learning is a form of machine learning for nonlinear high dimensional data reduction and prediction. A Bayesian probabilistic perspective provides a number of advantages. Specifically statistical interpretation and properties, more efficient algorithms for optimisation and hyper-parameter tuning, and an explanation of predictive performance. Traditional high-dimensional statistical techniques; principal component analysis (PCA), partial least squares (PLS), reduced rank regression (RRR), projection pursuit regression (PPR) are shown to be shallow learners. Their deep learning counterparts exploit multiple layers of of data reduction which leads to performance gains. Stochastic gradient descent (SGD) training and optimisation and Dropout (DO) provides model and variable selection. Bayesian regularization is central to finding networks and provides a framework for optimal bias-variance trade-off to achieve good out-of sample performance. Constructing good Bayesian predictors in high dimensions is discussed. To illustrate our methodology, we provide an analysis of first time international bookings on Airbnb. Finally, we conclude with directions for future research.
Deep Learning: A Critical Appraisal Although deep learning has historical roots going back decades, neither the term ‘deep learning’ nor the approach was popular just over five years ago, when the field was reignited by papers such as Krizhevsky, Sutskever and Hinton’s now classic (2012) deep network model of Imagenet. What has the field discovered in the five subsequent years Against a background of considerable progress in areas such as speech recognition, image recognition, and game playing, and considerable enthusiasm in the popular press, I present ten concerns for deep learning, and suggest that deep learning must be supplemented by other techniques if we are to reach artificial general intelligence.
Deep Learning: An Introduction for Applied Mathematicians Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective. Our target audience includes postgraduate and final year undergraduate students in mathematics who are keen to learn about the area. The article may also be useful for instructors in mathematics who wish to enliven their classes with references to the application of deep learning techniques. We focus on three fundamental questions: what is a deep neural network how is a network trained what is the stochastic gradient method We illustrate the ideas with a short MATLAB code that sets up and trains a network. We also show the use of state-of-the art software on a large scale image classification problem. We finish with references to the current literature.
Deep Learning: Generalization Requires Deep Compositional Feature Space Design Generalization error defines the discriminability and the representation power of a deep model. In this work, we claim that feature space design using deep compositional function plays a significant role in generalization along with explicit and implicit regularizations. Our claims are being established with several image classification experiments. We show that the information loss due to convolution and max pooling can be marginalized with the compositional design, improving generalization performance. Also, we will show that learning rate decay acts as an implicit regularizer in deep model training.
Deep Learning: Past, Present and Future (Slide Deck)
Deep learning: Technical introduction This note presents in a technical though hopefully pedagogical way the three most common forms of neural network architectures: Feedforward, Convolutional and Recurrent. For each network, their fundamental building blocks are detailed. The forward pass and the update rules for the backpropagation algorithm are then derived in full.
Deep Learning-based Sequential Recommender Systems: Concepts, Algorithms, and Evaluations In the field of sequential recommendation, deep learning methods have received a lot of attention in the past few years and surpassed traditional models such as Markov chain-based and factorization-based ones. However, DL-based methods also have some critical drawbacks, such as insufficient modeling of user representation and ignoring to distinguish the different types of interactions (i.e., user behavior) among users and items. In this view, this survey focuses on DL-based sequential recommender systems by taking the aforementioned issues into consideration. Specifically, we illustrate the concept of sequential recommendation, propose a categorization of existing algorithms in terms of three types of behavioral sequence, summarize the key factors affecting the performance of DL-based models, and conduct corresponding evaluations to demonstrate the effects of these factors. We conclude this survey by systematically outlining future directions and challenges in this field.
Deep Neural Decision Forests We present Deep Neural Decision Forests – a novel approach that unifies classification trees with the representation learning functionality known from deep convolutional networks, by training them in an end-to-end manner. To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network. Our model differs from conventional deep networks because a decision forest provides the final predictions and it differs from conventional decision forests since we propose a principled, joint and global optimization of split and leaf node parameters. We show experimental results on benchmark machine learning datasets like MNIST and ImageNet and find onpar or superior results when compared to state-of-the-art deep models. Most remarkably, we obtain Top5-Errors of only 7:84%=6:38% on ImageNet validation data when integrating our forests in a single-crop, single/seven model GoogLeNet architecture, respectively. Thus, even without any form of training data set augmentation we are improving on the 6.67% error obtained by the best GoogLeNet architecture (7 models, 144 crops).
Deep Neural Network Approximation Theory Deep neural networks have become state-of-the-art technology for a wide range of practical machine learning tasks such as image classification, handwritten digit recognition, speech recognition, or game intelligence. This paper develops the fundamental limits of learning in deep neural networks by characterizing what is possible if no constraints on the learning algorithm and the amount of training data are imposed. Concretely, we consider information-theoretically optimal approximation through deep neural networks with the guiding theme being a relation between the complexity of the function (class) to be approximated and the complexity of the approximating network in terms of connectivity and memory requirements for storing the network topology and the associated quantized weights. The theory we develop educes remarkable universality properties of deep networks. Specifically, deep networks are optimal approximants for vastly different function classes such as affine systems and Gabor systems. This universality is afforded by a concurrent invariance property of deep networks to time-shifts, scalings, and frequency-shifts. In addition, deep networks provide exponential approximation accuracy i.e., the approximation error decays exponentially in the number of non-zero weights in the network of vastly different functions such as the squaring operation, multiplication, polynomials, sinusoidal functions, general smooth functions, and even one-dimensional oscillatory textures and fractal functions such as the Weierstrass function, both of which do not have any known methods achieving exponential approximation accuracy. In summary, deep neural networks provide information-theoretically optimal approximation of a very wide range of functions and function classes used in mathematical signal processing.
Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images Deep neural networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification problems. Given that DNNs are now able to classify objects in images with near-human-level performance, questions naturally arise as to what differences remain between computer and human vision. A recent study revealed that changing an image (e.g. of a lion) in a way imperceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a library). Here we show a related result: it is easy to produce images that are completely unrecognizable to humans, but that state-of-theart DNNs believe to be recognizable objects with 99.99% confidence (e.g. labeling with certainty that white noise static is a lion). Specifically, we take convolutional neural networks trained to perform well on either the ImageNet or MNIST datasets and then find images with evolutionary algorithms or gradient ascent that DNNs label with high confidence as belonging to each dataset class. It is possible to produce images totally unrecognizable to human eyes that DNNs believe with near certainty are familiar objects. Our results shed light on interesting differences between human vision and current DNNs, and raise questions about the generality of DNN computer vision.
Deep Neural Networks as Gaussian Processes A deep fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP) in the limit of infinite network width. This correspondence enables exact Bayesian inference for neural networks on regression tasks by means of straightforward matrix computations. For single hidden-layer networks, the covariance function of this GP has long been known. Recently, kernel functions for multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified the correspondence between using these kernels as the covariance function for a GP and performing fully Bayesian prediction with a deep neural network. In this work, we derive this correspondence and develop a computationally efficient pipeline to compute the covariance functions. We then use the resulting GP to perform Bayesian inference for deep neural networks on MNIST and CIFAR-10. We find that the GP-based predictions are competitive and can outperform neural networks trained with stochastic gradient descent. We observe that the trained neural network accuracy approaches that of the corresponding GP-based computation with increasing layer width, and that the GP uncertainty is strongly correlated with prediction error. We connect our observations to the recent development of signal propagation in random neural networks.
Deep Probabilistic Programming Languages: A Qualitative Study Deep probabilistic programming languages try to combine the advantages of deep learning with those of probabilistic programming languages. If successful, this would be a big step forward in machine learning and programming languages. Unfortunately, as of now, this new crop of languages is hard to use and understand. This paper addresses this problem directly by explaining deep probabilistic programming languages and indirectly by characterizing their current strengths and weaknesses.
Deep Regression Bayesian Network and Its Applications Deep directed generative models have attracted much attention recently due to their generative modeling nature and powerful data representation ability. In this paper, we review different structures of deep directed generative models and the learning and inference algorithms associated with the structures. We focus on a specific structure that consists of layers of Bayesian Networks due to the property of capturing inherent and rich dependencies among latent variables. The major difficulty of learning and inference with deep directed models with many latent variables is the intractable inference due to the dependencies among the latent variables and the exponential number of latent variable configurations. Current solutions use variational methods often through an auxiliary network to approximate the posterior probability inference. In contrast, inference can also be performed directly without using any auxiliary network to maximally preserve the dependencies among the latent variables. Specifically, by exploiting the sparse representation with the latent space, max-max instead of max-sum operation can be used to overcome the exponential number of latent configurations. Furthermore, the max-max operation and augmented coordinate ascent are applied to both supervised and unsupervised learning as well as to various inference. Quantitative evaluations on benchmark datasets of different models are given for both data representation and feature learning tasks.
Deep Reinforcement Learning We discuss deep reinforcement learning in an overview style. We draw a big picture, filled with details. We discuss six core elements, six important mechanisms, and twelve applications, focusing on contemporary work, and in historical contexts. We start with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources. Next we discuss RL core elements, including value function, policy, reward, model, exploration vs. exploitation, and representation. Then we discuss important mechanisms for RL, including attention and memory, unsupervised learning, hierarchical RL, multi-agent RL, relational RL, and learning to learn. After that, we discuss RL applications, including games, robotics, natural language processing (NLP), computer vision, finance, business management, healthcare, education, energy, transportation, computer systems, and, science, engineering, and art. Finally we summarize briefly, discuss challenges and opportunities, and close with an epilogue.
Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey Owe to the recent advancements in Artificial Intelligence especially deep learning, many data-driven decision support systems have been implemented to facilitate medical doctors in delivering personalized care. We focus on the deep reinforcement learning (DRL) models in this paper. DRL models have demonstrated human-level or even superior performance in the tasks of computer vision and game playings, such as Go and Atari game. However, the adoption of deep reinforcement learning techniques in clinical decision optimization is still rare. We present the first survey that summarizes reinforcement learning algorithms with Deep Neural Networks (DNN) on clinical decision support. We also discuss some case studies, where different DRL algorithms were applied to address various clinical challenges. We further compare and contrast the advantages and limitations of various DRL algorithms and present a preliminary guide on how to choose the appropriate DRL algorithm for particular clinical applications.
Deep Reinforcement Learning for Conversational AI Deep reinforcement learning is revolutionizing the artificial intelligence field. Currently, it serves as a good starting point for constructing intelligent autonomous systems which offer a better knowledge of the visual world. It is possible to scale deep reinforcement learning with the use of deep learning and do amazing tasks such as use of pixels in playing video games. In this paper, key concepts of deep reinforcement learning including reward function, differences between reinforcement learning and supervised learning and models for implementation of reinforcement are discussed. Key challenges related to the implementation of reinforcement learning in conversational AI domain are identified as well as discussed in detail. Various conversational models which are based on deep reinforcement learning (as well as deep learning) are also discussed. In summary, this paper discusses key aspects of deep reinforcement learning which are crucial for designing an efficient conversational AI.
Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications Reinforcement learning (RL) algorithms have been around for decades and been employed to solve various sequential decision-making problems. These algorithms however have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This paper addresses an important aspect of deep RL related to situations that demand multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multi-agent deep RL (MADRL) is presented, including non-stationarity, partial observability, continuous state and action spaces, multi-agent training schemes, multi-agent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed, with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to future development of more robust and highly useful multi-agent learning methods for solving real-world problems.
Deep Reinforcement Learning: An Overview In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.
Deep Reinforcement Learning: An Overview We give an overview of recent exciting achievements of deep reinforcement learning (RL). We start with background of deep learning and reinforcement learning, as well as introduction of testbeds. Next we discuss Deep Q-Network (DQN) and its extensions, asynchronous methods, policy optimization, reward, and planning. After that, we talk about attention and memory, unsupervised learning, and learning to learn. Then we discuss various applications of RL, including games, in particular, AlphaGo, robotics, spoken dialogue systems (a.k.a. chatbot), machine translation, text sequence prediction, neural architecture design, personalized web services, healthcare, finance, and music generation. We mention topics/papers not reviewed yet. After listing a collection of RL resources, we close with discussions.
Deep Reinforcement Learning: Framework, Applications, and Embedded Implementations The recent breakthroughs of deep reinforcement learning (DRL) technique in Alpha Go and playing Atari have set a good example in handling large state and actions spaces of complicated control problems. The DRL technique is comprised of (i) an offline deep neural network (DNN) construction phase, which derives the correlation between each state-action pair of the system and its value function, and (ii) an online deep Q-learning phase, which adaptively derives the optimal action and updates value estimates. In this paper, we first present the general DRL framework, which can be widely utilized in many applications with different optimization objectives. This is followed by the introduction of three specific applications: the cloud computing resource allocation problem, the residential smart grid task scheduling problem, and building HVAC system optimal control problem. The effectiveness of the DRL technique in these three cyber-physical applications have been validated. Finally, this paper investigates the stochastic computing-based hardware implementations of the DRL framework, which consumes a significant improvement in area efficiency and power consumption compared with binary-based implementation counterparts.
Deep Retrieval-Based Dialogue Systems: A Short Review Building dialogue systems that naturally converse with humans is being an attractive and an active research domain. Multiple systems are being designed everyday and several datasets are being available. For this reason, it is being hard to keep an up-to-date state-of-the-art. In this work, we present the latest and most relevant retrieval-based dialogue systems and the available datasets used to build and evaluate them. We discuss their limitations and provide insights and guidelines for future work.
Deep Semantic Segmentation of Natural and Medical Images: A Review The (medical) image semantic segmentation task consists of classifying each pixel of an image (or just several ones) into an instance, where each instance (or category) corresponding to a class. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological diagnostics. In this review, we categorize the main deep learning-based medical and non-medical image segmentation solutions into six main groups of deep architectural improvements, data synthesis-based, loss function-based improvements, sequenced models, weakly supervised, and multi-task methods and further for each group we analyzed each variant of these groups and discuss limitations of the current approaches and future research directions for semantic image segmentation.
Deep Stochastic Configuration Networks: Universal Approximation and Learning Representation This paper focuses on the development of randomized approaches for building deep neural networks. A supervisory mechanism is proposed to constrain the random assignment of the hidden parameters (i.e., all biases and weights within the hidden layers). Full-rank oriented criterion is suggested and utilized as a termination condition to determine the number of nodes for each hidden layer, and a pre-defined error tolerance is used as a global indicator to decide the depth of the learner model. The read-out weights attached with all direct links from each hidden layer to the output layer are incrementally evaluated by the least squares method. Such a class of randomized leaner models with deep architecture is termed as deep stochastic configuration networks (DeepSCNs), of which the universal approximation property is verified with rigorous proof. Given abundant samples from a continuous distribution, DeepSCNs can speedily produce a learning representation, that is, a collection of random basis functions with the cascaded inputs together with the read-out weights. Simulation results with comparisons on function approximation align with the theoretical findings.
Deep Visual Domain Adaptation: A Survey Deep domain adaption has emerged as a new learning technique to address the lack of massive amounts of labeled data. Compared to conventional methods, which learn shared feature subspaces or reuse important source instances with shallow representations, deep domain adaption methods leverage deep networks to learn more transferable representations by embedding domain adaptation in the pipeline of deep learning. There have been comprehensive surveys for shallow domain adaption, but few timely reviews the emerging deep learning based methods. In this paper, we provide a comprehensive survey of deep domain adaptation methods for computer vision applications with four major contributions. First, we present a taxonomy of different deep domain adaption scenarios according to the properties of data that define how two domains are diverged. Second, we summarize deep domain adaption approaches into several categories based on training loss, and analyze and compare briefly the state-of-the-art methods under these categories. Third, we overview the computer vision applications that go beyond image classification, such as face recognition, semantic segmentation and object detection. Fourth, some potential deficiencies of current methods and several future directions are highlighted.
Deep-learning in Mobile Robotics – from Perception to Control Systems: A Survey on Why and Why not Deep-learning has dramatically changed the world overnight. It greatly boosted the development of visual perception, object detection, and speech recognition, etc. That was attributed to the multiple convolutional processing layers for abstraction of learning representations from massive data. The advantages of deep convolutional structures in data processing motivated the applications of artificial intelligence methods in robotic problems, especially perception and control system, the two typical and challenging problems in robotics. This paper presents a survey of the deep-learning research landscape in mobile robotics. We start with introducing the definition and development of deep-learning in related fields, especially the essential distinctions between image processing and robotic tasks. We described and discussed several typical applications and related works in this domain, followed by the benefits from deep-learning, and related existing frameworks. Besides, operation in the complex dynamic environment is regarded as a critical bottleneck for mobile robots, such as that for autonomous driving. We thus further emphasize the recent achievement on how deep-learning contributes to navigation and control systems for mobile robots. At the end, we discuss the open challenges and research frontiers.
DeepWalk: Online Learning of Social Representations We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk’s latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk’s representations can provide F1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk’s representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.
Delivering Information Faster: In-Memory Technology Reboots the Big Data Analytics World In-memory technology – in which entire datasets are pre-loaded into a computer´s random access memory, alleviating the need for shuttling data between memory and disk storage every time a query is initiated – has actually been around for a number of years. However, with the onset of big data, as well as an insatiable thirst for analytics, the industry is taking a second look at this promising approach to speeding up data processing.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%,) on this visual recognition challenge.
Demystifying Fog Computing: Characterizing Architectures, Applications and Abstractions Internet of Things (IoT) has accelerated the deployment of millions of sensors at the edge of the network, through Smart City infrastructure and lifestyle devices. Cloud computing platforms are often tasked with handling these large volumes and fast streams of data from the edge. Recently, Fog computing has emerged as a concept for low-latency and resource-rich processing of these observation streams, to complement Edge and Cloud computing. In this paper, we review various dimensions of system architecture, application characteristics and platform abstractions that are manifest in this Edge, Fog and Cloud eco-system. We highlight novel capabilities of the Edge and Fog layers, such as physical and application mobility, privacy sensitivity, and a nascent runtime environment. IoT application case studies based on first-hand experiences across diverse domains drive this categorization. We also highlight the gap between the potential and the reality of Fog computing, and identify challenges that need to be overcome for the solution to be sustainable. Together, our article can help platform and application developers bridge the gap that remains in making Fog computing viable.
Density-based Clustering The clustering methods like K-means or Expectation-Maximization are suitable for finding ellipsoid-shaped clusters, or at best convex clusters. However, for non-convex clusters, such as those shown in Figure 15.1, these methods have trouble finding the true clusters, since two points from different clusters may be closer than two points in the same cluster. The density-based methods we consider in this chapter are able to mine such non-convex or shape-based clusters. Figure
Design Principles of Massive, Robust Prediction Systems Most data mining research is concerned with building high-quality classification models in isolation. In massive production systems, however, the ability to monitor and maintain performance over time while growing in size and scope is equally important. Many external factors may degrade classification performance including changes in data distribution, noise or bias in the source data, and the evolution of the system itself. A well-functioning system must gracefully handle all of these. This paper lays out a set of design principles for large-scale autonomous data mining systems and then demonstrates our application of these principles within the m6d automated ad targeting system. We demonstrate a comprehensive set of quality control processes that allow us monitor and maintain thousands of distinct classification models automatically, and to add new models, take on new data, and correct poorly-performing models without manual intervention or system disruption.
Designing Great Visualizations This paper traces the history of visual representation, from early cave drawings through the computer revolution and the launch of Tableau. We will discuss some of the pioneers in data research and show how their work helped to revolutionize visualization techniques. We will also examine the different styles of data visuals, discuss some of the barriers to making effective visuals and the methods we use to overcome those barriers. In the end, we will show the power (and limits) of human perception, and how we can use data to tell stories – much like those of the earliest cave drawings.
Designing with Data: A Case Study As the Internet of Things continues to take hold in the commercial world, the teams designing these new technologies are constantly evolving and turning their hand to uncharted territory. This is especially key within the field of secondary service design as businesses attempt to utilize and find value in the sensor data being produced by connected products. This paper discusses the ways in which a commercial design team use smart thermostat data to prototype an advice-giving chatbot. The team collaborate to produce a chat sequence through careful ordering of data & reasoning about customer reactions. The paper contributes important insights into design methods being used in practice within the under researched areas of chatbot prototyping and secondary service design.
Detecting Dead Weights and Units in Neural Networks Deep Neural Networks are highly over-parameterized and the size of the neural networks can be reduced significantly after training without any decrease in performance. One can clearly see this phenomenon in a wide range of architectures trained for various problems. Weight/channel pruning, distillation, quantization, matrix factorization are some of the main methods one can use to remove the redundancy to come up with smaller and faster models. This work starts with a short informative chapter, where we motivate the pruning idea and provide the necessary notation. In the second chapter, we compare various saliency scores in the context of parameter pruning. Using the insights obtained from this comparison and stating the problems it brings we motivate why pruning units instead of the individual parameters might be a better idea. We propose some set of definitions to quantify and analyze units that don’t learn and create any useful information. We propose an efficient way for detecting dead units and use it to select which units to prune. We get 5x model size reduction through unit-wise pruning on MNIST.
Deterministic Distributed Matching: Simpler, Faster, Better We present improved deterministic distributed algorithms for a number of well-studied matching problems, which are simpler, faster, more accurate, and/or more general than their known counterparts. The common denominator of these results is a deterministic distributed rounding method for certain linear programs, which is the first such rounding method, to our knowledge. A sampling of our end results is as follows: — An $O(\log^2 \Delta \log n)$-round deterministic distributed algorithm for computing a maximal matching, in $n$-node graphs with maximum degree $\Delta$. This is the first improvement in about 20 years over the celebrated $O(\log^4 n)$-round algorithm of Hanckowiak, Karonski, and Panconesi [SODA’98, PODC’99]. — An $O(\log^2 \Delta \log \frac{1}{\varepsilon} + \log^ * n)$-round deterministic distributed algorithm for a $(2+\varepsilon)$-approximation of maximum matching. This is exponentially faster than the classic $O(\Delta +\log^* n)$-round $2$-approximation of Panconesi and Rizzi [DIST’01]. With some modifications, the algorithm can also find an almost maximal matching which leaves only an $\varepsilon$-fraction of the edges on unmatched nodes. — An $O(\log^2 \Delta \log \frac{1}{\varepsilon} \log_{1+\varepsilon} W + \log^ * n)$-round deterministic distributed algorithm for a $(2+\varepsilon)$-approximation of a maximum weighted matching, and also for the more general problem of maximum weighted $b$-matching. Here, $W$ denotes the maximum normalized weight. These improve over the $O(\log^4 n \log_{1+\varepsilon} W)$-round $(6+\varepsilon)$-approximation algorithm of Panconesi and Sozio [DIST’10].
Diachronic word embeddings and semantic shifts: a survey Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.
Different Approach to the Problem of Missing Data There is a long history of devleopment of methodology dealing with missing data in statistical analysis. Today, the most popular methods fall into two classes, Complete Cases (CC) and Multiple Imputation (MI). Another approach, Available Cases (AC), has occasionally been mentioned in the research literature, in the context of linear regression analysis, but has generally been ignored. In this paper, we revisit the AC method, showing that it can perform better than CC and MI, and we extend its breadth of application.
Different Approaches for Human Activity Recognition: A Survey Human activity recognition has gained importance in recent years due to its applications in various fields such as health, security and surveillance, entertainment, and intelligent environments. A significant amount of work has been done on human activity recognition and researchers have leveraged different approaches, such as wearable, object-tagged, and device-free, to recognize human activities. In this article, we present a comprehensive survey of the work conducted over the period 2010-2018 in various areas of human activity recognition with main focus on device-free solutions. The device-free approach is becoming very popular due to the fact that the subject is not required to carry anything, instead, the environment is tagged with devices to capture the required information. We propose a new taxonomy for categorizing the research work conducted in the field of activity recognition and divide the existing literature into three sub-areas: action-based, motion-based, and interaction-based. We further divide these areas into ten different sub-topics and present the latest research work in these sub-topics. Unlike previous surveys which focus only on one type of activities, to the best of our knowledge, we cover all the sub-areas in activity recognition and provide a comparison of the latest research work in these sub-areas. Specifically, we discuss the key attributes and design approaches for the work presented. Then we provide extensive analysis based on 10 important metrics, to give the reader, a complete overview of the state-of-the-art techniques and trends in different sub-areas of human activity recognition. In the end, we discuss open research issues and provide future research directions in the field of human activity recognition.
Different Stages of Wearable Health Tracking Adoption & Abandonment: A Survey Study and Analysis Health trackers are widely adopted to support users with daily health and wellness tracking. They can help increase steps taken, enhance sleeping pattern, improve healthy diet, and promote overall health. Despite the growth in the adoption of such technology, their reallife use is still questionable. While some users derive longterm value from their trackers, others face barriers to integrate it into their daily routine. Studies have analysed technical aspects of these barriers. In this study, we analyse the behavioural factors of discouragement and wearable abandonment strictly tied to user habits and living circumstances. A data analysis was conducted in two different studies, one with users posts about wearable sales and the other one was a survey analysis. The two studies were used to analyse the stages of wearable adoption, use and abandonment. Therefore, we mainly focused on users motives to get a wearable tracker and to post it for sale. We extracted insights about user motives, highlighted technology condition and limitations, and timeframe before abandonment. The findings revealed certain user behavioural pattern throughout the wearable use and abandonment.
Differential Similarity in Higher Dimensional Spaces: Theory and Applications This paper presents an extension and an elaboration of the theory of differential similarity, which was originally proposed in arXiv:1401.2411 [cs.LG]. The goal is to develop an algorithm for clustering and coding that combines a geometric model with a probabilistic model in a principled way. For simplicity, the geometric model in the earlier paper was restricted to the three-dimensional case. The present paper removes this restriction, and considers the full $n$-dimensional case. Although the mathematical model is the same, the strategies for computing solutions in the $n$-dimensional case are different, and one of the main purposes of this paper is to develop and analyze these strategies. Another main purpose is to devise techniques for estimating the parameters of the model from sample data, again in $n$ dimensions. We evaluate the solution strategies and the estimation techniques by applying them to two familiar real-world examples: the classical MNIST dataset and the CIFAR-10 dataset.
Digital Twin: Enabling Technology, Challenges and Open Research Digital Twin technology is an emerging concept that has recently become the centre of attention for industry and in more recent year’s academia. The advancements in industry 4.0 concepts have facilitated its growth, particularly in the manufacturing industry. The Digital Twin is defined extensively but is described as the effortless integration of data between a physical and virtual machine in either direction. The challenges, applications, and enabling technologies for Artificial Intelligence, Internet of Things and Digital Twins are presented. A review of publications relating to Digital Twins is performed, producing a categorical review of recent papers. The review has categorised them by research area; Manufacturing, Healthcare and Smart cities. Discussing a range of papers that reflect these areas and the current state of research. The paper outlines the open research opportunities and challenges.
Directional Statistics in Machine Learning: a Brief Review The modern data analyst must cope with data encoded in various forms, vectors, matrices, strings, graphs, or more. Consequently, statistical and machine learning models tailored to different data encodings are important. We focus on data encoded as normalized vectors, so that their ‘direction’ is more important than their magnitude. Specifically, we consider high-dimensional vectors that lie either on the surface of the unit hypersphere or on the real projective plane. For such data, we briefly review common mathematical models prevalent in machine learning, while also outlining some technical aspects, software, applications, and open mathematical challenges.
Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society The Big Data Research and Development Initiative is now in its third year and making great strides to address the challenges of Big Data. To further advance this initiative, we describe how statistical thinking can help tackle the many Big Data challenges, emphasizing that often the most productive approach will involve multidisciplinary teams with statistical, computational, mathematical, and scientific domain expertise.
discrete examples: genetics and spell checking
Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier – A Review The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested example and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures This review attempts to answer the previous question through evaluating the performance (measured by accuracy, precision and recall) of the KNN using a large number of distance measures, tested on a number of real world datasets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, the results showed large gaps between the performances of different distances. We found that a recently proposed non-convex distance performed the best when applied on most datasets comparing to the other tested distances. In addition, the performance of the KNN degraded only about $20\%$ while the noise level reaches $90\%$, this is true for all the distances used. This means that the KNN classifier using any of the top $10$ distances tolerate noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing to other distances.
Distance Metric Learning – A Comprehensive Survey Many machine learning algorithms, such as K Nearest Neighbor (KNN), heavily rely on the distance metric for the input data patterns. Distance Metric learning is to learn a distance metric for the input space of data from a given collection of pair of similar/dissimilar points that preserves the distance relation among the training data. In recent years, many studies have demonstrated, both empirically and theoretically, that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. This paper surveys the field of distance metric learning from a principle perspective, and includes a broad selection of recent work. In particular, distance metric learning is reviewed under different learning conditions: supervised learning versus unsupervised learning, learning in a global sense versus in a local sense; and the distance matrix based on linear kernel versus nonlinear kernel. In addition, this paper discusses a number of techniques that is central to distance metric learning, including convex programming, positive semi-definite programming, kernel learning, dimension reduction, K Nearest Neighbor, large margin classification, and graph-based approaches.
Distinguishing cause from effect using observational data: methods and benchmarks The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X; Y . This was often considered to be impossible. Nevertheless, several approaches for addressing this bivariate causal discovery problem were proposed recently. In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 di erent \causee ect pairs’ selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on arti cially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from e ect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. (2009).
Distributed Computation of Linear Matrix Equations: An Optimization Perspective This paper investigates the distributed computation of the well-known linear matrix equation in the form of AXB = F, with the matrices A, B, X, and F of appropriate dimensions, over multi-agent networks from an optimization perspective. In this paper, we consider the standard distributed matrix-information structures, where each agent of the considered multi-agent network has access to one of the sub-block matrices of A, B, and F. To be specific, we first propose different decomposition methods to reformulate the matrix equations in standard structures as distributed constrained optimization problems by introducing substitutional variables; we show that the solutions of the reformulated distributed optimization problems are equivalent to least squares solutions to the original matrix equations; and we design distributed continuous-time algorithms for the constrained optimization problems, even by using augmented matrices and a derivative feedback technique. With help of the semi-stability analysis, we prove the convergence of the algorithms to a least squares solution to the matrix equation for any initial condition.
Distributed Constraint Optimization Problems and Applications: A Survey The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents’ autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.
Distributed Decision Tree Learning for Mining Big Data Streams Web companies need to e ectively analyse big data in order to enhance the experiences of their users. They need to have systems that are capable of handling big data in term of three dimensions: volume as data keeps growing, variety as the type of data is diverse, and velocity as the is continuously arriving very fast into the systems. However, most of the existing systems have addressed at most only two out of the three dimensions such as Mahout, a distributed machine learning framework that addresses the volume and variety dimensions, and Massive Online Analysis (MOA), a streaming machine learning framework that handles the variety and velocity dimensions. In this thesis, we propose and develop Scalable Advanced Massive Online Analysis (SAMOA), a distributed streaming machine learning framework to address the aforementioned challenge. SAMOA provides exible application programming interfaces (APIs) to allow rapid development of new ML algorithms for dealing with variety. Moreover, we integrate SAMOA with Storm, a state-of-the-art stream processing engine (SPE), which allows SAMOA to inherit Storm’s scalability to address velocity and volume. The main benefits of SAMOA are: it provides exibility in developing new ML algorithms and extensibility in integrating new SPEs. We develop a distributed online classification algorithm on top of SAMOA to verify the aforementioned features of SAMOA. The evaluation results show that the distributed algorithm is suitable for high number of attributes settings.
Distributed Latent Dirichlet Allocation via Tensor Factorization Latent Dirichlet Allocation (LDA) has proven extremely popular and versatile since its introduction over a decade ago. LDA is successful in part because it assigns a mixture of latent states (‘topics’) to each set of exchangeable observations (‘document’), in contrast to a hard clustering. This property complicates the estimation of latent parameters, and has led to extensive research in disparate learning techniques. Broadly speaking there are 3 basic strategies: variational inference ; Markov chain Monte Carlo ; and the method of moments , the latter having been recently discovered. Due to high dimensional data with large vocabulary size; numerous documents; and number of topics, computational constraints are the limiting factor to developing large scale topic models. This has motivated research into scalable computational strategies for LDA. In the single node context, stochastic variational inference is fast and accurate, but has high communication costs in the distributed setting. Batch variational inference has a more favorable ratio of communication to computation as the E-step (but not the M-step) is embarrisingly parallel. Markov chain Monte Carlo (MCMC) techniques have also been implemented in the distributed setting, both synchronous and asynchronous variants. Due to their recent introduction, there are no distributed implementations of method of moments based approaches to LDA. We leverage that the method of moments for LDA reduces to canonical polyadic (CP) decomposition of a tensor, a problem which has received extensive study in the literature , including distributed variants. We combine ALS with whitening preprocessing (data orthogonalization and dimensionality reduction) motivated by better convergence rate and perturbation guarantees compared to previous methods. Additionally, the preprocessing has the benefit that the subsequent tensor decomposition is independent of the vocabulary size and the number of documents. Although ALS requires many iterations to converge (more than would be tolerable using map-reduce without custom support for low-overhead iteration), we utilize REEF , a distributed processing framework which runs on YARN managed clusters, e.g., a Hadoop 2 installation.
Distributed Least-Squares Iterative Methods in Networks: A Survey Many science and engineering applications involve solving a linear least-squares system formed from some field measurements. In the distributed cyber-physical systems (CPS), often each sensor node used for measurement only knows partial independent rows of the least-squares system. To compute the least-squares solution they need to gather all these measurement at a centralized location and then compute the solution. These data collection and computation are inefficient because of bandwidth and time constraints and sometimes are infeasible because of data privacy concerns. Thus distributed computations are strongly preferred or demanded in many of the real world applications e.g.: smart-grid, target tracking etc. To compute least squares for the large sparse system of linear equation iterative methods are natural candidates and there are a lot of studies regarding this, however, most of them are related to the efficiency of centralized/parallel computations while and only a few are explicitly about distributed computation or have the potential to apply in distributed networks. This paper surveys the representative iterative methods from several research communities. Some of them were not originally designed for this need, so we slightly modified them to suit our requirement and maintain the consistency. In this survey, we