Documents

( | 1 | 4 | 5 | 7 | 8 | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | Y | Z
|Documents| = 1778

(

(Machine) Learning to Do More with Less Determining the best method for training a machine learning algorithm is critical to maximizing its ability to classify data. In this paper, we compare the standard ‘fully supervised’ approach (that relies on knowledge of event-by-event truth-level labels) with a recent proposal that instead utilizes class ratios as the only discriminating information provided during training. This so-called ‘weakly supervised’ technique has access to less information than the fully supervised method and yet is still able to yield impressive discriminating power. In addition, weak supervision seems particularly well suited to particle physics since quantum mechanics is incompatible with the notion of mapping an individual event onto any single Feynman diagram. We examine the technique in detail — both analytically and numerically — with a focus on the robustness to issues of mischaracterizing the training samples. Weakly supervised networks turn out to be remarkably insensitive to systematic mismodeling. Furthermore, we demonstrate that the event level outputs for weakly versus fully supervised networks are probing different kinematics, even though the numerical quality metrics are essentially identical. This implies that it should be possible to improve the overall classification ability by combining the output from the two types of networks. For concreteness, we apply this technology to a signature of beyond the Standard Model physics to demonstrate that all these impressive features continue to hold in a scenario of relevance to the LHC. Example code is provided at https://…/master.

1

10 Tips to Create Useful and Beautiful Visualizations (Slide Deck)

4

4 Steps to Successfully Evaluating Business Analytics Software The goal of Business Analytics and Intelligence software is to help businesses access, analyze and visualize data, and then communicate those insights in meaningful dashboards and metrics. Unfortunately, the reality is that the majority of software options on the market today provide only a subset of that functionality. And those that provide a more comprehensive solution, tend to then lack the features that make it user-friendly. With a crowded marketplace, businesses need to go through a complex evaluation process and make some fundamental technology decisions before selecting a vendor. Finding a business intelligence (BI) software that will scale with your organization´s needs may seem like an impossible task. Here are the four questions you can ask when beginning the BI evaluation process that will save you a lot of time and help set you in the right direction.

5

5 Best Practices for Creating Effective Dashboards You´ve been there: no matter how many reports, formal meetings, casual conversations or emailed memos, someone important inevitably claims they didn´t know about some important fact or insight and says ‘we should have a dashboard to monitor the performance of X.’ Or maybe you´ve been here: you´ve said ‘yes, let´s have a dashboard. It will help us improve return on investment (ROI) if everyone can see how X is performing and be able to quickly respond. I´ll update it weekly.’ Unfortunately, by week 3, you realize you´re killing several hours a week integrating data from multiple sources to update a dashboard you´re not sure anyone is actually using. Yet, dashboards have been all the rage and with good reason. They can help you and your coworkers achieve a better grasp on the data – one of your most important, and often overlooked assets. You´ve read how they help organizations get on the same page, speed decision-making and improve ROI. They help create organizational alignment because everyone is looking at the same thing. So dashboards can be effective. They can work. The question becomes: How can you get one to work for you Focus on these 5 best practices. Equally important, keep an eye on the 7 critical mistakes you don´t want to make.
50 years of Data Science More than 50 years ago, John Tukey called for a reformation of academic statistics. In `The Future of Data Analysis’, he pointed to the existence of an as-yet unrecognized science, whose subject of interest was learning from data, or `data analysis’. Ten to twenty years ago, John Chambers, Bill Cleveland and Leo Breiman independently once again urged academic statistics to expand its boundaries beyond the classical domain of theoretical statistics; Chambers called for more emphasis on data preparation and presentation rather than statistical modeling; and Breiman called for emphasis on prediction rather than inference. Cleveland even suggested the catchy name \Data Science’ for his envisioned eld. A recent and growing phenomenon is the emergence of \Data Science’ programs at major universities, including UC Berkeley, NYU, MIT, and most recently the Univ. of Michigan, which on September 8, 2015 announced a $100M \Data Science Initiative’ that will hire 35 new faculty. Teaching in these new programs has signi cant overlap in curricular subject matter with tradi- tional statistics courses; in general, though, the new initiatives steer away from close involvement with academic statistics departments. This paper reviews some ingredients of the current \Data Science moment’, including recent commentary about data science in the popular media, and about how/whether Data Science is really di erent from Statistics. The now-contemplated eld of Data Science amounts to a superset of the elds of statistics and machine learning which adds some technology for `scaling up’ to `big data’. This chosen superset is motivated by commercial rather than intellectual developments. Choosing in this way is likely to miss out on the really important intellectual event of the next fty years. Because all of science itself will soon become data that can be mined, the imminent revolution in Data Science is not about mere `scaling up’, but instead the emergence of scienti c studies of data analysis science-wide. In the future, we will be able to predict how a proposal to change data analysis work ows would impact the validity of data analysis across all of science, even predicting the impacts eld-by- eld. Drawing on work by Tukey, Cleveland, Chambers and Breiman, I present a vision of data science based on the activities of people who are `learning from data’, and I describe an academic eld dedicated to improving that activity in an evidence-based manner. This new eld is a better academic enlargement of statistics and machine learning than today’s Data Science Initiatives, while being able to accommodate the same short-term goals.

7

7 Signs You Need Advanced Analytics for Salesforce.com (or any CRM) and Why They Matter Sure, customer relationship management (CRM) applications provide reports and dashboards. But if you rely on the built-in analytic capabilities of CRM, you´re leaving money on the table. Because that´s what the information in your CRM system is; it´s money. But you can´t extract the true value of that information without an analytics application that does the heavy lifting without putting your sales team through hell. You also want your sales team to stay in your CRM application. That was the point. Remember, all CRM, all the time. Directing the team to another application for analytic insight just defeats the purpose. What you need are robust, easy-to-access analytics embedded right in your CRM solution. Following are seven signs that you are not operating efficiently and making reporting and analytics more difficult for your sales team and your business less productive. Don´t ignore these seven warning signs. They all carry one message: Yes, you need advanced analytics!
7 Tips to Succeed with Big Data in 2014 Just when you thought big data couldn´t get any bigger, it got bigger still. Regardless of its actual size, big data is showing its value. Organizations everywhere have big data of all shapes and sizes. They recognize the importance, the opportunity, and even the imperative to pay attention. It has become clear that big data will outlive those who ignore it. Organizations that have already tamed big data – the multi-structured mass they stored before they knew its worth – are improving their operational efficiency, growing their revenues, and empowering new business models. How do they do it Their techniques for success can be summarized in seven tips.

8

8 Critical Metrics for Measuring App User Engagement In this guide, we outline for you the eight engagement metrics critical to app success, including suggestions for running marketing campaigns and boosting ROI.

A

A Benchmark of Selected Algorithmic Differentiation Tools on Some Problems in Computer Vision and Machine Learning Algorithmic differentiation (AD) allows exact computation of derivatives given only an implementation of an objective function. Although many AD tools are available, a proper and efficient implementation of AD methods is not straightforward. The existing tools are often too different to allow for a general test suite. In this paper, we compare fifteen ways of computing derivatives including eleven automatic differentiation tools implementing various methods and written in various languages (C++, F#, MATLAB, Julia and Python), two symbolic differentiation tools, finite differences, and hand-derived computation. We look at three objective functions from computer vision and machine learning. These objectives are for the most part simple, in the sense that no iterative loops are involved, and conditional statements are encapsulated in functions such as {\tt abs} or {\tt logsumexp}. However, it is important for the success of algorithmic differentiation that such `simple’ objective functions are handled efficiently, as so many problems in computer vision and machine learning are of this form. Of course, our results depend on programmer skill, and familiarity with the tools. However, we contend that this paper presents an important datapoint: a skilled programmer devoting roughly a week to each tool produced the timings we present. We have made our implementations available as open source to allow the community to replicate and update these benchmarks.
A Brief Introduction to Machine Learning for Engineers This monograph aims at providing an introduction to key concepts, algorithms, and theoretical frameworks in machine learning, including supervised and unsupervised learning, statistical learning theory, probabilistic graphical models and approximate inference. The intended readership consists of electrical engineers with a background in probability and linear algebra. The treatment builds on first principles, and organizes the main ideas according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, directed and undirected models, and convex and non-convex optimization. The mathematical framework uses information-theoretic measures as a unifying tool. The text offers simple and reproducible numerical examples providing insights into key motivations and conclusions. Rather than providing exhaustive details on the existing myriad solutions in each specific category, for which the reader is referred to textbooks and papers, this monograph is meant as an entry point for an engineer into the literature on machine learning.
A Brief Survey of Deep Reinforcement Learning Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and algorithms are required to discover useful patterns. Text mining is the task of extracting meaningful information from text, which has gained significant attentions in recent years. In this paper, we describe several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering. Additionally, we briefly explain text mining in biomedical and health care domains.
A Closer Look at Memorization in Deep Networks We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
A Comparative Study of Association Rule Mining Algorithms on Grid and Cloud Platform Association rule mining is a time consuming process due to involving both data intensive and computation intensive nature. In order to mine large volume of data and to enhance the scalability and performance of existing sequential association rule mining algorithms, parallel and distributed algorithms are developed. These traditional parallel and distributed algorithms are based on homogeneous platform and are not lucrative for heterogeneous platform such as grid and cloud. This requires design of new algorithms which address the issues of good data set partition and distribution, load balancing strategy, optimization of communication and synchronization technique among processors in such heterogeneous system. Grid and cloud are the emerging platform for distributed data processing and various association rule mining algorithms have been proposed on such platforms. This survey article integrates the brief architectural aspect of distributed system, various recent approaches of grid based and cloud based association rule mining algorithms with comparative perception. We differentiate between approaches of association rule mining algorithms developed on these architectures on the basis of data locality, programming paradigm, fault tolerance, communication cost, partition and distribution of data sets. Although it is not complete in order to cover all algorithms, yet it can be very useful for the new researchers working in the direction of distributed association rule mining algorithms.
A comparative study of fuzzy c-means algorithm and entropy-based fuzzy clustering algorithms Fuzzy clustering is useful to mine complex and multi-dimensional data sets, where the members have partial or fuzzy relations. Among the various developed techniques, fuzzy-C-means (FCM) algorithm is the most popular one, where a piece of data has partial membership with each of the pre-defined cluster centers. Moreover, in FCM, the cluster centers are virtual, that is, they are chosen at random and thus might be out of the data set. The cluster centers and membership values of the data points with them are updated through some iterations. On the other hand, entropy-based fuzzy clustering (EFC) algorithm works based on a similarity-threshold value. Contrary to FCM, in EFC, the cluster centers are real, that is, they are chosen from the data points. In the present paper, the performances of these algorithms have been compared on four data sets, such as IRIS, WINES, OLITOS and psychosis (collected with the help of forty doctors), in terms of the quality of the clusters (that is, discrepancy factor, compactness, distinctness) obtained and their computational time. Moreover, the best set of clusters has been mapped into 2-D for visualization using a self-organizing map (SOM).
A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems Between matrix factorization or Random Walk with Restart (RWR), which method works better for recommender systems Which method handles explicit or implicit feedback data better Does additional side information help recommen- dation Recommender systems play an important role in many e-commerce services such as Amazon and Netflix to recommend new items to a user. Among various recommendation strategies, collaborative filtering has shown good performance by using rating patterns of users. Matrix factorization and random walk with restart are the most representative collaborative filtering methods. However, it is still unclear which method provides better recommendation performance despite their extensive utility. In this paper, we provide a comparative study of matrix factorization and RWR in recommender systems. We exactly formulate each correspondence of the two methods according to various tasks in recommendation. Especially, we newly devise an RWR method using global bias term which corresponds to a matrix factorization method using biases. We describe details of the two methods in various aspects of recommendation quality such as how those methods handle cold-start problem which typ- ically happens in collaborative filtering. We extensively perform experiments over real-world datasets to evaluate the performance of each method in terms of various measures. We observe that matrix factorization performs better with explicit feedback ratings while RWR is better with implicit ones. We also observe that exploiting global popularities of items is advantageous in the performance and that side information produces positive synergy with explicit feedback but gives negative effects with implicit one.
A Comparative Study of Recommendation Algorithms in Ecommerce Applications We evaluate a wide range of recommendation algorithms on e-commerce-related datasets. These algorithms include the popular user-based and item-based correlation/similarity algorithms as well as methods designed to work with sparse transactional data. Data sparsity poses a significant challenge to recommendation approaches when applied in ecommerce applications. We experimented with approaches such as dimensionality reduction, generative models, and spreading activation, which are designed to meet this challenge. In addition, we report a new recommendation algorithm based on link analysis. Initial experimental results indicate that the link analysis-based algorithm achieves the best overall performance across several e-commerce datasets.
A Comparative Study on using Principle Component Analysis with Different Text Classifiers Text categorization (TC) is the task of automatically organizing a set of documents into a set of pre-defined categories. Over the last few years, increased attention has been paid to the use of documents in digital form and this makes text categorization becomes a challenging issue. The most significant problem of text categorization is its huge number of features. Most of these features are redundant, noisy and irrelevant that cause over fitting with most of the classifiers. Hence, feature extraction is an important step to improve the overall accuracy and the performance of the text classifiers. In this paper, we will provide an overview of using principle component analysis (PCA) as a feature extraction with various classifiers. It was observed that the performance rate of the classifiers after using PCA to reduce the dimension of data improved. Experiments are conducted on three UCI data sets, Classic03, CNAE-9 and DBWorld e-mails. We compare the classification performance results of using PCA with popular and well-known text classifiers. Results show that using PCA encouragingly enhances classification performance on most of the classifiers.
A Comparative Survey of Recent Natural Language Interfaces for Databases Over the last few years natural language interfaces (NLI) for databases have gained significant traction both in academia and industry. These systems use very different approaches as described in recent survey papers. However, these systems have not been systematically compared against a set of benchmark questions in order to rigorously evaluate their functionalities and expressive power. In this paper, we give an overview over 24 recently developed NLIs for databases. Each of the systems is evaluated using a curated list of ten sample questions to show their strengths and weaknesses. We categorize the NLIs into four groups based on the methodology they are using: keyword-, pattern-, parsing-, and grammar-based NLI. Overall, we learned that keyword-based systems are enough to answer simple questions. To solve more complex questions involving subqueries, the system needs to apply some sort of parsing to identify structural dependencies. Grammar-based systems are overall the most powerful ones, but are highly dependent on their manually designed rules. In addition to providing a systematic analysis of the major systems, we derive lessons learned that are vital for designing NLIs that can answer a wide range of user questions.
A comparison of algorithms for the multivariate L1-median The L1-median is a robust estimator of multivariate location with good statistical properties. Several algorithms for computing the L1-median are available. Problem specific algorithms can be used, but also general optimization routines. The aim is to compare different algorithms with respect to their precision and runtime. This is possible because all considered algorithms have been implemented in a standardized manner in the open source environment R. In most situations, the algorithm based on the optimization routine NLM (non-linear minimization) clearly outperforms other approaches. Its low computation time makes applications for large and high-dimensional data feasible.
A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and requires only a parallel corpus of speech and text for training. However, unlike in conventional approaches that combine separate acoustic and language models, it is not clear how to use additional (unpaired) text. While there has been previous work on methods addressing this problem, a thorough comparison among methods is still lacking. In this paper, we compare a suite of past methods and some of our own proposed methods for using unpaired text data to improve encoder-decoder models. For evaluation, we use the medium-sized Switchboard data set and the large-scale Google voice search and dictation data sets. Our results confirm the benefits of using unpaired text across a range of methods and data sets. Surprisingly, for first-pass decoding, the rather simple approach of shallow fusion performs best across data sets. However, for Google data sets we find that cold fusion has a lower oracle error rate and outperforms other approaches after second-pass rescoring on the Google voice search data set.
A Composite Model for Computing Similarity Between Texts Computing text similarity is a foundational technique for a wide range of tasks in natural language processing such as duplicate detection, question answering, or automatic essay grading. Just recently, text similarity received wide-spread attention in the research community by the establishment of the Semantic Textual Similarity (STS) Task at the Semantic Evaluation (SemEval) workshop in 2012 – a fact that stresses the importance of text similarity research. The goal of the STS Task is to create automated measures which are able to compute the degree of similarity between two given texts in the same way that humans do. Measures are thereby expected to output continuous text similarity scores, which are then either compared with human judgments or used as a means for solving a particular problem. We start this thesis with the observation that while the concept of similarity is well grounded in psychology, text similarity is much less well-defined in the natural language processing community. No attempt has been made yet to formalize in what way text similarity between two texts can be computed. Still, text similarity is regarded as a fixed, axiomatic notion in the community. To alleviate this shortcoming, we describe existing formal models of similarity and discuss how we can adapt them to texts. We propose to judge text similarity along multiple text dimensions, i.e. characteristics inherent to texts, and provide empirical evidence based on a set of annotation studies that the proposed dimensions are perceived by humans. We continue with a comprehensive survey of state-of-the-art text similarity measures previously proposed in the literature. To the best of our knowledge, no such survey has been done yet. We propose a classification into compositional and noncompositional text similarity measures according to their inherent properties. Compositional measures compute text similarity based on pairwise word similarity scores between all words which are then aggregated to an overall similarity score, while noncompositional measures project the complete texts onto particular models and then compare the texts based on these models. Based on our theoretical insights, we then present the implementation of a text similarity system which composes a multitude of text similarity measures along multiple text dimensions using a machine learning classifier. Depending on the concrete task at hand, we argue that such a system may need to address more than a single text dimension in order to best resemble human judgments. Our efforts culminate in the open source framework DKPro Similarity, which streamlines the development of text similarity measures and experimental setups. We apply our system in two evaluations, for which it consistently outperforms prior work and competing systems: an intrinsic and an extrinsic evaluation. In the intrinsic evaluation, the performance of text similarity measures is evaluated in an isolated setting by comparing the algorithmically produced scores with human judgments. We conducted the intrinsic evaluation in the context of the STS Task as part of the SemEval workshop. In the extrinsic evaluation, the performance of text similarity measures is evaluated with respect to a particular task at hand, where text similarity is a means for solving a particular problem. We conducted the extrinsic evaluation in the text classification task of text reuse detection. The results of both evaluations support our hypothesis that a composition of text similarity measures highly benefits the similarity computation process. Finally, we stress the importance of text similarity measures for real-world applications. We therefore introduce the application scenario Self-Organizing Wikis, where users of wikis, i.e. web-based collaborative content authoring systems, are supported in their everyday tasks by means of natural language processing techniques in general, and text similarity in particular. We elaborate on two use cases where text similarity computation is particularly beneficial: the detection of duplicates, and the semi-automatic insertion of hyperlinks. Moreover, we discuss two further applications where text similarity is a valuable tool: In both question answering and textual entailment recognition, text similarity has been used successfully in experiments and appears to be a promising means for further research in these fields. We conclude this thesis with an analysis of shortcomings of current text similarity research and formulate challenges which should be tackled by future work. In particular, we believe that computing text similarity along multiple text dimensions – which depend on the specific task at hand – will benefit any other task where text similarity is fundamental, as a composition of text similarity measures has shown superior performance in both the intrinsic as well as the extrinsic evaluation.
A Comprehensive Analysis of Deep Regression Deep learning revolutionized data science, and recently, its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks such as human pose estimation did not escape this methodological change. The large number of deep architectures lead to a plethora of methods that are evaluated under different experimental protocols. Moreover, small changes in the architecture of the network, or in the data pre-processing procedure, together with the stochastic nature of the optimization methods, lead to notably different results, making extremely difficult to sift methods that significantly outperform others. Therefore, when proposing regression algorithms, practitioners proceed by trial-and-error. This situation motivated the current study, in which we perform a systematic evaluation and a statistical analysis of the performance of vanilla deep regression — short for convolutional neural networks with a linear regression top layer –. Up to our knowledge this is the first comprehensive analysis of deep regression techniques. We perform experiments on three vision problems and report confidence intervals for the median performance as well as the statistical significance of the results, if any. Surprisingly, the variability due to different data pre-processing procedures generally eclipses the variability due to modifications in the network architecture.
A Comprehensive Analysis on Adversarial Robustness of Spiking Neural Networks In this era of machine learning models, their functionality is being threatened by adversarial attacks. In the face of this struggle for making artificial neural networks robust, finding a model, resilient to these attacks, is very important. In this work, we present, for the first time, a comprehensive analysis of the behavior of more bio-plausible networks, namely Spiking Neural Network (SNN) under state-of-the-art adversarial tests. We perform a comparative study of the accuracy degradation between conventional VGG-9 Artificial Neural Network (ANN) and equivalent spiking network with CIFAR-10 dataset in both whitebox and blackbox setting for different types of single-step and multi-step FGSM (Fast Gradient Sign Method) attacks. We demonstrate that SNNs tend to show more resiliency compared to ANN under black-box attack scenario. Additionally, we find that SNN robustness is largely dependent on the corresponding training mechanism. We observe that SNNs trained by spike-based backpropagation are more adversarially robust than the ones obtained by ANN-to-SNN conversion rules in several whitebox and blackbox scenarios. Finally, we also propose a simple, yet, effective framework for crafting adversarial attacks from SNNs. Our results suggest that attacks crafted from SNNs following our proposed method are much stronger than those crafted from ANNs.
A Comprehensive Comparison of Unsupervised Network Representation Learning Methods There has been appreciable progress in unsupervised network representation learning (UNRL) approaches over graphs recently with flexible random-walk approaches, new optimization objectives and deep architectures. However, there is no common ground for systematic comparison of embeddings to understand their behavior for different graphs and tasks. In this paper we theoretically group different approaches under a unifying framework and empirically investigate the effectiveness of different network representation methods. In particular, we argue that most of the UNRL approaches either explicitly or implicit model and exploit context information of a node. Consequently, we propose a framework that casts a variety of approaches — random walk based, matrix factorization and deep learning based — into a unified context-based optimization function. We systematically group the methods based on their similarities and differences. We study the differences among these methods in detail which we later use to explain their performance differences (on downstream tasks). We conduct a large-scale empirical study considering 9 popular and recent UNRL techniques and 11 real-world datasets with varying structural properties and two common tasks — node classification and link prediction. We find that there is no single method that is a clear winner and that the choice of a suitable method is dictated by certain properties of the embedding methods, task and structural properties of the underlying graph. In addition we also report the common pitfalls in evaluation of UNRL methods and come up with suggestions for experimental design and interpretation of results.
A Comprehensive Study of Deep Learning for Image Captioning Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically correct sentences. Deep learning-based techniques are capable of handling the complexities and challenges of image captioning. In this survey paper, we aim to present a comprehensive review of existing deep learning-based image captioning techniques. We discuss the foundation of the techniques to analyze their performances, strengths and limitations. We also discuss the datasets and the evaluation metrics popularly used in deep learning based automatic image captioning.
A Comprehensive Survey for Low Rank Regularization Low rank regularization, in essence, involves introducing a low rank or approximately low rank assumption for matrix we aim to learn, which has achieved great success in many fields including machine learning, data mining and computer version. Over the last decade, much progress has been made in theories and practical applications. Nevertheless, the intersection between them is very slight. In order to construct a bridge between practical applications and theoretical research, in this paper we provide a comprehensive survey for low rank regularization. We first review several traditional machine learning models using low rank regularization, and then show their (or their variants) applications in solving practical issues, such as non-rigid structure from motion and image denoising. Subsequently, we summarize the regularizers and optimization methods that achieve great success in traditional machine learning tasks but are rarely seen in solving practical issues. Finally, we provide a discussion and comparison for some representative regularizers including convex and non-convex relaxations. Extensive experimental results demonstrate that non-convex regularizers can provide a large advantage over the nuclear norm, the regularizer widely used in solving practical issues.
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.
A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications Graph is an important data representation which appears in a wide diversity of real-world scenarios. Effective graph analytics provides users a deeper understanding of what is behind the data, and thus can benefit a lot of useful applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer the high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem. It converts the graph data into a low dimensional space in which the graph structural information and graph properties are maximally preserved. In this survey, we conduct a comprehensive review of the literature in graph embedding. We first introduce the formal definition of graph embedding as well as the related concepts. After that, we propose two taxonomies of graph embedding which correspond to what challenges exist in different graph embedding problem settings and how the existing work address these challenges in their solutions. Finally, we summarize the applications that graph embedding enables and suggest four promising future research directions in terms of computation efficiency, problem settings, techniques and application scenarios.
A Comprehensive Survey of Ontology Summarization: Measures and Methods The Semantic Web is becoming a large scale framework that enables data to be published, shared, and reused in the form of ontologies. The ontology which is considered as basic building block of semantic web consists of two layers including data and schema layer. With the current exponential development of ontologies in both data size and complexity of schemas, ontology understanding which is playing an important role in different tasks such as ontology engineering, ontology learning, etc., is becoming more difficult. Ontology summarization as a way to distill knowledge from an ontology and generate an abridge version to facilitate a better understanding is getting more attention recently. There are various approaches available for ontology summarization which are focusing on different measures in order to produce a proper summary for a given ontology. In this paper, we mainly focus on the common metrics which are using for ontology summarization and meet the state-of-the-art in ontology summarization.
A Comprehensive Survey on Fog Computing: State-of-the-art and Research Challenges Cloud computing with its three key facets (i.e., IaaS, PaaS, and SaaS) and its inherent advantages (e.g., elasticity and scalability) still faces several challenges. The distance between the cloud and the end devices might be an issue for latency-sensitive applications such as disaster management and content delivery applications. Service Level Agreements (SLAs) may also impose processing at locations where the cloud provider does not have data centers. Fog computing is a novel paradigm to address such issues. It enables provisioning resources and services outside the cloud, at the edge of the network, closer to end devices or eventually, at locations stipulated by SLAs. Fog computing is not a substitute for cloud computing but a powerful complement. It enables processing at the edge while still offering the possibility to interact with the cloud. This article presents a comprehensive survey on fog computing. It critically reviews the state of the art in the light of a concise set of evaluation criteria. We cover both the architectures and the algorithms that make fog systems. Challenges and research directions are also introduced. In addition, the lessons learned are reviewed and the prospects are discussed in terms of the key role fog is likely to play in emerging technologies such as Tactile Internet.
A Comprehensive Survey on Graph Neural Networks Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into different categories. With a focus on graph convolutional networks, we review alternative architectures that have recently been developed; these learning paradigms include graph attention networks, graph autoencoders, graph generative networks, and graph spatial-temporal networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes and benchmarks of the existing algorithms on different learning tasks. Finally, we propose potential research directions in this fast-growing field.
A Comprehensive Survey on Safe Reinforcement Learning Safe Reinforcement Learning can be de ned as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes. We categorize and analyze two approaches of Safe Reinforcement Learning. The rst is based on the modi cation of the optimality criterion, the classic discounted – nite/in nite horizon, with a safety factor. The second is based on the modi cation of the exploration process through the incorporation of external knowledge or the guidance of a risk metric. We use the proposed classi cation to survey the existing literature, as well as suggesting future directions for Safe Reinforcement Learning.
A Conceptual Introduction to Markov Chain Monte Carlo Methods Markov Chain Monte Carlo (MCMC) methods have become a cornerstone of many modern scientific analyses by providing a straightforward approach to numerically estimate uncertainties in the parameters of a model using a sequence of random samples. This article provides a basic introduction to MCMC methods by establishing a strong conceptual understanding of what problems MCMC methods are trying to solve, why we want to use them, and how they work in theory and in practice. To develop these concepts, I outline the foundations of Bayesian inference, discuss how posterior distributions are used in practice, explore basic approaches to estimate posterior-based quantities, and derive their link to Monte Carlo sampling and MCMC. Using a simple toy problem, I then demonstrate how these concepts can be used to understand the benefits and drawbacks of various MCMC approaches. Exercises designed to highlight various concepts are also included throughout the article.
A Concise Guide to Compositional Data Analysis Why a course in compositional data analysis Compositional data consist of vectors whose components are the proportion or percentages of some whole. Their peculiarity is that their sum is constrained to the be some constant, equal to 1 for proportions, 100 for percentages or possibly some other constant c for other situations such as parts per million (ppm) in trace element compositions. Unfortunately a cursory look at such vectors gives the appearance of vectors of real numbers with the consequence that over the last century all sorts of sophisticated statistical methods designed for unconstrained data have been applied to compositional data with inappropriate inferences. All this despite the fact that many workers have been, or should have been, aware that the sample space for compositional vectors is radically different from the real Euclidean space associated with unconstrained data. Several substantial warnings had been given, even as early as 1897 by Karl Pearson in his seminal paper on spurious correlations and then repeatedly in the 1960’s by geologist Felix Chayes. Unfortunately little heed was paid to such warnings and within the small circle who did pay attention the approach was essentially pathological, attempting to answer the question: what goes wrong when we apply multivariate statistical methodology designed for unconstrained data to our constrained data and how can the unconstrained methodology be adjusted to give meaningful inferences.
A Contemporary Overview of Probabilistic Latent Variable Models In this paper we provide a conceptual overview of latent variable models within a probabilistic modeling framework, an overview that emphasizes the compositional nature and the interconnectedness of the seemingly disparate models commonly encountered in statistical practice.
A Correspondence Between Random Neural Networks and Statistical Field Theory A number of recent papers have provided evidence that practical design questions about neural networks may be tackled theoretically by studying the behavior of random networks. However, until now the tools available for analyzing random neural networks have been relatively ad-hoc. In this work, we show that the distribution of pre-activations in random neural networks can be exactly mapped onto lattice models in statistical physics. We argue that several previous investigations of stochastic networks actually studied a particular factorial approximation to the full lattice model. For random linear networks and random rectified linear networks we show that the corresponding lattice models in the wide network limit may be systematically approximated by a Gaussian distribution with covariance between the layers of the network. In each case, the approximate distribution can be diagonalized by Fourier transformation. We show that this approximation accurately describes the results of numerical simulations of wide random neural networks. Finally, we demonstrate that in each case the large scale behavior of the random networks can be approximated by an effective field theory.
A correspondence between thermodynamics and inference A rough analogy between Bayesian statistics and statistical mechanics has long been discussed. We explore this analogy systematically and discover that it is more substantive than previously reported. We show that most canonical thermodynamic quantities have a natural correspondence with well-established statistical quantities. A novel correspondence is discovered between the heat capacity and the model complexity in information-based inference. This leads to a critical insight: We argue that the well-known mechanisms of failure of equipartition in statistical mechanics explain the nature of sloppy models in statistics. Finally, we exploit the correspondence to propose a solution to a long-standing ambiguity in Bayesian statistics: the definition of an objective or uninformative prior. In particular, we propose that the Gibbs entropy provides a natural generalization of the principle of indifference.
A Data Management System for Computational Experiments (3X) 3X, which stands for eXecuting eXploratory eXperiments, is a software tool to ease the burden of conducting computational experiments. 3X provides a standard yet con gurable structure to execute a wide variety of experiments in a systematic way. 3X organizes the code, inputs, and outputs for an experiment, records results, and lets users visualize result data in a variety of ways. Its interface allows further runs of the experiment to be driven interactively. Our demonstration will illustrate how 3X eases the process of conducting computational experiments, using two complementary examples designed to quickly show the many features of 3X.
A data scientist´s guide to start-ups In August 2013, we held a panel discussion at the KDD 2013 conference in Chicago on the subject of data science, data scientists, and start-ups. KDD is the premier conference on data science research and practice. The panel discussed the pros and cons for top-notch data scientists of the hot data science start-up scene. In this article, we first present background on our panelists. Our four panelists have unquestionable pedigrees in data science and substantial experience with start-ups from multiple perspectives (founders, employees, chief scientists, venture capitalists). For the casual reader, we next present a brief summary of the experts´ opinions on eight of the issues the panel discussed. The rest of the article presents a lightly edited transcription of the entire panel discussion.
A Detailed Analysis of Quicksort Algorithms with Experimental Mathematics We study several variants of single-pivot and multi-pivot Quicksort algorithms and consider them as discrete probability problems. With experimental mathematics, explicit expressions for expectations, variances and even higher moments of their numbers of comparisons and swaps can be obtained. For some variants, Monte Carlo experiments are performed, the numerical results are demonstrated and the scaled limiting distribution is also discussed.
A detailed comparative study of open source deep learning frameworks Deep Learning (DL) is one of the hottest trends in machine learning as DL approaches produced results superior to the state-of-the-art in problematic areas such as image processing and natural language processing (NLP). To foster the growth of DL, several open source frameworks appeared providing implementations of the most common DL algorithms. These frameworks vary in the algorithms they support and in the quality of their implementations. The purpose of this work is to provide a qualitative and quantitative comparison among three of the most popular and most comprehensive DL frameworks (namely Google’s TensorFlow, University of Montreal’s Theano and Microsoft’s CNTK). The ultimate goal of this work is to help end users make an informed decision about the best DL framework that suits their needs and resources. To ensure that our study is as comprehensive as possible, we conduct several experiments using multiple benchmark datasets from different fields (image processing, NLP, etc.) and measure the performance of the frameworks’ implementations of different DL algorithms. For most of our experiments, we find out that CNTK’s implementations are superior to the other ones under consideration.
A fast learning algorithm for deep belief nets We show how to use ‘complementary priors’ to eliminate the explaining away effects that make inference difficult in densely-connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modelled by long ravines in the free-energy landscape of the top-level associative memory and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.
A Few Useful Things to Know about Machine Learning Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. This is often feasible and cost-effective where manual programming is not. As more data becomes available, more ambitious problems can be tackled. As a result, machine learning is widely used in computer science and other fields. However, developing successful machine learning applications requires a substantial amount of ‘black art’ that is hard to find in textbooks. This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions.
A Framework for Considering Comprehensibility Comprehensibility in modeling is the ability of stakeholders to understand relevant aspects of the modeling process. In this article, we provide a framework to help guide exploration of the space of comprehensibility challenges. We consider facets organized around key questions: Who is comprehending Why are they trying to comprehend Where in the process are they trying to comprehend How can we help them comprehend How do we measure their comprehension With each facet we consider the broad range of options.We discuss why taking a broad view of comprehensibility in modeling is useful in identifying challenges and opportunities for solutions.
A Framework for Time-Consistent, Risk-Averse Model Predictive Control: Theory and Algorithms In this paper we present a framework for risk-averse model predictive control (MPC) of linear systems affected by multiplicative uncertainty. Our key innovation is to consider time-consistent, dynamic risk metrics as objective functions to be minimized. This framework is axiomatically justified in terms of time-consistency of risk assessments, is amenable to dynamic optimization, and is unifying in the sense that it captures a full range of risk preferences from risk-neutral to worst case. Within this framework, we propose and analyze an online risk-averse MPC algorithm that is provably stabilizing. Furthermore, by exploiting the dual representation of time-consistent, dynamic risk metrics, we cast the computation of the MPC control law as a convex optimization problem amenable to real-time implementation. Simulation results are presented and discussed.
A General Theory for Training Learning Machine Though the deep learning is pushing the machine learning to a new stage, basic theories of machine learning are still limited. The principle of learning, the role of the a prior knowledge, the role of neuron bias, and the basis for choosing neural transfer function and cost function, etc., are still far from clear. In this paper, we present a general theoretical framework for machine learning. We classify the prior knowledge into common and problem-dependent parts, and consider that the aim of learning is to maximally incorporate them. The principle we suggested for maximizing the former is the design risk minimization principle, while the neural transfer function, the cost function, as well as pretreatment of samples, are endowed with the role for maximizing the latter. The role of the neuron bias is explained from a different angle. We develop a Monte Carlo algorithm to establish the input-output responses, and we control the input-output sensitivity of a learning machine by controlling that of individual neurons. Applications of function approaching and smoothing, pattern recognition and classification, are provided to illustrate how to train general learning machines based on our theory and algorithm. Our method may in addition induce new applications, such as the transductive inference.
A Generalization of Convolutional Neural Networks to Graph-Structured Data This paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within the input, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure. Furthermore, this generalization can be applied to many standard regression or classification problems, by learning the the underlying graph. We empirically demonstrate the performance of the proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular activity data set.
A generalized concept-cognitive learning: A machine learning viewpoint Concept-cognitive learning (CCL) is a hot topic in recent years, and it has attracted much attention from the communities of formal concept analysis, granular computing and cognitive computing. However, the relationship among cognitive computing (CC), conceptcognitive computing (CCC), and CCL is not clearly described. To this end, we explain the relationship of CC, CCC, and CCL. Then, we propose a generalized CCL from the point of view of machine learning. Finally, experiments on seven data sets are conducted to evaluate concept formation and concept-cognitive processes of the proposed generalized CCL.
A Gentle Introduction to Deep Learning in Medical Image Processing This paper tries to give a gentle introduction to deep learning in medical image processing, proceeding from theoretical foundations to applications. We first discuss general reasons for the popularity of deep learning, including several major breakthroughs in computer science. Next, we start reviewing the fundamental basics of the perceptron and neural networks, along with some fundamental theory that is often omitted. Doing so allows us to understand the reasons for the rise of deep learning in many application domains. Obviously medical image processing is one of these areas which has been largely affected by this rapid progress, in particular in image detection and recognition, image segmentation, image registration, and computer-aided diagnosis. There are also recent trends in physical simulation, modelling, and reconstruction that have led to astonishing results. Yet, some of these approaches neglect prior knowledge and hence bear the risk of producing implausible results. These apparent weaknesses highlight current limitations of deep learning. However, we also briefly discuss promising approaches that might be able to resolve these problems in the future.
A Gentle Introduction to Memetic Algorithms The generic denomination of `Memetic Algorithms’ (MAs) is used to encompass a broad class of metaheuristics (i.e. general purpose methods aimed to guide an underlying heuristic). The method is based on a population of agents and proved to be of practical success in a variety of problem domains and in particular for the approximate solution of NP Optimization problems. Unlike traditional Evolutionary Computation (EC) methods, MAs are intrinsically concerned with exploiting all available knowledge about the problem under study. The incorporation of prob- lem domain knowledge is not an optional mechanism, but a fundamental feature that characterizes MAs. This functioning philosophy is perfectly illustrated by the term \memetic’. Coined by R. Dawkins , the word `meme’ denotes an analogous to the gene in the context of cultural evolution .
A Gentle Introduction to Supervised Machine Learning This tutorial is based on the lecture notes for the courses ‘Machine Learning: Basic Principles’ and ‘Artificial Intelligence’, which I have taught during fall 2017 and spring 2018 at Aalto university. The aim is to provide an accessible introduction to some of the main concepts and methods within supervised machine learning. Most of the current systems which are con- sidered as (artificially) intelligent are based on some form of supervised machine learning. After discussing the main building blocks of a formal machine learning problem, some of the most popular algorithmic design patterns for machine learning methods are presented.
A Graph Summarization: A Survey While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Thus, efficient computational methods for condensing and simplifying data are becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind and the challenges of graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.
A History of Bayesian Neural Networks (Slide Deck)
A Joint Model for Question Answering and Question Generation We propose a generative machine comprehension model that learns jointly to ask and answer questions based on documents. The proposed model uses a sequence-to-sequence framework that encodes the document and generates a question (answer) given an answer (question). Significant improvement in model performance is observed empirically on the SQuAD corpus, confirming our hypothesis that the model benefits from jointly learning to perform both tasks. We believe the joint model’s novelty offers a new perspective on machine comprehension beyond architectural engineering, and serves as a first step towards autonomous information seeking.
A joint renewal process used to model event based data In many industrial situations, where systems must be monitored using data recorded throughout a historical period of observation, one cannot fully rely on sensor data, but often only has event data to work with. This, in particular, holds for legacy data, whose evaluation is of interest to systems analysts, reliability planners, maintenance engineers etc. Event data, herein defined as a collection of triples containing a time stamp, a failure code and eventually a descriptive text, can best be evaluated by using the paradigm of joint renewal processes. The present paper formulates a model of such a process, which proceeds by means of state dependent event rates. The system state is defined, at each point in time, as the vector of backward times, whereby the backward time of an event is the time passed since the last occurrence of this event. The present paper suggests a mathematical model relating event rates linearly to the backward times. The parameters can then be estimated by means of the method of moments. In a subsequent step, these event rates can be used in a Monte-Carlo simulation to forecast the numbers of occurrences of each failure in a future time interval, based on the current system state. The model is illustrated by means of an example. As forecasting system malfunctions receives increasingly more attention in light of modern condition-based maintenance policies, this approach enables decision makers to use existing event data to implement state dependent maintenance measures.
A Large-Scale Comparison of Historical Text Normalization Systems There is no consensus on the state-of-the-art approach to historical text normalization. Many techniques have been proposed, including rule-based methods, distance metrics, character-based statistical machine translation, and neural encoder–decoder models, but studies have used different datasets, different evaluation methods, and have come to different conclusions. This paper presents the largest study of historical text normalization done so far. We critically survey the existing literature and report experiments on eight languages, comparing systems spanning all categories of proposed normalization techniques, analysing the effect of training data quantity, and using different evaluation methods. The datasets and scripts are made publicly available.
A Learning Approach to Secure Learning Deep Neural Networks (DNNs) have been shown to be vulnerable against adversarial examples, which are data points cleverly constructed to fool the classifier. Such attacks can be devastating in practice, especially as DNNs are being applied to ever increasing critical tasks like image recognition in autonomous driving. In this paper, we introduce a new perspective on the problem. We do so by first defining robustness of a classifier to adversarial exploitation. Next, we show that the problem of adversarial example generation and defense both can be posed as learning problems, which are duals of each other. We also show formally that our defense aims to increase robustness of the classifier. We demonstrate the efficacy of our techniques by experimenting with the MNIST and CIFAR-10 datasets.
A literature review on current approaches and applications of fuzzy expert systems The main purposes of this study are to distinguish the trends of research in publication exits for the utilisations of the fuzzy expert and knowledge-based systems that is done based on the classification of studies in the last decade. The present investigation covers 60 articles from related scholastic journals, International conference proceedings and some major literature review papers. Our outcomes reveal an upward trend in the up-to-date publications number, that is evidence of growing notoriety on the various applications of fuzzy expert systems. This raise in the reports is mainly in the medical neuro-fuzzy and fuzzy expert systems. Moreover, another most critical observation is that many modern industrial applications are extended, employing knowledge-based systems by extracting the experts’ knowledge.
A Literature Survey on Ontology of Different Computing Platforms in Smart Environments Smart environments integrates various types of technologies, including cloud computing, fog computing, and the IoT paradigm. In such environments, it is essential to organize and manage efficiently the broad and complex set of heterogeneous resources. For this reason, resources classification and categorization becomes a vital issue in the control system. In this paper we make an exhaustive literature survey about the various computing systems and architectures which defines any type of ontology in the context of smart environments, considering both, authors that explicitly propose resources categorization and authors that implicitly propose some resources classification as part of their system architecture. As part of this research survey, we have built a table that summarizes all research works considered, and which provides a compact and graphical snapshot of the current classification trends. The goal and primary motivation of this literature survey has been to understand the current state of the art and identify the gaps between the different computing paradigms involved in smart environment scenarios. As a result, we have found that it is essential to consider together several computing paradigms and technologies, and that there is not, yet, any research work that integrates a merged resources classification, taxonomy or ontology required in such heterogeneous scenarios.
A Mathematical Theory for Clustering in Metric Spaces Clustering is one of the most fundamental problems in data analysis and it has been studied extensively in the literature. Though many clustering algorithms have been proposed, clustering theories that justify the use of these clustering algorithms are still unsatisfactory. In particular, one of the fundamental challenges is to address the following question: What is a cluster in a set of data points In this paper, we make an attempt to address such a question by considering a set of data points associated with a distance measure (metric). We first propose a new cohesion measure in terms of the distance measure. Using the cohesion measure, we define a cluster as a set of points that are cohesive to themselves. For such a definition, we show there are various equivalent statements that have intuitive explanations. We then consider the second question: How do we find clusters and good partitions of clusters under such a definition For such a question, we propose a hierarchical agglomerative algorithm and a partitional algorithm. Unlike standard hierarchical agglomerative algorithms, our hierarchical agglomerative algorithm has a specific stopping criterion and it stops with a partition of clusters. Our partitional algorithm, called the K-sets algorithm in the paper, appears to be a new iterative algorithm. Unlike the Lloyd iteration that needs two-step minimization, our K-sets algorithm only takes one-step minimization. One of the most interesting findings of our paper is the duality result between a distance measure and a cohesion measure. Such a duality result leads to a dual K-sets algorithm for clustering a set of data points with a cohesion measure. The dual K-sets algorithm converges in the same way as a sequential version of the classical kernel K-means algorithm. The key difference is that a cohesion measure does not need to be positive semi-definite.
A Mathematical Theory of Interpersonal Interactions and Group Behavior Emergent collective group processes and capabilities have been studied through analysis of transactive memory, measures of group task performance, and group intelligence, among others. In their approach to collective behaviors, these approaches transcend traditional studies of group decision making that focus on how individual preferences combine through power relationships, social choice by voting, negotiation and game theory. Understanding more generally how individuals contribute to group effectiveness is important to a broad set of social challenges. Here we formalize a dynamic theory of interpersonal communications that classifies individual acts, sequences of actions, group behavioral patterns, and individuals engaged in group decision making. Group decision making occurs through a sequence of communications that convey personal attitudes and preferences among members of the group. The resulting formalism is relevant to psychosocial behavior analysis, rules of order, organizational structures and personality types, as well as formalized systems such as social choice theory. More centrally, it provides a framework for quantifying and even anticipating the structure of informal dialog, allowing specific conversations to be coded and analyzed in relation to a quantitative model of the participating individuals and the parameters that govern their interactions.
A mathematical theory of semantic development in deep neural networks An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species. Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep learning dynamics to give rise to these regularities.
A Measure of Similarity Between Graph Vertices: Applications to Synonym Extraction and Web Searching We introduce a concept of similarity between vertices of directed graphs. Let GA and GB be two directed graphs with respectively nA and nB vertices. We define a nB × nA similarity matrix S whose real entry sij expresses how similar vertex j (in GA) is to vertex i (in GB) : we say that sij is their similarity score. The similarity matrix can be obtained as the limit of the normalized even iterates of S(k+1) = BS(k)AT +BT S(k)A where A and B are adjacency matrices of the graphs and S(0) is a matrix whose entries are all equal to one. In the special case where GA = GB = G, the matrix S is square and the score sij is the similarity score between the vertices i and j of G. We point out that Kleinberg´s ‘hub and authority’ method to identify web-pages relevant to a given query can be viewed as a special case of our definition in the case where one of the graphs has two vertices and a unique directed edge between them. In analogy to Kleinberg, we show that our similarity scores are given by the components of a dominant eigenvector of a non-negative matrix. Potential applications of our similarity concept are numerous. We illustrate an application for the automatic extraction of synonyms in a monolingual dictionary.
A Model Explanation System We propose a general model explanation system (MES) for ‘explaining’ the output of black box classifiers. In this introduction we use the motivating example of a classifier trained to detect fraud in a credit card transaction history. The key aspect is that we provide explanations applicable to a single prediction, rather than provide an interpretable set of parameters. The labels in the provided examples are usually negative. Hence, we focus on explaining positive predictions (alerts). In many classification applications, but especially in fraud detection, there is an expectation of false positives. Alerts are given to a human analyst before any further action is taken. Analysts often insist on understanding ‘why’ there was an alert, since an opaque alert makes it difficult for them to proceed. Analogous scenarios occur in computer vision , credit risk , spam detection , etc. Furthermore, the MES framework is useful for model criticism. In the world of generative models, practitioners often generate synthetic data from a trained model to get an idea of ‘what the model is doing’. Our MES framework augments such tools. As an added benefit, MES is applicable to completely non-probabilistic black boxes that only provide hard labels. In Section 3 we use MES to visualize the decisions of a face recognition system.
A Model for General Intelligence The overarching problem in artificial intelligence (AI) is that we do not understand the intelligence process well enough to enable the development of adequate computational models. Much work has been done in AI over the years at lower levels, but a big part of what has been missing involves the high level, abstract, general nature of intelligence. We address this gap by developing a model for general intelligence. To accomplish this, we focus on three basic aspects of intelligence. First, we must realize the general order and nature of intelligence at a high level. Second, we must come to know what these realizations mean with respect to the overall intelligence process. Third, we must describe these realizations as clearly as possible. We propose a hierarchical model to help capture and exploit the order within intelligence. The underlying order involves patterns of signals that become organized, stored and activated in space and time. These patterns can be described using a simple, general hierarchy, with physical signals at the lowest level, information in the middle, and abstract signal representations at the top. This high level perspective provides a big picture that literally helps us see the intelligence process, thereby enabling fundamental realizations, a better understanding and clear descriptions of the intelligence process. The resulting model can be used to support all kinds of information processing across multiple levels of abstraction. As computer technology improves, and as cooperation increases between humans and computers, people will become more efficient and more productive in performing their information processing tasks.
A Model of Modeling We propose a formal model of scientific modeling, geared to applications of decision theory and game theory. The model highlights the freedom that modelers have in conceptualizing social phenomena using general paradigms in these elds. It may shed some light on the distinctions between (i) refutation of a theory and a paradigm, (ii) notions of rationality, (iii) modes of application of decision models, and (iv) roles of economics as an academic discipline. Moreover, the model suggests that all four distinctions have some common features that are captured by the model.
A model of text for experimentation in the social sciences Statistical models of text have become increasingly popular in statistics and com- puter science as a method of exploring large document collections. Social scientists often want to move beyond exploration, to measurement and experimentation, and make inference about social and political processes that drive discourse and content. In this paper, we develop a model of text data that supports this type of substantive re- search. Our approach is to posit a hierarchical mixed membership model for analyzing topical content of documents, in which mixing weights are parameterized by observed covariates. In this model, topical prevalence and topical content are speci ed as a sim- ple generalized linear model on an arbitrary number of document-level covariates, such as news source and time of release, enabling researchers to introduce elements of the experimental design that informed document collection into the model, within a gen- erally applicable framework. We demonstrate the proposed methodology by analyzing a collection of news reports about China, where we allow the prevalence of topics to evolve over time and vary across newswire services. Our methods help quantify the e ect of news wire source on both the frequency and nature of topic coverage. All the methods we describe are available as part of the open source R package stm.
A Natural Language Query Interface to Structured Information Accessing structured data such as that encoded in ontologies and knowledge bases can be done using either syntactically complex formal query languages like SPARQL or complicated form interfaces that require expensive customisation to each particular application domain. This paper presents the QuestIO system – a natural language interface for accessing structured information, that is domain independent and easy to use without training. It aims to bring the simplicity of Google´s search interface to conceptual retrieval by automatically converting short conceptual queries into formal ones, which can then be executed against any semantic repository. QuestIO was developed specifically to be robust with regard to language ambiguities, incomplete or syntactically ill-formed queries, by harnessing the structure of ontologies, fuzzy string matching, and ontologymotivated similarity metrics.
A Neural Bayesian Estimator for Conditional Probability Densities This article describes a robust algorithm to estimate a conditional probability density f(t|x) as a non-parametric smooth regression function. It is based on a neural network and the Bayesian interpretation of the network output as a posteriori probabability. The network is trained using example events from history or simulation, which define the underlying probability density f(t, x). Once trained, the network is applied on new, unknown examples x, for which it can predict the probability distribution of the target variable t. Event-by-event knowledge of the smooth function f(t|x) can be very useful, e.g. in maximum likelihood fits or for forecasting tasks. No assumptions are necessary about the distribution, and non-Gaussian tails are accounted for automatically. Important quantities like median, mean value, left and right standard deviations, moments and expectation values of any function of t are readily derived from it. The algorithm can be considered as an event-by-event unfolding and leads to statistically optimal reconstruction. The largest benefit of the method lies in complicated problems, when the measurements x are only relatively weakly correlated to the output t. As to assure optimal generalisation features and to avoid overfitting, the networks are regularised by extended versions of weight decay. The regularisation parameters are determined during the online-learning of the network by relations obtained from Bayesian statistics. Some toy Monte Carlo tests and first real application examples from high-energy physics and econometry are discussed.
A new look at clustering through the lens of deep convolutional neural networks Classification and clustering have been studied separately in machine learning and computer vision. Inspired by the recent success of deep learning models in solving various vision problems (e.g., object recognition, semantic segmentation) and the fact that humans serve as the gold standard in assessing clustering algorithms, here, we advocate for a unified treatment of the two problems and suggest that hierarchical frameworks that progressively build complex patterns on top of the simpler ones (e.g., convolutional neural networks) offer a promising solution. We do not dwell much on the learning mechanisms in these frameworks as they are still a matter of debate, with respect to biological constraints. Instead, we emphasize on the compositionality of the real world structures and objects. In particular, we show that CNNs, trained end to end using back propagation with noisy labels, are able to cluster data points belonging to several overlapping shapes, and do so much better than the state of the art algorithms. The main takeaway lesson from our study is that mechanisms of human vision, particularly the hierarchal organization of the visual ventral stream should be taken into account in clustering algorithms (e.g., for learning representations in an unsupervised manner or with minimum supervision) to reach human level clustering performance. This, by no means, suggests that other methods do not hold merits. For example, methods relying on pairwise affinities (e.g., spectral clustering) have been very successful in many cases but still fail in some cases (e.g., overlapping clusters).
A New View of Predictive State Methods for Dynamical System Learning Recently there has been substantial interest in predictive state methods for learning dynamical systems: these algorithms are popular since they often offer a good tradeoff between computational speed and statistical efficiency. Despite their desirable properties, though, predictive state methods can sometimes be difficult to use in practice. E.g., in contrast to the rich literature on supervised learning methods, which allows us to choose from an extensive menu of models and algorithms to suit the prior beliefs we have about properties of the function to be learned, predictive state dynamical system learning methods are comparatively inflexible: it is as if we were restricted to use only linear regression instead of being allowed to choose decision trees, nonparametric regression, or the lasso. To address this problem, we propose a new view of predictive state methods in terms of instrumentalvariable regression. This view allows us to construct a wide variety of dynamical system learners simply by swapping in different supervised learning methods. We demonstrate the effectiveness of our proposed methods by experimenting with non-linear regression to learn a hidden Markov model, showing that the resulting algorithm outperforms its linear counterpart; the correctness of this algorithm follows directly from our general analysis.
A Non-Geek´s Big Data Playbook This Big Data Playbook demonstrates in six common ‘plays’ how Apache Hadoop supports and extends the EDW ecosystem.
A novel algorithm for fast and scalable subspace clustering of high-dimensional data Rapid growth of high dimensional datasets in recent years has created an emergent need to extract the knowledge underlying them. Clustering is the process of automatically finding groups of similar data points in the space of the dimensions or attributes of a dataset. Finding clusters in the high dimensional datasets is an important and challenging data mining problem. Data group together differently under different subsets of dimensions, called subspaces. Quite often a dataset can be better understood by clustering it in its subspaces, a process called subspace clustering. But the exponential growth in the number of these subspaces with the dimensionality of data makes the whole process of subspace clustering computationally very expensive. There is a growing demand for efficient and scalable subspace clustering solutions in many Big data application domains like biology, computer vision, astronomy and social networking. Apriori based hierarchical clustering is a promising approach to find all possible higher dimensional subspace clusters from the lower dimensional clusters using a bottom-up process. However, the performance of the existing algorithms based on this approach deteriorates drastically with the increase in the number of dimensions. Most of these algorithms require multiple database scans and generate a large number of redundant subspace clusters, either implicitly or explicitly, during the clustering process. In this paper, we present SUBSCALE, a novel clustering algorithm to find non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality of the dataset and is highly parallelizable. We present the details of the SUBSCALE algorithm and its evaluation in this paper.
A novel algorithmic approach to Bayesian Logic Regression Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has remained less well known than other approaches to epistatic association mapping. Here we will adopt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects.
A novel framework to analyze road accident time series data Road accident data analysis plays an important role in identifying key factors associated with road accidents. These associated factors help in taking preventive measures to overcome the road accidents. Various studies have been done on road accident data analysis using traditional statistical techniques and data mining techniques. All these studies focused on identifying key factors associated with road accidents in different countries. Road accident is uncertain and unpredictable events which can occur in any circumstances. Also, road accidents do not have similar impacts in every region of the districts. There are chances that road accident rate is increasing in a certain district but it has some lower impact in other districts. Hence, the more focus on road safety should be on those regions or districts where road accident trend is increasing. Time series analysis is an important area of study which can be helpful in identifying the increasing or decreasing trends in different districts. In this paper, we have proposed a framework to analyze road accident time series data that takes 39 time series data of 39 districts of Gujrat and Uttarakhand state of India. This framework segments the time series data into different clusters. A time series merging algorithm is proposed to find the representative time series (RTS) for each cluster. This RTS is further used for trend analysis of different clusters. The results reveals that road accident trend is going to increase in certain clusters and those districts should be the prime concern to take preventive measure to overcome the road accidents.
A Practical Guide to Support Vector Classification The support vector machine (SVM) is a popular classi cation technique. However, beginners who are not familiar with SVM often get unsatisfactory results since they miss some easy but signi cant steps. In this guide, we propose a simple procedure which usually gives reasonable results.
A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines Many of the existing machine learning algorithms, both supervised and unsupervised, depend on the quality of the input characteristics to generate a good model. The amount of these variables is also important, since performance tends to decline as the input dimensionality increases, hence the interest in using feature fusion techniques, able to produce feature sets that are more compact and higher level. A plethora of procedures to fuse original variables for producing new ones has been developed in the past decades. The most basic ones use linear combinations of the original variables, such as PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis), while others find manifold embeddings of lower dimensionality based on non-linear combinations, such as Isomap or LLE (Linear Locally Embedding) techniques. More recently, autoencoders (AEs) have emerged as an alternative to manifold learning for conducting nonlinear feature fusion. Dozens of AE models have been proposed lately, each with its own specific traits. Although many of them can be used to generate reduced feature sets through the fusion of the original ones, there also AEs designed with other applications in mind. The goal of this paper is to provide the reader with a broad view of what an AE is, how they are used for feature fusion, a taxonomy gathering a broad range of models, and how they relate to other classical techniques. In addition, a set of didactic guidelines on how to choose the proper AE for a given task is supplied, together with a discussion of the software tools available. Finally, two case studies illustrate the usage of AEs with datasets of handwritten digits and breast cancer.
A Primer on Neural Network Models for Natural Language Processing Over the past few years, neural networks have re-emerged as powerful machine-learning models, yielding state-of-the-art results in elds such as image recognition and speech processing. More recently, neural network models started to be applied also to textual natural language signals, again with very promising results. This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques. The tutorial covers input encoding for natural language tasks, feed-forward networks, convolutional networks, recurrent networks and recursive networks, as well as the computation graph abstraction for automatic gradient computation.
A Probabilistic Theory of Deep Learning A grand challenge in machine learning is the development of computational algorithms that match or outperform humans in perceptual inference tasks such as visual object and speech recognition. The key factor complicating such tasks is the presence of numerous nuisance variables, for instance, the unknown object position, orientation, and scale in object recognition or the unknown voice pronunciation, pitch, and speed in speech recognition. Recently, a new breed of deep learning algorithms have emerged for high-nuisance inference tasks; they are constructed from many layers of alternating linear and nonlinear processing units and are trained using large-scale algorithms and massive amounts of training data. The recent success of deep learning systems is impressive – they now routinely yield pattern recognition systems with nearor super-human capabilities – but a fundamental question remains: Why do they work Intuitions abound, but a coherent framework for understanding, analyzing, and synthesizing deep learning architectures has remained elusive. We answer this question by developing a new probabilistic framework for deep learning based on a Bayesian generative probabilistic model that explicitly captures variation due to nuisance variables. The graphical structure of the model enables it to be learned from data using classical expectation-maximization techniques. Furthermore, by relaxing the generative model to a discriminative one, we can recover two of the current leading deep learning systems, deep convolutional neural networks (DCNs) and random decision forests (RDFs), providing insights into their successes and shortcomings as well as a principled route to their improvement.
A rational analysis of curiosity We present a rational analysis of curiosity, proposing that people’s curiosity is driven by seeking stimuli that maximize their ability to make appropriate responses in the future. This perspective offers a way to unify previous theories of curiosity into a single framework. Experimental results confirm our model’s predictions, showing how the relationship between curiosity and confidence can change significantly depending on the nature of the environment.
A Recent Survey on the Applications of Genetic Programming in Image Processing During the last two decades, Genetic Programming (GP) has been largely used to tackle optimization, classification, and automatic features selection related tasks. The widespread use of GP is mainly due to its flexible and comprehensible tree-type structure. Similarly, research is also gaining momentum in the field of Image Processing (IP) because of its promising results over wide areas of applications ranging from medical IP to multispectral imaging. IP is mainly involved in applications such as computer vision, pattern recognition, image compression, storage and transmission, and medical diagnostics. This prevailing nature of images and their associated algorithm i.e complexities gave an impetus to the exploration of GP. GP has thus been used in different ways for IP since its inception. Many interesting GP techniques have been developed and employed in the field of IP. To give the research community an extensive view of these techniques, this paper presents the diverse applications of GP in IP and provides useful resources for further research. Also, comparison of different parameters used in ten different applications of IP are summarized in tabular form. Moreover, analysis of different parameters used in IP related tasks is carried-out to save the time needed in future for evaluating the parameters of GP. As more advancement is made in GP methodologies, its success in solving complex tasks not only related to IP but also in other fields will increase. Additionally, guidelines are provided for applying GP in IP related tasks, pros and cons of GP techniques are discussed, and some future directions are also set.
A Reliability Theory of Truth Our approach is basically a coherence approach, but we avoid the well-known pitfalls of coherence theories of truth. Consistency is replaced by reliability, which expresses support and attack, and, in principle, every theory (or agent, message) counts. At the same time, we do not require a priviledged access to ‘reality’. A centerpiece of our approach is that we attribute reliability also to agents, messages, etc., so an unreliable source of information will be less important in future. Our ideas can also be extended to value systems, and even actions, e.g., of animals.
A Revealing Introduction to Hidden Markov Models Suppose we want to determine the average annual temperature at a particular location on earth over a series of years. To make it interesting, suppose the years we are concerned with lie in the distant past, before thermometers were invented. Since we can’t go back in time, we instead look for indirect evidence of the temperature…
A review and comparative study on functional time series techniques This paper reviews the main estimation and prediction results derived in the context of functional time series, when Hilbert and Banach spaces are considered, specially, in the context of autoregressive processes of order one (ARH(1) and ARB(1) processes, for H and B being a Hilbert and Banach space, respectively). Particularly, we pay attention to the estimation and prediction results, and statistical tests, derived in both parametric and non-parametric frameworks. A comparative study between different ARH(1) prediction approaches is developed in the simulation study undertaken.
A Review for Weighted MinHash Algorithms Data similarity (or distance) computation is a fundamental research topic which underpins many high-level applications based on similarity measures in machine learning and data mining. However, in large-scale real-world scenarios, the exact similarity computation has become daunting due to ‘3V’ nature (volume, velocity and variety) of big data. In such cases, the hashing techniques have been verified to efficiently conduct similarity estimation in terms of both theory and practice. Currently, MinHash is a popular technique for efficiently estimating the Jaccard similarity of binary sets and furthermore, weighted MinHash is generalized to estimate the generalized Jaccard similarity of weighted sets. This review focuses on categorizing and discussing the existing works of weighted MinHash algorithms. In this review, we mainly categorize the Weighted MinHash algorithms into quantization-based approaches, ‘active index’-based ones and others, and show the evolution and inherent connection of the weighted MinHash algorithms, from the integer weighted MinHash algorithms to real-valued weighted MinHash ones (particularly the Consistent Weighted Sampling scheme). Also, we have developed a python toolbox for the algorithms, and released it in our github. Based on the toolbox, we experimentally conduct a comprehensive comparative study of the standard MinHash algorithm and the weighted MinHash ones.
A Review of 40 Years of Cognitive Architecture Research: Focus on Perception, Attention, Learning and Applications In this paper we present a broad overview of the last 40 years of research on cognitive architectures. Although the number of existing architectures is nearing several hundred, most of the existing surveys do not reflect this growth and focus on a handful of well-established architectures. While their contributions are undeniable, they represent only a part of the research in the field. Thus, in this survey we wanted to shift the focus towards a more inclusive and high-level overview of the research in cognitive architectures. Our final set of 86 architectures includes 55 that are still actively developed, and borrow from a diverse set of disciplines, spanning areas from psychoanalysis to neuroscience. To keep the length of this paper within reasonable limits we discuss only the core cognitive abilities, such as perception, attention mechanisms, learning and memory structure. To assess the breadth of practical applications of cognitive architectures we gathered information on over 700 practical projects implemented using the cognitive architectures in our list. We use various visualization techniques to highlight overall trends in the development of the field. Our analysis of practical applications shows that most architectures are very narrowly focused on a particular application domain. Furthermore, there is an apparent gap between general research in robotics and computer vision and research in these areas within the cognitive architectures field. It is very clear that biologically inspired models do not have the same range and efficiency compared to the systems based on engineering principles and heuristics. Another observation is related to a general lack of collaboration. Several factors hinder communication, such as the closed nature of the individual projects (only one-third of the reviewed here architectures are open-source) and terminological differences.
A review of change point detection methods In this work, methods to detect one or several change points in multivariate time series are reviewed. They include retrospective (off-line) procedure such as maximum likelihood estimation, regression, kernel methods, etc. In this large area of research, applications are numerous and diverse; many different models and operational constraints (on precision, complexity,…) exist. A formal framework for change point detection is introduced to give sens to this significant body of work. Precisely, all methods are described as a collection of three elements: a cost function, a search method and a constraint on the number of changes to detect. For a given method, we detail the assumed signal model, the associated algorithm, theoretical guarantees (if any) and the application domain. This approach is intended to facilitate prototyping of change point detection methods: for a given segmentation task, one can appropriately choose among the described elements to design an algorithm.
A Review of Changepoint Detection Models The objective of the change-point detection is to discover the abrupt property changes lying behind the time-series data. In this paper, we firstly summarize the definition and in-depth implication of the changepoint detection. The next stage is to elaborate traditional and some alternative model-based changepoint detection algorithms. Finally, we try to go a bit further in the theory and look into future research directions.
A Review of Cooperative Multi-Agent Deep Reinforcement Learning Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. In this review article, we have mostly focused on recent papers on Multi-Agent Reinforcement Learning (MARL) than the older papers, unless it was necessary. Several ideas and papers are proposed with different notations, and we tried our best to unify them with a single notation and categorize them by their relevance. In particular, we have focused on five common approaches on modeling and solving multi-agent reinforcement learning problems: (I) independent-learners, (II) fully observable critic, (III) value function decomposition, (IV) consensus, (IV) learn to communicate. Moreover, we discuss some new emerging research areas in MARL along with the relevant recent papers. In addition, some of the recent applications of MARL in real world are discussed. Finally, a list of available environments for MARL research are provided and the paper is concluded with proposals on the possible research directions.
A Review of Data Fusion Techniques In general, all tasks that demand any type of parameter estimation from multiple sources can benefit from the use of data/information fusion methods. The terms information fusion and data fusion are typically employed as synonyms; but in some scenarios, the term data fusion is used for raw data (obtained directly from the sensors) and the term information fusion is employed to define already processed data. In this sense, the term information fusion implies a higher semantic level than data fusion. Other terms associated with data fusion that typically appear in the literature include decision fusion, data combination, data aggregation, multisensor data fusion, and sensor fusion. Researchers in this field agree that the most accepted definition of data fusion was provided by the Joint Directors of Laboratories (JDL) workshop : ‘A multi-level process dealing with the association, correlation, combination of data and information from single and multiple sources to achieve refined position, identify estimates and complete and timely assessments of situations, threats and their significance.’ Hall and Llinas provided the following well-known definition of data fusion: ‘data fusion techniques combine data from multiple sensors and related information from associated databases to achieve improved accuracy and more specific inferences than could be achieved by the use of a single sensor alone.’ Briefly, we can define data fusion as a combination of multiple sources to obtain improved information; in this context, improved information means less expensive, higher quality, or more relevant information. Data fusion techniques have been extensively employed on multisensor environments with the aim of fusing and aggregating data from different sensors; however, these techniques can also be applied to other domains, such as text processing.The goal of using data fusion inmultisensor environments is to obtain a lower detection error probability and a higher reliability by using data from multiple distributed sources. The available data fusion techniques can be classified into three nonexclusive categories: (i) data association, (ii) state estimation, and (iii) decision fusion. Because of the large number of published papers on data fusion, this paper does not aim to provide an exhaustive review of all of the studies; instead, the objective is to highlight the main steps that are involved in the data fusion framework and to review the most common techniques for each step. The remainder of this paper continues as follows. The next section provides various classification categories for data fusion techniques. Then, Section 3 describes the most common methods for data association tasks. Section 4 provides a review of techniques under the state estimation category. Next, the most common techniques for decision fusion are enumerated in Section 5. Finally, the conclusions obtained from reviewing the different methods are highlighted in Section 6.
A review of data mining using big data in health informatics The amount of data produced within Health Informatics has grown to be quite vast, and analysis of this Big Data grants potentially limitless possibilities for knowledge to be gained. In addition, this information can improve the quality of healthcare offered to patients. However, there are a number of issues that arise when dealing with these vast quantities of data, especially how to analyze this data in a reliable manner. The basic goal of Health Informatics is to take in real world medical data from all levels of human existence to help advance our understanding of medicine and medical practice. This paper will present recent research using Big Data tools and approaches for the analysis of Health Informatics data gathered at multiple levels, including the molecular, tissue, patient, and population levels. In addition to gathering data at multiple levels, multiple levels of questions are addressed: human-scale biology, clinical-scale, and epidemic-scale. We will also analyze and examine possible future work for each of these areas, as well as how combining data from each level may provide the most promising approach to gain the most knowledge in Health Informatics.
A Review of Deep Learning with Special Emphasis on Architectures, Applications and Recent Trends Deep learning (DL) has solved a problem that as little as five years ago was thought by many to be intractable – the automatic recognition of patterns in data; and it can do so with accuracy that often surpasses human beings. It has solved problems beyond the realm of traditional, hand-crafted machine learning algorithms and captured the imagination of practitioners trying to make sense out of the flood of data that now inundates our society. As public awareness of the efficacy of DL increases so does the desire to make use of it. But even for highly trained professionals it can be daunting to approach the rapidly increasing body of knowledge produced by experts in the field. Where does one start? How does one determine if a particular model is applicable to their problem? How does one train and deploy such a network? A primer on the subject can be a good place to start. With that in mind, we present an overview of some of the key multilayer ANNs that comprise DL. We also discuss some new automatic architecture optimization protocols that use multi-agent approaches. Further, since guaranteeing system uptime is becoming critical to many computer applications, we include a section on using neural networks for fault detection and subsequent mitigation. This is followed by an exploratory survey of several application areas where DL has emerged as a game-changing technology: anomalous behavior detection in financial applications or in financial time-series forecasting, predictive and prescriptive analytics, medical image processing and analysis and power systems research. The thrust of this review is to outline emerging areas of application-oriented research within the DL community as well as to provide a reference to researchers seeking to use it in their work for what it does best: statistical pattern recognition with unparalleled learning capacity with the ability to scale with information.
A Review of Different Word Embeddings for Sentiment Classification using Deep Learning The web is loaded with textual content, and Natural Language Processing is a standout amongst the most vital fields in Machine Learning. But when data is huge simple Machine Learning algorithms are not able to handle it and that is when Deep Learning comes into play which based on Neural Networks. However since neural networks cannot process raw text, we have to change over them through some diverse strategies of word embedding. This paper demonstrates those distinctive word embedding strategies implemented on an Amazon Review Dataset, which has two sentiments to be classified: Happy and Unhappy based on numerous customer reviews. Moreover we demonstrate the distinction in accuracy with a discourse about which word embedding to apply when.
A Review of Evaluation Techniques for Social Dialogue Systems In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review current evaluation methods, focusing on automatic metrics. We conclude that turn-based metrics often ignore the context and do not account for the fact that several replies are valid, while end-of-dialogue rewards are mainly hand-crafted. Both lack grounding in human perceptions.
A Review of Features for the Discrimination of Twitter Users: Application to the Prediction of Offline Influence Many works related to Twitter aim at characterizing its users in some way: role on the service (spammers, bots, organizations, etc.), nature of the user (socio-professional category, age, etc.), topics of interest, and others. However, for a given user classification problem, it is very difficult to select a set of appropriate features, because the many features described in the literature are very heterogeneous, with name overlaps and collisions, and numerous very close variants. In this article, we review a wide range of such features. In order to present a clear state-of-the-art description, we unify their names, definitions and relationships, and we propose a new, neutral, typology. We then illustrate the interest of our review by applying a selection of these features to the offline influence detection problem. This task consists in identifying users which are influential in real-life, based on their Twitter account and related data. We show that most features deemed efficient to predict online influence, such as the numbers of retweets and followers, are not relevant to this problem. However, We propose several content-based approaches to label Twitter users as Influencers or not. We also rank them according to a predicted influence level. Our proposals are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art methods.
A review of instance selection methods In supervised learning, a training set providing previously known information is used to classify new instances. Commonly, several instances are stored in the training set but some of them are not useful for classifying therefore it is possible to get acceptable classification rates ignoring non useful cases; this process is known as instance selection. Through instance selection the training set is reduced which allows reducing runtimes in the classification and/or training stages of classifiers. This work is focused on presenting a survey of the main instance selection methods reported in the literature.
A Review of Literature on Parallel Constraint Solving As multicore computing is now standard, it seems irresponsible for constraints researchers to ignore the implications of it. Researchers need to address a number of issues to exploit parallelism, such as: investigating which constraint algorithms are amenable to parallelisation; whether to use shared memory or distributed computation; whether to use static or dynamic decomposition; and how to best exploit portfolios and cooperating search. We review the literature, and see that we can sometimes do quite well, some of the time, on some instances, but we are far from a general solution. Yet there seems to be little overall guidance that can be given on how best to exploit multicore computers to speed up constraint solving. We hope at least that this survey will provide useful pointers to future researchers wishing to correct this situation. Under consideration in Theory and Practice of Logic Programming (TPLP).
A Review of Modularization Techniques in Artificial Neural Networks Artificial neural networks (ANNs) have achieved significant success in tackling classical and modern machine learning problems. As learning problems grow in scale and complexity, and expand into multi-disciplinary territory, a more modular approach for scaling ANNs will be needed. Modular neural networks (MNNs) are neural networks that embody the concepts and principles of modularity. MNNs adopt a large number of different techniques for achieving modularization. Previous surveys of modularization techniques are relatively scarce in their systematic analysis of MNNs, focusing mostly on empirical comparisons and lacking an extensive taxonomical framework. In this review, we aim to establish a solid taxonomy that captures the essential properties and relationships of the different variants of MNNs. Based on an investigation of the different levels at which modularization techniques act, we attempt to provide a universal and systematic framework for theorists studying MNNs, also trying along the way to emphasise the strengths and weaknesses of different modularization approaches in order to highlight good practices for neural network practitioners.
A review of neuro-fuzzy systems based on intelligent control The system’s ability to adapt and self-organize are two key factors when it comes to how well the system can survive the changes to the environment and the plant they work within. Intelligent control improves these two factors in controllers. Considering the increasing complexity of dynamic systems along with their need for feedback controls, using more complicated controls has become necessary and intelligent control can be a suitable response to this necessity. This paper briefly describes the structure of intelligent control and provides a review on fuzzy logic and neural networks which are some of the base methods for intelligent control. The different aspects of these two methods are then compared together and an example of a combined method is presented.
A Review of Point Cloud Semantic Segmentation 3D Point Cloud Semantic Segmentation (PCSS) is attracting increasing interest, due to its applicability in remote sensing, computer vision and robotics, and due to the new possibilities offered by deep learning techniques. In order to provide a needed up-to-date review of recent developments in PCSS, this article summarizes existing studies on this topic. Firstly, we outline the acquisition and evolution of the 3D point cloud from the perspective of remote sensing and computer vision, as well as the published benchmarks for PCSS studies. Then, traditional and advanced techniques used for Point Cloud Segmentation (PCS) and PCSS are reviewed and compared. Finally, important issues and open questions in PCSS studies are discussed.
A Review of Relational Machine Learning for Knowledge Graphs Relational machine learning studies methods for the statistical analysis of relational, or graph-structured, data. In this paper, we provide a review of how such statistical models can be ‘trained’ on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph). In particular, we discuss two different kinds of statistical relational models, both of which can scale to massive datasets. The first is based on tensor factorization methods and related latent variable models. The second is based on mining observable patterns in the graph. We also show how to combine these latent and observable models to get improved modeling power at decreased computational cost. Finally, we discuss how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. In particular, we discuss Google’s Knowledge Vault project.
A Review of Self-Exciting Spatio-Temporal Point Processes and Their Applications Self-exciting spatio-temporal point process models predict the rate of events as a function of space, time, and the previous history of events. These models naturally capture triggering and clustering behavior, and have been widely used in fields where spatio-temporal clustering of events is observed, such as earthquake modeling, infectious disease, and crime. In the past several decades, advances have been made in estimation, inference, simulation, and diagnostic tools for self-exciting point process models. In this review, I describe the basic theory, survey related estimation and inference techniques from each field, highlight several key applications, and suggest directions for future research.
A review of single-source unsupervised domain adaptation Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the questions: when and how a classifier can learn from a source domain and generalize to a target domain. As for when, we review conditions that allow for cross-domain generalization error bounds. As for how, we present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods focus on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods focus on alternative estimators, such as robust, minimax or Bayesian. Our categorization highlights recurring ideas and raises a number of questions important to further research.
A review of swarmalators and their potential in bio-inspired computing From fireflies to heart cells, many systems in Nature show the remarkable ability to spontaneously fall into synchrony. By imitating Nature’s success at self-synchronizing, scientists have designed cost-effective methods to achieve synchrony in the lab, with applications ranging from wireless sensor networks to radio transmission. A similar story has occurred in the study of swarms, where inspiration from the behavior flocks of birds and schools of fish has led to ‘low-footprint’ algorithms for multi-robot systems. Here, we continue this ‘bio-inspired’ tradition, by speculating on the technological benefit of fusing swarming with synchronization. The subject of recent theoretical work, minimal models of so-called ‘swarmalator’ systems exhibit rich spatiotemporal patterns, hinting at utility in ‘bottom-up’ robotic swarms. We review the theoretical work on swarmalators, identify possible realizations in Nature, and discuss their potential applications in technology.
A Review of Theoretical and Practical Challenges of Trusted Autonomy in Big Data Despite the advances made in artificial intelligence, software agents, and robotics, there is little we see today that we can truly call a fully autonomous system. We conjecture that the main inhibitor for advancing autonomy is lack of trust. Trusted autonomy is the scientific and engineering field to establish the foundations and ground work for developing trusted autonomous systems (robotics and software agents) that can be used in our daily life, and can be integrated with humans seamlessly, naturally and efficiently. In this paper, we review this literature to reveal opportunities for researchers and practitioners to work on topics that can create a leap forward in advancing the field of trusted autonomy. We focus the paper on the `trust’ component as the uniting technology between humans and machines. Our inquiry into this topic revolves around three sub-topics: (1) reviewing and positioning the trust modelling literature for the purpose of trusted autonomy; (2) reviewing a critical subset of sensor technologies that allow a machine to sense human states; and (3) distilling some critical questions for advancing the field of trusted autonomy. The inquiry is augmented with conceptual models that we propose along the way by recompiling and reshaping the literature into forms that enables trusted autonomous systems to become a reality. The paper offers a vision for a Trusted Cyborg Swarm, an extension of our previous Cognitive Cyber Symbiosis concept, whereby humans and machines meld together in a harmonious, seamless, and coordinated manner.
A Review on Algorithms for Constraint-based Causal Discovery Causal discovery studies the problem of mining causal relationships between variables from data, which is of primary interest in science. During the past decades, significant amount of progresses have been made toward this fundamental data mining paradigm. Recent years, as the availability of abundant large-sized and complex observational data, the constrain-based approaches have gradually attracted a lot of interest and have been widely applied to many diverse real-world problems due to the fast running speed and easy generalizing to the problem of causal insufficiency. In this paper, we aim to review the constraint-based causal discovery algorithms. Firstly, we discuss the learning paradigm of the constraint-based approaches. Secondly and primarily, the state-of-the-art constraint-based casual inference algorithms are surveyed with the detailed analysis. Thirdly, several related open-source software packages and benchmark data repositories are briefly summarized. As a conclusion, some open problems in constraint-based causal discovery are outlined for future research.
A Review on Deep Learning Techniques Applied to Semantic Segmentation Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.
A Review on Recommendation Systems: Context-aware to Social-based The number of Internet users had grown rapidly enticing companies and cooperations to make full use of recommendation infrastructures. Consequently, online advertisement companies emerged to aid us in the presence of numerous items and users. Even as a user, you may find yourself drowned in a set of items that you think you might need, but you are not sure if you should try them. Those items could be online services, products, places or even a person for a friendship. Therefore, we need recommender systems that pave the way and help us making good decisions. This paper provides a review on traditional recommendation systems, recommendation system evaluations and metrics, context-aware recommendation systems, and social-based recommendation systems. While it is hard to include all the information in a brief review paper, we try to have an introductory review over the essentials of recommendation systems. More detailed information on each chapter will be found in the corresponding references. For the purpose of explaining the concept in a different way, we provided slides available on https://…/recommender-systems-97094937.
A review on statistical inference methods for discrete Markov random fields Developing satisfactory methodology for the analysis of Markov random field is a very challenging task. Indeed, due to the Markovian dependence structure, the normalizing constant of the fields cannot be computed using standard analytical or numerical methods. This forms a central issue for any statistical approach as the likelihood is an integral part of the procedure. Furthermore, such unobserved fields cannot be integrated out and the likelihood evaluation becomes a doubly intractable problem. This report gives an overview of some of the methods used in the literature to analyse such observed or unobserved random fields.
A second-quantised Shannon theory Shannon’s theory of information was built on the assumption that the information carriers were classical systems. Its quantum counterpart, quantum Shannon theory, explores the new possibilities that arise when the information carriers are quantum particles. Traditionally,quantum Shannon theory has focussed on scenarios where the internal state of the particles is quantum, while their trajectory in spacetime is classical. Here we propose a second level of quantisation where both the information and its propagation in spacetime is treated quantum mechanically. The framework is illustrated with a number of examples, showcasing some of the couterintuitive phenomena taking place when information travels in a superposition of paths.
A Security Framework for Wireless Sensor Networks: Theory and Practice Wireless sensor networks are often deployed in public or otherwise untrusted and even hostile environments, which prompts a number of security issues. Although security is a necessity in other types of networks, it is much more so in sensor networks due to the resource-constraint, susceptibility to physical capture, and wireless nature. In this work we emphasize two security issues: (1) secure communication infrastructure and (2) secure nodes scheduling algorithm. Due to resource constraints, specific strategies are often necessary to preserve the network’s lifetime and its quality of service. For instance, to reduce communication costs nodes can go to sleep mode periodically (nodes scheduling). These strategies must be proven as secure, but protocols used to guarantee this security must be compatible with the resource preservation requirement. To achieve this goal, secure communications in such networks will be defined, together with the notions of secure scheduling. Finally, some of these security properties will be evaluated in concrete case studies.
A Selective Overview of Deep Learning Deep learning has arguably achieved tremendous success in recent years. In simple words, deep learning uses the composition of many nonlinear functions to model the complex dependency between input features and labels. While neural networks have a long history, recent advances have greatly improved their performance in computer vision, natural language processing, etc. From the statistical and scientific perspective, it is natural to ask: What is deep learning? What are the new characteristics of deep learning, compared with classical methods? What are the theoretical foundations of deep learning? To answer these questions, we introduce common neural network models (e.g., convolutional neural nets, recurrent neural nets, generative adversarial nets) and training techniques (e.g., stochastic gradient descent, dropout, batch normalization) from a statistical point of view. Along the way, we highlight new characteristics of deep learning (including depth and over-parametrization) and explain their practical and theoretical benefits. We also sample recent results on theories of deep learning, many of which are only suggestive. While a complete understanding of deep learning remains elusive, we hope that our perspectives and discussions serve as a stimulus for new statistical research.
A Short Course on Network Analysis These are lecture notes prepared for a short (6 hours) course given at the conference Method- ological Advances in Statistics Related to Big Data, held in Castro Urdiales, Spain, June 8-12, 2015. The course focuses on the analysis of networks without labels at the nodes. It covers de- scriptives statistics for graphs, random graphs models, and graph partitioning, including recent advances in spectral and semide nite methods.
A Short Introduction to Boosting Boosting is a general method for improving the accuracy of any given learning algorithm. This short overview paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting´s relationship to support-vector machines. Some examples of recent applications of boosting are also described.
A Short Introduction to Local Graph Clustering Methods and Software Graph clustering has many important applications in computing, but due to the increasing sizes of graphs, even traditionally fast clustering methods can be computationally expensive for real-world graphs of interest. Scalability problems led to the development of local graph clustering algorithms that come with a variety of theoretical guarantees. Rather than return a global clustering of the entire graph, local clustering algorithms return a single cluster around a given seed node or set of seed nodes. These algorithms improve scalability because they use time and memory resources that depend only on the size of the cluster returned, instead of the size of the input graph. Indeed, for many of them, their running time grows linearly with the size of the output. In addition to scalability arguments, local graph clustering algorithms have proven to be very useful for identifying and interpreting small-scale and meso-scale structure in large-scale graphs. As opposed to heuristic operational procedures, this class of algorithms comes with strong algorithmic and statistical theory. These include statistical guarantees that prove they have implicit regularization properties. One of the challenges with the existing literature on these approaches is that they are published in a wide variety of areas, including theoretical computer science, statistics, data science, and mathematics. This has made it difficult to relate the various algorithms and ideas together into a cohesive whole. We have recently been working on unifying these diverse perspectives through the lens of optimization as well as providing software to perform these computations in a cohesive fashion. In this note, we provide a brief introduction to local graph clustering, we provide some representative examples of our perspective, and we introduce our software named Local Graph Clustering (LGC).
A Short Survey of Topological Data Analysis in Time Series and Systems Analysis Topological Data Analysis (TDA) is the collection of mathematical tools that capture the structure of shapes in data. Despite computational topology and computational geometry, the utilization of TDA in time series and signal processing is relatively new. In some recent contributions, TDA has been utilized as an alternative to the conventional signal processing methods. Specifically, TDA is been considered to deal with noisy signals and time series. In these applications, TDA is used to find the shapes in data as the main properties, while the other properties are assumed much less informative. In this paper, we will review recent developments and contributions where topological data analysis especially persistent homology has been applied to time series analysis, dynamical systems and signal processing. We will cover problem statements such as stability determination, risk analysis, systems behaviour, and predicting critical transitions in financial markets.
A Short Survey On Memory Based Reinforcement Learning Reinforcement learning (RL) is a branch of machine learning which is employed to solve various sequential decision making problems without proper supervision. Due to the recent advancement of deep learning, the newly proposed Deep-RL algorithms have been able to perform extremely well in sophisticated high-dimensional environments. However, even after successes in many domains, one of the major challenge in these approaches is the high magnitude of interactions with the environment required for efficient decision making. Seeking inspiration from the brain, this problem can be solved by incorporating instance based learning by biasing the decision making on the memories of high rewarding experiences. This paper reviews various recent reinforcement learning methods which incorporate external memory to solve decision making and a survey of them is presented. We provide an overview of the different methods – along with their advantages and disadvantages, applications and the standard experimentation settings used for memory based models. This review hopes to be a helpful resource to provide key insight of the recent advances in the field and provide help in further future development of it.
A Short Survey on Probabilistic Reinforcement Learning A reinforcement learning agent tries to maximize its cumulative payoff by interacting in an unknown environment. It is important for the agent to explore suboptimal actions as well as to pick actions with highest known rewards. Yet, in sensitive domains, collecting more data with exploration is not always possible, but it is important to find a policy with a certain performance guaranty. In this paper, we present a brief survey of methods available in the literature for balancing exploration-exploitation trade off and computing robust solutions from fixed samples in reinforcement learning.
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University
A simple neural network module for relational reasoning Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human performance; text-based question answering using the bAbI suite of tasks; and complex reasoning about dynamic physical systems. Then, using a curated dataset called Sort-of-CLEVR we show that powerful convolutional networks do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs. Our work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations.
A snapshot on nonstandard supervised learning problems: taxonomy, relationships and methods Machine learning is a field which studies how machines can alter and adapt their behavior, improving their actions according to the information they are given. This field is subdivided into multiple areas, among which the best known are supervised learning (e.g. classification and regression) and unsupervised learning (e.g. clustering and association rules). Within supervised learning, most studies and research are focused on well known standard tasks, such as binary classification, multiclass classification and regression with one dependent variable. However, there are many other less known problems. These are what we generically call nonstandard supervised learning problems. The literature about them is much more sparse, and each study is directed to a specific task. Therefore, the definitions, relations and applications of this kind of learners are hard to find. The goal of this paper is to provide the reader with a broad view on the distinct variations of nonstandard supervised problems. A comprehensive taxonomy summarizing their traits is proposed. A review of the common approaches followed to accomplish them and their main applications is provided as well.
A Statistical Learning Model of Text Classification for Support Vector Machines
A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions Deep neural networks (DNNs) achieve excellent performance on standard classification tasks. However, under image quality distortions such as blur and noise, classification accuracy becomes poor. In this work, we compare the performance of DNNs with human subjects on distorted images. We show that, although DNNs perform better than or on par with humans on good quality images, DNN performance is still much lower than human performance on distorted images. We additionally find that there is little correlation in errors between DNNs and human subjects. This could be an indication that the internal representation of images are different between DNNs and the human visual system. These comparisons with human performance could be used to guide future development of more robust DNNs.
A Study of Recent Contributions on Information Extraction This paper reports on modern approaches in Information Extraction (IE) and its two main sub-tasks of Named Entity Recognition (NER) and Relation Extraction (RE). Basic concepts and the most recent approaches in this area are reviewed, which mainly include Machine Learning (ML) based approaches and the more recent trend to Deep Learning (DL) based methods.
A Study of Reinforcement Learning for Neural Machine Translation Recent studies have shown that reinforcement learning (RL) is an effective approach for improving the performance of neural machine translation (NMT) system. However, due to its instability, successfully RL training is challenging, especially in real-world systems where deep models and large datasets are leveraged. In this paper, taking several large-scale translation tasks as testbeds, we conduct a systematic study on how to train better NMT models using reinforcement learning. We provide a comprehensive comparison of several important factors (e.g., baseline reward, reward shaping) in RL training. Furthermore, to fill in the gap that it remains unclear whether RL is still beneficial when monolingual data is used, we propose a new method to leverage RL to further boost the performance of NMT systems trained with source/target monolingual data. By integrating all our findings, we obtain competitive results on WMT14 English- German, WMT17 English-Chinese, and WMT17 Chinese-English translation tasks, especially setting a state-of-the-art performance on WMT17 Chinese-English translation task.
A Study on Neural Network Language Modeling An exhaustive study on neural network language modeling (NNLM) is performed in this paper. Different architectures of basic neural network language models are described and examined. A number of different improvements over basic neural network language models, including importance sampling, word classes, caching and bidirectional recurrent neural network (BiRNN), are studied separately, and the advantages and disadvantages of every technique are evaluated. Then, the limits of neural network language modeling are explored from the aspects of model architecture and knowledge representation. Part of the statistical information from a word sequence will loss when it is processed word by word in a certain order, and the mechanism of training neural network by updating weight matrixes and vectors imposes severe restrictions on any significant enhancement of NNLM. For knowledge representation, the knowledge represented by neural network language models is the approximate probabilistic distribution of word sequences from a certain training data set rather than the knowledge of a language itself or the information conveyed by word sequences in a natural language. Finally, some directions for improving neural network language modeling further is discussed.
A Study on Overfitting in Deep Reinforcement Learning Recent years have witnessed significant progresses in deep Reinforcement Learning (RL). Empowered with large scale neural networks, carefully designed architectures, novel training algorithms and massively parallel computing devices, researchers are able to attack many challenging RL problems. However, in machine learning, more training power comes with a potential risk of more overfitting. As deep RL techniques are being applied to critical problems such as healthcare and finance, it is important to understand the generalization behaviors of the trained agents. In this paper, we conduct a systematic study of standard RL agents and find that they could overfit in various ways. Moreover, overfitting could happen “robustly”: commonly used techniques in RL that add stochasticity do not necessarily prevent or detect overfitting. In particular, the same agents and learning algorithms could have drastically different test performance, even when all of them achieve optimal rewards during training. The observations call for more principled and careful evaluation protocols in RL. We conclude with a general discussion on overfitting in RL and a study of the generalization behaviors from the perspective of inductive bias.
A summary on Maximum likelihood Estimator A general method of building a predictive model requires least square estimation at first. Then we need work on the residuals, find the confidence interval of parameters and test how well the model fits the data which are based on the normally distributed assumption of the residuals (or noises). But unfortunately the assumption is not guaranteed. Most of the time, you will have a graph of residuals that looks like another distribution rather than the normal. At this moment you could add one more factor term to your model so as to filter out the non-normal distributed noise, and then calculate the LSE again. But you may still have the same problem again. Or if you can recognize the distribution of the graph (or somehow you know the pdf of the noise), you can just calculate the MLE of the parameters of your model. This time, your work is really finished.
A Survey and Evaluation of Data Center Network Topologies Data centers are becoming increasingly popular for their flexibility and processing capabilities in the modern computing environment. They are managed by a single entity (administrator) and allow dynamic resource provisioning, performance optimization as well as efficient utilization of available resources. Each data center consists of massive compute, network and storage resources connected with physical wires. The large scale nature of data centers requires careful planning of compute, storage, network nodes, interconnection as well as inter-communication for their effective and efficient operations. In this paper, we present a comprehensive survey and taxonomy of network topologies either used in commercial data centers, or proposed by researchers working in this space. We also compare and evaluate some of those topologies using mininet as well as gem5 simulator for different traffic patterns, based on various metrics including throughput, latency and bisection bandwidth.
A Survey of Algorithms for Keyword Search on Graph Data In this chapter, we survey methods that perform keyword search on graph data. Keyword search provides a simple but user-friendly interface to retrieve information from complicated data structures. Since many real life datasets are represented by trees and graphs, keyword search has become an attractive mechanism for data of a variety of types. In this survey, we discuss methods of keyword search on schema graphs, which are abstract representation for XML data and relational data, and methods of keyword search on schema-free graphs. In our discussion, we focus on three major challenges of keyword search on graphs. First, what is the semantics of keyword search on graphs, or, what qualifies as an answer to a keyword search; second, what constitutes a good answer, or, how to rank the answers; third, how to perform keyword search efficiently. We also discuss some unresolved challenges and propose some new research directions on this topic.
A Survey of Autonomous Driving: Common Practices and Emerging Technologies Automated driving systems (ADSs) promise a safe, comfortable and efficient driving experience. However, fatalities involving vehicles equipped with ADSs are on the rise. The full potential of ADSs cannot be realized unless the robustness of state-of-the-art improved further. This paper discusses unsolved problems and surveys the technical aspect of automated driving. Studies regarding present challenges, high-level system architectures, emerging methodologies and core functions: localization, mapping, perception, planning, and human machine interface, were thoroughly reviewed. Furthermore, the state-of-the-art was implemented on our own platform and various algorithms were compared in a real-world driving setting. The paper concludes with an overview of available datasets and tools for ADS development.
A survey of Bayesian predictive methods for model assessment, selection and comparison To date, several methods exist in the statistical literature for model assessment, which purport themselves specifically as Bayesian predic- tive methods. The decision theoretic assumptions on which these methods are based are not always clearly stated in the original articles, however. The aim of this survey is to provide a unified review of Bayesian predictive model assessment and selection methods, and of methods closely related to them. We review the various assumptions that are made in this context and discuss the connections between different approaches, with an emphasis on how each method approximates the expected utility of using a Bayesian model for the purpose of predicting future data.
A Survey of Binary Similarity and Distance Measures The binary feature vector is one of the most common representations of patterns and measuring similarity and distance measures play a critical role in many problems such as clustering, classification, etc. Ever since Jaccard proposed a similarity measure to classify ecological species in 1901, numerous binary similarity and distance measures have been proposed in various fields. Applying appropriate measures results in more accurate data analysis. Notwithstanding, few comprehensive surveys on binary measures have been conducted. Hence we collected 76 binary similarity and distance measures used over the last century and reveal their correlations through the hierarchical clustering technique.
A survey of blockchain frameworks and applications The applications of the blockchain technology are still being discov-ered. When a new potential disruptive technology emerges, there is a tendency to try to solve every problem with that technology. However, it is still necessary to determine what approach is the best for each type of application. To find how distributed ledgers solve existing problems, this study looks for blockchain frameworks in the academic world. Identifying the existing frameworks can demonstrate where the interest in the technology exists and where it can be miss-ing. This study encountered several blockchain frameworks in development. However, there are few references to operational needs, testing, and deploy of the technology. With the widespread use of the technology, either integrating with pre-existing solutions, replacing legacy systems, or new implementations, the need for testing, deploying, exploration, and maintenance is expected to in-tensify.
A survey of Community Question Answering With the advent of numerous community forums, tasks associated with the same have gained importance in the recent past. With the influx of new questions every day on these forums, the issues of identifying methods to find answers to said questions, or even trying to detect duplicate questions, are of practical importance and are challenging in their own right. This paper aims at surveying some of the aforementioned issues, and methods proposed for tackling the same.
A Survey of Community Search Over Big Graphs With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many real applications, such as event organization, friend recommendation, and so on. Consequently, how to efficiently find high-quality communities from big graphs is an important research topic in the era of big data. Recently a large group of research works, called community search, have been proposed. They aim to provide efficient solutions for searching high-quality communities from large networks in real-time. Nevertheless, these works focus on different types of graphs and formulate communities in different manners, and thus it is desirable to have a comprehensive review of these works. In this survey, we conduct a thorough review of existing community search works. Moreover, we analyze and compare the quality of communities under their models, and the performance of different solutions. Furthermore, we point out new research directions. This survey does not only help researchers to have a better understanding of existing community search solutions, but also provides practitioners a better judgment on choosing the proper solutions.
A Survey of Cross-Lingual Embedding Models Cross-lingual embedding models allow us to project words from different languages into a shared embedding space. This allows us to apply models trained on languages with a lot of data, e.g. English to low-resource languages. In the following, we will survey models that seek to learn cross-lingual embeddings. We will discuss them based on the type of approach and the nature of parallel data that they employ. Finally, we will present challenges and summarize how to evaluate cross-lingual embedding models.
A Survey of Deep Learning Methods for Relation Extraction Relation Extraction is an important sub-task of Information Extraction which has the potential of employing deep learning (DL) models with the creation of large datasets using distant supervision. In this review, we compare the contributions and pitfalls of the various DL models that have been used for the task, to help guide the path ahead.
A Survey of Deep Learning Techniques for Autonomous Driving The last decade witnessed increasingly rapid progress in self-driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence. The objective of this paper is to survey the current state-of-the-art on deep learning technologies used in autonomous driving. We start by presenting AI-based self-driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration and motion control algorithms. We investigate both the modular perception-planning-action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources and computational hardware. The comparison presented in this survey helps to gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices
A Survey of Deep Learning Techniques for Mobile Robot Applications Advancements in deep learning over the years have attracted research into how deep artificial neural networks can be used in robotic systems. This research survey will present a summarization of the current research with a specific focus on the gains and obstacles for deep learning to be applied to mobile robotics.
A Survey of Deep Learning-based Object Detection Object detection is one of the most important and challenging branches of computer vision, which has been widely applied in peoples life, such as monitoring security, autonomous driving and so on, with the purpose of locating instances of semantic objects of a certain class. With the rapid development of deep learning networks for detection tasks, the performance of object detectors has been greatly improved. In order to understand the main development status of object detection pipeline, thoroughly and deeply, in this survey, we first analyze the methods of existing typical detection models and describe the benchmark datasets. Afterwards and primarily, we provide a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors. Moreover, we list the traditional and new applications. Some representative branches of object detection are analyzed as well. Finally, we discuss the architecture of exploiting these object detection methods to build an effective and efficient system and point out a set of development trends to better follow the state-of-the-art algorithms and further research.
A survey of dimensionality reduction techniques based on random projection Dimensionality reduction techniques play important roles in the analysis of big data. Traditional dimensionality reduction approaches, such as Principle Component Analysis (PCA) and Linear Discriminant Analysis (LDA), have been studied extensively in the past few decades. However, as the dimension of huge data increases, the computational cost of traditional dimensionality reduction approaches grows dramatically and becomes prohibitive. It has also triggered the development of Random Projection (RP) technique which maps high-dimensional data onto low-dimensional subspace within short time. However, RP generates transformation matrix without considering intrinsic structure of original data and usually leads to relatively high distortion. Therefore, in the past few years, some approaches based on RP have been proposed to address this problem. In this paper, we summarized these approaches in different applications to help practitioners to employ proper approaches in their specific applications. Also, we enumerated their benefits and limitations to provide further references for researchers to develop novel RP-based approaches.
A Survey of Domain Adaptation for Neural Machine Translation Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.
A survey of hidden convex optimization Motivated by the fact that not all nonconvex optimization problems are difficult to solve, we survey in this paper three widely-used ways to reveal the hidden convex structure for different classes of nonconvex optimization problems. Finally, ten open problems are raised.
A Survey of Hierarchy Identification in Social Networks Humans are social by nature. Throughout history, people have formed communities and built relationships. Most relationships with coworkers, friends, and family are developed during face-to-face interactions. These relationships are established through explicit means of communications such as words and implicit such as intonation, body language, etc. By analyzing human interactions we can derive information about the relationships and influence among conversation participants. However, with the development of the Internet, people started to communicate through text in online social networks. Interestingly, they brought their communicational habits to the Internet. Many social network users form relationships with each other and establish communities with leaders and followers. Recognizing these hierarchical relationships is an important task because it will help to understand social networks and predict future trends, improve recommendations, better target advertisement, and improve national security by identifying leaders of anonymous terror groups. In this work, I provide an overview of current research in this area and present the state-of-the-art approaches to deal with the problem of identifying hierarchical relationships in social networks.
A Survey of Inductive Biases for Factorial Representation-Learning With the resurgence of interest in neural networks, representation learning has re-emerged as a central focus in artificial intelligence. Representation learning refers to the discovery of useful encodings of data that make domain-relevant information explicit. Factorial representations identify underlying independent causal factors of variation in data. A factorial representation is compact and faithful, makes the causal factors explicit, and facilitates human interpretation of data. Factorial representations support a variety of applications, including the generation of novel examples, indexing and search, novelty detection, and transfer learning. This article surveys various constraints that encourage a learning algorithm to discover factorial representations. I dichotomize the constraints in terms of unsupervised and supervised inductive bias. Unsupervised inductive biases exploit assumptions about the environment, such as the statistical distribution of factor coefficients, assumptions about the perturbations a factor should be invariant to (e.g. a representation of an object can be invariant to rotation, translation or scaling), and assumptions about how factors are combined to synthesize an observation. Supervised inductive biases are constraints on the representations based on additional information connected to observations. Supervisory labels come in variety of types, which vary in how strongly they constrain the representation, how many factors are labeled, how many observations are labeled, and whether or not we know the associations between the constraints and the factors they are related to. This survey brings together a wide variety of models that all touch on the problem of learning factorial representations and lays out a framework for comparing these models based on the strengths of the underlying supervised and unsupervised inductive biases.
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress Inverse reinforcement learning is the problem of inferring the reward function of an observed agent, given its policy or behavior. Researchers perceive IRL both as a problem and as a class of methods. By categorically surveying the current literature in IRL, this article serves as a reference for researchers and practitioners in machine learning to understand the challenges of IRL and select the approaches best suited for the problem on hand. The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size. The article elaborates how the current methods mitigate these challenges. We further discuss the extensions of traditional IRL methods: (i) inaccurate and incomplete perception, (ii) incomplete model, (iii) multiple rewards, and (iv) non-linear reward functions. This discussion concludes with some broad advances in the research area and currently open research questions.
A Survey of Knowledge Representation and Retrieval for Learning in Service Robotics Within the realm of service robotics, researchers have placed a great amount of effort into learning motions and manipulations for task execution by robots. The task of robot learning is very broad, as it involves many tasks such as object detection, action recognition, motion planning, localization, knowledge representation and retrieval, and the intertwining of computer vision and machine learning techniques. In this paper, we focus on how knowledge can be gathered, represented, and reproduced to solve problems as done by researchers in the past decades. We discuss the problems which have existed in robot learning and the solutions, technologies or developments (if any) which have contributed to solving them. Specifically, we look at three broad categories involved in task representation and retrieval for robotics: 1) activity recognition from demonstrations, 2) scene understanding and interpretation, and 3) task representation in robotics – datasets and networks. Within each section, we discuss major breakthroughs and how their methods address present issues in robot learning and manipulation.
A Survey of Location Prediction on Twitter Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people’s daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.
A Survey of Machine and Deep Learning Methods for Internet of Things (IoT) Security The Internet of Things (IoT) integrates billions of smart devices that can communicate with one another with minimal human intervention. It is one of the fastest developing fields in the history of computing, with an estimated 50 billion devices by the end of 2020. On the one hand, IoT play a crucial role in enhancing several real-life smart applications that can improve life quality. On the other hand, the crosscutting nature of IoT systems and the multidisciplinary components involved in the deployment of such systems introduced new security challenges. Implementing security measures, such as encryption, authentication, access control, network security and application security, for IoT devices and their inherent vulnerabilities is ineffective. Therefore, existing security methods should be enhanced to secure the IoT system effectively. Machine learning and deep learning (ML/DL) have advanced considerably over the last few years, and machine intelligence has transitioned from laboratory curiosity to practical machinery in several important applications. Consequently, ML/DL methods are important in transforming the security of IoT systems from merely facilitating secure communication between devices to security-based intelligence systems. The goal of this work is to provide a comprehensive survey of ML /DL methods that can be used to develop enhanced security methods for IoT systems. IoT security threats that are related to inherent or newly introduced threats are presented, and various potential IoT system attack surfaces and the possible threats related to each surface are discussed. We then thoroughly review ML/DL methods for IoT security and present the opportunities, advantages and shortcomings of each method. We discuss the opportunities and challenges involved in applying ML/DL to IoT security. These opportunities and challenges can serve as potential future research directions.
A Survey of Machine Learning for Big Code and Naturalness Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit code’s abundance of patterns. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.
A Survey of Methods for Collective Communication Optimization and Tuning New developments in HPC technology in terms of increasing computing power on multi/many core processors, high-bandwidth memory/IO subsystems and communication interconnects, pose a direct impact on software and runtime system development. These advancements have become useful in producing high-performance collective communication interfaces that integrate efficiently on a wide variety of platforms and environments. However, number of optimization options that shows up with each new technology or software framework has resulted in a \emph{combinatorial explosion} in feature space for tuning collective parameters such that finding the optimal set has become a nearly impossible task. Applicability of algorithmic choices available for optimizing collective communication depends largely on the scalability requirement for a particular usecase. This problem can be further exasperated by any requirement to run collective problems at very large scales such as in the case of exascale computing, at which impractical tuning by brute force may require many months of resources. Therefore application of statistical, data mining and artificial Intelligence or more general hybrid learning models seems essential in many collectives parameter optimization problems. We hope to explore current and the cutting edge of collective communication optimization and tuning methods and culminate with possible future directions towards this problem.
A Survey of Mixed Data Clustering Algorithms Most of the datasets normally contain either numeric or categorical features. Mixed data comprises of both numeric and categorical features, and they frequently occur in various domains, such as health, finance, marketing, etc. Clustering is often sought on mixed data to find structures and to group similar objects. However, clustering mixed data is challenging because it is difficult to directly apply mathematical operations, such as summation, average etc. on the feature values of these datasets. In this paper, we review various types of mixed data clustering techniques in detail. We present a taxonomy to identify ten types of different mixed data clustering techniques. We also compare the performance of several mixed data clustering methods on publicly available datasets. The paper further identifies challenges in developing different mixed data clustering algorithms and provides guidelines for future directions in this area.
A Survey of Model Compression and Acceleration for Deep Neural Networks Deep convolutional neural networks (CNNs) have recently achieved dramatic accuracy improvements in many visual recognition tasks. However, existing deep convolutional neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep CNNs without significantly decreasing the classification accuracy. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transfered/compact convolutional filters and knowledge distillation. Methods of parameter pruning and sharing will be described in detail at the beginning, and all the others will introduced. For methods of each scheme, we provide insightful analysis regarding the performance, related applications, advantages and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic networks and stochastic depths networks. After that, we survey the evaluation matrix, main datasets used for the evaluating the model performance and recent benchmarking efforts. Finally we conclude this paper, discuss remaining challenges and possible directions in this topic.
A Survey of Modern Object Detection Literature using Deep Learning Object detection is the identification of an object in the image along with its localisation and classification. It has wide spread applications and is a critical component for vision based software systems. This paper seeks to perform a rigorous survey of modern object detection algorithms that use deep learning. As part of the survey, the topics explored include various algorithms, quality metrics, speed/size trade offs and training methodologies. This paper focuses on the two types of object detection algorithms- the SSD class of single step detectors and the Faster R-CNN class of two step detectors. Techniques to construct detectors that are portable and fast on low powered devices are also addressed by exploring new lightweight convolutional base architectures. Ultimately, a rigorous review of the strengths and weaknesses of each detector leads us to the present state of the art.
A Survey of Monte Carlo Tree Search Methods Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm´s derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and non-game domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
A Survey of Natural Language Generation Techniques with a Focus on Dialogue Systems – Past, Present and Future Directions One of the hardest problems in the area of Natural Language Processing and Artificial Intelligence is automatically generating language that is coherent and understandable to humans. Teaching machines how to converse as humans do falls under the broad umbrella of Natural Language Generation. Recent years have seen unprecedented growth in the number of research articles published on this subject in conferences and journals both by academic and industry researchers. There have also been several workshops organized alongside top-tier NLP conferences dedicated specifically to this problem. All this activity makes it hard to clearly define the state of the field and reason about its future directions. In this work, we provide an overview of this important and thriving area, covering traditional approaches, statistical approaches and also approaches that use deep neural networks. We provide a comprehensive review towards building open domain dialogue systems, an important application of natural language generation. We find that, predominantly, the approaches for building dialogue systems use seq2seq or language models architecture. Notably, we identify three important areas of further research towards building more effective dialogue systems: 1) incorporating larger context, including conversation context and world knowledge; 2) adding personae or personality in the NLG system; and 3) overcoming dull and generic responses that affect the quality of system-produced responses. We provide pointers on how to tackle these open problems through the use of cognitive architectures that mimic human language understanding and generation capabilities.
A Survey of Neural Network Techniques for Feature Extraction from Text This paper aims to catalyze the discussions about text feature extraction techniques using neural network architectures. The research questions discussed in the paper focus on the state-of-the-art neural network techniques that have proven to be useful tools for language processing, language generation, text classification and other computational linguistics tasks.
A Survey of Neuromorphic Computing and Neural Networks in Hardware Neuromorphic computing has come to refer to a variety of brain-inspired computers, devices, and models that contrast the pervasive von Neumann computer architecture. This biologically inspired approach has created highly connected synthetic neurons and synapses that can be used to model neuroscience theories as well as solve challenging machine learning problems. The promise of the technology is to create a brain-like ability to learn and adapt, but the technical challenges are significant, starting with an accurate neuroscience model of how the brain works, to finding materials and engineering breakthroughs to build devices to support these models, to creating a programming framework so the systems can learn, to creating applications with brain-like capabilities. In this work, we provide a comprehensive survey of the research and motivations for neuromorphic computing over its history. We begin with a 35-year review of the motivations and drivers of neuromorphic computing, then look at the major research areas of the field, which we define as neuro-inspired models, algorithms and learning approaches, hardware and devices, supporting systems, and finally applications. We conclude with a broad discussion on the major research topics that need to be addressed in the coming years to see the promise of neuromorphic computing fulfilled. The goals of this work are to provide an exhaustive review of the research conducted in neuromorphic computing since the inception of the term, and to motivate further work by illuminating gaps in the field where new research is needed.
A Survey of Online Failure Prediction Methods With ever-growing complexity and dynamicity of computer systems, proactive fault management is an effective approach to enhancing availability. Online failure prediction is the key to such techniques. In contrast to classical reliability methods, online failure prediction is based on runtime monitoring and a variety of models and methods that use the current state of a system and, frequently, the past experience as well. This survey describes these methods. To capture the wide spectrum of approaches concerning this area, a taxonomy has been developed, whose different approaches are explained and major concepts are described in detail.
A Survey of Optimization Methods from a Machine Learning Perspective Machine learning develops rapidly, which has made many theoretical breakthroughs and is widely applied in various fields. Optimization, as an important part of machine learning, has attracted much attention of researchers. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research. In this paper, we first describe the optimization problems in machine learning. Then, we introduce the principles and progresses of commonly used optimization methods. Next, we summarize the applications and developments of optimization methods in some popular machine learning fields. Finally, we explore and give some challenges and open problems for the optimization in machine learning.
A Survey of Parallel Sequential Pattern Mining With the growing popularity of resource sharing and shared resources, large volumes of complex data of different types are collected automatically. Traditional data mining algorithms generally have problems and challenges including huge memory cost, low processing speed, and inadequate hard disk space. For sequential pattern mining (SPM), it is used in a wide variety of real-life applications. However, it is more complex and challenging than frequent itemset mining, and also suffers from the above challenges when handling the large-scale data. To solve these problems, mining sequential patterns in a parallel computing environment has emerged as an important issue with many applications. In this paper, an in-depth survey of the current status of parallel sequential pattern mining (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state of the art parallel SPM. We review the related work of PSPM in detail, including partition-based algorithms for PSPM, Apriori-based PSPM, pattern growth based PSPM, and hybrid algorithms for PSPM, and provide deep description (i.e., characteristics, advantages, and disadvantages) of each parallel approach of PSPM. Some advanced topics for PSPM and the related open-source software are further reviewed in details. Finally, we summarize some challenges and opportunities of PSPM in the big data era.
A Survey of Point-of-interest Recommendation in Location-based Social Networks Point-of-interest (POI) recommendation that suggests new places for users to visit arises with the popularity of location-based social networks (LBSNs). Due to the importance of POI recommendation in LBSNs, it has attracted much academic and industrial interest. In this paper, we offer a systematic review of this field, summarizing the contributions of individual efforts and exploring their relations. We discuss the new properties and challenges in POI recommendation, compared with traditional recommendation problems, e.g., movie recommendation. Then, we present a comprehensive review in three aspects: influential factors for POI recommendation, methodologies employed for POI recommendation, and different tasks in POI recommendation. Specifically, we propose three taxonomies to classify POI recommendation systems. First, we categorize the systems by the influential factors check-in characteristics, including the geographical information, social relationship, temporal influence, and content indications. Second, we categorize the systems by the methodology, including systems modeled by fused methods and joint methods. Third, we categorize the systems as general POI recommendation and successive POI recommendation by subtle differences in the recommendation task whether to be bias to the recent check-in. For each category, we summarize the contributions and system features, and highlight the representative work. Moreover, we discuss the available data sets and the popular metrics. Finally, we point out the possible future directions in this area and conclude this survey.
A Survey of Shortest-Path Algorithms A shortest-path algorithm finds a path containing the minimal cost between two vertices in a graph. A plethora of shortest-path algorithms is studied in the literature that span across multiple disciplines. This paper presents a survey of shortest-path algorithms based on a taxonomy that is introduced in the paper. One dimension of this taxonomy is the various flavors of the shortest-path problem. There is no one general algorithm that is capable of solving all variants of the shortest-path problem due to the space and time complexities associated with each algorithm. Other important dimensions of the taxonomy include whether the shortest-path algorithm operates over a static or a dynamic graph, whether the shortest-path algorithm produces exact or approximate answers, and whether the objective of the shortest-path algorithm is to achieve time-dependence or is to only be goal directed. This survey studies and classifies shortest-path algorithms according to the proposed taxonomy. The survey also presents the challenges and proposed solutions associated with each category in the taxonomy.
A Survey of Tensor Methods Matrix decompositions have always been at the heart of signal, circuit and system theory. In particular, the Singular Value Decomposition (SVD) has been an important tool. There is currently a shift of paradigm in the algebraic foundations of these fields. Quite recently, Nonnegative Matrix Factorization (NMF) has been shown to outperform SVD at a number of tasks. Increasing research efforts are spent on the study and application of decompositions of higher-order tensors or multi-way arrays. This paper is a partial survey on tensor generalizations of the SVD and their applications. We also touch on Nonnegative Tensor Factorizations.
A Survey of the Recent Architectures of Deep Convolutional Neural Networks Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.
A Survey of the Usages of Deep Learning in Natural Language Processing Over the last several years, the field of natural language processing has been propelled forward by an explosion in the use of deep learning models. This survey provides a brief introduction to the field and a quick overview of deep learning architectures and methods. It then sifts through the plethora of recent studies and summarizes a large assortment of relevant contributions. Analyzed research areas include several core linguistic processing issues in addition to a number of applications of computational linguistics. A discussion of the current state of the art is then provided along with recommendations for future research in the field.
A survey of transfer learning Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments.
A Survey of Tuning Parameter Selection for High-dimensional Regression Penalized (or regularized) regression, as represented by Lasso and its variants, has become a standard technique for analyzing high-dimensional data when the number of variables substantially exceeds the sample size. The performance of penalized regression relies crucially on the choice of the tuning parameter, which determines the amount of regularization and hence the sparsity level of the fitted model. The optimal choice of tuning parameter depends on both the structure of the design matrix and the unknown random error distribution (variance, tail behavior, etc). This article reviews the current literature of tuning parameter selection for high-dimensional regression from both theoretical and practical perspectives. We discuss various strategies that choose the tuning parameter to achieve prediction accuracy or support recovery. We also review several recently proposed methods for tuning-free high-dimensional regression.
A Survey of Utility-Oriented Pattern Mining The main purpose of data mining and analytics is to find novel, potentially useful patterns that can be utilized in real-world applications to derive beneficial knowledge. For identifying and evaluating the usefulness of different kinds of patterns, many techniques/constraints have been proposed, such as support, confidence, sequence order, and utility parameters (e.g., weight, price, profit, quantity, etc.). In recent years, there has been an increasing demand for utility-oriented pattern mining (UPM). UPM is a vital task, with numerous high-impact applications, including cross-marketing, e-commerce, finance, medical, and biomedical applications. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods of UPM. First, we introduce an in-depth understanding of UPM, including concepts, examples, and comparisons with related concepts. A taxonomy of the most common and state-of-the-art approaches for mining different kinds of high-utility patterns is presented, including Apriori-based, tree-based, projection-based, vertical-/horizontal-data-format-based, and other hybrid approaches. A comprehensive review of advanced topics of existing high-utility pattern mining techniques is offered, with a discussion of their pros and cons. Finally, we present several well-known open-source software packages for UPM. We conclude our survey with a discussion on open and practical challenges in this field.
A Survey of Visual Analysis of Human Motion and Its Applications This paper summarizes the recent progress in human motion analysis and its applications. In the beginning, we reviewed the motion capture systems and the representation model of human’s motion data. Next, we sketched the advanced human motion data processing technologies, including motion data filtering, temporal alignment, and segmentation. The following parts overview the state-of-the-art approaches of action recognition and dynamics measuring since these two are the most active research areas in human motion analysis. The last part discusses some emerging applications of the human motion analysis in healthcare, human robot interaction, security surveillance, virtual reality and animation. The promising research topics of human motion analysis in the future is also summarized in the last part.
A Survey on Acceleration of Deep Convolutional Neural Networks Deep Neural Networks have achieved remarkable progress during the past few years and are currently the fundamental tools of many intelligent systems. At the same time, the computational complexity and resource consumption of these networks are also continuously increasing. This will pose a significant challenge to the deployment of such networks, especially for real-time applications or on resource-limited devices. Thus, network acceleration have become a hot topic within the deep learning community. As for hardware implementation of deep neural networks, a batch of accelerators based on FPGA/ASIC have been proposed these years. In this paper, we provide a comprehensive survey about the recent advances on network acceleration, compression and accelerator design from both algorithm and hardware side. Specifically, we provide thorough analysis for each of the following topics: network pruning, low-rank approximation, network quantization, teacher-student networks, compact network design and hardware accelerator. Finally, we make a discussion and introduce a few possible future directions.
A Survey on Active Learning and Human-in-the-Loop Deep Learning for Medical Image Analysis Fully automatic deep learning has become the state-of-the-art technique for many tasks including image acquisition, analysis and interpretation, and for the extraction of clinically useful information for computer-aided detection, diagnosis, treatment planning, intervention and therapy. However, the unique challenges posed by medical image analysis suggest that retaining a human end-user in any deep learning enabled system will be beneficial. In this review we investigate the role that humans might play in the development and deployment of deep learning enabled diagnostic applications and focus on techniques that will retain a significant input from a human end user. Human-in-the-Loop computing is an area that we see as increasingly important in future research due to the safety-critical nature of working in the medical domain. We evaluate four key areas that we consider vital for deep learning in the clinical practice: (1) Active Learning – to choose the best data to annotate for optimal model performance; (2) Interpretation and Refinement – using iterative feedback to steer models to optima for a given prediction and offering meaningful ways to interpret and respond to predictions; (3) Practical considerations – developing full scale applications and the key considerations that need to be made before deployment; (4) Related Areas – research fields that will benefit human-in-the-loop computing as they evolve. We offer our opinions on the most promising directions of research and how various aspects of each area might be unified towards common goals.
A survey on Adversarial Attacks and Defenses in Text Deep neural networks (DNNs) have shown an inherent vulnerability to adversarial examples which are maliciously crafted on real examples by attackers, aiming at making target DNNs misbehave. The threats of adversarial examples are widely existed in image, voice, speech, and text recognition and classification. Inspired by the previous work, researches on adversarial attacks and defenses in text domain develop rapidly. To the best of our knowledge, this article presents a comprehensive review on adversarial examples in text. We analyze the advantages and shortcomings of recent adversarial examples generation methods and elaborate the efficiency and limitations on countermeasures. Finally, we discuss the challenges in adversarial texts and provide a research direction of this aspect.
A Survey on Adversarial Information Retrieval on the Web This survey paper discusses different forms of malicious techniques that can affect how an information retrieval model retrieves documents for a query and their remedies.
A Survey on Artificial Intelligence and Data Mining for MOOCs Massive Open Online Courses (MOOCs) have gained tremendous popularity in the last few years. Thanks to MOOCs, millions of learners from all over the world have taken thousands of high-quality courses for free. Putting together an excellent MOOC ecosystem is a multidisciplinary endeavour that requires contributions from many different fields. Artificial intelligence (AI) and data mining (DM) are two such fields that have played a significant role in making MOOCs what they are today. By exploiting the vast amount of data generated by learners engaging in MOOCs, DM improves our understanding of the MOOC ecosystem and enables MOOC practitioners to deliver better courses. Similarly, AI, supported by DM, can greatly improve student experience and learning outcomes. In this survey paper, we first review the state-of-the-art artificial intelligence and data mining research applied to MOOCs, emphasising the use of AI and DM tools and techniques to improve student engagement, learning outcomes, and our understanding of the MOOC ecosystem. We then offer an overview of key trends and important research to carry out in the fields of AI and DM so that MOOCs can reach their full potential.
A Survey on Bias and Fairness in Machine Learning With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues into consideration while designing and engineering these types of systems. Such systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that the decisions do not reflect discriminatory behavior toward certain groups or populations. We have recently seen work in machine learning, natural language processing, and deep learning that addresses such challenges in different subdomains. With the commercialization of these systems, researchers are becoming aware of the biases that these applications can contain and have attempted to address them. In this survey we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined in order to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and how they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.
A survey on Big Data and Machine Learning for Chemistry Herein we review aspects of leading-edge research and innovation in chemistry which exploits big data and machine learning (ML), two computer science fields that combine to yield machine intelligence. ML can accelerate the solution of intricate chemical problems and even solve problems that otherwise would not be tractable. But the potential benefits of ML come at the cost of big data production; that is, the algorithms, in order to learn, demand large volumes of data of various natures and from different sources, from materials properties to sensor data. In the survey, we propose a roadmap for future developments, with emphasis on materials discovery and chemical sensing, and within the context of the Internet of Things (IoT), both prominent research fields for ML in the context of big data. In addition to providing an overview of recent advances, we elaborate upon the conceptual and practical limitations of big data and ML applied to chemistry, outlining processes, discussing pitfalls, and reviewing cases of success and failure.
A Survey on Compressive Sensing: Classical Results and Recent Advancements Recovering sparse signals from linear measurements has demonstrated outstanding utility in a vast variety of real-world applications. Compressive sensing is the topic that studies the associated raised questions for the possibility of a successful recovery. This topic is well-nourished and numerous results are available in the literature. However, their dispersity makes it challenging and time-consuming for new readers and practitioners to quickly grasp its main ideas and classical algorithms, and further touch upon the recent advancements in this surging field. Besides, the sparsity notion has already demonstrated its effectiveness in many contemporary fields. Thus, these results are useful and inspiring for further investigation of related questions in these emerging fields from new perspectives. In this survey, we gather and overview vital classical tools and algorithms in compressive sensing and describe significant recent advancements. We conclude this survey by a numerical comparison of the performance of described approaches on an interesting application.
A Survey on Contextual Multi-armed Bandits The natural of contextual bandits makes it suitable for many machine learning applications such as user modeling, Internet advertising, search engine, experiments optimization etc. In this survey we cover three different types of contextual bandits algorithms, and for each type we introduce several representative algorithms. We also compare the regrets and assumptions between these algorithms.
A Survey on Data Collection for Machine Learning: a Big Data – AI Integration Perspective Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning where feature engineering is the bottleneck, deep learning techniques automatically generate features, but instead require large amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data. In this survey, we perform a comprehensive study of data collection from a data management point of view. Data collection largely consists of data acquisition, data labeling, and improvement of existing data or models. We provide a research landscape of these operations, provide guidelines on which technique to use when, and identify interesting research challenges. The integration of machine learning and data management for data collection is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.
A Survey on Deep Learning for Named Entity Recognition Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.
A Survey on Deep Learning Methods for Robot Vision Deep learning has allowed a paradigm shift in pattern recognition, from using hand-crafted features together with statistical classifiers to using general-purpose learning procedures for learning data-driven representations, features, and classifiers together. The application of this new paradigm has been particularly successful in computer vision, in which the development of deep learning methods for vision applications has become a hot research topic. Given that deep learning has already attracted the attention of the robot vision community, the main purpose of this survey is to address the use of deep learning in robot vision. To achieve this, a comprehensive overview of deep learning and its usage in computer vision is given, that includes a description of the most frequently used neural models and their main application areas. Then, the standard methodology and tools used for designing deep-learning based vision systems are presented. Afterwards, a review of the principal work using deep learning in robot vision is presented, as well as current and future trends related to the use of deep learning in robotics. This survey is intended to be a guide for the developers of robot vision systems.
A Survey on Deep Learning Toolkits and Libraries for Intelligent User Interfaces This paper provides an overview of prominent deep learning toolkits and, in particular, reports on recent publications that contributed open source software for implementing tasks that are common in intelligent user interfaces (IUI). We provide a scientific reference for researchers and software engineers who plan to utilise deep learning techniques within their IUI research and development projects.
A Survey on Deep Transfer Learning As a new classification platform, deep learning has recently received increasing attention from researchers and has been successfully applied to many domains. In some domains, like bioinformatics and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation, which limits its development. Transfer learning relaxes the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. This survey focuses on reviewing the current researches of transfer learning by using deep neural network and its applications. We defined deep transfer learning, category and review the recent research works based on the techniques used in deep transfer learning.
A Survey on Dialogue Systems: Recent Advances and New Frontiers Dialogue systems have attracted more and more attention. Recent advances on dialogue systems are overwhelmingly contributed by deep learning techniques, which have been employed to enhance a wide range of big data applications such as computer vision, natural language processing, and recommender systems. For dialogue systems, deep learning can leverage a massive amount of data to learn meaningful feature representations and response generation strategies, while requiring a minimum amount of hand-crafting. In this article, we give an overview to these recent advances on dialogue systems from various perspectives and discuss some possible research directions. In particular, we generally di- vide existing dialogue systems into task-oriented and non- task-oriented models, then detail how deep learning techniques help them with representative algorithms and finally discuss some appealing research directions that can bring the dialogue system research into a new frontier.
A Survey on Domain-Specific Languages for Machine Learning in Big Data The amount of data generated in the modern society is increasing rapidly. New problems and novel approaches of data capture, storage, analysis and visualization are responsible for the emergence of the Big Data research field. Machine Learning algorithms can be used in Big Data to make better and more accurate inferences. However, because of the challenges Big Data imposes, these algorithms need to be adapted and optimized to specific applications. One important decision made by software engineers is the choice of the language that is used in the implementation of these algorithms. Therefore, this literature survey identifies and describes domain-specific languages and frameworks used for Machine Learning in Big Data. By doing this, software engineers can then make more informed choices and beginners have an overview of the main languages used in this domain.
A Survey on Expert Recommendation in Community Question Answering Community question answering (CQA) represents the type of Web applications where people can exchange knowledge via asking and answering questions. One significant challenge of most real-world CQA systems is the lack of effective matching between questions and the potential good answerers, which adversely affects the efficient knowledge acquisition and circulation. On the one hand, a requester might experience many low-quality answers without receiving a quality response in a brief time, on the other hand, an answerer might face numerous new questions without being able to identify their questions of interest quickly. Under this situation, expert recommendation emerges as a promising technique to address the above issues. Instead of passively waiting for users to browse and find their questions of interest, an expert recommendation method raises the attention of users to the appropriate questions actively and promptly. The past few years have witnessed considerable efforts that address the expert recommendation problem from different perspectives. These methods all have their issues that need to be resolved before the advantages of expert recommendation can be fully embraced. In this survey, we first present an overview of the research efforts and state-of-the-art techniques for the expert recommendation in CQA. We next summarize and compare the existing methods concerning their advantages and shortcomings, followed by discussing the open issues and future research directions.
A Survey on Geographically Distributed Big-Data Processing using MapReduce Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many industries, e.g., Google, Facebook, and Amazon, for solving a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and social network analysis. However, all these popular systems have a major drawback in terms of locally distributed computations, which prevent them in implementing geographically distributed data processing. The increasing amount of geographically distributed massive data is pushing industries and academia to rethink the current big-data processing systems. The novel frameworks, which will be beyond state-of-the-art architectures and technologies involved in the current system, are expected to process geographically distributed data at their locations without moving entire raw datasets to a single location. In this paper, we investigate and discuss challenges and requirements in designing geographically distributed data processing frameworks and protocols. We classify and study batch processing (MapReduce-based systems), stream processing (Spark-based systems), and SQL-style processing geo-distributed frameworks, models, and algorithms with their overhead issues.
A Survey on Graph Kernels Graph kernels have become an established and widely-used technique for solving classification tasks on graphs. This survey gives a comprehensive overview of techniques for kernel-based graph classification developed in the past 15 years. We describe and categorize graph kernels based on properties inherent to their design, such as the nature of their extracted graph features, their method of computation and their applicability to problems in practice. In an extensive experimental evaluation, we study the classification accuracy of a large suite of graph kernels on established benchmarks as well as new datasets. We compare the performance of popular kernels with several baseline methods and study the effect of applying a Gaussian RBF kernel to the metric induced by a graph kernel. In doing so, we find that simple baselines become competitive after this transformation on some datasets. Moreover, we study the extent to which existing graph kernels agree in their predictions (and prediction errors) and obtain a data-driven categorization of kernels as result. Finally, based on our experimental results, we derive a practitioner’s guide to kernel-based graph classification.
A Survey on Influence Maximization in a Social Network Given a social network with diffusion probabilities as edge weights and an integer k, which k nodes should be chosen for initial injection of information to maximize influence in the network? This problem is known as Target Set Selection in a social network (TSS Problem) and more popularly, Social Influence Maximization Problem (SIM Problem). This is an active area of research in computational social network analysis domain since one and half decades or so. Due to its practical importance in various domains, such as viral marketing, target advertisement, personalized recommendation, the problem has been studied in different variants, and different solution methodologies have been proposed over the years. Hence, there is a need for an organized and comprehensive review on this topic. This paper presents a survey on the progress in and around TSS Problem. At last, it discusses current research trends and future research directions as well.
A Survey on Load Balancing Algorithms for VM Placement in Cloud Computing The emergence of cloud computing based on virtualization technologies brings huge opportunities to host virtual resource at low cost without the need of owning any infrastructure. Virtualization technologies enable users to acquire, configure and be charged on pay-per-use basis. However, Cloud data centers mostly comprise heterogeneous commodity servers hosting multiple virtual machines (VMs) with potential various specifications and fluctuating resource usages, which may cause imbalanced resource utilization within servers that may lead to performance degradation and service level agreements (SLAs) violations. To achieve efficient scheduling, these challenges should be addressed and solved by using load balancing strategies, which have been proved to be NP-hard problem. From multiple perspectives, this work identifies the challenges and analyzes existing algorithms for allocating VMs to PMs in infrastructure Clouds, especially focuses on load balancing. A detailed classification targeting load balancing algorithms for VM placement in cloud data centers is investigated and the surveyed algorithms are classified according to the classification. The goal of this paper is to provide a comprehensive and comparative understanding of existing literature and aid researchers by providing an insight for potential future enhancements.
A Survey on Monochromatic Connections of Graphs The concept of monochromatic connection of graphs was introduced by Caro and Yuster in 2011. Recently, a lot of results have been published about it. In this survey, we attempt to bring together all the results that dealt with it. We begin with an introduction, and then classify the results into the following categories: monochromatic connection coloring of edge-version, monochromatic connection coloring of vertex-version, monochromatic index, monochromatic connection coloring of total-version.
A Survey on Multi-output Learning Multi-output learning aims to simultaneously predict multiple outputs given an input. It is an important learning problem due to the pressing need for sophisticated decision making in real-world applications. Inspired by big data, the 4Vs characteristics of multi-output imposes a set of challenges to multi-output learning, in terms of the volume, velocity, variety and veracity of the outputs. Increasing number of works in the literature have been devoted to the study of multi-output learning and the development of novel approaches for addressing the challenges encountered. However, it lacks a comprehensive overview on different types of challenges of multi-output learning brought by the characteristics of the multiple outputs and the techniques proposed to overcome the challenges. This paper thus attempts to fill in this gap to provide a comprehensive review on this area. We first introduce different stages of the life cycle of the output labels. Then we present the paradigm on multi-output learning, including its myriads of output structures, definitions of its different sub-problems, model evaluation metrics and popular data repositories used in the study. Subsequently, we review a number of state-of-the-art multi-output learning methods, which are categorized based on the challenges.
A Survey on Multi-View Clustering With the fast development of information technology, especially the popularization of internet, multi-view learning becomes more and more popular in machine learning and data mining fields. As we all know that, multi-view semi-supervised learning, such as co-training, co-regularization has gained considerable attentions. Although recently, multi-view clustering (MVC) has developed rapidly, there are not a survey or review to summarize and analyze the current progress. Therefore, this paper sums up the common strategies of combining multiple views and based on that we proposed a novel taxonomy of the MVC approaches. We also discussed the relationships between MVC and multi-view representation, ensemble clustering, multi-task clustering, multi-view supervised and multi-view semi-supervised learning. Several representative real-world applications are elaborated. To promote the further development of MVC, we pointed out several open problems that are worth exploring in the future.
A Survey on Natural Language Processing for Fake News Detection Fake news detection is a critical yet challenging problem in Natural Language Processing (NLP). The rapid rise of social networking platforms has not only yielded a vast increase in information accessibility but has also accelerated the spread of fake news. Given the massive amount of Web content, automatic fake news detection is a practical NLP problem required by all online content providers. This paper presents a survey on fake news detection. Our survey introduces the challenges of automatic fake news detection. We systematically review the datasets and NLP solutions that have been developed for this task. We also discuss the limits of these datasets and problem formulations, our insights, and recommended solutions.
A Survey on Neural Architecture Search The growing interest in both the automation of machine learning and deep learning has inevitably led to the development of automated methods for neural architecture optimization. The choice of the network architecture has proven to be critical, and many advances in deep learning spring from its immediate improvements. However, deep learning techniques are computationally intensive and their application requires a high level of domain knowledge. Therefore, even partial automation of this process would help make deep learning more accessible to both researchers and practitioners. With this survey, we provide a formalism which unifies and categorizes the landscape of existing methods along with a detailed analysis that compares and contrasts the different approaches. We achieve this via a discussion of common architecture search spaces and architecture optimization algorithms based on principles of reinforcement learning and evolutionary algorithms along with approaches that incorporate surrogate and one-shot models. Additionally, we address the new research directions which include constrained and multi-objective architecture search as well as automated data augmentation, optimizer and activation function search.
A Survey on Neural Network Language Models As the core component of Natural Language Processing (NLP) system, Language Model (LM) can provide word representation and probability indication of word sequences. Neural Network Language Models (NNLMs) overcome the curse of dimensionality and improve the performance of traditional LMs. A survey on NNLMs is performed in this paper. The structure of classic NNLMs is described firstly, and then some major improvements are introduced and analyzed. We summarize and compare corpora and toolkits of NNLMs. Further, some research directions of NNLMs are discussed.
A Survey on Recent Advances in Named Entity Recognition from Deep Learning models Named Entity Recognition (NER) is a key component in NLP systems for question answering, information retrieval, relation extraction, etc. NER systems have been studied and developed widely for decades, but accurate systems using deep neural networks (NN) have only been introduced in the last few years. We present a comprehensive survey of deep neural network architectures for NER, and contrast them with previous approaches to NER based on feature engineering and other supervised or semi-supervised learning algorithms. Our results highlight the improvements achieved by neural networks, and show how incorporating some of the lessons learned from past work on feature-based NER systems can yield further improvements.
A Survey on Resilient Machine Learning Machine learning based system are increasingly being used for sensitive tasks such as security surveillance, guiding autonomous vehicle, taking investment decisions, detecting and blocking network intrusion and malware etc. However, recent research has shown that machine learning models are venerable to attacks by adversaries at all phases of machine learning (eg, training data collection, training, operation). All model classes of machine learning systems can be misled by providing carefully crafted inputs making them wrongly classify inputs. Maliciously created input samples can affect the learning process of a ML system by either slowing down the learning process, or affecting the performance of the learned mode, or causing the system make error(s) only in attacker’s planned scenario. Because of these developments, understanding security of machine learning algorithms and systems is emerging as an important research area among computer security and machine learning researchers and practitioners. We present a survey of this emerging area in machine learning.
A Survey on Semantic Parsing A significant amount of information in today’s world is stored in structured and semi-structured knowledge bases. Efficient and simple methods to query these databases are essential and must not be restricted to only those who have expertise in formal query languages. The field of semantic parsing deals with converting natural language utterances to logical forms that can be easily executed on a knowledge base. In this survey, we examine the various components of a semantic parsing system and discuss prominent work ranging from the initial rule based methods to the current neural approaches to program synthesis. We also discuss methods that operate using varying levels of supervision and highlight the key challenges involved in the learning of such systems.
A Survey on Sentiment and Emotion Analysis for Computational Literary Studies Emotions have often been a crucial part of compelling narratives: literature tells about people with goals, desires, passions, and intentions. In the past, classical literary studies usually scrutinized the affective dimension of literature within the framework of hermeneutics. However, with emergence of the research field known as Digital Humanities (DH) some studies of emotions in literary context have taken a computational turn. Given the fact that DH is still being formed as a science, this direction of research can be rendered relatively new. At the same time, the research in sentiment analysis started in computational linguistic almost two decades ago and is nowadays an established field that has dedicated workshops and tracks in the main computational linguistics conferences. This leads us to the question of what are the commonalities and discrepancies between sentiment analysis research in computational linguistics and digital humanities? In this survey, we offer an overview of the existing body of research on sentiment and emotion analysis as applied to literature. We precede the main part of the survey with a short introduction to natural language processing and machine learning, psychological models of emotions, and provide an overview of existing approaches to sentiment and emotion analysis in computational linguistics. The papers presented in this survey are either coming directly from DH or computational linguistics venues and are limited to sentiment and emotion analysis as applied to literary text.
A Survey on Session-based Recommender Systems Session-based recommender systems (SBRS) are an emerging topic in the recommendation domain and have attracted much attention from both academia and industry in recent years. Most of existing works only work on modelling the general item-level dependency for recommendation tasks. However, there are many more other challenges at different levels, e.g., item feature level and session level, and from various perspectives, e.g., item heterogeneity and intra- and inter-item feature coupling relations, associated with SBRS. In this paper, we provide a systematic and comprehensive review on SBRS and create a hierarchical and in-depth understanding of a variety of challenges in SBRS. To be specific, we first illustrate the value and significance of SBRS, followed by a hierarchical framework to categorize the related research issues and methods of SBRS and to reveal its intrinsic challenges and complexities. Further, a summary together with a detailed introduction of the research progress is provided. Lastly, we share some prospects in this research area.
A Survey on Social Media Anomaly Detection Social media anomaly detection is of critical importance to prevent malicious activities such as bullying, terrorist attack planning, and fraud information dissemination. With the recent popularity of social media, new types of anomalous behaviors arise, causing concerns from various parties. While a large amount of work have been dedicated to traditional anomaly detection problems, we observe a surge of research interests in the new realm of social media anomaly detection. In this paper, we present a survey on existing approaches to address this problem. We focus on the new type of anomalous phenomena in the social media and review the recent developed techniques to detect those special types of anomalies. We provide a general overview of the problem domain, common formulations, existing methodologies and potential directions. With this work, we hope to call out the attention from the research community on this challenging problem and open up new directions that we can contribute in the future.
A survey on trajectory clustering analysis This paper comprehensively surveys the development of trajectory clustering. Considering the critical role of trajectory data mining in modern intelligent systems for surveillance security, abnormal behavior detection, crowd behavior analysis, and traffic control, trajectory clustering has attracted growing attention. Existing trajectory clustering methods can be grouped into three categories: unsupervised, supervised and semi-supervised algorithms. In spite of achieving a certain level of development, trajectory clustering is limited in its success by complex conditions such as application scenarios and data dimensions. This paper provides a holistic understanding and deep insight into trajectory clustering, and presents a comprehensive analysis of representative methods and promising future directions.
A Survey on Trust Modeling from a Bayesian Perspective This paper is concerned with trust modeling for networked computing systems. Of particular interest to this paper is the observation that trust is a subjective notion that is invisible, implicit and uncertain in nature, therefore it may be suitable for being expressed by subjective probabilities and then modeled on the basis of Bayesian principle. In spite of a few attempts to model trust in the Bayesian paradigm, the field lacks a global comprehensive overview of Bayesian methods and their theoretical connections to other alternatives. This paper presents a study to fill in this gap. It provides a comprehensive review and analysis of the literature, showing that a large deal of existing work, whether or not proposed based on Bayesian principle, can cast into a general Bayesian paradigm termed subjective Bayesian trust (SBT) theory here. The SBT framework can thus act as a general theoretical infrastructure for comparing or analyzing theoretical ties among existing trust models, and for developing novel models. The aim of this study is twofold. One is to gain insights about Bayesian philosophy in modeling trust. The other is to drive current research step ahead in seeking a high-level, abstract way of modeling and evaluating trust.
A Survey on Visual Query Systems in the Web Era (extended version) As more and more collections of data are becoming available on the web to everyone, non expert users demand easy ways to retrieve data from these collections. One solution is the so called Visual Query Systems (VQS) where queries are represented visually and users do not have to understand query languages such as SQL or XQuery. In 1996, a paper by Catarci reviewed the Visual Query Systems available until that year. In this paper, we review VQSs from 1997 until now and try to determine whether they have been the solution for non expert users. The short answer is no because very few systems have in fact been used in real environments or as commercial tools. We have also gathered basic features of VQSs such as the visual representation adopted to present the reality of interest or the visual representation adopted to express queries.
A Survey: Non-Orthogonal Multiple Access with Compressed Sensing Multiuser Detection for mMTC One objective of the 5G communication system and beyond is to support massive machine type of communication (mMTC) to propel the fast growth of diverse Internet of Things use cases. The mMTC aims to provide connectivity to tens of billions sensor nodes. The dramatic increase of sensor devices and massive connectivity impose critical challenges for the network to handle the enormous control signaling overhead with limited radio resource. Non-Orthogonal Multiple Access (NOMA) is a new paradigm shift in the design of multiple user detection and multiple access. NOMA with compressive sensing based multiuser detection is one of the promising candidates to address the challenges of mMTC. The survey article aims at providing an overview of the current state-of-art research work in various compressive sensing based techniques that enable NOMA. We present characteristics of different algorithms and compare their pros and cons, thereby provide useful insights for researchers to make further contributions in NOMA using compressive sensing techniques.
A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas This report will show the history of deep learning evolves. It will trace back as far as the initial belief of connectionism modelling of brain, and come back to look at its early stage realization: neural networks. With the background of neural network, we will gradually introduce how convolutional neural networks, as a representative of deep discriminative models, is developed from neural networks, together with many practical techniques that can help in optimization of neural networks. On the other hand, we will also trace back to see the evolution history of deep generative models, to see how researchers balance the representation power and computation complexity to reach Restricted Boltzmann Machine and eventually reach Deep Belief Nets. Further, we will also look into the development history of modelling time series data with neural networks. We start with Time Delay Neural Networks and move further to currently famous model named Recurrent Neural Network and its extension Lone Time Short Memory. We will also briefly look into how to construct deep recurrent neural networks. Finally, we will conclude this report with some interesting open-ended questions of deep neural networks.
A System for Accessible Artificial Intelligence While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects.
A systematic review of fuzzing based on machine learning techniques Security vulnerabilities play a vital role in network security system. Fuzzing technology is widely used as a vulnerability discovery technology to reduce damage in advance. However, traditional fuzzing techniques have many challenges, such as how to mutate input seed files, how to increase code coverage, and how to effectively bypass verification. Machine learning technology has been introduced as a new method into fuzzing test to alleviate these challenges. This paper reviews the research progress of using machine learning technology for fuzzing test in recent years, analyzes how machine learning improve the fuzz process and results, and sheds light on future work in fuzzing. Firstly, this paper discusses the reasons why machine learning techniques can be used for fuzzing scenarios and identifies six different stages in which machine learning have been used. Then this paper systematically study the machine learning based fuzzing models from selection of machine learning algorithm, pre-processing methods, datasets, evaluation metrics, and hyperparameters setting. Next, this paper assesses the performance of the machine learning models based on the frequently used evaluation metrics. The results of the evaluation prove that machine learning technology has an acceptable capability of categorize predictive for fuzzing. Finally, the comparison on capability of discovering vulnerabilities between traditional fuzzing tools and machine learning based fuzzing tools is analyzed. The results depict that the introduction of machine learning technology can improve the performance of fuzzing. However, there are still some limitations, such as unbalanced training samples and difficult to extract the characteristics related to vulnerabilities.
A Taxonomy for Neural Memory Networks In this paper, a taxonomy for memory networks is proposed based on their memory organization. The taxonomy includes all the popular memory networks: vanilla recurrent neural network (RNN), long short term memory (LSTM ), neural stack and neural Turing machine and their variants. The taxonomy puts all these networks under a single umbrella and shows their relative expressive power , i.e. vanilla RNN <=LSTM<=neural stack<=neural RAM. The differences and commonality between these networks are analyzed. These differences are also connected to the requirements of different tasks which can give the user instructions of how to choose or design an appropriate memory network for a specific task. As a conceptual simplified class of problems, four tasks of synthetic symbol sequences: counting, counting with interference, reversing and repeat counting are developed and tested to verify our arguments. And we use two natural language processing problems to discuss how this taxonomy helps choosing the appropriate neural memory networks for real world problem.
A Temporal Difference Reinforcement Learning Theory of Emotion: unifying emotion, cognition and adaptive behavior Emotions are intimately tied to motivation and the adaptation of behavior, and many animal species show evidence of emotions in their behavior. Therefore, emotions must be related to powerful mechanisms that aid survival, and, emotions must be evolutionary continuous phenomena. How and why did emotions evolve in nature, how do events get emotionally appraised, how do emotions relate to cognitive complexity, and, how do they impact behavior and learning? In this article I propose that all emotions are manifestations of reward processing, in particular Temporal Difference (TD) error assessment. Reinforcement Learning (RL) is a powerful computational model for the learning of goal oriented tasks by exploration and feedback. Evidence indicates that RL-like processes exist in many animal species. Key in the processing of feedback in RL is the notion of TD error, the assessment of how much better or worse a situation just became, compared to what was previously expected (or, the estimated gain or loss of utility – or well-being – resulting from new evidence). I propose a TDRL Theory of Emotion and discuss its ramifications for our understanding of emotions in humans, animals and machines, and present psychological, neurobiological and computational evidence in its support.
A Theoretical Connection Between Statistical Physics and Reinforcement Learning Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforcement learning using the tools and abstractions of statistical physics. The central object in the statistical physics abstraction is the idea of a partition function $\mathcal{Z}$, and here we construct a partition function from the ensemble of possible trajectories that an agent might take in a Markov decision process. Although value functions and $Q$-functions can be derived from this partition function and interpreted via average energies, the $\mathcal{Z}$-function provides an object with its own Bellman equation that can form the basis of alternative dynamic programming approaches. Moreover, when the MDP dynamics are deterministic, the Bellman equation for $\mathcal{Z}$ is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions. The policies learned via these $\mathcal{Z}$-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations. In addition to sampling actions proportionally to the exponential of the expected cumulative reward as Boltzmann policies would, these policies take entropy into account favoring states from which many outcomes are possible.
A Theory of Diagnostic Interpretation in Supervised Classification Interpretable deep learning is a fundamental building block towards safer AI, especially when the deployment possibilities of deep learning-based computer-aided medical diagnostic systems are so eminent. However, without a computational formulation of black-box interpretation, general interpretability research rely heavily on subjective bias. Clear decision structure of the medical diagnostics lets us approximate the decision process of a radiologist as a model – removed from subjective bias. We define the process of interpretation as a finite communication between a known model and a black-box model to optimally map the black box’s decision process in the known model. Consequently, we define interpretability as maximal information gain over the initial uncertainty about the black-box’s decision within finite communication. We relax this definition based on the observation that diagnostic interpretation is typically achieved by a process of minimal querying. We derive an algorithm to calculate diagnostic interpretability. The usual question of accuracy-interpretability tradeoff, i.e. whether a black-box model’s prediction accuracy is dependent on its ability to be interpreted by a known source model, does not arise in this theory. With multiple example simulation experiments of various complexity levels, we demonstrate the working of such a theoretical model in synthetic supervised classification scenarios.
A Theory of Output-Side Unsupervised Domain Adaptation When learning a mapping from an input space to an output space, the assumption that the sample distribution of the training data is the same as that of the test data is often violated. Unsupervised domain shift methods adapt the learned function in order to correct for this shift. Previous work has focused on utilizing unlabeled samples from the target distribution. We consider the complementary problem in which the unlabeled samples are given post mapping, i.e., we are given the outputs of the mapping of unknown samples from the shifted domain. Two other variants are also studied: the two sided version, in which unlabeled samples are give from both the input and the output spaces, and the Domain Transfer problem, which was recently formalized. In all cases, we derive generalization bounds that employ discrepancy terms.
A Tour of Reinforcement Learning: The View from Continuous Control This manuscript surveys reinforcement learning from the perspective of optimization and control with a focus on continuous control applications. It surveys the general formulation, terminology, and typical experimental implementations of reinforcement learning and reviews competing solution paradigms. In order to compare the relative merits of various techniques, this survey presents a case study of the Linear Quadratic Regulator (LQR) with unknown dynamics, perhaps the simplest and best studied problem in optimal control. The manuscript describes how merging techniques from learning theory and control can provide non-asymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate the role and importance of models and the cost of generality in reinforcement learning algorithms. This survey concludes with a discussion of some of the challenges in designing learning systems that safely and reliably interact with complex and uncertain environments and how tools from reinforcement learning and controls might be combined to approach these challenges.
A Tour through the Visualization Zoo A survey of powerful visualization techniques, from the obvious to the obscure
A Tutorial for Reinforcement Learning The tutorial is written for those who would like an introduction to reinforcement learning (RL). The aim is to provide an intuitive presentation of the ideas rather than concentrate on the deeper mathematics underlying the topic. RL is generally used to solve the so-called Markov decision problem (MDP). In other words, the problem that you are attempting to solve with RL should be an MDP or its variant. The theory of RL relies on dynamic programming (DP) and artificial intelligence (AI). We will begin with a quick description of MDPs. We will discuss what we mean by ‘complex’ and ‘large-scale’ MDPs. Then we will explain why RL is needed to solve complex and large-scale MDPs. The semi-Markov decision problem (SMDP) will also be covered.
A tutorial on active learning (Slide Deck)
A Tutorial on Bayesian Belief Networks This tutorial provides an overview of Bayesian belief networks. The subject is introduced through a discussion on probabilistic models that covers probability language, dependency models, graphical representations of models, and belief networks as a particular representation of probabilistic models. The general class of causal belief networks is presented, and the concept of d-separation and its relationship with independence in probabilistic models is introduced. This leads to a description of Bayesian belief networks as a specific class of causal belief networks, with detailed discussion on belief propagation and practical network design. The target recognition problem is presented as an example of the application of Bayesian belief networks to a real problem, and the tutorial concludes with a brief summary of Bayesian belief networks.
A Tutorial on Bayesian Optimization Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications.
A Tutorial on Bridge Sampling The marginal likelihood plays an important role in many areas of Bayesian statistics such as parameter estimation, model comparison, and model averaging. In most applications, however, the marginal likelihood is not analytically tractable and must be approximated using numerical methods. Here we provide a tutorial on bridge sampling (Bennett, 1976; Meng and Wong, 1996), a reliable and relatively straightforward sampling method that allows researchers to obtain the marginal likelihood for models of varying complexity. First, we introduce bridge sampling and three related sampling methods using the beta-binomial model as a running example. We then apply bridge sampling to estimate the marginal likelihood for the Expectancy Valence (EV) model—a popular model for reinforcement learning. Our results indicate that bridge sampling provides accurate estimates for both a single participant and a hierarchical version of the EV model. We conclude that bridge sampling is an attractive method for mathematical psychologists who typically aim to approximate the marginal likelihood for a limited set of possibly high-dimensional models.
A Tutorial on Canonical Correlation Methods Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has for instance been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation. This tutorial explains the theory of canonical correlation analysis including its regularised, kernel, and sparse variants. Additionally, the deep and Bayesian CCA extensions are briefly reviewed. Together with the numerical examples, this overview provides a coherent compendium on the applicability of the variants of canonical correlation analysis. By bringing together techniques for solving the optimisation problems, evaluating the statistical significance and generalisability of the canonical correlation model, and interpreting the relations, we hope that this article can serve as a hands-on tool for applying canonical correlation methods in data analysis.
A Tutorial on Deep Learning Part 1: Nonlinear Classi ers and The Backpropagation Algorithm In the past few years, Deep Learning has generated much excitement in Machine Learning and industry thanks to many breakthrough results in speech recognition, computer vision and text processing. So, what is Deep Learning For many researchers, Deep Learning is another name for a set of algorithms that use a neural network as an architecture. Even though neural networks have a long history, they became more successful in recent years due to the availability of inexpensive, parallel hardware (GPUs, computer clusters) and massive amounts of data. In this tutorial, we will start with the concept of a linear classi er and use that to develop the concept of neural networks. I will present two key algorithms in learning with neural networks: the stochastic gradient descent algorithm and the backpropagation algorithm. Towards the end of the tutorial, I will explain some simple tricks and recent advances that improve neural networks and their training. For that, let’s start with a simple example.
A Tutorial on Deep Learning Part 2: Autoencoders, Convolutional Neural Networks and Recurrent Neural Networks In the previous tutorial, I discussed the use of deep networks to classify nonlinear data. In addition to their ability to handle nonlinear data, deep networks also have a special strength in their exibility which sets them apart from other tranditional machine learning models: we can modify them in many ways to suit our tasks. In the following, I will discuss three most common modi cations: • Unsupervised learning and data compression via autoencoders which require modi cations in the loss function, • Translational invariance via convolutional neural networks which require modi cations in the network architecture, • Variable-sized sequence prediction via recurrent neural networks which require modi cations in the network architecture. The exibility of neural networks is a very powerful property. In many cases, these changes lead to great improvements in accuracy compared to basic models that we discussed in the previous tutorial. In the last part of the tutorial, I will also explain how to parallelize the training of neural networks. This is also an important topic because parallelizing neural networks has played an important role in the current deep learning movement.
A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms and Software This paper describes the discipline of distance metric learning, a branch of machine learning that aims to learn distances from the data. Distance metric learning can be useful to improve similarity learning algorithms, and also has applications in dimensionality reduction. We describe the distance metric learning problem and analyze its main mathematical foundations. We discuss some of the most popular distance metric learning techniques used in classification, showing their goals and the required information to understand and use them. Furthermore, we present a Python package that collects a set of 17 distance metric learning techniques explained in this paper, with some experiments to evaluate the performance of the different algorithms. Finally, we discuss several possibilities of future work in this topic.
A Tutorial on Fisher Information In many statistical applications that concern mathematical psychologists, the concept of Fisher information plays an important role. In this tutorial we clarify the concept of Fisher information as it manifests itself across three different statistical paradigms. First, in the frequentist paradigm, Fisher information is used to construct hypothesis tests and confidence intervals using maximum likelihood estimators; second, in the Bayesian paradigm, Fisher information is used to define a default prior; finally, in the minimum description length paradigm, Fisher information is used to measure model complexity.
A tutorial on geometric programming A geometric program (GP) is a type of mathematical optimization problem characterized by objective and constraint functions that have a special form. Recently developed solution methods can solve even large-scale GPs extremely efficiently and reliably; at the same time a number of practical problems, particularly in circuit design, have been found to be equivalent to (or well approximated by) GPs. Putting these two together, we get effective solutions for the practical problems. The basic approach in GP modeling is to attempt to express a practical problem, such as an engineering analysis or design problem, in GP format. In the best case, this formulation is exact; when this is not possible, we settle for an approximate formulation. This tutorial paper collects together in one place the basic background material needed to do GP modeling. We start with the basic definitions and facts, and some methods used to transform problems into GP format.We show how to recognize functions and problems compatible with GP, and how to approximate functions or data in a form compatible with GP (when this is possible). We give some simple and representative examples, and also describe some common extensions of GP, along with methods for solving (or approximately solving) them.
A Tutorial on Hawkes Processes for Events in Social Media This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time. We start by reviewing the definitions and the key concepts in point processes. We then introduce the Hawkes process, its event intensity function, as well as schemes for event simulation and parameter estimation. We also describe a practical example drawn from social media data – we show how to model retweet cascades using a Hawkes self-exciting process. We presents a design of the memory kernel, and results on estimating parameters and predicting popularity. The code and sample event data are available as an online appendix
A Tutorial on Kernel Density Estimation and Recent Advances This tutorial provides a gentle introduction to kernel density estimation (KDE) and recent advances regarding confidence bands and geometric/topological features. We begin with a discussion of basic properties of KDE: the convergence rate under various metrics, density derivative estimation, and bandwidth selection. Then, we introduce common approaches to the construction of confidence intervals/bands, and we discuss how to handle bias. Next, we talk about recent advances in the inference of geometric and topological features of a density function using KDE. Finally, we illustrate how one can use KDE to estimate a cumulative distribution function and a receiver operating characteristic curve. We provide R implementations related to this tutorial at the end.
A Tutorial on Modeling and Inference in Undirected Graphical Models for Hyperspectral Image Analysis Undirected graphical models have been successfully used to jointly model the spatial and the spectral dependencies in earth observing hyperspectral images. They produce less noisy, smooth, and spatially coherent land cover maps and give top accuracies on many datasets. Moreover, they can easily be combined with other state-of-the-art approaches, such as deep learning. This has made them an essential tool for remote sensing researchers and practitioners. However, graphical models have not been easily accessible to the larger remote sensing community as they are not discussed in standard remote sensing textbooks and not included in the popular remote sensing software and toolboxes. In this tutorial, we provide a theoretical introduction to Markov random fields and conditional random fields based spatial-spectral classification for land cover mapping along with a detailed step-by-step practical guide on applying these methods using freely available software. Furthermore, the discussed methods are benchmarked on four public hyperspectral datasets for a fair comparison among themselves and easy comparison with the vast number of methods in literature which use the same datasets. The source code necessary to reproduce all the results in the paper is published on-line to make it easier for the readers to apply these techniques to different remote sensing problems.
A Tutorial on Network Embeddings Network embedding methods aim at learning low-dimensional latent representation of nodes in a network. These representations can be used as features for a wide range of tasks on graphs such as classification, clustering, link prediction, and visualization. In this survey, we give an overview of network embeddings by summarizing and categorizing recent advancements in this research field. We first discuss the desirable properties of network embeddings and briefly introduce the history of network embedding algorithms. Then, we discuss network embedding methods under different scenarios, such as supervised versus unsupervised learning, learning embeddings for homogeneous networks versus for heterogeneous networks, etc. We further demonstrate the applications of network embeddings, and conclude the survey with future work in this area.
A tutorial on Particle Swarm Optimization Clustering This paper proposes a tutorial on the Data Clustering technique using the Particle Swarm Optimization approach. Following the work proposed by Merwe et al. here we present an in-deep analysis of the algorithm together with a Matlab implementation and a short tutorial that explains how to modify the proposed implementation and the effect of the parameters of the original algorithm. Moreover, we provide a comparison against the results obtained using the well known K-Means approach. All the source code presented in this paper is publicly available under the GPL-v2 license.
A Tutorial on Spectral Clustering In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.
A Tutorial on Statistically Sound Pattern Discovery Statistically sound pattern discovery harnesses the rigour of statistical hypothesis testing to overcome many of the issues that have hampered standard data mining approaches to pattern discovery. Most importantly, application of appropriate statistical tests allows precise control over the risk of false discoveries — patterns that are found in the sample data but do not hold in the wider population from which the sample was drawn. Statistical tests can also be applied to filter out patterns that are unlikely to be useful, removing uninformative variations of the key patterns in the data. This tutorial introduces the key statistical and data mining theory and techniques that underpin this fast developing field.
A Tutorial on Thompson Sampling Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance. The algorithm addresses a broad range of problems in a computationally efficient manner and is therefore enjoying wide use. This tutorial covers the algorithm and its application, illustrating concepts through a range of examples, including Bernoulli bandit problems, shortest path problems, dynamic pricing, recommendation, active learning with neural networks, and reinforcement learning in Markov decision processes. Most of these problems involve complex information structures, where information revealed by taking an action informs beliefs about other actions. We will also discuss when and why Thompson sampling is or is not effective and relations to alternative algorithms.
A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the ‘echo state network’ approach (Slide Deck)
A unified view of gradient-based attribution methods for Deep Neural Networks Understanding the flow of information in Deep Neural Networks is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, only few attempts to analyze them from a theoretical perspective have been made in the past. In this work we analyze various state-of-the-art attribution methods and prove unexplored connections between them. We also show how some methods can be reformulated and more conveniently implemented. Finally, we perform an empirical evaluation with six attribution methods on a variety of tasks and architectures and discuss their strengths and limitations.
A Universal Hypercomputer This paper describes a type of infinitary computer (a hypercomputer) capable of computing truth in initial levels of the set theoretic universe, V. The proper class of such hypercomputers is called a universal hypercomputer. There are two basic variants of hypercomputer: a serial hypercomputer and a parallel hypercomputer. The set of computable functions of the two variants is identical but the parallel hypercomputer is in general faster than a serial hypercomputer (as measured by an ordinal complexity measure). Insights into set theory using information theory and a universal hypercomputer are possible, and it is argued that the Generalised Continuum Hypothesis can be regarded as a information-theoretic principle, which follows from an information minimization principle.
A User’s Guide to Support Vector Machines The Support Vector Machine (SVM) is a widely used classifier. And yet, obtaining the best results with SVMs requires an understanding of their workings and the various ways a user can in uence their accuracy. We provide the user with a basic understanding of the theory behind SVMs and focus on their use in practice. We describe the effect of the SVM parameters on the resulting classifier, how to select good values for those parameters, data normalization, factors that affect training time, and software for training SVMs.
A vector linear programming approach for certain global optimization problems Global optimization problems with a quasi-concave objective function and linear constraints are studied. We point out that various other classes of global optimization problems can be expressed in this way. We present two algorithms, which can be seen as slight modifications of Benson-type algorithms for multiple objective linear programs. The modification of the MOLP algorithms results into a more efficient treatment of the studied optimization problems. This paper generalizes and improves results of Schulz and Mittal on quasi-concave problems, Shao and Ehrgott on multiplicative linear programs and L\’ohne and Wagner on minimizing the difference $f=g-h$ of two convex functions $g$, $h$ where either $g$ or $h$ is polyhedral. Numerical examples are given and the results are compared with the global optimization software BARON.
A Very Brief Introduction to Machine Learning With Applications to Communication Systems Given the unprecedented availability of data and computing resources, there is widespread renewed interest in applying data-driven machine learning methods to problems for which the development of conventional engineering solutions is challenged by modelling or algorithmic deficiencies. This tutorial-style paper starts by addressing the questions of why and when such techniques can be useful. It then provides a high-level introduction to the basics of supervised and unsupervised learning with a focus on probabilistic models. For both supervised and unsupervised learning, exemplifying applications to communication networks are discussed by distinguishing tasks carried out at the edge and at the cloud segments of the network at different layers of the protocol stack.
A weakly informative default prior distribution for logistic and other regression models We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. Cross-validation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation. We implement a procedure to fit generalized linear models in R with the Student-t prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several applications, including a series of logistic regressions predicting voting preferences, a small bioassay experiment, and an imputation model for a public health data set.
Abandon Statistical Significance In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only then is consideration–often scant–given to such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. There have been recent proposals to change the p-value threshold, but instead we recommend abandoning the null hypothesis significance testing paradigm entirely, leaving p-values as just one of many pieces of information with no privileged role in scientific publication and decision making. We argue that this radical approach is both practical and sensible.
Above the Clouds: A Brief Survey Cloud Computing is a versatile technology that can support a broad-spectrum of applications. The low cost of cloud computing and its dynamic scaling renders it an innovation driver for small companies, particularly in the developing world. Cloud deployed enterprise resource planning (ERP), supply chain management applications (SCM), customer relationship management (CRM) applications, medical applications, business applications and mobile applications have potential to reach millions of users. In this paper, we explore the different concepts involved in cloud computing and we also examine clouds from technical aspects. We highlight some of the opportunities in cloud computing underlining the importance of clouds showing why that technology must succeed and we have provided additional cloud computing problems that businesses may need to address. Finally, we discuss some of the issues that this area should deal with.
Abstraction Learning There has been a gap between artificial intelligence and human intelligence. In this paper, we identify three key elements forming human intelligence, and suggest that abstraction learning combines these elements and is thus a way to bridge the gap. Prior researches in artificial intelligence either specify abstraction by human experts, or take abstraction as a qualitative explanation for the model. This paper aims to learn abstraction directly. We tackle three main challenges: representation, objective function, and learning algorithm. Specifically, we propose a partition structure that contains pre-allocated abstraction neurons; we formulate abstraction learning as a constrained optimization problem, which integrates abstraction properties; we develop a network evolution algorithm to solve this problem. This complete framework is named ONE (Optimization via Network Evolution). In our experiments on MNIST, ONE shows elementary human-like intelligence, including low energy consumption, knowledge sharing, and lifelong learning.
Accelerating CNN inference on FPGAs: A Survey Convolutional Neural Networks (CNNs) are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image classification and segmentation. The large amount of processing required by CNNs calls for dedicated and tailored hardware support methods. Moreover, CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs. The amount and diversity of research on the subject of CNN FPGA acceleration within the last 3 years demonstrates the tremendous industrial and academic interest. This paper presents a state-of-the-art of CNN inference accelerators over FPGAs. The computational workloads, their parallelism and the involved memory accesses are analyzed. At the level of neurons, optimizations of the convolutional and fully connected layers are explained and the performances of the different methods compared. At the network level, approximate computing and datapath optimization methods are covered and state-of-the-art approaches compared. The methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators and will fuel the future advances on efficient hardware deep learning.
Activation Functions: Comparison of trends in Practice and Research for Deep Learning Deep neural networks have been successfully used in diverse emerging domains to solve real world complex problems with may more deep learning(DL) architectures, being developed to date. To achieve these state-of-the-art performances, the DL architectures use activation functions (AFs), to perform diverse computations between the hidden layers and the output layers of any given DL architecture. This paper presents a survey on the existing AFs used in deep learning applications and highlights the recent trends in the use of the activation functions for deep learning applications. The novelty of this paper is that it compiles majority of the AFs used in DL and outlines the current trends in the applications and usage of these functions in practical deep learning deployments against the state-of-the-art research results. This compilation will aid in making effective decisions in the choice of the most suitable and appropriate activation function for any given application, ready for deployment. This paper is timely because most research papers on AF highlights similar works and results while this paper will be the first, to compile the trends in AF applications in practice against the research results from literature, found in deep learning research to date.
Active Learning for Visual Question Answering: An Empirical Study We present an empirical study of active learning for Visual Question Answering, where a deep VQA model selects informative question-image pairs from a pool and queries an oracle for answers to maximally improve its performance under a limited query budget. Drawing analogies from human learning, we explore cramming (entropy), curiosity-driven (expected model change), and goal-driven (expected error reduction) active learning approaches, and propose a fast and effective goal-driven active learning scoring function to pick question-image pairs for deep VQA models under the Bayesian Neural Network framework. We find that deep VQA models need large amounts of training data before they can start asking informative questions. But once they do, all three approaches outperform the random selection baseline and achieve significant query savings. For the scenario where the model is allowed to ask generic questions about images but is evaluated only on specific questions (e.g., questions whose answer is either yes or no), our proposed goal-driven scoring function performs the best.
Ad Click Prediction: a View from the Trenches Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a deployed CTR prediction system. These include improvements in the context of traditional supervised learning based on an FTRL-Proximal online learning algorithm (which has excellent sparsity and convergence properties) and the use of per-coordinate learning rates. We also explore some of the challenges that arise in a real-world system that may appear at first to be outside the domain of traditional machine learning research. These include useful tricks for memory savings, methods for as- sessing and visualizing performance, practical methods for providing con dence estimates for predicted probabilities, calibration methods, and methods for automated management of features. Finally, we also detail several directions that did not turn out to be bene cial for us, despite promis- ing results elsewhere in the literature. The goal of this paper is to highlight the close relationship between theoretical ad- vances and practical engineering in this industrial setting, and to show the depth of challenges that appear when applying traditional machine learning methods in a complex dynamic system.
ADADELTA: An Adaptive Learning Rate Method We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.
Adaptive Graph Signal Processing: Algorithms and Optimal Sampling Strategies The goal of this paper is to propose novel strategies for adaptive learning of signals defined over graphs, which are observed over a (randomly time-varying) subset of vertices. We recast two classical adaptive algorithms in the graph signal processing framework, namely, the least mean squares (LMS) and the recursive least squares (RLS) adaptive estimation strategies. For both methods, a detailed mean-square analysis illustrates the effect of random sampling on the adaptive reconstruction capability and the steady-state performance. Then, several probabilistic sampling strategies are proposed to design the sampling probability at each node in the graph, with the aim of optimizing the tradeoff between steady-state performance, graph sampling rate, and convergence rate of the adaptive algorithms. Finally, a distributed RLS strategy is derived and is shown to be convergent to its centralized counterpart. Numerical simulations carried out over both synthetic and real data illustrate the good performance of the proposed sampling and reconstruction strategies for (possibly distributed) adaptive learning of signals defined over graphs.
Addressing the ‘Big Data’ Issue: What You Need to Know These days, you´re probably hearing a lot of hype about ‘big data.’ Vendors are currently hawking a wealth of new tools, all of which promise to help your organization unlock previously inaccessible insights from your proprietary information. According to the authors, there is no doubt that big data, i.e., organization-wide data that´s being managed in a centralized repository, can yield valuable discoveries that will result in improved products and performance – if properly analyzed. Nonetheless, you must look before you leap. First, is your company culture ready for such a move How will data managers be affected when scores of discrete data silos are gathered and reviewed as a whole How will you involve leadership and others in ongoing decision-making processes How will you choose your architecture and tools from the dizzying array of options that are currently available How will you stay up-to-date in this rapidly evolving field Finally, how will you train your company´s users so that they can actually leverage the new capabilities This ExecBlueprint explores these and other key concerns.
Advanced Analytics with the SAP HANA Database MapReduce as a programming paradigm provides a simple-to-use yet very powerful abstraction encapsulated in two second-order functions: Map and Reduce. As such, they allow defining single sequentially processed tasks while at the same time hiding many of the framework details about how those tasks are parallelized and scaled out. In this paper we discuss four processing patterns in the context of the distributed SAP HANA database that go beyond the classic MapReduce paradigm. We illustrate them using some typical Machine Learning algorithms and present experimental results that demonstrate how the data flows scale out with the number of parallel tasks.
Advances in Artificial Intelligence Require Progress Across all of Computer Science Advances in Artificial Intelligence require progress across all of computer science.
Advances in Machine Learning for the Behavioral Sciences The areas of machine learning and knowledge discovery in databases have considerably matured in recent years. In this article, we briefly review recent developments as well as classical algorithms that stood the test of time. Our goal is to provide a general introduction into different tasks such as learning from tabular data, behavioral data, or textual data, with a particular focus on actual and potential applications in behavioral sciences. The supplemental appendix to the article also provides practical guidance for using the methods by pointing the reader to proven software implementations. The focus is on R, but we also cover some libraries in other programming languages as well as systems with easy-to-use graphical interfaces.
Advances in Natural Language Question Answering: A Review Question Answering has recently received high attention from artificial intelligence communities due to the advancements in learning technologies. Early question answering models used rule-based approaches and moved to the statistical approach to address the vastly available information. However, statistical approaches are shown to underperform in handling the dynamic nature and the variation of language. Therefore, learning models have shown the capability of handling the dynamic nature and variations in language. Many deep learning methods have been introduced to question answering. Most of the deep learning approaches have shown to achieve higher results compared to machine learning and statistical methods. The dynamic nature of language has profited from the nonlinear learning in deep learning. This has created prominent success and a spike in work on question answering. This paper discusses the successes and challenges in question answering question answering systems and techniques that are used in these challenges.
Advances in Variational Inference Many modern unsupervised or semi-supervised machine learning algorithms rely on Bayesian probabilistic models. These models are usually intractable and thus require approximate inference. Variational inference (VI) lets us approximate a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem. This approach has been successfully used in various models and large-scale applications. In this review, we give an overview of recent trends in variational inference. We first introduce standard mean field variational inference, then review recent advances focusing on the following aspects: (a) scalable VI, which includes stochastic approximations, (b) generic VI, which extends the applicability of VI to a large class of otherwise intractable models, such as non-conjugate models, (c) accurate VI, which includes variational models beyond the mean field approximation or with atypical divergences, and (d) amortized VI, which implements the inference over local latent variables with inference networks. Finally, we provide a summary of promising future research directions.
Adversarial Attacks and Defences: A Survey Deep learning has emerged as a strong and efficient framework that can be applied to a broad spectrum of complex learning problems which were difficult to solve using the traditional machine learning techniques in the past. In the last few years, deep learning has advanced radically in such a way that it can surpass human-level performance on a number of tasks. As a consequence, deep learning is being extensively used in most of the recent day-to-day applications. However, security of deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify the output. In recent times, different types of adversaries based on their threat model leverage these vulnerabilities to compromise a deep learning system where adversaries have high incentives. Hence, it is extremely important to provide robustness to deep learning algorithms against these adversaries. However, there are only a few strong countermeasures which can be used in all types of attack scenarios to design a robust deep learning system. In this paper, we attempt to provide a detailed discussion on different types of adversarial attacks with various threat models and also elaborate the efficiency and challenges of recent countermeasures against them.
Adversarial Examples – A Complete Characterisation of the Phenomenon We provide a complete characterisation of the phenomenon of adversarial examples – inputs intentionally crafted to fool machine learning models. We aim to cover all the important concerns in this field of study: (1) the conjectures on the existence of adversarial examples, (2) the security, safety and robustness implications, (3) the methods used to generate and (4) protect against adversarial examples and (5) the ability of adversarial examples to transfer between different machine learning models. We provide ample background information in an effort to make this document self-contained. Therefore, this document can be used as survey, tutorial or as a catalog of attacks and defences using adversarial examples.
Adversarial Examples in Modern Machine Learning: A Review Recent research has found that many families of machine learning models are vulnerable to adversarial examples: inputs that are specifically designed to cause the target model to produce erroneous outputs. In this survey, we focus on machine learning models in the visual domain, where methods for generating and detecting such examples have been most extensively studied. We explore a variety of adversarial attack methods that apply to image-space content, real world adversarial attacks, adversarial defenses, and the transferability property of adversarial examples. We also discuss strengths and weaknesses of various methods of adversarial attack and defense. Our aim is to provide an extensive coverage of the field, furnishing the reader with an intuitive understanding of the mechanics of adversarial attack and defense mechanisms and enlarging the community of researchers studying this fundamental set of problems.
Adversarial Examples: Attacks and Defenses for Deep Learning With rapid progress and great successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called \textit{adversarial examples}. Adversarial examples are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical scenarios. Therefore, the attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples against deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications and countermeasures for adversarial examples are investigated. We further elaborate on adversarial examples and explore the challenges and the potential solutions.
Adversarial Learning in Statistical Classification: A Comprehensive Review of Defenses Against Attacks With the wide deployment of machine learning (ML) based systems for a variety of applications including medical, military, automotive, genomic, as well as multimedia and social networking, there is great potential for damage from adversarial learning (AL) attacks. In this paper, we provide a contemporary survey of AL, focused particularly on defenses against attacks on statistical classifiers. After introducing relevant terminology and the goals and range of possible knowledge of both attackers and defenders, we survey recent work on test-time evasion (TTE), data poisoning (DP), and reverse engineering (RE) attacks and particularly defenses against same. In so doing, we distinguish robust classification from anomaly detection (AD), unsupervised from supervised, and statistical hypothesis-based defenses from ones that do not have an explicit null (no attack) hypothesis; we identify the hyperparameters a particular method requires, its computational complexity, as well as the performance measures on which it was evaluated and the obtained quality. We then dig deeper, providing novel insights that challenge conventional AL wisdom and that target unresolved issues, including: 1) robust classification versus AD as a defense strategy; 2) the belief that attack success increases with attack strength, which ignores susceptibility to AD; 3) small perturbations for test-time evasion attacks: a fallacy or a requirement?; 4) validity of the universal assumption that a TTE attacker knows the ground-truth class for the example to be attacked; 5) black, grey, or white box attacks as the standard for defense evaluation; 6) susceptibility of query-based RE to an AD defense. We then present benchmark comparisons of several defenses against TTE, RE, and backdoor DP attacks on images. The paper concludes with a discussion of future work.
Advice from the Oracle: Really Intelligent Information Retrieval What is ‘intelligent’ information retrieval Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as mentor to a number of less senior people, and as trusted advisor to even the most accomplished scientists, and knows essentially everyone in the field. In an appendix I will present a brief summary of the Statistical Factor Space method for text indexing and retrieval, and indicate how it will be used in the Astrophysics Data System Abstract Service. 2018 Keywords: Personal Digital Assistant; Supervised Topic Models
Agent-based computing from multi-agent systems to agent-based Models: a visual survey Agent-Based Computing is a diverse research domain concerned with the building of intelligent software based on the concept of ‘agents’. In this paper, we use Scientometric analysis to analyze all sub-domains of agent-based computing. Our data consists of 1,064 journal articles indexed in the ISI web of knowledge published during a twenty year period: 1990-2010. These were retrieved using a topic search with various keywords commonly used in sub-domains of agent-based computing. In our proposed approach, we have employed a combination of two applications for analysis, namely Network Workbench and CiteSpace – wherein Network Workbench allowed for the analysis of complex network aspects of the domain, detailed visualization-based analysis of the bibliographic data was performed using CiteSpace. Our results include the identification of the largest cluster based on keywords, the timeline of publication of index terms, the core journals and key subject categories. We also identify the core authors, top countries of origin of the manuscripts along with core research institutes. Finally, our results have interestingly revealed the strong presence of agent-based computing in a number of non-computing related scientific domains including Life Sciences, Ecological Sciences and Social Sciences.
Agent-Based Modeling and Simulation Agent-based modeling and simulation (ABMS) is a new approach to modeling systems comprised of autonomous, interacting agents. Computational advances have made possible a growing number of agent-based models across a variety of application domains. Applications range from modeling agent behavior in the stock market, supply chains, and consumer markets, to predicting the spread of epidemics, mitigating the threat of bio-warfare, and understanding the factors that may be responsible for the fall of ancient civilizations. Such progress suggests the potential of ABMS to have far-reaching effects on the way that businesses use computers to support decision-making and researchers use agent-based models as electronic laboratories. Some contend that ABMS ‘is a third way of doing science’ and could augment traditional deductive and inductive reasoning as discovery methods. This brief tutorial introduces agent-based modeling by describing the foundations of ABMS, discuss-ing some illustrative applications, and addressing toolkits and methods for developing agent-based models.
Agent-based models of collective intelligence Collective or group intelligence is manifested in the fact that a team of cooperating agents can solve problems more efficiently than when those agents work in isolation. Although cooperation is, in general, a successful problem solving strategy, it is not clear whether it merely speeds up the time to find the solution, or whether it alters qualitatively the statistical signature of the search for the solution. Here we review and offer insights on two agent-based models of distributed cooperative problem-solving systems, whose task is to solve a cryptarithmetic puzzle. The first model is the imitative learning search in which the agents exchange information on the quality of their partial solutions to the puzzle and imitate the most successful agent in the group. This scenario predicts a very poor performance in the case imitation is too frequent or the group is too large, a phenomenon akin to Groupthink of social psychology. The second model is the blackboard organization in which agents read and post hints on a public blackboard. This brainstorming scenario performs the best when there is a stringent limit to the amount of information that is exhibited on the board. Both cooperative scenarios produce a substantial speed up of the time to solve the puzzle as compared with the situation where the agents work in isolation. The statistical signature of the search, however, is the same as that of the independent search.
Agile business intelligence: reshaping the landscape The last few years have brought a wave of changes for business intelligence (BI) solutions. A set of redefining technological trends is reshaping the landscape from a slow and cumbersome process practiced mainly by large enterprises to a much more flexible, agile process that mid-market companies as well as individuals can utilize. This report explores the key features that influence the evolution of agile BI and takes a look at the BI landscape under this light. At first glance, polarization seems to exist between traditional BI vendors, who are focused on extract, transform, and load (ETL) and reporting, and the newcomers, who are focused on data exploration and visualization, but a closer look reveals that, in fact, they converge as adoption of useful features is taking place across the spectrum.
AI Enabling Technologies: A Survey Artificial Intelligence (AI) has the opportunity to revolutionize the way the United States Department of Defense (DoD) and Intelligence Community (IC) address the challenges of evolving threats, data deluge, and rapid courses of action. Developing an end-to-end artificial intelligence system involves parallel development of different pieces that must work together in order to provide capabilities that can be used by decision makers, warfighters and analysts. These pieces include data collection, data conditioning, algorithms, computing, robust artificial intelligence, and human-machine teaming. While much of the popular press today surrounds advances in algorithms and computing, most modern AI systems leverage advances across numerous different fields. Further, while certain components may not be as visible to end-users as others, our experience has shown that each of these interrelated components play a major role in the success or failure of an AI system. This article is meant to highlight many of these technologies that are involved in an end-to-end AI system. The goal of this article is to provide readers with an overview of terminology, technical details and recent highlights from academia, industry and government. Where possible, we indicate relevant resources that can be used for further reading and understanding.
AI in the media and creative industries Thanks to the Big Data revolution and increasing computing capacities, Artificial Intelligence (AI) has made an impressive revival over the past few years and is now omnipresent in both research and industry. The creative sectors have always been early adopters of AI technologies and this continues to be the case. As a matter of fact, recent technological developments keep pushing the boundaries of intelligent systems in creative applications: the critically acclaimed movie ‘Sunspring’, released in 2016, was entirely written by AI technology, and the first-ever Music Album, called ‘Hello World’, produced using AI has been released this year. Simultaneously, the exploratory nature of the creative process is raising important technical challenges for AI such as the ability for AI-powered techniques to be accurate under limited data resources, as opposed to the conventional ‘Big Data’ approach, or the ability to process, analyse and match data from multiple modalities (text, sound, images, etc.) at the same time. The purpose of this white paper is to understand future technological advances in AI and their growing impact on creative industries. This paper addresses the following questions: Where does AI operate in creative Industries? What is its operative role? How will AI transform creative industries in the next ten years? This white paper aims to provide a realistic perspective of the scope of AI actions in creative industries, proposes a vision of how this technology could contribute to research and development works in such context, and identifies research and development challenges.
AI Reasoning Systems: PAC and Applied Methods Learning and logic are distinct and remarkable approaches to prediction. Machine learning has experienced a surge in popularity because it is robust to noise and achieves high performance; however, ML experiences many issues with knowledge transfer and extrapolation. In contrast, logic is easily intepreted, and logical rules are easy to chain and transfer between systems; however, inductive logic is brittle to noise. We then explore the premise of combining learning with inductive logic into AI Reasoning Systems. Specifically, we summarize findings from PAC learning (conceptual graphs, robust logics, knowledge infusion) and deep learning (DSRL, $\partial$ILP, DeepLogic) by reproducing proofs of tractability, presenting algorithms in pseudocode, highlighting results, and synthesizing between fields. We conclude with suggestions for integrated models by combining the modules listed above and with a list of unsolved (likely intractable) problems.
AI-Powered Social Bots This paper gives an overview of impersonation bots that generate output in one, or possibly, multiple modalities. We also discuss rapidly advancing areas of machine learning and artificial intelligence that could lead to frighteningly powerful new multi-modal social bots. Our main conclusion is that most commonly known bots are one dimensional (i.e., chatterbot), and far from deceiving serious interrogators. However, using recent advances in machine learning, it is possible to unleash incredibly powerful, human-like armies of social bots, in potentially well coordinated campaigns of deception and influence.
AI-Powered Text Generation for Harmonious Human-Machine Interaction: Current State and Future Directions In the last two decades, the landscape of text generation has undergone tremendous changes and is being reshaped by the success of deep learning. New technologies for text generation ranging from template-based methods to neural network-based methods emerged. Meanwhile, the research objectives have also changed from generating smooth and coherent sentences to infusing personalized traits to enrich the diversification of newly generated content. With the rapid development of text generation solutions, one comprehensive survey is urgent to summarize the achievements and track the state of the arts. In this survey paper, we present the general systematical framework, illustrate the widely utilized models and summarize the classic applications of text generation.
AIR5: Five Pillars of Artificial Intelligence Research In this article, we provide and overview of what we consider to be some of the most pressing research questions facing the field of artificial intelligence (AI); as well as its sub-field of computational intelligence (CI). We demarcate these questions using five unique Rs – namely, (i) rationalizability, (ii) resilience, (iii) reproducibility, (iv) realism, and (v) responsibility. Just as air serves as the basic element of biological life, the term AIR5 – cumulatively referring to the five aforementioned Rs – is introduced herein to mark some of the basic elements of artificial life (supporting the sustained growth of AI and CI). A brief summary of each of the Rs is presented, highlighting their relevance as pillars of future research in this arena.
Algorithm quasi-optimal (AQ) learning The algorithm quasi-optimal (AQ) is a powerful machine learning methodology aimed at learning symbolic decision rules from a set of examples and counterexamples. It was first proposed in the late 1960s to solve the Boolean function satisfiability problem and further refined over the following decade to solve the general covering problem. In its newest implementations, it is a powerful but yet little explored methodology for symbolic machine learning classification. It has been applied to solve several problems from different domains, including the generation of individuals within an evolutionary computation framework. The current article introduces the main concepts of the AQ methodology and describes AQ for source detection(AQ4SD), a tailored implementation of the AQ methodology to solve the problem of finding the sources of atmospheric releases using distributed sensor measurements. The AQ4SD program is tested to find the sources of all the releases of the prairie grass field experiment.
Algorithms and Methods in Recommender Systems Today, there is a big variety of different approaches and algorithms of data filtering and recommendations giving. In this paper we describe traditional approaches and explain what kind of modern approaches have been developed lately. All the paper long we will try to explain approaches and their problems based on movie recommendations. In the end we will show the main challenges recommender systems come across.
Algorithms and Statistical Models for Scientific Discovery in the Petabyte Era The field of astronomy has arrived at a turning point in terms of size and complexity of both datasets and scientific collaboration. Commensurately, algorithms and statistical models have begun to adapt — e.g., via the onset of artificial intelligence — which itself presents new challenges and opportunities for growth. This white paper aims to offer guidance and ideas for how we can evolve our technical and collaborative frameworks to promote efficient algorithmic development and take advantage of opportunities for scientific discovery in the petabyte era. We discuss challenges for discovery in large and complex data sets; challenges and requirements for the next stage of development of statistical methodologies and algorithmic tool sets; how we might change our paradigms of collaboration and education; and the ethical implications of scientists’ contributions to widely applicable algorithms and computational modeling. We start with six distinct recommendations that are supported by the commentary following them. This white paper is related to a larger corpus of effort that has taken place within and around the Petabytes to Science Workshops (https://petabytestoscience.github.io ).
Algorithms for Active Learning This dissertation explores both the algorithmic and statistical aspects of active learning for binary classification. What are effective procedures for determining which data to label How can these procedures take advantage of the interactive learning process, and in what circumstances do they yield improved learning performance compared to standard passive learners To answer these questions, we develop and rigorously analyze a broad class of general active learning methods that address the essential algorithmic and statistical difficulties of the problem.
Algorithms for Reinforcement Learning Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner’s predictions. Further, the predictions may have long term e ects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop e cient learning algorithms, as well as to understand the algorithms’ merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in arti cial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming. We give a fairly comprehensive catalog of learning problems, describe the core ideas together with a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.
Algorithms for the Greater Good! On Mental Modeling and Acceptable Symbiosis in Human-AI Collaboration Effective collaboration between humans and AI-based systems requires effective modeling of the human in the loop, both in terms of the mental state as well as the physical capabilities of the latter. However, these models can also open up pathways for manipulating and exploiting the human in the hopes of achieving some greater good, especially when the intent or values of the AI and the human are not aligned or when they have an asymmetrical relationship with respect to knowledge or computation power. In fact, such behavior does not necessarily require any malicious intent but can rather be borne out of cooperative scenarios. It is also beyond simple misinterpretation of intents, as in the case of value alignment problems, and thus can be effectively engineered if desired. Such techniques already exist and pose several unresolved ethical and moral questions with regards to the design of autonomy. In this paper, we illustrate some of these issues in a teaming scenario and investigate how they are perceived by participants in a thought experiment.
Algorithms in Data Mining using Matrix and Tensor Methods In many elds of science, engineering, and economics large amounts of data are stored and there is a need to analyze these data in order to extract information for various purposes. Data mining is a general concept involving di erent tools for performing this kind of analysis. The development of mathematical models and e cient algorithms is of key importance. In this thesis we discuss algorithms for the reduced rank regression problem and algorithms for the computation of the best multilinear rank approximation of tensors.
All Neural Networks are Created Equal One of the unresolved questions in the context of deep learning is the triumph of GD based optimization, which is guaranteed to converge to one of many local minima. To shed light on the nature of the solutions that are thus being discovered, we investigate the ensemble of solutions reached by the same network architecture, with different random initialization of weights and random mini-batches. Surprisingly, we observe that these solutions are in fact very similar – more often than not, each train and test example is either classified correctly by all the networks, or by none at all. Moreover, all the networks seem to share the same learning dynamics, whereby initially the same train and test examples are incorporated into the learnt model, followed by other examples which are learnt in roughly the same order. When different neural network architectures are compared, the same learning dynamics is observed even when one architecture is significantly stronger than the other and achieves higher accuracy. Finally, when investigating other methods that involve the gradual refinement of a solution, such as boosting, once again we see the same learning pattern. In all cases, it appears as if all the classifiers start by learning to classify correctly the same train and test examples, while the more powerful classifiers continue to learn to classify correctly additional examples. These results are incredibly robust, observed for a large variety of architectures, hyperparameters and different datasets of images. Thus we observe that different classification solutions may be discovered by different means, but typically they evolve in roughly the same manner and demonstrate a similar success and failure behavior. For a given dataset, such behavior seems to be strongly correlated with effective generalization, while the induced ranking of examples may reflect inherent structure in the data.
AlphaStar: An Evolutionary Computation Perspective In January 2019, DeepMind revealed AlphaStar to the world-the first artificial intelligence (AI) system to beat a professional player at the game of StarCraft II-representing a milestone in the progress of AI. AlphaStar draws on many areas of AI research, including deep learning, reinforcement learning, game theory, and evolutionary computation (EC). In this paper we analyze AlphaStar primarily through the lens of EC, presenting a new look at the system and relating it to many concepts in the field. We highlight some of its most interesting aspects-the use of Lamarckian evolution, competitive co-evolution, and quality diversity. In doing so, we hope to provide a bridge between the wider EC community and one of the most significant AI systems developed in recent times.
Amazon.com Recommendations: Item-to-Item Collaborative Filtering Recommendation algorithms are best known for their use on e-commerce Web sites, where they use input about a customer´s interests to generate a list of recommended items. Many applications use only the items that customers purchase and explicitly rate to represent their interests, but they can also use other attributes, including items viewed, demographic data, subject interests, and favorite artists. At Amazon.com, we use recommendation algorithms to personalize the online store for each customer. The store radically changes based on customer interests, showing programming titles to a software engineer and baby toys to a new mother. The click-through and conversion rates – two important measures of Web-based and email advertising effectiveness – vastly exceed those of untargeted content such as banner advertisements and top-seller lists….
An Analysis of Hierarchical Text Classification Using Word Embeddings Efficient distributed numerical word representation models (word embeddings) combined with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This study investigates the application of those models and algorithms on this specific problem by means of experimentation and analysis. We trained classification models with prominent machine learning algorithm implementations—fastText, XGBoost, SVM, and Keras’ CNN—and noticeable word embeddings generation methods—GloVe, word2vec, and fastText—with publicly available data and evaluated them with measures specifically appropriate for the hierarchical context. FastText achieved an ${}_{LCA}F_1$ of 0.893 on a single-labeled version of the RCV1 dataset. An analysis indicates that using word embeddings and its flavors is a very promising approach for HTC.
An Analysis of Machine Learning Intelligence Deep neural networks (DNNs) have set state of the art results in many machine learning and NLP tasks. However, we do not have a strong understanding of what DNN models learn. In this paper, we examine learning in DNNs through analysis of their outputs. We compare DNN performance directly to a human population, and use characteristics of individual data points such as difficulty to see how well models perform on easy and hard examples. We investigate how training size and the incorporation of noise affect a DNN’s ability to generalize and learn. Our experiments show that unlike traditional machine learning models (e.g., Naive Bayes, Decision Trees), DNNs exhibit human-like learning properties. As they are trained with more data, they are more able to distinguish between easy and difficult items, and performance on easy items improves at a higher rate than difficult items. We find that different DNN models exhibit different strengths in learning and are robust to noise in training data.
An Analysis of the t-SNE Algorithm for Data Visualization A first line of attack in exploratory data analysis is data visualization, i.e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable. Standard Johnson-Lindenstrauss dimensionality reduction does not produce data visualizations. The t-SNE heuristic of van der Maaten and Hinton, which is based on non-convex optimization, has become the de facto standard for visualization in a wide range of applications. This work gives a formal framework for the problem of data visualization – finding a 2-dimensional embedding of clusterable data that correctly separates individual clusters to make them visually identifiable. We then give a rigorous analysis of the performance of t-SNE under a natural, deterministic condition on the ‘ground-truth’ clusters (similar to conditions assumed in earlier analyses of clustering) in the underlying data. These are the first provable guarantees on t-SNE for constructing good data visualizations. We show that our deterministic condition is satisfied by considerably general probabilistic generative models for clusterable data such as mixtures of well-separated log-concave distributions. Finally, we give theoretical evidence that t-SNE provably succeeds in partially recovering cluster structure even when the above deterministic condition is not met.
An Analysis of Visual Question Answering Algorithms In visual question answering (VQA), an algorithm must answer text-based questions about images. While multiple datasets for VQA have been created since late 2014, they all have flaws in both their content and the way algorithms are evaluated on them. As a result, evaluation scores are inflated and predominantly determined by answering easier questions, making it difficult to compare different methods. In this paper, we analyze existing VQA algorithms using a new dataset. It contains over 1.6 million questions organized into 12 different categories. We also introduce questions that are meaningless for a given image to force a VQA system to reason about image content. We propose new evaluation schemes that compensate for over-represented question-types and make it easier to study the strengths and weaknesses of algorithms. We analyze the performance of both baseline and state-of-the-art VQA models, including multi-modal compact bilinear pooling (MCB), neural module networks, and recurrent answering units. Our experiments establish how attention helps certain categories more than others, determine which models work better than others, and explain how simple models (e.g. MLP) can surpass more complex models (MCB) by simply learning to answer large, easy question categories.
An Attentive Survey of Attention Models Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we propose a taxonomy which groups existing techniques into coherent categories. We review the different neural architectures in which attention has been incorporated, and also show how attention improves interpretability of neural models. Finally, we discuss some applications in which modeling attention has a significant impact. We hope this survey will provide a succinct introduction to attention models and guide practitioners while developing approaches for their applications.
An Economist´s Guide to Visualizing Data Once upon a time, a picture was worth a thousand words. But with online news, blogs, and social media, a good picture can now be worth so much more. Economists who want to disseminate their research, both inside and outside the seminar room, should invest some time in thinking about how to construct compelling and effective graphics. An effective graph should tap into the brain´s ‘pre-attentive visual processing’ (Few 2004; Healey and Enns 2012). Because our eyes detect a limited set of visual characteristics, such as shape or contrast, we easily combine those characteristics and unconsciously perceive them as an image. In contrast to ‘attentive processing’ – the conscious part of perception that allows us to perceive things serially – pre-attentive processing is done in parallel and is much faster. Pre-attentive processing allows the reader to perceive multiple basic visual elements simultaneously….
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks.
An Essay on Optimization Mystery of Deep Learning Despite the huge empirical success of deep learning, theoretical understanding of neural networks learning process is still lacking. This is the reason, why some of its features seem ‘mysterious’. We emphasize two mysteries of deep learning: generalization mystery, and optimization mystery. In this essay we review and draw connections between several selected works concerning the latter.
An Example Inference Task: Clustering Human brains are good at nding regularities in data. One way of expressing regularity is to put a set of objects into groups that are similar to each other. For example, biologists have found that most objects in the natural world fall into one of two categories: things that are brown and run away, and things that are green and don’t run away. The rst group they call animals, and the second, plants. We’ll call this operation of grouping things together clustering. If the biologist further sub-divides the cluster of plants into sub- clusters, we would call this `hierarchical clustering’; but we won’t be talking about hierarchical clustering yet. In this chapter we’ll just discuss ways to take a set of N objects and group them into K clusters.
An Experimental Study of Algorithms for Online Bipartite Matching We perform an experimental study of algorithms for online bipartite matching under the known i.i.d. input model with integral types. In the last decade, there has been substantial effort in designing complex algorithms with the goal of improving worst-case approximation ratios. Our goal is to determine how these algorithms perform on more practical instances rather than worst-case instances. In particular, we are interested in whether the ranking of the algorithms by their worst-case performance is consistent with the ranking of the algorithms by their average-case/practical performance. We are also interested in whether preprocessing times and implementation difficulties that are introduced by these algorithms are justified in practice. To that end we evaluate these algorithms on different random inputs as well as real-life instances obtained from publicly available repositories. We compare these algorithms against several simple greedy-style algorithms. Most of the complex algorithms in the literature are presented as being non-greedy (i.e., an algorithm can intentionally skip matching a node that has available neighbors) to simplify the analysis. Every such algorithm can be turned into a greedy one without hurting its worst-case performance. On our benchmarks, non-greedy versions of these algorithms perform much worse than their greedy versions. Greedy versions perform about as well as the simplest greedy algorithm by itself. This, together with our other findings, suggests that simplest greedy algorithms are competitive with the state-of-the-art worst-case algorithms for online bipartite matching on many average-case and practical input families. Greediness is by far the most important property of online algorithms for bipartite matching.
An exploration of algorithmic discrimination in data and classification Algorithmic discrimination is an important aspect when data is used for predictive purposes. This paper analyzes the relationships between discrimination and classification, data set partitioning, and decision models, as well as correlation. The paper uses real world data sets to demonstrate the existence of discrimination and the independence between the discrimination of data sets and the discrimination of classification models.
An Impossibility Theorem for Clustering Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and pro- foundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median.
An Information-Theoretic Analysis of Deep Latent-Variable Models We present an information-theoretic framework for understanding trade-offs in unsupervised learning of deep latent-variables models using variational inference. This framework emphasizes the need to consider latent-variable models along two dimensions: the ability to reconstruct inputs (distortion) and the communication cost (rate). We derive the optimal frontier of generative models in the two-dimensional rate-distortion plane, and show how the standard evidence lower bound objective is insufficient to select between points along this frontier. However, by performing targeted optimization to learn generative models with different rates, we are able to learn many models that can achieve similar generative performance but make vastly different trade-offs in terms of the usage of the latent variable. Through experiments on MNIST and Omniglot with a variety of architectures, we show how our framework sheds light on many recent proposed extensions to the variational autoencoder family.
An Information-Theoretic View for Deep Learning Deep learning has transformed the computer vision, natural language processing and speech recognition. However, the following two critical questions are remaining obscure: (1) why deep neural networks generalize better than shallow networks (2) Does it always hold that a deeper network leads to better performance Specifically, letting $L$ be the number of convolutional and pooling layers in a deep neural network, and $n$ be the size of the training sample, we derive the upper bound on the expected generalization error for this network, i.e., \begin{eqnarray*} \mathbb{E}[R(W)-R_S(W)] \leq \exp{\left(-\frac{L}{2}\log{\frac{1}{\eta}}\right)}\sqrt{\frac{2\sigma^2}{n}I(S,W) } \end{eqnarray*} where $\sigma >0$ is a constant depending on the loss function, $0<\eta<1$ is a constant depending on the information loss for each convolutional or pooling layer, and $I(S, W)$ is the mutual information between the training sample $S$ and the output hypothesis $W$. This upper bound discovers: (1) As the network increases its number of convolutional and pooling layers $L$, the expected generalization error will decrease exponentially to zero. Layers with strict information loss, such as the convolutional layers, reduce the generalization error of deep learning algorithms. This answers the first question. However, (2) algorithms with zero expected generalization error does not imply a small test error or $\mathbb{E}[R(W)]$. This is because $\mathbb{E}[R_S(W)]$ will be large when the information for fitting the data is lost as the number of layers increases. This suggests that the claim 'the deeper the better' is conditioned on a small training error or $\mathbb{E}[R_S(W)]$.
An Interdisciplinary Comparison of Sequence Modeling Methods for Next-Element Prediction Data of sequential nature arise in many application domains in forms of, e.g. textual data, DNA sequences, and software execution traces. Different research disciplines have developed methods to learn sequence models from such datasets: (i) in the machine learning field methods such as (hidden) Markov models and recurrent neural networks have been developed and successfully applied to a wide-range of tasks, (ii) in process mining process discovery techniques aim to generate human-interpretable descriptive models, and (iii) in the grammar inference field the focus is on finding descriptive models in the form of formal grammars. Despite their different focuses, these fields share a common goal – learning a model that accurately describes the behavior in the underlying data. Those sequence models are generative, i.e, they can predict what elements are likely to occur after a given unfinished sequence. So far, these fields have developed mainly in isolation from each other and no comparison exists. This paper presents an interdisciplinary experimental evaluation that compares sequence modeling techniques on the task of next-element prediction on four real-life sequence datasets. The results indicate that machine learning techniques that generally have no aim at interpretability in terms of accuracy outperform techniques from the process mining and grammar inference fields that aim to yield interpretable models.
An Interpretable Compression and Classification System: Theory and Applications This study proposes a low-complexity interpretable classification system. The proposed system contains three main modules including feature extraction, feature reduction, and classification. All of them are linear. Thanks to the linear property, the extracted and reduced features can be inversed to original data, like a linear transform such as Fourier transform, so that one can quantify and visualize the contribution of individual features towards the original data. Also, the reduced features and reversibility naturally endure the proposed system ability of data compression. This system can significantly compress data with a small percent deviation between the compressed and the original data. At the same time, when the compressed data is used for classification, it still achieves high testing accuracy. Furthermore, we observe that the extracted features of the proposed system can be approximated to uncorrelated Gaussian random variables. Hence, classical theory in estimation and detection can be applied for classification. This motivates us to propose using a MAP (maximum a posteriori) based classification method. As a result, the extracted features and the corresponding performance have statistical meaning and mathematically interpretable. Simulation results show that the proposed classification system not only enjoys significant reduced training and testing time but also high testing accuracy compared to the conventional schemes.
An Introduction to Advanced Analytics Advanced Analytics is ‘the analysis of all kinds of data using sophisticated quantitative methods (for example, statistics, descriptive and predictive data mining, simulation and optimization) to produce insights that traditional approaches to business intelligence (BI) – such as query and reporting – are unlikely to discover.’
An Introduction to Advanced Machine Learning : Meta Learning Algorithms, Applications and Promises In [1, 2], we have explored the theoretical aspects of feature extraction optimization processes for solving largescale problems and overcoming machine learning limitations. Majority of optimization algorithms that have been introduced in [1, 2] guarantee the optimal performance of supervised learning, given offline and discrete data, to deal with curse of dimensionality (CoD) problem. These algorithms, however, are not tailored for solving emerging learning problems. One of the important issues caused by online data is lack of sufficient samples per class. Further, traditional machine learning algorithms cannot achieve accurate training based on limited distributed data, as data has proliferated and dispersed significantly. Machine learning employs a strict model or embedded engine to train and predict which still fails to learn unseen classes and sufficiently use online data. In this chapter, we introduce these challenges elaborately. We further investigate Meta-Learning (MTL) algorithm, and their application and promises to solve the emerging problems by answering how autonomous agents can learn to learn?.
An Introduction to Artificial Intelligence Applied to Multimedia In this chapter, we give an introduction to symbolic artificial intelligence (AI) and discuss its relation and application to multimedia. We begin by defining what symbolic AI is, what distinguishes it from non-symbolic approaches, such as machine learning, and how it can used in the construction of advanced multimedia applications. We then introduce description logic (DL) and use it to discuss symbolic representation and reasoning. DL is the logical underpinning of OWL, the most successful family of ontology languages. After discussing DL, we present OWL and related Semantic Web technologies, such as RDF and SPARQL. We conclude the chapter by discussing a hybrid model for multimedia representation, called Hyperknowledge. Throughout the text, we make references to technologies and extensions specifically designed to solve the kinds of problems that arise in multimedia representation.
An Introduction to Bayesian Networks: Concepts and Learning from Data (Slide Deck)
An Introduction to Causal Inference This paper summarizes recent advances in causal inference and underscores the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underlie all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: those about (1) the effects of potential interventions, (2) probabilities of counterfactuals, and (3) direct and indirect effects (also known as ‘mediation’). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both. The tools are demonstrated in the analyses of mediation, causes of effects, and probabilities of causation.
An Introduction to Cluster Analysis for Data Mining Cluster analysis divides data into meaningful or useful groups (clusters). If meaningful clusters are the goal, then the resulting clusters should capture the ‘natural’ structure of the data. For example, cluster analysis has been used to group related documents for browsing, to find genes and proteins that have similar functionality, and to provide a grouping of spatial locations prone to earthquakes. However, in other cases, cluster analysis is only a useful starting point for other purposes, e.g., data compression or efficiently finding the nearest neighbors of points. Whether for understanding or utility, cluster analysis has long been used in a wide variety of fields: psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning, and data mining. The scope of this paper is modest: to provide an introduction to cluster analysis in the field of data mining, where we define data mining to be the discovery of useful, but non-obvious, information or patterns in large collections of data. Much of this paper is necessarily consumed with providing a general background for cluster analysis, but we also discuss a number of clustering techniques that have recently been developed specifically for data mining. While the paper strives to be self-contained from a conceptual point of view, many details have been omitted. Consequently, many references to relevant books and papers are provided.
An Introduction To Compressive Sampling This article surveys the theory of compressive sampling, also known as compressed sensing or CS, a novel sensing/sampling paradigm that goes against the common wisdom in data acquisition. CS theory asserts that one can recover certain signals and images from far fewer samples or measurements than traditional methods use. To make this possible, CS relies on two principles: sparsity, which pertains to the signals of interest, and incoherence, which pertains to the sensing modality.
An Introduction to Deep Reinforcement Learning Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision-making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.
An introduction to domain adaptation and transfer learning In machine learning, if the training data is an unbiased sample of an underlying distribution, then the learned classification function will make accurate predictions for new samples. However, if the training data is not an unbiased sample, then there will be differences between how the training data is distributed and how the test data is distributed. Standard classifiers cannot cope with changes in data distributions between training and test phases, and will not perform well. Domain adaptation and transfer learning are sub-fields within machine learning that are concerned with accounting for these types of changes. Here, I present an introduction to these fields, guided by the question: when and how can a classifier generalize from a source to a target domain? I will start with a brief introduction into risk minimization, and how transfer learning and domain adaptation expand upon this framework. Following that, I discuss three special cases of data set shift, namely prior, covariate and concept shift. For more complex domain shifts, there are a wide variety of approaches. These are categorized into: importance-weighting, subspace mapping, domain-invariant spaces, feature augmentation, minimax estimators and robust algorithms. A number of points will arise, which I will discuss in the last section. I conclude with the remark that many open questions will have to be addressed before transfer learners and domain-adaptive classifiers become practical.
An Introduction to Factor Graphs A large variety of algorithms in coding, signal processing, and artificial intelligence may be viewed as instances of the summary-product algorithm (or belief/probability propagation algorithm), which operates by message passing in a graphical model. Specific instances of such algorithms include Kalman filtering and smoothing; the forward-backward algorithm for hidden Markov models; probability propagation in Bayesian networks; and decoding algorithms for error-correcting codes such as the Viterbi algorithm, the BCJR algorithm, and the iterative decoding of turbo codes, low-density parity-check (LDPC) codes, and similar codes. New algorithms for complex detection and estimation problems can also be derived as instances of the summary-product algorithm. In this article, we give an introduction to this unified perspective in terms of (Forney-style) factor graphs.
An Introduction to Fuzzy and Annotated Semantic Web Languages We present the state of the art in representing and reasoning with fuzzy knowledge in Semantic Web Languages such as triple languages RDF/RDFS, conceptual languages of the OWL 2 family and rule languages. We further show how one may generalise them to so-called annotation domains, that cover also e.g. temporal and provenance extensions.
An introduction to Graph Data Management A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them.
An introduction to graphical models The following quotation, from the Preface provides a very concise introduction to graphical models: Graphical models are a marriage between probability theory and graph theory. They provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering { uncertainty and complexity { and in particular they are playing an increasingly important role in the design and analysis of machine learning algorithms. Fundamental to the idea of a graphical model is the notion of modularity { a complex system is built by combining simpler parts. Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data. The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly-interacting sets of variables as well as a data structure that lends itself naturally to the design of e cient general-purpose algorithms. Many of the classical multivariate probabalistic systems studied in elds such as statistics, systems engineering, information theory, pattern recognition and statistical mechanics are special cases of the general graphical model formalism { examples include mixture models, factor analysis, hidden Markov models, Kalman lters and Ising models. The graphical model framework provides a way to view all of these systems as instances of a common underlying formalism. This view has many advantages { in particular, specialized techniques that have been developed in one eld can be transferred between research communities and exploited more widely. Moreover, the graphical model formalism provides a natural framework for the design of new systems.
An introduction to graphical models (Slide Deck)
An introduction to high-dimensional statistics In this note, we aim to give a very brief introduction to high-dimensional statistics. Rather than attempting to give an overview of this vast area, we will explain what is meant by highdimensional data and then focus on two methods which have been introduced to deal with this sort of data. Many of the state of the art techniques used in high-dimensional statistics today are based on these two core methods. We begin with a quick recap of least squares regression.
An Introduction to Image Synthesis with Generative Adversarial Nets There has been a drastic growth of research in Generative Adversarial Nets (GANs) in the past few years. Proposed in 2014, GAN has been applied to various applications such as computer vision and natural language processing, and achieves impressive performance. Among the many applications of GAN, image synthesis is the most well-studied one, and research in this area has already demonstrated the great potential of using GAN in image synthesis. In this paper, we provide a taxonomy of methods used in image synthesis, review different models for text-to-image synthesis and image-to-image translation, and discuss some evaluation metrics as well as possible future research directions in image synthesis with GAN.
An Introduction to Inductive Statistical Inference — from Parameter Estimation to Decision-Making These lecture notes aim at a post-Bachelor audience with a backgound at an introductory level in Applied Mathematics and Applied Statistics. They discuss the logic and methodology of the Bayes-Laplace approach to inductive statistical inference that places common sense and the guiding lines of the scientific method at the heart of systematic analyses of quantitative-empirical data. Following an exposition of exactly solvable cases of single- and two-parameter estimation, the main focus is laid on Markov Chain Monte Carlo (MCMC) simulations on the basis of Gibbs sampling and Hamiltonian Monte Carlo sampling of posterior joint probability distributions for regression parameters occurring in generalised linear models. The modelling of fixed as well as of varying effects (varying intercepts) is considered, and the simulation of posterior predictive distributions is outlined. The issues of model comparison with Bayes factors and the assessment of models’ relative posterior predictive accuracy with information entropy-based criteria DIC and WAIC are addressed. Concluding, a conceptual link to the behavioural subjective expected utility representation of a single decision-maker’s choice behaviour in static one-shot decision problems is established. Codes for MCMC simulations of multi-dimensional posterior joint probability distributions with the JAGS and Stan packages implemented in the statistical software R are provided. The lecture notes are fully hyperlinked. They direct the reader to original scientific research papers and to pertinent biographical information.
An Introduction to Latent Semantic Analysis The question of knowledge induction, i.e. how children are able to learn so much about, say, what words mean without any explicit instruction, is one that has vexed philosophers, linguistics, and psychologists alike. Indeed, inferring the vast amount of knowledge that children learn almost effortlessly from an apparently ‘impoverished stimulus’ seems paradoxical. The Latent Semantic Analysis model (Landauer and Dumais, 1997) is a theory for how meaning representations might be learned from encountering large samples of language without explicit directions as to how it is structured. To do this, LSA makes two assumptions about how the meaning of linguistic expressions is present in the distributional patterns of simple expressions (e.g words) within more complex expressions (e.g. sentences and paragraphs) viewed across many samples of language….
An Introduction to Latent Variable Mixture Modeling (Part 1): Overview and Cross-Sectional Latent Class and Latent Profile Analyses Objective: Pediatric psychologists are often interested in finding patterns in heterogeneous cross-sectional data. Latent variable mixture modeling is an emerging person-centered statistical approach that models heterogeneity by classifying individuals into unobserved groupings (latent classes) with similar (more homogenous) patterns. The purpose of this article is to offer a nontechnical introduction to cross-sectional mixture modeling. Method: An overview of latent variable mixture modeling is provided and 2 cross-sectional examples are reviewed and distinguished. Results: Step-by-step pediatric psychology examples of latent class and latent profile analyses are provided using the Early Childhood Longitudinal Study-Kindergarten Class of 1998-1999 data file. Conclusions: Latent variable mixture modeling is a technique that is useful to pediatric psychologists who wish to find groupings of individuals who share similar data patterns to determine the extent to which these patterns may relate to variables of interest.
An Introduction to Latent Variable Mixture Modeling (Part 2): Longitudinal Latent Class Growth Analysis and Growth Mixture Models Objective: Pediatric psychologists are often interested in finding patterns in heterogeneous longitudinal data. Latent Variable Mixture Modeling is an emerging statistical approach that models such heterogeneity by classifying individuals into unobserved groupings (latent classes) with similar (more homogenous) patterns. The purpose of the second of a two article set is to offer a nontechnical introduction to longitudinal latent variable mixture modeling. Methods: 3 latent variable approaches to modeling longitudinal data are reviewed and distinguished. Results: Step-by-step pediatric psychology examples of latent growth curve modeling, latent class growth analysis, and growth mixture modeling are provided using the Early Childhood Longitudinal Study-Kindergarten Class of 1998-99 data file. Conclusions: Latent variable mixture modeling is a technique that is useful to pediatric psychologists who wish to find groupings of individuals who share similar longitudinal data patterns to determine the extent to which these patterns may relate to variables of interest.
An Introduction to Mathematical Optimal Control Theory Version 0.2 These notes build upon a course I taught at the University of Maryland during the fall of 1983. My great thanks go to Martino Bardi, who took careful notes, saved them all these years and recently mailed them to me. Faye Yeager typed up his notes into a first draft of these lectures as they now appear. Scott Armstrong read over the notes and suggested many improvements: thanks, Scott. Stephen Moye of the American Math Society helped me a lot with AMSTeX versus LaTeX issues. My thanks also to Atilla Yilmaz for spotting lots of typos and errors, which I have corrected. I have radically modified much of the notation (to be consistent with my other writings), updated the references, added several new examples, and provided a proof of the Pontryagin Maximum Principle. As this is a course for undergraduates, I have dispensed in certain proofs with various measurability and continuity issues, and as compensation have added various critiques as to the lack of total rigor. This current version of the notes is not yet complete, but meets I think the usual high standards for material posted on the internet. Please email me at evans@math.berkeley.edu with any corrections or comments.
An introduction to modern missing data analyses A great deal of recent methodological research has focused on two modern missing data analysis methods: maximum likelihood and multiple imputation. These approaches are advantageous to traditional techniques (e.g. deletion and mean imputation techniques) because they require less stringent assumptions and mitigate the pitfalls of traditional techniques. This article explains the theoretical underpinnings of missing data analyses, gives an overview of traditional missing data techniques, and provides accessible descriptions of maximum likelihood and multiple imputation. In particular, this article focuses on maximum likelihood estimation and presents two analysis examples from the Longitudinal Study of American Youth data. One of these examples includes a description of the use of auxiliary variables. Finally, the paper illustrates ways that researchers can use intentional, or planned, missing data to enhance their research designs.
An Introduction to Multivariate Statistics The term ‘multivariate statistics’ is appropriately used to include all statistics where there are more than two variables simultaneously analyzed. You are already familiar with bivariate statistics such as the Pearson product moment correlation coefficient and the independent groups t-test. A one-way ANOVA with 3 or more treatment groups might also be considered a bivariate design, since there are two variables: one independent variable and one dependent variable. Statistically, one could consider the one-way ANOVA as either a bivariate curvilinear regression or as a multiple regression with the K level categorical independent variable dummy coded into K-1 dichotomous variables.
An Introduction to Neural Networks An accurate forecast into the future can offer tremendous value in areas as diverse as financial market price movements, financial expense budget forecasts, website clickthrough likelihoods, insurance risk, and drug compound efficacy, to name just a few. Many algorithm techniques, ranging from regression analysis to ARIMA for time series, among others, are regularly used to generate forecasts. A neural network approach provides a forecasting technique that can operate in circumstances where classical techniques cannot perform or do not generate the desired accuracy in a forecast.
An Introduction to Ontology Learning Ever since the early days of Artificial Intelligence and the development of the first knowledge-based systems in the 70s people have dreamt of self-learning machines. When knowledge-based systems grew larger and the commercial interest in these technologies increased, people became aware of the knowledge acquisition bottleneck and the necessity to (partly) automatize the creation and maintenance of knowledge bases. Today, many applications which exhibit ´intelligent´ behavior thanks to symbolic knowledge representation and logical inference rely on ontologies and the standards provided by the World Wide Web Committee (W3C). Supporting the construction of ontologies and populating them with instantiations of both concepts and relations, commonly referred to as ontology learning. Early research in ontology learning has concentrated on the extraction of facts or schema-level knowledge from textual resources building upon earlier work in the field of computational linguistics and lexical acquisition. However, as we will show in this book, ontology learning is a very diverse and interdisciplinary field of research. Ontology learning approaches are as heterogeneous as the sources of data on the web, and as different from one another as the types of knowledge representations called ‘ontologies’. In the remainder of this introduction, we briefly summarize the state-of-the-art in ontology learning and elaborate on what we consider as the key challenges for current and future ontology learning research.
An Introduction to Probabilistic Programming This document is designed to be a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages. We start with a discussion of model-based reasoning and explain why conditioning as a foundational computation is central to the fields of probabilistic machine learning and artificial intelligence. We then introduce a simple first-order probabilistic programming language (PPL) whose programs define static-computation-graph, finite-variable-cardinality models. In the context of this restricted PPL we introduce fundamental inference algorithms and describe how they can be implemented in the context of models denoted by probabilistic programs. In the second part of this document, we introduce a higher-order probabilistic programming language, with a functionality analogous to that of established programming languages. This affords the opportunity to define models with dynamic computation graphs, at the cost of requiring inference methods that generate samples by repeatedly executing the program. Foundational inference algorithms for this kind of probabilistic programming language are explained in the context of an interface between program executions and an inference controller. This document closes with a chapter on advanced topics which we believe to be, at the time of writing, interesting directions for probabilistic programming research; directions that point towards a tight integration with deep neural network research and the development of systems for next-generation artificial intelligence applications.
An introduction to ROC analysis Receiver operating characteristics (ROC) graphs are useful for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been used increasingly in machine learning and data mining research. Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls when using them in practice. The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.
An Introduction to the Practical and Theoretical Aspects of Mixture-of-Experts Modeling Mixture-of-experts (MoE) models are a powerful paradigm for modeling of data arising from complex data generating processes (DGPs). In this article, we demonstrate how different MoE models can be constructed to approximate the underlying DGPs of arbitrary types of data. Due to the probabilistic nature of MoE models, we propose the maximum quasi-likelihood (MQL) estimator as a method for estimating MoE model parameters from data, and we provide conditions under which MQL estimators are consistent and asymptotically normal. The blockwise minorization-maximizatoin (blockwise-MM) algorithm framework is proposed as an all-purpose method for constructing algorithms for obtaining MQL estimators. An example derivation of a blockwise-MM algorithm is provided. We then present a method for constructing information criteria for estimating the number of components in MoE models and provide justification for the classic Bayesian information criterion (BIC). We explain how MoE models can be used to conduct classification, clustering, and regression and we illustrate these applications via a pair of worked examples.
An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists Topological Data Analysis (tda) is a recent and fast growing eld providing a set of new topological and geometric tools to infer relevant features for possibly complex data. This paper is a brief introduction, through a few selected topics, to basic fundamental and practical aspects of tda for non experts. 1 Introduction and motivation Topological Data Analysis (tda) is a recent eld that emerged from various works in applied (algebraic) topology and computational geometry during the rst decade of the century. Although one can trace back geometric approaches for data analysis quite far in the past, tda really started as a eld with the pioneering works of Edelsbrunner et al. (2002) and Zomorodian and Carlsson (2005) in persistent homology and was popularized in a landmark paper in 2009 Carlsson (2009). tda is mainly motivated by the idea that topology and geometry provide a powerful approach to infer robust qualitative, and sometimes quantitative, information about the structure of data-see, e.g. Chazal (2017). tda aims at providing well-founded mathematical, statistical and algorithmic methods to infer, analyze and exploit the complex topological and geometric structures underlying data that are often represented as point clouds in Euclidean or more general metric spaces. During the last few years, a considerable eort has been made to provide robust and ecient data structures and algorithms for tda that are now implemented and available and easy to use through standard libraries such as the Gudhi library (C++ and Python) Maria et al. (2014) and its R software interface Fasy et al. (2014a). Although it is still rapidly evolving, tda now provides a set of mature and ecient tools that can be used in combination or complementary to other data sciences tools. The tdapipeline. tda has recently known developments in various directions and application elds. There now exist a large variety of methods inspired by topological and geometric approaches. Providing a complete overview of all these existing approaches is beyond the scope of this introductory survey. However, most of them rely on the following basic and standard pipeline that will serve as the backbone of this paper: 1. The input is assumed to be a nite set of points coming with a notion of distance-or similarity between them. This distance can be induced by the metric in the ambient space (e.g. the Euclidean metric when the data are embedded in R d) or come as an intrinsic metric dened by a pairwise distance matrix. The denition of the metric on the data is usually given as an input or guided by the application. It is however important to notice that the choice of the metric may be critical to reveal interesting topological and geometric features of the data.
An Introduction to Variable and Feature Selection Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
An Introduction to Variational Autoencoders Variational autoencoders provide a principled framework for learning deep latent-variable models and corresponding inference models. In this work, we provide an introduction to variational autoencoders and some important extensions.
An Introduction to Visualizing Data The purpose of this document is to provide an introduction to the theory behind visualizing data. After studying the works of many talented people I decided to summarize the key points of information into this single paper. If you found this document interesting please take some time to look at the list of resources that I used (see Chapter 8) because I could never have created this without the excellent work done by others.
An Introductory Survey on Attention Mechanisms in NLP Problems First derived from human intuition, later adapted to machine translation for automatic token alignment, attention mechanism, a simple method that can be used for encoding sequence data based on the importance score each element is assigned, has been widely applied to and attained significant improvement in various tasks in natural language processing, including sentiment classification, text summarization, question answering, dependency parsing, etc. In this paper, we survey through recent works and conduct an introductory summary of the attention mechanism in different NLP problems, aiming to provide our readers with basic knowledge on this widely used method, discuss its different variants for different tasks, explore its association with other techniques in machine learning, and examine methods for evaluating its performance.
An Overview of Blockchain Integration with Robotics and Artificial Intelligence Blockchain technology is growing everyday at a fast-passed rhythm and it’s possible to integrate it with many systems, namely Robotics with AI services. However, this is still a recent field and there isn’t yet a clear understanding of what it could potentially become. In this paper, we conduct an overview of many different methods and platforms that try to leverage the power of blockchain into robotic systems, to improve AI services or to solve problems that are present in the major blockchains, which can lead to the ability of creating robotic systems with increased capabilities and security. We present an overview, discuss the methods and conclude the paper with our view on the future of the integration of these technologies.
An Overview of Computational Approaches for Analyzing Interpretation It is said that beauty is in the eye of the beholder. But how exactly can we characterize such discrepancies in interpretation? For example, are there any specific features of an image that makes person A regard an image as beautiful while person B finds the same image displeasing? Such questions ultimately aim at explaining our individual ways of interpretation, an intention that has been of fundamental importance to the social sciences from the beginning. More recently, advances in computer science brought up two related questions: First, can computational tools be adopted for analyzing ways of interpretation? Second, what if the ‘beholder’ is a computer model, i.e., how can we explain a computer model’s point of view? Numerous efforts have been made regarding both of these points, while many existing approaches focus on particular aspects and are still rather separate. With this paper, in order to connect these approaches we introduce a theoretical framework for analyzing interpretation, which is applicable to interpretation of both human beings and computer models. We give an overview of relevant computational approaches from various fields, and discuss the most common and promising application areas. The focus of this paper lies on interpretation of text and image data, while many of the presented approaches are applicable to other types of data as well.
An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos Videos represent the primary source of information for surveillance applications and are available in large amounts but in most cases contain little or no annotation for supervised learning. This article reviews the state-of-the-art deep learning based methods for video anomaly detection and categorizes them based on the type of model and criteria of detection. We also perform simple studies to understand the different approaches and provide the criteria of evaluation for spatio-temporal anomaly detection.
An Overview of Machine Teaching In this paper we try to organize machine teaching as a coherent set of ideas. Each idea is presented as varying along a dimension. The collection of dimensions then form the problem space of machine teaching, such that existing teaching problems can be characterized in this space. We hope this organization allows us to gain deeper understanding of individual teaching problems, discover connections among them, and identify gaps in the field.
An Overview of Multi-Processor Approximate Message Passing Approximate message passing (AMP) is an algorithmic framework for solving linear inverse problems from noisy measurements, with exciting applications such as reconstructing images, audio, hyper spectral images, and various other signals, including those acquired in compressive signal acquisiton systems. The growing prevalence of big data systems has increased interest in large-scale problems, which may involve huge measurement matrices that are unsuitable for conventional computing systems. To address the challenge of large-scale processing, multiprocessor (MP) versions of AMP have been developed. We provide an overview of two such MP-AMP variants. In row-MP-AMP, each computing node stores a subset of the rows of the matrix and processes corresponding measurements. In column- MP-AMP, each node stores a subset of columns, and is solely responsible for reconstructing a portion of the signal. We will discuss pros and cons of both approaches, summarize recent research results for each, and explain when each one may be a viable approach. Aspects that are highlighted include some recent results on state evolution for both MP-AMP algorithms, and the use of data compression to reduce communication in the MP network.
An Overview of Multi-Task Learning in Deep Neural Networks Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery. This article aims to give a general overview of MTL, particularly in deep neural networks. It introduces the two most common methods for MTL in Deep Learning, gives an overview of the literature, and discusses recent advances. In particular, it seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks.
An Overview of Open-Ended Evolution: Editorial Introduction to the Open-Ended Evolution II Special Issue Nature’s spectacular inventiveness, reflected in the enormous diversity of form and function displayed by the biosphere, is a feature of life that distinguishes living most strongly from nonliving. It is, therefore, not surprising that this aspect of life should become a central focus of artificial life. We have known since Darwin that the diversity is produced dynamically, through the process of evolution; this has led life’s creative productivity to be called Open-Ended Evolution (OEE) in the field. This article introduces the second of two special issues on current research in OEE and provides an overview of the contents of both special issues. Most of the work was presented at a workshop on open-ended evolution that was held as a part of the 2018 Conference on Artificial Life in Tokyo, and much of it had antecedents in two previous workshops on open-ended evolution at artificial life conferences in Cancun and York. We present a simplified categorization of OEE and summarize progress in the field as represented by the articles in this special issue.
An Overview of Spatial Econometrics This paper offers an expository overview of the field of spatial econometrics. It first justifies the necessity of special statistical procedures for the analysis of spatial data and then proceeds to describe the fundamentals of these procedures. In particular, this paper covers three crucial techniques for building models with spatial data. First, we discuss how to create a spatial weights matrix based on the distances between each data point in a dataset. Next, we describe the conventional methods to formally detect spatial autocorrelation, both global and local. Finally, we outline the chief components of a spatial autoregressive model, noting the circumstances under which it would be appropriate to incorporate each component into a model. This paper seeks to offer a concise introduction to spatial econometrics that will be accessible to interested individuals with a background in statistics or econometrics.
An Overview of Statistical Data Analysis The use of statistical software in academia and enterprises has been evolving over the last years. More often than not, students, professors, workers, and users, in general, have all had, at some point, exposure to statistical software. Sometimes, difficulties are felt when dealing with such type of software. Very few persons have theoretical knowledge to clearly understand software configurations or settings, and sometimes even the presented results. Very often, the users are required by academies or enterprises to present reports, without the time to explore or understand the results or tasks required to do an optimal preparation of data or software settings. In this work, we present a statistical overview of some theoretical concepts, to provide fast access to some concepts.
An Overview of Statistical Learning Theory Statistical learning theory was introduced in the late 1960´s. Until the 1990´s it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990´s new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems. A more detailed overview of the theory (without proofs) can be found in Vapnik (1995). In Vapnik (1998) one can find detailed description of the theory (including proofs).
An Overview on Data Representation Learning: From Traditional Feature Learning to Recent Deep Learning Since about 100 years ago, to learn the intrinsic structure of data, many representation learning approaches have been proposed, including both linear ones and nonlinear ones, supervised ones and unsupervised ones. Particularly, deep architectures are widely applied for representation learning in recent years, and have delivered top results in many tasks, such as image classification, object detection and speech recognition. In this paper, we review the development of data representation learning methods. Specifically, we investigate both traditional feature learning algorithms and state-of-the-art deep learning models. The history of data representation learning is introduced, while available resources (e.g. online course, tutorial and book information) and toolboxes are provided. Finally, we conclude this paper with remarks and some interesting research directions on data representation learning.
Analysing spatial point patterns in R This is a detailed set of notes for a workshop on Analysing spatial point patterns in R, presented by the author in Australia and New Zealand since 2006. The goal of the workshop is to equip researchers with a range of practical techniques for the statistical analysis of spatial point patterns. Some of the techniques are well established in the applications literature, while some are very recent developments. The workshop is based on spatstat, a contributed library for the statistical package R, which is free open source software. Topics covered include: statistical formulation and methodological issues; data input and handling; R concepts such as classes and methods; exploratory data analysis; nonparametric intensity and risk estimates; goodness-of-fit testing for Complete Spatial Randomness; maximum likelihood inference for Poisson processes; spatial logistic regression; model validation for Poisson processes; exploratory analysis of dependence; distance methods and summary functions such as Ripley´s K function; simulation techniques; non-Poisson point process models; fitting models using summary statistics; LISA and local analysis; inhomogeneous K-functions; Gibbs point process models; fitting Gibbs models; simulating Gibbs models; validating Gibbs models; multitype and marked point patterns; exploratory analysis of multitype and marked point patterns; multitype Poisson process models and maximum likelihood inference; multitype Gibbs process models and maximum pseudolikelihood; line segment patterns, 3-dimensional point patterns, multidimensional space-time point patterns, replicated point patterns, and stochastic geometry methods.
Analysis and Optimization of Convolutional Neural Network Architectures Convolutional Neural Networks (CNNs) dominate various computer vision tasks since Alex Krizhevsky showed that they can be trained effectively and reduced the top-5 error from 26.2 % to 15.3 % on the ImageNet large scale visual recognition challenge. Many aspects of CNNs are examined in various publications, but literature about the analysis and construction of neural network architectures is rare. This work is one step to close this gap. A comprehensive overview over existing techniques for CNN analysis and topology construction is provided. A novel way to visualize classification errors with confusion matrices was developed. Based on this method, hierarchical classifiers are described and evaluated. Additionally, some results are confirmed and quantified for CIFAR-100. For example, the positive impact of smaller batch sizes, averaging ensembles, data augmentation and test-time transformations on the accuracy. Other results, such as the positive impact of learned color transformation on the test accuracy could not be confirmed. A model which has only one million learned parameters for an input size of 32x32x3 and 100 classes and which beats the state of the art on the benchmark dataset Asirra, GTSRB, HASYv2 and STL-10 was developed.
Analysis Methods in Neural Language Processing: A Survey The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.
Analysis of Dropout in Online Learning Deep learning is the state-of-the-art in fields such as visual object recognition and speech recognition. This learning uses a large number of layers and a huge number of units and connections. Therefore, overfitting is a serious problem with it, and the dropout which is a kind of regularization tool is used. However, in online learning, the effect of dropout is not well known. This paper presents our investigation on the effect of dropout in online learning. We analyzed the effect of dropout on convergence speed near the singular point. Our results indicated that dropout is effective in online learning. Dropout tends to avoid the singular point for convergence speed near that point.
Analysis of Evolutionary Algorithms in Dynamic and Stochastic Environments Many real-world optimization problems occur in environments that change dynamically or involve stochastic components. Evolutionary algorithms and other bio-inspired algorithms have been widely applied to dynamic and stochastic problems. This survey gives an overview of major theoretical developments in the area of runtime analysis for these problems. We review recent theoretical studies of evolutionary algorithms and ant colony optimization for problems where the objective functions or the constraints change over time. Furthermore, we consider stochastic problems under various noise models and point out some directions for future research.
Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey Deep Learning is a state-of-the-art technique to make inference on extensive or complex data. As a black box model due to their multilayer nonlinear structure, Deep Neural Networks are often criticized to be non-transparent and their predictions not traceable by humans. Furthermore, the models learn from artificial datasets, often with bias or contaminated discriminating content. Through their increased distribution, decision-making algorithms can contribute promoting prejudge and unfairness which is not easy to notice due to lack of transparency. Hence, scientists developed several so-called explanators or explainers which try to point out the connection between input and output to represent in a simplified way the inner structure of machine learning black boxes. In this survey we differ the mechanisms and properties of explaining systems for Deep Neural Networks for Computer Vision tasks. We give a comprehensive overview about taxonomy of related studies and compare several survey papers that deal with explainability in general. We work out the drawbacks and gaps and summarize further research ideas.
Analysis of Unstructured Data: Applications of Text Analytics and Sentiment Mining The proliferation of textual data in business is overwhelming. Unstructured textual data is being constantly generated via call center logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, and so on. While the amount of textual data is increasing rapidly, businesses´ ability to summarize, understand, and make sense of such data for making better business decisions remain challenging. This paper takes a quick look at how to organize and analyze textual data for extracting insightful customer intelligence from a large collection of documents and for using such information to improve business operations and performance. Multiple business applications of case studies using real data that demonstrate applications of text analytics and sentiment mining using SAS Text Miner and SAS Sentiment Analysis Studio are presented. While SAS products are used as tools for demonstration only, the topics and theories covered are generic (not tool specific).
Analytical Skills, Tools and Attitudes 2013: Analytics capabilities needed now and in the future Organizations continue to invest more in analytics, but increasingly there is recognition that a shortage of analytic talent is holding back even greater investment. Lavastorm Analytics polled more than 425 people in the analytics community about whether their organization needs more analytic resources or skills and which skills are valued most and are most urgently needed. Survey respondents included business analysts, technologists, data analytics professionals, managers, and C-level executives across a broad variety of industries. The top findings were: – According to the survey respondents, a lack of skills/training/education is the biggest factor holding back organizations from using analytics more. – Skills most urgently needed in their organizations are Statistics, math or other quantitative skills; Analytic tool training; and Critical thinking. – Lack of funding or resources, however, also has a significant impact on adoption of analytics to drive day-to-day decisions. Lesser factors also include inadequate support from executives and data that is not integrated.
Analytics 3.0 In the new era, big data will power consumer products and services
Analytics for the Internet of Things: A Survey The Internet of Things (IoT) envisions a world-wide, interconnected network of smart physical entities. These physical entities generate a large amount of data in operation and as the IoT gains momentum in terms of deployment, the combined scale of those data seems destined to continue to grow. Increasingly, applications for the IoT involve analytics. Data analytics is the process of deriving knowledge from data, generating value like actionable insights from them. This article reviews work in the IoT and big data analytics from the perspective of their utility in creating efficient, effective and innovative applications and services for a wide spectrum of domains. We review the broad vision for the IoT as it is shaped in various communities, examine the application of data analytics across IoT domains, provide a categorisation of analytic approaches and propose a layered taxonomy from IoT data to analytics. This taxonomy provides us with insights on the appropriateness of analytical techniques, which in turn shapes a survey of enabling technology and infrastructure for IoT analytics. Finally, we look at some tradeoffs for analytics in the IoT that can shape future research.
Analytics: The real-world use of big data Big data’ – which admittedly means many things to many people – is no longer confined to the realm of technology. Today it is a business priority, given its ability to profoundly affect commerce in the globally integrated economy. In addition to providing solutions to long-standing business challenges, big data inspires new ways to transform processes, organizations, entire industries and even society itself. Yet extensive media coverage makes it hard to distinguish hype from reality – what is really happening Our newest research finds that organizations are using big data to target customer-centric outcomes, tap into internal data and build a better information ecosystem.
Analyzing biological and artificial neural networks: challenges with opportunities for synergy? Deep neural networks (DNNs) transform stimuli across multiple processing stages to produce representations that can be used to solve complex tasks, such as object recognition in images. However, a full understanding of how they achieve this remains elusive. The complexity of biological neural networks substantially exceeds the complexity of DNNs, making it even more challenging to understand the representations that they learn. Thus, both machine learning and computational neuroscience are faced with a shared challenge: how can we analyze their representations in order to understand how they solve complex tasks? We review how data-analysis concepts and techniques developed by computational neuroscientists can be useful for analyzing representations in DNNs, and in turn, how recently developed techniques for analysis of DNNs can be useful for understanding representations in biological neural networks. We explore opportunities for synergy between the two fields, such as the use of DNNs as in-silico model systems for neuroscience, and how this synergy can lead to new hypotheses about the operating principles of biological neural networks.
Analyzing the Analyzers Binita, Chao, Dmitri, and Rebecca are data scientists. What does that statement tell you about them Probably not as much as you´d like. You know they probably know something about statistics, programming, and data visualization. You´d hope that they had some experience finding insights from data, maybe even ‘big data.’ But if you´re trying to find the best person for a job, you need to be more specific than just ‘doctor,’ or ‘athlete,’ or ‘data scientist.’ And that´s a problem. Finding the right people for a task is all about efficient communication and, without the appropriate shared vocabulary, data science talent and data science problems are too often kept apart….
Anomaly Detection in Road Traffic Using Visual Surveillance: A Survey Computer vision has evolved in the last decade as a key technology for numerous applications replacing human supervision. In this paper, we present a survey on relevant visual surveillance related researches for anomaly detection in public places, focusing primarily on roads. Firstly, we revisit the surveys done in the last 10 years in this field. Since the underlying building block of a typical anomaly detection is learning, we emphasize more on learning methods applied on video scenes. We then summarize the important contributions made during last six years on anomaly detection primarily focusing on features, underlying techniques, applied scenarios and types of anomalies using single static camera. Finally, we discuss the challenges in the computer vision related anomaly detection techniques and some of the important future possibilities.
Anomaly Detection: A Survey Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
Anomaly Detection: Review and preliminary Entropy method tests Anomalies are strange data points; they usually represent an unusual occurrence. Anomaly detection is presented from the perspective of Wireless sensor networks. Different approaches have been taken in the past, as we will see, not only to identify outliers, but also to establish the statistical properties of the different methods. The usual goal is to show that the approach is asymptotically efficient and that the metric used is unbiased or maybe biased. This project is based on a work done by [1]. The approach is based on the principle that the entropy of the data is increased when an anomalous data point is measured. The entropy of the data set is thus to be estimated. In this report however, preliminary efforts at confirming the results of [1] is presented. To estimate the entropy of the dataset, since no parametric form is assumed, the probability density function of the data set is first estimated using data split method. This estimated pdf value is then plugged-in to the entropy estimation formula to estimate the entropy of the dataset. The data (test signal) used in this report is Gaussian distributed with zero mean and variance 4. Results of pdf estimation using the k-nearest neighbour method using the entire dataset, and a data-split method are presented and compared based on how well they approximate the probability density function of a Gaussian with similar mean and variance. The number of nearest neighbours chosen for the purpose of this report is 8. This is arbitrary, but is reasonable since the number of anomalies introduced is expected to be less than this upon data-split. The data-split method is preferred and rightly so.
Anticipatory Thinking: A Metacognitive Capability Anticipatory thinking is a complex cognitive process for assessing and managing risk in many contexts. Humans use anticipatory thinking to identify potential future issues and proactively take actions to manage their risks. In this paper we define a cognitive systems approach to anticipatory thinking as a metacognitive goal reasoning mechanism. The contributions of this paper include (1) defining anticipatory thinking in the MIDCA cognitive architecture, (2) operationalizing anticipatory thinking as a three step process for managing risk in plans, and (3) a numeric risk assessment calculating an expected cost-benefit ratio for modifying a plan with anticipatory actions.
Any-gram Kernels for Sentence Classification: A Sentiment Analysis Case Study Any-gram kernels are a flexible and efficient way to employ bag-of-n-gram features when learning from textual data. They are also compatible with the use of word embeddings so that word similarities can be accounted for. While the original any-gram kernels are implemented on top of tree kernels, we propose a new approach which is independent of tree kernels and is more efficient. We also propose a more effective way to make use of word embeddings than the original any-gram formulation. When applied to the task of sentiment classification, our new formulation achieves significantly better performance.
APACHE DRILL: Interactive Ad-Hoc Analysis at Scale Apache Drill is a distributed system for interactive ad-hoc analysis of large-scale datasets. Designed to handle up to petabytes of data spread across thousands of servers, the goal of Drill is to respond to ad-hoc queries in a lowlatency manner. In this article, we introduce Drill´s architecture, discuss its extensibility points, and put it into the context of the emerging offerings in the interactive analytics realm.
Applications of Artificial Intelligence to Network Security Attacks to networks are becoming more complex and sophisticated every day. Beyond the so-called script-kiddies and hacking newbies, there is a myriad of professional attackers seeking to make serious profits infiltrating in corporate networks. Either hostile governments, big corporations or mafias are constantly increasing their resources and skills in cybercrime in order to spy, steal or cause damage more effectively. traditional approaches to Network Security seem to start hitting their limits and it is being recognized the need for a smarter approach to threat detections. This paper provides an introduction on the need for evolution of Cyber Security techniques and how Artificial Intelligence could be of application to help solving some of the problems. It provides also, a high-level overview of some state of the art AI Network Security techniques, to finish analysing what is the foreseeable future of the application of AI to Network Security.
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey This paper presents a comprehensive literature review on applications of deep reinforcement learning in communications and networking. Modern networks, e.g., Internet of Things (IoT) and Unmanned Aerial Vehicle (UAV) networks, become more decentralized and autonomous. In such networks, network entities need to make decisions locally to maximize the network performance under uncertainty of network environment. Reinforcement learning has been efficiently used to enable the network entities to obtain the optimal policy including, e.g., decisions or actions, given their states when the state and action spaces are small. However, in complex and large-scale networks, the state and action spaces are usually large, and the reinforcement learning may not be able to find the optimal policy in reasonable time. Therefore, deep reinforcement learning, a combination of reinforcement learning with deep learning, has been developed to overcome the shortcomings. In this survey, we first give a tutorial of deep reinforcement learning from fundamental concepts to advanced models. Then, we review deep reinforcement learning approaches proposed to address emerging issues in communications and networking. The issues include dynamic network access, data rate control, wireless caching, data offloading, network security, and connectivity preservation which are all important to next generation networks such as 5G and beyond. Furthermore, we present applications of deep reinforcement learning for traffic routing, resource sharing, and data collection. Finally, we highlight important challenges, open issues, and future research directions of applying deep reinforcement learning.
Applied Data Science in Europe Google Trends and other IT fever charts rate Data Science among the most rapidly emerging and promising fields that expand around computer science. Although Data Science draws on content from established fields like artificial intelligence, statistics, databases, visualization and many more, industry is demanding for trained data scientists that no one seems able to deliver. This is due to the pace at which the field has expanded and the corresponding lack of curricula; the unique skill set, which is inherently multi-disciplinary; and the translation work (from the US web economy to other ecosystems) necessary to realize the recognized world-wide potential of applying analytics to all sorts of data. In this contribution we draw from our experiences in establishing an inter-disciplinary Data Science lab in order to highlight the challenges and potential remedies for Data Science in Europe. We discuss our role as academia in the light of the potential societal/economic impact as well as the challenges in organizational leadership tied to such inter-disciplinary work.
Architecting a High Performance Storage System Designing a large-scale, high-performance data storage system presents significant challenges. This paper describes a step-by-step approach to designing such a system and presents an iterative methodology that applies at both the component level and the system level. A detailed case study using the methodology described to design a Lustre storage system is presented.
Are Efficient Deep Representations Learnable Many theories of deep learning have shown that a deep network can require dramatically fewer resources to represent a given function compared to a shallow network. But a question remains: can these efficient representations be learned using current deep learning techniques In this work, we test whether standard deep learning methods can in fact find the efficient representations posited by several theories of deep representation. Specifically, we train deep neural networks to learn two simple functions with known efficient solutions: the parity function and the fast Fourier transform. We find that using gradient-based optimization, a deep network does not learn the parity function, unless initialized very close to a hand-coded exact solution. We also find that a deep linear neural network does not learn the fast Fourier transform, even in the best-case scenario of infinite training data, unless the weights are initialized very close to the exact hand-coded solution. Our results suggest that not every element of the class of compositional functions can be learned efficiently by a deep network, and further restrictions are necessary to understand what functions are both efficiently representable and learnable.
Are GANs Created Equal A Large-Scale Study Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes. To overcome some limitations of the current metrics, we also propose several data sets on which precision and recall can be computed. Our experimental results suggest that future GAN research should be based on more systematic and objective evaluation procedures. Finally, we did not find evidence that any of the tested algorithms consistently outperforms the original one.
Are profile likelihoods likelihoods? No, but sometimes they can be We contribute our two cents to the ongoing discussion on whether profile likelihoods are ‘true’ likelihood functions, by showing that the profile likelihood function can in fact be identical to a marginal likelihood in the special case of normal models. Thus, profile likelihoods can be ‘true’ likelihoods insofar as marginal likelihoods are ‘true’ likelihoods. The prior distribution that achieves this equivalence turns out to be the Jeffreys prior. We suspect, however, that normal models are the only class of models for which such an equivalence between maximization and marginalization is exact.
Are Saddles Good Enough for Deep Learning Recent years have seen a growing interest in understanding deep neural networks from an optimization perspective. It is understood now that converging to low-cost local minima is sufficient for such models to become effective in practice. However, in this work, we propose a new hypothesis based on recent theoretical findings and empirical studies that deep neural network models actually converge to saddle points with high degeneracy. Our findings from this work are new, and can have a significant impact on the development of gradient descent based methods for training deep networks. We validated our hypotheses using an extensive experimental evaluation on standard datasets such as MNIST and CIFAR-10, and also showed that recent efforts that attempt to escape saddles finally converge to saddles with high degeneracy, which we define as `good saddles’. We also verified the famous Wigner’s Semicircle Law in our experimental results.
Are screening methods useful in feature selection? An empirical study Filter or screening methods are often used as a preprocessing step for reducing the number of variables used by a learning algorithm in obtaining a classification or regression model. While there are many such filter methods, there is a need for an objective evaluation of these methods. Such an evaluation is needed to compare them with each other and also to answer whether they are at all useful, or a learning algorithm could do a better job without them. For this purpose, many popular screening methods are partnered in this paper with three regression learners and five classification learners and evaluated on ten real datasets to obtain accuracy criteria such as R-square and area under the ROC curve (AUC). The obtained results are compared through curve plots and comparison tables in order to find out whether screening methods help improve the performance of learning algorithms and how they fare with each other. Our findings revealed that the screening methods were only useful in one regression and three classification datasets out of the ten datasets evaluated.
Are You a Bayesian or a Frequentist (Slide Deck)
Artificial Intelligence (AI) Methods in Optical Networks: A Comprehensive Survey Artificial intelligence (AI) is an extensive scientific discipline which enables computer systems to solve problems by emulating complex biological processes such as learning, reasoning and self-correction. This paper presents a comprehensive review of the application of AI techniques for improving performance of optical communication systems and networks. The use of AI-based techniques is first studied in applications related to optical transmission, ranging from the characterization and operation of network components to performance monitoring, mitigation of nonlinearities, and quality of transmission estimation. Then, applications related to optical network control and management are also reviewed, including topics like optical network planning and operation in both transport and access networks. Finally, the paper also presents a summary of opportunities and challenges in optical networking where AI is expected to play a key role in the near future.
Artificial Intelligence and Data Science in the Automotive Industry Data science and machine learning are the key technologies when it comes to the processes and products with automatic learning and optimization to be used in the automotive industry of the future. This article defines the terms ‘data science’ (also referred to as ‘data analytics’) and ‘machine learning’ and how they are related. In addition, it defines the term ‘optimizing analytics’ and illustrates the role of automatic optimization as a key technology in combination with data analytics. It also uses examples to explain the way that these technologies are currently being used in the automotive industry on the basis of the major subprocesses in the automotive value chain (development, procurement; logistics, production, marketing, sales and after-sales, connected customer). Since the industry is just starting to explore the broad range of potential uses for these technologies, visionary application examples are used to illustrate the revolutionary possibilities that they offer. Finally, the article demonstrates how these technologies can make the automotive industry more efficient and enhance its customer focus throughout all its operations and activities, extending from the product and its development process to the customers and their connection to the product.
Artificial Intelligence and Economic Theories The advent of artificial intelligence has changed many disciplines such as engineering, social science and economics. Artificial intelligence is a computational technique which is inspired by natural intelligence such as the swarming of birds, the working of the brain and the pathfinding of the ants. These techniques have impact on economic theories. This book studies the impact of artificial intelligence on economic theories, a subject that has not been extensively studied. The theories that are considered are: demand and supply, asymmetrical information, pricing, rational choice, rational expectation, game theory, efficient market hypotheses, mechanism design, prospect, bounded rationality, portfolio theory, rational counterfactual and causality. The benefit of this book is that it evaluates existing theories of economics and update them based on the developments in artificial intelligence field.
Artificial Intelligence and its Role in Near Future AI technology has a long history which is actively and constantly changing and growing. It focuses on intelligent agents, which contain devices that perceive the environment and based on which takes actions in order to maximize goal success chances. In this paper, we will explain the modern AI basics and various representative applications of AI. In the context of the modern digitalized world, AI is the property of machines, computer programs, and systems to perform the intellectual and creative functions of a person, independently find ways to solve problems, be able to draw conclusions and make decisions. Most artificial intelligence systems have the ability to learn, which allows people to improve their performance over time. The recent research on AI tools, including machine learning, deep learning and predictive analysis intended toward increasing the planning, learning, reasoning, thinking and action taking ability. Based on which, the proposed research intends towards exploring on how the human intelligence differs from the artificial intelligence. Moreover, we critically analyze what AI of today is capable of doing, why it still cannot reach human intelligence and what are the open challenges existing in front of AI to reach and outperform human level of intelligence. Furthermore, it will explore the future predictions for artificial intelligence and based on which potential solution will be recommended to solve it within next decades.
Artificial Intelligence and Robotics The recent successes of AI have captured the wildest imagination of both the scientific communities and the general public. Robotics and AI amplify human potentials, increase productivity and are moving from simple reasoning towards human-like cognitive abilities. Current AI technologies are used in a set area of applications, ranging from healthcare, manufacturing, transport, energy, to financial services, banking, advertising, management consulting and government agencies. The global AI market is around 260 billion USD in 2016 and it is estimated to exceed 3 trillion by 2024. To understand the impact of AI, it is important to draw lessons from it’s past successes and failures and this white paper provides a comprehensive explanation of the evolution of AI, its current status and future directions.
Artificial Intelligence Approaches Artificial Intelligence (AI) has received tremendous attention from academia, industry, and the general public in recent years. The integration of geography and AI, or GeoAI, provides novel approaches for addressing a variety of problems in the natural environment and our human society. This entry briefly reviews the recent development of AI with a focus on machine learning and deep learning approaches. We discuss the integration of AI with geography and particularly geographic information science, and present a number of GeoAI applications and possible future directions.
Artificial Intelligence Enabled Software Defined Networking: A Comprehensive Overview In recent years, the increased demand for dynamic management of network resources in modern computer networks in general and in today’s data centers in particular has resulted in a new promising architecture, in which a more flexible controlling functionalities can be achieved with high level of abstraction. In software defined networking (SDN) architecture, a central management of the forwarding elements (i.e. switches and routers) is accomplished by a central unit, which can be programmed directly to perform fundamental networking tasks or implementing any other additional services. Combining both central management and network programmability, opens the door to employ more advanced techniques such as artificial intelligence (AI) in order to deal with high-demand and rapidly-changing networks. In this study, we provide a detailed overview of current efforts and recent advancements to include AI in SDN-based networks.
Artificial Intelligence for Long-Term Robot Autonomy: A Survey Autonomous systems will play an essential role in many applications across diverse domains including space, marine, air, field, road, and service robotics. They will assist us in our daily routines and perform dangerous, dirty and dull tasks. However, enabling robotic systems to perform autonomously in complex, real-world scenarios over extended time periods (i.e. weeks, months, or years) poses many challenges. Some of these have been investigated by sub-disciplines of Artificial Intelligence (AI) including navigation and mapping, perception, knowledge representation and reasoning, planning, interaction, and learning. The different sub-disciplines have developed techniques that, when re-integrated within an autonomous system, can enable robots to operate effectively in complex, long-term scenarios. In this paper, we survey and discuss AI techniques as ‘enablers’ for long-term robot autonomy, current progress in integrating these techniques within long-running robotic systems, and the future challenges and opportunities for AI in long-term autonomy.
Artificial Intelligence Now The phrase ‘artificial intelligence’ has a way of retreating into the future: as things that were once in the realm of imagination and fiction become reality, they lose their wonder and become ‘machine translation,’ ‘real-time traffic updates,’ ‘self-driving cars,’ and more. But the past 12 months have seen a true explosion in the capacities as well as adoption of AI technologies. While the flavor of these developments has not pointed to the ‘general AI’ of science fiction, it has come much closer to offering generalized AI tools—these tools are being deployed to solve specific problems. But now they solve them more powerfully than the complex, rule-based tools that preceded them. More importantly, they are flexible enough to be deployed in many contexts. This means that more applications and industries are ripe for transformation with AI technologies. This book, drawing from the best posts on the O´Reilly AI blog, brings you a summary of the current state of AI technologies and applications, as well as a selection of useful guides to getting started with deep learning and AI technologies. Part I covers the overall landscape of AI, focusing on the platforms, businesses, and business models are shaping the growth of AI. We then turn to the technologies underlying AI, particularly deep learning, in Part II. Part III brings us some ‘hobbyist’ applications: intelligent robots. Even if you don´t build them, they are an incredible illustration of the low cost of entry into computer vision and autonomous operation. Part IV also focuses on one application: natural language. Part V takes us into commercial use cases: bots and autonomous vehicles. And finally, Part VI discusses a few of the interplays ix between human and machine intelligence, leaving you with some big issues to ponder in the coming year.
Artificial Intelligence: A Child’s Play We discuss the objectives of any endeavor in creating artificial intelligence, AI, and provide a possible alternative. Intelligence might be an unintended consequence of curiosity left to roam free, best exemplified by a frolicking infant. This suggests that our attempts at AI could have been misguided; what we actually need to strive for can be termed artificial curiosity, AC, and intelligence happens as a consequence of those efforts. For this unintentional yet welcome aftereffect to set in a foundational list of guiding principles needs to be present. We discuss what these essential doctrines might be and why their establishment is required to form connections, possibly growing, between a knowledge store that has been built up and new pieces of information that curiosity will bring back. As more findings are acquired and more bonds are fermented, we need a way to, periodically, reduce the amount of data; in the sense, it is important to capture the critical characteristics of what has been accumulated or produce a summary of what has been gathered. We start with the intuition for this line of reasoning and formalize it with a series of models (and iterative improvements) that will be necessary to make the incubation of intelligence a reality. Our discussion provides conceptual modifications to the Turing Test and to Searle’s Chinese room argument. We discuss the future implications for society as AI becomes an integral part of life.
Artificial Intelligence-Based Techniques for Emerging Robotics Communication: A Survey and Future Perspectives This paper reviews the current development of artificial intelligence (AI) techniques for the application area of robot communication. The study of the control and operation of multiple robots collaboratively toward a common goal is fast growing. Communication among members of a robot team and even including humans is becoming essential in many real-world applications. The survey focuses on the AI techniques for robot communication to enhance the communication capability of the multi-robot team, making more complex activities, taking an appreciated decision, taking coordinated action, and performing their tasks efficiently.
Artificial Neural Networks These are lecture notes for my course on Artificial Neural Networks that I have given at Chalmers (FFR135) and Gothenburg University (FIM720). This course describes the use of neural networks in machine learning: deep learning, recurrent networks, and other supervised and unsupervised machine-learning algorithms.
Assessing four Neural Networks on Handwritten Digit Recognition Dataset (MNIST) Although the image recognition has been a research topic for many years, many researchers still have a keen interest in it. In some papers, however, there is a tendency to compare models only on one or two datasets, either because of time restraints or because the model is tailored to a specific task. Accordingly, it is hard to understand how well a certain model generalizes across image recognition field. In this paper, we compare four neural networks on MNIST dataset with different division. Among of them, three are Convolutional Neural Networks (CNN), Deep Residual Network (ResNet) and Dense Convolutional Network (DenseNet) respectively, and the other is our improvement on CNN baseline through introducing Capsule Network (CapsNet) to image recognition area. We show that the previous models despite do a quite good job in this area, our retrofitting can be applied to get a better performance. The result obtained by CapsNet is an accuracy rate of 99.75\%, and it is the best result published so far. Another inspiring result is that CapsNet only needs a small amount of data to get the excellent performance. Finally, we will apply CapsNet’s ability to generalize in other image recognition field in the future.
Assessing Your Business Analytics Initiatives – Eight Metrics That Matter It´s no secret that using analytics to uncover meaningful insights from data is crucial for making fact-based decisions. Now considered mainstream, the business analytics market worldwide is expected to exceed $50 billion by the year 2016.1 Yet when it comes to making analytics work, not all organizations are equal. In fact, despite the transformative power of big data and analytics, many organizations still struggle to wring value from their information. The complexities of dealing with big data, integrating technologies, finding analytical talent and challenging corporate culture are the main pitfalls to the successful use of analytics within organizations. The management of information – including the analytics used to transform it – is an evolutionary process, and organizations are at various levels of this evolution. Those wanting to advance analytics to a new level need to understand their analytics activities across the organization, from both an IT and business perspective. Toward that end, an assessment focusing on eight key analytics metrics can be used to identify strengths and areas for improvement in the analytics life cycle.
At what sample size do correlations stabilize Sample correlations converge to the population value with increasing sample size, but the estimates are often inaccurate in small samples. In this report we use Monte-Carlo simulations to determine the critical sample size from which on the magnitude of a correlation can be expected to be stable. The necessary sample size to achieve stable estimates for correlations depends on the effect size, the width of the corridor of stability (i.e., a corridor around the true value where deviations are tolerated), and the requested confidence that the trajectory does not leave this corridor any more. Results indicate that in typical scenarios the sample size should approach 250 for stable estimates.
Attacking Automatic Video Analysis Algorithms: A Case Study of Google Cloud Video Intelligence API Due to the growth of video data on Internet, automatic video analysis has gained a lot of attention from academia as well as companies such as Facebook, Twitter and Google. In this paper, we examine the robustness of video analysis algorithms in adversarial settings. Specifically, we propose targeted attacks on two fundamental classes of video analysis algorithms, namely video classification and shot detection. We show that an adversary can subtly manipulate a video in such a way that a human observer would perceive the content of the original video, but the video analysis algorithm will return the adversary’s desired outputs. We then apply the attacks on the recently released Google Cloud Video Intelligence API. The API takes a video file and returns the video labels (objects within the video), shot changes (scene changes within the video) and shot labels (description of video events over time). Through experiments, we show that the API generates video and shot labels by processing only the first frame of every second of the video. Hence, an adversary can deceive the API to output only her desired video and shot labels by periodically inserting an image into the video at the rate of one frame per second. We also show that the pattern of shot changes returned by the API can be mostly recovered by an algorithm that compares the histograms of consecutive frames. Based on our equivalent model, we develop a method for slightly modifying the video frames, in order to deceive the API into generating our desired pattern of shot changes. We perform extensive experiments with different videos and show that our attacks are consistently successful across videos with different characteristics. At the end, we propose introducing randomness to video analysis algorithms as a countermeasure to our attacks.
Attend Before you Act: Leveraging human visual attention for continual learning When humans perform a task, such as playing a game, they selectively pay attention to certain parts of the visual input, gathering relevant information and sequentially combining it to build a representation from the sensory data. In this work, we explore leveraging where humans look in an image as an implicit indication of what is salient for decision making. We build on top of the UNREAL architecture in DeepMind Lab’s 3D navigation maze environment. We train the agent both with original images and foveated images, which were generated by overlaying the original images with saliency maps generated using a real-time spectral residual technique. We investigate the effectiveness of this approach in transfer learning by measuring performance in the context of noise in the environment.
Attention Models in Graphs: A Survey Graph-structured data arise naturally in many different application domains. By representing data as graphs, we can capture entities (i.e., nodes) as well as their relationships (i.e., edges) with each other. Many useful insights can be derived from graph-structured data as demonstrated by an ever-growing body of work focused on graph mining. However, in the real-world, graphs can be both large – with many complex patterns – and noisy which can pose a problem for effective graph mining. An effective way to deal with this issue is to incorporate ‘attention’ into graph mining solutions. An attention mechanism allows a method to focus on task-relevant parts of the graph, helping it to make better decisions. In this work, we conduct a comprehensive and focused survey of the literature on the emerging field of graph attention models. We introduce three intuitive taxonomies to group existing work. These are based on problem setting (type of input and output), the type of attention mechanism used, and the task (e.g., graph classification, link prediction, etc.). We motivate our taxonomies through detailed examples and use each to survey competing approaches from a unique standpoint. Finally, we highlight several challenges in the area and discuss promising directions for future work.
Attribute-aware Collaborative Filtering: Survey and Classification Attribute-aware CF models aims at rating prediction given not only the historical rating from users to items, but also the information associated with users (e.g. age), items (e.g. price), or even ratings (e.g. rating time). This paper surveys works in the past decade developing attribute-aware CF systems, and discovered that mathematically they can be classified into four different categories. We provide the readers not only the high level mathematical interpretation of the existing works in this area but also the mathematical insight for each category of models. Finally we provide in-depth experiment results comparing the effectiveness of the major works in each category.
Augmented Data Science: Towards Industrialization and Democratization of Data Science Conversion of raw data into insights and knowledge requires substantial amounts of effort from data scientists. Despite breathtaking advances in Machine Learning (ML) and Artificial Intelligence (AI), data scientists still spend the majority of their effort in understanding and then preparing the raw data for ML/AI. The effort is often manual and ad hoc, and requires some level of domain knowledge. The complexity of the effort increases dramatically when data diversity, both in form and context, increases. In this paper, we introduce our solution, Augmented Data Science (ADS), towards addressing this ‘human bottleneck’ in creating value from diverse datasets. ADS is a data-driven approach and relies on statistics and ML to extract insights from any data set in a domain-agnostic way to facilitate the data science process. Key features of ADS are the replacement of rudimentary data exploration and processing steps with automation and the augmentation of data scientist judgment with automatically-generated insights. We present building blocks of our end-to-end solution and provide a case study to exemplify its capabilities.
Augmented Reality, Cyber-Physical Systems, and Feedback Control for Additive Manufacturing: A Review Our objective in this paper is to review the application of feedback ideas in the area of additive manufacturing. Both the application of feedback control to the 3D printing process, and the application of feedback theory to enable users to interact better with machines, are reviewed. Where appropriate, opportunities for future work are highlighted.
Automated Algorithm Selection: Survey and Perspectives It has long been observed that for practically any computational problem that has been intensely studied, different instances are best solved using different algorithms. This is particularly pronounced for computationally hard problems, where in most cases, no single algorithm defines the state of the art; instead, there is a set of algorithms with complementary strengths. This performance complementarity can be exploited in various ways, one of which is based on the idea of selecting, from a set of given algorithms, for each problem instance to be solved the one expected to perform best. The task of automatically selecting an algorithm from a given set is known as the per-instance algorithm selection problem and has been intensely studied over the past 15 years, leading to major improvements in the state of the art in solving a growing number of discrete combinatorial problems, including propositional satisfiability and AI planning. Per-instance algorithm selection also shows much promise for boosting performance in solving continuous and mixed discrete/continuous optimisation problems. This survey provides an overview of research in automated algorithm selection, ranging from early and seminal works to recent and promising application areas. Different from earlier work, it covers applications to discrete and continuous problems, and discusses algorithm selection in context with conceptually related approaches, such as algorithm configuration, scheduling or portfolio selection. Since informative and cheaply computable problem instance features provide the basis for effective per-instance algorithm selection systems, we also provide an overview of such features for discrete and continuous problems. Finally, we provide perspectives on future work in the area and discuss a number of open research challenges.
Automated Machine Learning – Bayesian Optimization, Meta-Learning and Applications Automating machine learning by providing techniques that autonomously find the best algorithm, hyperparameter configuration and preprocessing is helpful for both researchers and practitioners. Therefore, it is not surprising that automated machine learning has become a very interesting field of research. Bayesian optimization has proven to be a very successful tool for automated machine learning. In the first part of the thesis we present different approaches to improve Bayesian optimization by means of transfer learning. We present three different ways of considering meta-knowledge in Bayesian optimization, i.e. search space pruning, initialization and transfer surrogate models. Finally, we present a general framework for Bayesian optimization combined with meta-learning and conduct a comparison among existing work on two different meta-data sets. A conclusion is that in particular the meta-target driven approaches provide better results. Choosing algorithm configurations based on the improvement on the meta-knowledge combined with the expected improvement yields best results. The second part of this thesis is more application-oriented. Bayesian optimization is applied to large data sets and used as a tool to participate in machine learning challenges. We compare its autonomous performance and its performance in combination with a human expert. At two ECML-PKDD Discovery Challenges, we are able to show that automated machine learning outperforms human machine learning experts. Finally, we present an approach that automates the process of creating an ensemble of several layers, different algorithms and hyperparameter configurations. These kinds of ensembles are jokingly called Frankenstein ensembles and proved their benefit on versatile data sets in many machine learning challenges. We compare our approach Automatic Frankensteining with the current state of the art for automated machine learning on 80 different data sets and can show that it outperforms them on the majority using the same training time. Furthermore, we compare Automatic Frankensteining on a large-scale data set to more than 3,500 machine learning expert teams and are able to outperform more than 3,000 of them within 12 CPU hours.
Automated Machine Learning in Practice: State of the Art and Recent Results A main driver behind the digitization of industry and society is the belief that data-driven model building and decision making can contribute to higher degrees of automation and more informed decisions. Building such models from data often involves the application of some form of machine learning. Thus, there is an ever growing demand in work force with the necessary skill set to do so. This demand has given rise to a new research topic concerned with fitting machine learning models fully automatically – AutoML. This paper gives an overview of the state of the art in AutoML with a focus on practical applicability in a business context, and provides recent benchmark results on the most important AutoML algorithms.
Automated Machine Learning: State-of-The-Art and Open Challenges With the continuous and vast increase in the amount of data in our digital world, it has been acknowledged that the number of knowledgeable data scientists can not scale to address these challenges. Thus, there was a crucial need for automating the process of building good machine learning models. In the last few years, several techniques and frameworks have been introduced to tackle the challenge of automating the process of Combined Algorithm Selection and Hyper-parameter tuning (CASH) in the machine learning domain. The main aim of these techniques is to reduce the role of the human in the loop and fill the gap for non-expert machine learning users by playing the role of the domain expert. In this paper, we present a comprehensive survey for the state-of-the-art efforts in tackling the CASH problem. In addition, we highlight the research work of automating the other steps of the full complex machine learning pipeline (AutoML) from data understanding till model deployment. Furthermore, we provide comprehensive coverage for the various tools and frameworks that have been introduced in this domain. Finally, we discuss some of the research directions and open challenges that need to be addressed in order to achieve the vision and goals of the AutoML process.
Automated Problem Identification: Regression vs Classification via Evolutionary Deep Networks Regression or classification This is perhaps the most basic question faced when tackling a new supervised learning problem. We present an Evolutionary Deep Learning (EDL) algorithm that automatically solves this by identifying the question type with high accuracy, along with a proposed deep architecture. Typically, a significant amount of human insight and preparation is required prior to executing machine learning algorithms. For example, when creating deep neural networks, the number of parameters must be selected in advance and furthermore, a lot of these choices are made based upon pre-existing knowledge of the data such as the use of a categorical cross entropy loss function. Humans are able to study a dataset and decide whether it represents a classification or a regression problem, and consequently make decisions which will be applied to the execution of the neural network. We propose the Automated Problem Identification (API) algorithm, which uses an evolutionary algorithm interface to TensorFlow to manipulate a deep neural network to decide if a dataset represents a classification or a regression problem. We test API on 16 different classification, regression and sentiment analysis datasets with up to 10,000 features and up to 17,000 unique target values. API achieves an average accuracy of $96.3\%$ in identifying the problem type without hardcoding any insights about the general characteristics of regression or classification problems. For example, API successfully identifies classification problems even with 1000 target values. Furthermore, the algorithm recommends which loss function to use and also recommends a neural network architecture. Our work is therefore a step towards fully automated machine learning.
Automatic Conversion of Tables to LongForm Dataframes TableToLongForm automatically converts hierarchical Tables intended for a human reader into a simple LongForm Dataframe that is machine readable, hence enabling much greater utilisation of the data. It does this by recognising positional cues present in the hierarchical Table (which would normally be interpreted visually by the human brain) to decompose, then reconstruct the data into a LongForm Dataframe. The article motivates the benefit of such a conversion with an example Table, followed by a short user manual, which includes a comparison between the simple one argument call to TableToLongForm, with code for an equivalent manual conversion. The article then explores the types of Tables the package can convert by providing a gallery of all recognised patterns. It finishes with a discussion of available diagnostic methods and future work.
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.
Automatic Extraction of Causal Relations from Natural Language Texts: A Comprehensive Survey Automatic extraction of cause-effect relationships from natural language texts is a challenging open problem in Artificial Intelligence. Most of the early attempts at its solution used manually constructed linguistic and syntactic rules on small and domain-specific data sets. However, with the advent of big data, the availability of affordable computing power and the recent popularization of machine learning, the paradigm to tackle this problem has slowly shifted. Machines are now expected to learn generic causal extraction rules from labelled data with minimal supervision, in a domain independent-manner. In this paper, we provide a comprehensive survey of causal relation extraction techniques from both paradigms, and analyse their relative strengths and weaknesses, with recommendations for future work.
Automatic Extraction of Personality from Text: Challenges and Opportunities In this study, we examined the possibility to extract personality traits from a text. We created an extensive dataset by having experts annotate personality traits in a large number of texts from multiple online sources. From these annotated texts, we selected a sample and made further annotations ending up in a large low-reliability dataset and a small high-reliability dataset. We then used the two datasets to train and test several machine learning models to extract personality from text, including a language model. Finally, we evaluated our best models in the wild, on datasets from different domains. Our results show that the models based on the small high-reliability dataset performed better (in terms of $\textrm{R}^2$) than models based on large low-reliability dataset. Also, language model based on small high-reliability dataset performed better than the random baseline. Finally, and more importantly, the results showed our best model did not perform better than the random baseline when tested in the wild. Taken together, our results show that determining personality traits from a text remains a challenge and that no firm conclusions can be made on model performance before testing in the wild.
Automatic Keyphrase Extraction: A Survey of the State of the Art While automatic keyphrase extraction has been examined extensively, state-of-theart performance on this task is still much lower than that on many core natural language processing tasks. We present a survey of the state of the art in automatic keyphrase extraction, examining the major sources of errors made by existing systems and discussing the challenges ahead.
Automatic Keyword Extraction for Text Summarization: A Survey In recent times, data is growing rapidly in every domain such as news, social media, banking, education, etc. Due to the excessiveness of data, there is a need of automatic summarizer which will be capable to summarize the data especially textual data in original document without losing any critical purposes. Text summarization is emerged as an important research area in recent past. In this regard, review of existing work on text summarization process is useful for carrying out further research. In this paper, recent literature on automatic keyword extraction and text summarization are presented since text summarization process is highly depend on keyword extraction. This literature includes the discussion about different methodology used for keyword extraction and text summarization. It also discusses about different databases used for text summarization in several domains along with evaluation matrices. Finally, it discusses briefly about issues and research challenges faced by researchers along with future direction.
Automatic Language Identification in Texts: A Survey Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used so far in the LI literature. For describing the features and methods we introduce a unified notation. We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI.
Automatic Rumor Detection on Microblogs: A Survey The ever-increasing amount of multimedia content on modern social media platforms are valuable in many applications. While the openness and convenience features of social media also foster many rumors online. Without verification, these rumors would reach thousands of users immediately and cause serious damages. Many efforts have been taken to defeat online rumors automatically by mining the rich content provided on the open network with machine learning techniques. Most rumor detection methods can be categorized in three paradigms: the hand-crafted features based classification approaches, the propagation-based approaches and the neural networks approaches. In this survey, we introduce a formal definition of rumor in comparison with other definitions used in literatures. We summary the studies of automatic rumor detection so far and present details in three paradigms of rumor detection. We also give an introduction on existing datasets for rumor detection which would benefit following researches in this area. We give our suggestions for future rumors detection on microblogs as a conclusion.
Automatic Sarcasm Detection: A Survey Automatic detection of sarcasm has witnessed interest from the sentiment analysis research community. With diverse approaches, datasets and analyses that have been reported, there is an essential need to have a collective understanding of the research in this area. In this survey of automatic sarcasm detection, we describe datasets, approaches (both supervised and rule-based), and trends in sarcasm detection research. We also present a research matrix that summarizes past work, and list pointers to future work.
Automatic Tag Recommendation Algorithms for Social Recommender Systems The emergence of Web 2.0 and the consequent success of social network websites such as del.icio.us and Flickr introduce us to a new concept called social bookmarking, or tagging in short. Tagging can be seen as the action of connecting a relevant user-defined keyword to a document, image or video, which helps user to better organize and share their collections of interesting stuff. With the rapid growth of Web 2.0, tagged data is becoming more and more abundant on the social network websites. An interesting problem is how to automate the process of making tag recommendations to users when a new resource becomes available. In this paper, we address the issue of tag recommendation from a machine learning perspective of view. From our empirical observation of two large-scale data sets, we first argue that the user-centered approach for tag recommendation is not very effective in practice. Consequently, we propose two novel document-centered approaches that are capable of making effective and efficient tag recommendations in real scenarios. The first graph-based method represents the tagged data into two bipartite graphs of (document, tag) and (document, word), then finds document topics by leveraging graph partitioning algorithms. The second prototype-based method aims at finding the most representative documents within the data collections and advocates a sparse multi-class Gaussian process classifier for efficient document classification. For both methods, tags are ranked within each topic cluster/class by a novel ranking method. Recommendations are performed by first classifying a new document into one or more topic clusters/classes, and then selecting the most relevant tags from those clusters/classes as machine-recommended tags. Experiments on real-world data from Del.icio.us, CiteULike and BibSonomy examine the quality of tag recommendation as well as the efficiency of our recommendation algorithms. The results suggest that our document-centered models can substantially improve the performance of tag recommendations when compared to the user-centered methods, as well as topic models LDA and SVM classifiers.
AutoML: A Survey of the State-of-the-Art Deep learning has penetrated all aspects of our lives and brought us great convenience. However, the process of building a high-quality deep learning system for a specific task is not only time-consuming but also requires lots of resources and relies on human expertise, which hinders the development of deep learning in both industry and academia. To alleviate this problem, a growing number of research projects focus on automated machine learning (AutoML). In this paper, we provide a comprehensive and up-to-date study on the state-of-the-art AutoML. First, we introduce the AutoML techniques in details according to the machine learning pipeline. Then we summarize existing Neural Architecture Search (NAS) research, which is one of the most popular topics in AutoML. We also compare the models generated by NAS algorithms with those human-designed models. Finally, we present several open problems for future research.
Autonomics: In Search of a Foundation for Next Generation Autonomous Systems The potential benefits of autonomous systems have been driving intensive development of such systems, and of supporting tools and methodologies. However, there are still major issues to be dealt with before such development becomes commonplace engineering practice, with accepted and trustworthy deliverables. We argue that a solid, evolving, publicly available, community-controlled foundation for developing next generation autonomous systems is a must. We discuss what is needed for such a foundation, identify a central aspect thereof, namely, decision-making, and focus on three main challenges: (i) how to specify autonomous system behavior and the associated decisions in the face of unpredictability of future events and conditions and the inadequacy of current languages for describing these; (ii) how to carry out faithful simulation and analysis of system behavior with respect to rich environments that include humans, physical artifacts, and other systems,; and (iii) how to engineer systems that combine executable model-driven techniques and data-driven machine learning techniques. We argue that autonomics, i.e., the study of unique challenges presented by next generation autonomous systems, and research towards resolving them, can introduce substantial contributions and innovations in system engineering and computer science.
Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.
Autonomous development and learning in artificial intelligence and robotics: Scaling up deep learning to human–like learning Autonomous lifelong development and learning is a fundamental capability of humans, differentiating them from current deep learning systems. However, other branches of artificial intelligence have designed crucial ingredients towards autonomous learning: curiosity and intrinsic motivation, social learning and natural interaction with peers, and embodiment. These mechanisms guide exploration and autonomous choice of goals, and integrating them with deep learning opens stimulating perspectives. Deep learning (DL) approaches made great advances in artificial intelligence, but are still far away from human learning. As argued convincingly by Lake et al., differences include human capabilities to learn causal models of the world from very little data, leveraging compositional representations and priors like intuitive physics and psychology. However, there are other fundamental differences between current DL systems and human learning, as well as technical ingredients to fill this gap, that are either superficially, or not adequately, discussed by Lake et al. These fundamental mechanisms relate to autonomous development and learning. They are bound to play a central role in artificial intelligence in the future. Current DL systems require engineers to manually specify a task-specific objective function for every new task, and learn through off-line processing of large training databases. On the contrary, humans learn autonomously open-ended repertoires of skills, deciding for themselves which goals to pursue or value, and which skills to explore, driven by intrinsic motivation/curiosity and social learning through natural interaction with peers. Such learning processes are incremental, online, and progressive. Human child development involves a progressive increase of complexity in a curriculum of learning where skills are explored, acquired, and built on each other, through particular ordering and timing. Finally, human learning happens in the physical world, and through bodily and physical experimentation, under severe constraints on energy, time, and computational resources. In the two last decades, the field of Developmental and Cognitive Robotics (Cangelosi and Schlesinger, 2015, Asada et al., 2009), in strong interaction with developmental psychology and neuroscience, has achieved significant advances in computational
Auto-scaling Web Applications in Clouds: A Taxonomy and Survey Web application providers have been migrating their applications to cloud data centers, attracted by the emerging cloud computing paradigm. One of the appealing features of cloud is elasticity. It allows cloud users to acquire or release computing resources on demand, which enables web application providers to auto-scale the resources provisioned to their applications under dynamic workload in order to minimize resource cost while satisfying Quality of Service (QoS) requirements. In this paper, we comprehensively analyze the challenges remain in auto-scaling web applications in clouds and review the developments in this field. We present a taxonomy of auto-scaling systems according to the identified challenges and key properties. We analyze the surveyed works and map them to the taxonomy to identify the weakness in this field. Moreover, based on the analysis, we propose new future directions.
Average Predictive Comparisons for models with nonlinearity, interactions, and variance components In a predictive model, what is the expected difference in the outcome associated with a unit difference in one of the inputs In a linear regression model without interactions, this average predictive comparison is simply a regression coefficient (with associated uncertainty). In a model with nonlinearity or interactions, however, the average predictive comparison in general depends on the values of the predictors. We consider various definitions based on averages over a population distribution of the predictors, and we compute standard errors based on uncertainty in model parameters. We illustrate with a study of criminal justice data for urban counties in the United States. The outcome of interest measures whether a convicted felon received a prison sentence rather than a jail or non-custodial sentence, with predictors available at both individual and county levels.We fit three models: (1)a hierarchical logistic regression with varying coefficients for the within-county intercepts as well as for each individual predictor; (2)a hierarchical model with varying intercepts only; and (3)a nonhierarchical model that ignores themultilevel nature of the data. The regression coefficients have different interpretations for the different models; in contrast, the models can be compared directly using predictive comparisons. Furthermore, predictive comparisons clarify the interplay between the individual and county predictors for the hierarchical models and also illustrate the relative size of varying county effects.
Avoiding the Barriers of In-Memory Business Intelligence: Making Data Discovery Scalable When looking at the growth rates of the business intelligence platform space, it is apparent that acquisitions of new business intelligence tools have shifted dramatically from traditional data visualization and aggregation use cases to newer data discovery implementations. This shift toward data discovery use cases has been driven by two key factors: faster implementation times and the ability to visualize and manipulate data as quickly as an analyst can click a mouse. The improvements in implementation speeds stem from the use of architectures that access source data directly without having to first aggregate all the data in a central location such as an enterprise data warehouse or departmental data mart. The promise of fast manipulation of data has largely been accomplished by employing in-memory data management models to exploit the speed advantage of accessing data from server memory over traditional disk-based approaches. The ‘physics’ of data access favors in-memory data management models. However, in-memory techniques are not without drawbacks. As companies attempt to evolve from small departmental projects to broader division-wide or enterprise-wide initiatives, increasing data volumes and the impact of increasing data consumer counts challenge the limits of early in-memory implementations. These challenges raise serious questions that should be considered by any organization considering in-memory techniques for business intelligence platforms.

B

Babel Storage: Uncoordinated Content Delivery from Multiple Coded Storage Systems In future content-centric networks, content is identified independently of its location. From an end-user’s perspective, individual storage systems dissolve into a seemingly omnipresent structureless `storage fog’. Content should be delivered oblivious of the network topology, using multiple storage systems simultaneously, and at minimal coordination overhead. Prior works have addressed the advantages of error correction coding for distributed storage and content delivery separately. This work takes a comprehensive approach to highlighting the tradeoff between storage overhead and transmission overhead in uncoordinated content delivery from multiple coded storage systems. Our contribution is twofold. First, we characterize the tradeoff between storage and transmission overhead when all participating storage systems employ the same code. Second, we show that the resulting stark inefficiencies can be avoided when storage systems use diverse codes. What is more, such code diversity is not just technically desirable, but presumably will be the reality in the increasingly heterogeneous networks of the future. To this end, we show that a mix of Reed-Solomon, low-density parity-check and random linear network codes achieves close-to-optimal performance at minimal coordination and operational overhead.
Basic Principles of Clustering Methods Clustering methods group a set of data points into a few coherent groups or clusters of similar data points. As an example, consider clustering pixels in an image (or video) if they belong to the same object. Different clustering methods are obtained by using different notions of similarity and different representations of data points.
Bayesian Computation Via Markov Chain Monte Carlo Markov chain Monte Carlo (MCMC) algorithms are an indispensable tool for performing Bayesian inference. This review discusses widely used sampling algorithms and illustrates their implementation on a probit regression model for lupus data. The examples considered highlight the importance of tuning the simulation parameters and underscore the important contributions of modern developments such as adaptive MCMC. We then use the theory underlying MCMC to explain the validity of the algorithms considered and to assess the variance of the resulting Monte Carlo estimators.
Bayesian Computational Tools This article surveys advances in the field of Bayesian computation over the past 20 years from a purely personal viewpoint, hence containing some ommissions given the spectrum of the field. Monte Carlo, MCMC, and ABC themes are covered here, whereas the rapidly expanding area of particle methods is only briefly mentioned and different approximative techniques such as variational Bayes and linear Bayes methods do not appear at all. This article also contains some novel computational entries on the doubleexponential model that may be of interest.
Bayesian Computing with INLA: A Review The key operation in Bayesian inference, is to compute high-dimensional integrals. An old approximate technique is the Laplace method or approximation, which dates back to Pierre- Simon Laplace (1774). This simple idea approximates the integrand with a second order Taylor expansion around the mode and computes the integral analytically. By developing a nested version of this classical idea, combined with modern numerical techniques for sparse matrices, we obtain the approach of Integrated Nested Laplace Approximations (INLA) to do approximate Bayesian inference for latent Gaussian models (LGMs). LGMs represent an important model-abstraction for Bayesian inference and include a large proportion of the statistical models used today. In this review, we will discuss the reasons for the success of the INLA-approach, the R-INLA package, why it is so accurate, why the approximations are very quick to compute and why LGMs make such a useful concept for Bayesian computing.
Bayesian Decision Theory and Stochastic Independence Stochastic independence has a complex status in probability theory. It is not part of the definition of a probability measure, but it is nonetheless an essential property for the mathematical development of this theory. Bayesian decision theorists such as Savage can be criticized for being silent about stochastic independence. From their current preference axioms, they can derive no more than the definitional properties of a probability measure. In a new framework of twofold uncertainty, we introduce preference axioms that entail not only these definitional properties, but also the stochastic independence of the two sources of uncertainty. This goes some way towards filling a curious lacuna in Bayesian decision theory.
Bayesian estimation supersedes the t test Bayesian estimation for two groups provides complete distributions of credible values for the effect size, group means and their difference, standard deviations and their difference, and the normality of the data. The method handles outliers. The decision rule can accept the null value (unlike traditional t tests) when certainty in the estimate is high (unlike Bayesian model com- parison using Bayes factors). The method also yields precise estimates of statistical power for various research goals. The software and programs are free, and run on Macintosh, Windows, and Linux platforms.
Bayesian Group Decisions: Algorithms and Complexity Many important real-world decision-making problems involve interactions of individuals with purely informational externalities, for example, in jury deliberations, expert committees, etc. We model such interactions of rational agents in a group, where they receive private information and act based on that information while also observing other people’s beliefs or actions. As a Bayesian agent attempts to infer the truth from her sequence of observations of actions of others and her own private signal, she recursively refines her belief on the signals that other players could have observed and actions that they could have taken given that other players are also rational. The existing literature addresses asymptotic properties of Bayesian group decisions (important questions such as convergence to consensus and learning). In this work, we address the computations that the Bayesian agent should undertake to realize the optimal actions at every decision epoch. We use the iterated eliminations of infeasible signals (IEIS) to model the thinking process as well as the calculations of a Bayesian agent in a group decision scenario. We show that IEIS algorithm runs in exponential time; however, when the group structure is a partially ordered set the Bayesian calculations simplify and polynomial-time computation of the Bayesian recommendations is possible. We next shift attention to the case where agents reveal their beliefs (instead of actions) at every decision epoch. We analyze the computational complexity of the Bayesian belief formation in groups and show that it is NP-hard. We also investigate the factors underlying this computational complexity and show how belief calculations simplify in special network structures or cases with strong inherent symmetries. We finally give insights about the statistical efficiency (optimality) of the beliefs and its relations to computational efficiency.
Bayesian Methods of Parameter Estimation In order to motivate the idea of parameter estimation we need to first understand the notion of mathematical modeling. What is the idea behind modeling real world phenomena Mathematically modeling an aspect of the real world enables us to better understand it and better explain it, and perhaps enables us to reproduce it, either on a large scale, or on a simplified scale that characterizes only the critical parts of that phenomenon. How do we model these real life phenomena These real life phenomena are captured by means of distribution models, which are extracted or learned directly from data gathered about them. So, what do we mean by parameter estimation Every distribution model has a set of parameters that need to be estimated. These parameters specify any constants appearing in the model and provide a mechanism for efficient and accurate use of data. …
Bayesian Model Averaging: A Tutorial Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-confident inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA have recently emerged. We discuss these methods and present a number of examples. In these examples, BMA provides improved out-ofsample predictive performance. We also provide a catalogue of currently available BMA software.
Bayesian model reduction This paper reviews recent developments in statistical structure learning; namely, Bayesian model reduction. Bayesian model reduction is a special but ubiquitous case of Bayesian model comparison that, in the setting of variational Bayes, furnishes an analytic solution for (a lower bound on) model evidence induced by a change in priors. This analytic solution finesses the problem of scoring large model spaces in model comparison or structure learning. This is because each new model can be cast in terms of an alternative set of priors over model parameters. Furthermore, the reduced free energy (i.e. evidence bound on the reduced model) finds an expedient application in hierarchical models, where it plays the role of a summary statistic. In other words, it contains all the necessary information contained in the posterior distributions over parameters of lower levels. In this technical note, we review Bayesian model reduction – in terms of common forms of reduced free energy – and illustrate recent applications in structure learning, hierarchical or empirical Bayes and as a metaphor for neurobiological processes like abductive reasoning and sleep.
Bayesian Networks, Total Variation and Robustness Now that Bayesian Networks (BNs) have become widely used, an appreciation is developing of just how critical an awareness of the sensitivity and robustness of certain target variables are to changes in the model. When time resources are limited, such issues impact directly on the chosen level of complexity of the BN as well as the quantity of missing probabilities we are able to elicit. Currently most such analyses are performed once the whole BN has been elicited and are based on Kullback-Leibler information measures. In this paper we argue that robustness methods based instead on the familiar total variation distance provide simple and more useful bounds on robustness to misspecification which are both formally justifiable and transparent. We demonstrate how such formal robustness considerations can be embedded within the process of building a BN. Here we focus on two particular choices a modeller needs to make: the choice of the parents of each node and the number of levels to choose for each variable within the system. Our analyses are illustrated throughout using two BNs drawn from the recent literature.
Bayesian Reinforcement Learning: A Survey Bayesian methods for machine learning have been widely investigated, yielding principled methods for incorporating prior information into inference algorithms. In this survey, we provide an in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm. The major incentives for incorporating Bayesian reasoning in RL are: 1) it provides an elegant approach to action-selection (exploration/exploitation) as a function of the uncertainty in learning; and 2) it provides a machinery to incorporate prior knowledge into the algorithms. We first discuss models and methods for Bayesian inference in the simple single-step Bandit model. We then review the extensive recent literature on Bayesian methods for model-based RL, where prior information can be expressed on the parameters of the Markov model. We also present Bayesian methods for model-free RL, where priors are expressed over the value function or policy class. The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.
Bayesian Statistics Students are to choose one paper in the following list, or possibly outside of the list upon my agreement. The papers are available online. Most of them are collected in this zip file . The presentation can focus on a particular section / result / example of the paper. Evaluation of the students is based on the understanding and presentation of the chosen paper.
Bayesian Statistics Papers (Paper Collection)
Behavior Trees in Robotics and AI, an Introduction A Behavior Tree (BT) is a way to structure the switching between different tasks in an autonomous agent, such as a robot or a virtual entity in a computer game. BTs are a very efficient way of creating complex systems that are both modular and reactive. These properties are crucial in many applications, which has led to the spread of BT from computer game programming to many branches of AI and Robotics. In this book, we will first give an introduction to BTs, then we describe how BTs relate to, and in many cases generalize, earlier switching structures. These ideas are then used as a foundation for a set of efficient and easy to use design principles. Properties such as safety, robustness, and efficiency are important for an autonomous system, and we describe a set of tools for formally analyzing these using a state space description of BTs. With the new analysis tools, we can formalize the descriptions of how BTs generalize earlier approaches. Finally, we describe an extended set of tools to capture the behavior of Stochastic BTs, where the outcomes of actions are described by probabilities. These tools enable the computation of both success probabilities and time to completion.
Best Practices for Applying Deep Learning to Novel Applications This report is targeted to groups who are subject matter experts in their application but deep learning novices. It contains practical advice for those interested in testing the use of deep neural networks on applications that are novel for deep learning. We suggest making your project more manageable by dividing it into phases. For each phase this report contains numerous recommendations and insights to assist novice practitioners.
Best Practices for Scientific Computing Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists´ productivity and the reliability of their software.
Better Decisions through Science Math-based aids for making decisions in medicine and industry could improve many diagnoses – often saving lives in the process.
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Beyond Mobile Apps: A Survey of Technologies for Mental Well-being Mental health problems are on the rise globally and strain national health systems worldwide. Mental disorders are closely associated with fear of stigma, structural barriers such as financial burden, and lack of available services and resources which often prohibit the delivery of frequent clinical advice and monitoring. Technologies for mental well-being exhibit a range of attractive properties which facilitate the delivery of state of the art clinical monitoring. This review article provides an overview of traditional techniques followed by their technological alternatives, sensing devices, behaviour changing tools, and feedback interfaces. The challenges presented by these technologies are then discussed with data collection, privacy and battery life being some of the key issues which need to be carefully considered for the successful deployment of mental health tool-kits. Finally, the opportunities this growing research area presents are discussed including the use of portable tangible interfaces combining sensing and feedback technologies. Capitalising on the captured data these ubiquitous devices offer, state of the art machine learning algorithms can lead to the develop
BI forward: A full view of your business Imagine that your organization is effectively using a business intelligence (BI) solution that provides everything you need to make better decisions and improve operational efficiency. Imagine users with their fingers on the pulse of markets, customers, channels and operations at all times. And imagine that your programs, plans, services and products are being designed with full and timely insight into all the factors – past, present and future – critical to success. What would it take to make that happen What businesses need from BI is a full picture. And that is why it is important to understand that, for now and in the future, BI should help you not only describe and diagnose your past and current performance, but also predict future performance. When your business can do all three, you have a better idea of what your business needs to do to stay competitive. You have reports that show you where you have been, scorecards and real-time monitoring that show what is happening now and predictive analytics to show where your business is headed. This paper explains the advantages of a BI solution that includes predictive analytics.
BI, Analytics and Big Data A Modern-Day Perspective (Slide Deck)
Big Data Analytics for Manufacturing Internet of Things: Opportunities, Challenges and Enabling Technologies The recent advances in information and communication technology (ICT) have promoted the evolution of conventional computer-aided manufacturing industry to smart data-driven manufacturing. Data analytics in massive manufacturing data can extract huge business values while can also result in research challenges due to the heterogeneous data types, enormous volume and real-time velocity of manufacturing data. This paper provides an overview on big data analytics in manufacturing Internet of Things (MIoT). This paper first starts with a discussion on necessities and challenges of big data analytics in manufacturing data of MIoT. Then, the enabling technologies of big data analytics of manufacturing data are surveyed and discussed. Moreover, this paper also outlines the future directions in this promising area.
Big Data Analytics in Action How Your Organization Can Improve its Bottom Line through Better Measurement, Better Decisions and Faster Response to Dynamic Market Conditions.
Big Data Analytics: A Survey The age of big data is now coming. But the traditional data analytics may not be able to handle such large quantities of data. The question that arises now is, how to develop a high performance platform to efficiently analyze big data and how to design an appropriate mining algorithm to find the useful things from big data. To deeply discuss this issue, this paper begins with a brief introduction to data analytics, followed by the discussions of big data analytics. Some important open issues and further research directions will also be presented for the next step of big data analytics.
Big Data and Fog Computing Fog computing serves as a computing layer that sits between the edge devices and the cloud in the network topology. They have more compute capacity than the edge but much less so than cloud data centers. They typically have high uptime and always-on Internet connectivity. Applications that make use of the fog can avoid the network performance limitation of cloud computing while being less resource constrained than edge computing. As a result, they offer a useful balance of the current paradigms. This article explores various aspects of fog computing in the context of big data.
Big Data and Machine Learning with an Actuarial Perspective (Slide Deck)
Big Data and the Creative Destruction of Today’s Business Models
Big data and the democratisation of decisions In August 2012 the Economist Intelligence Unit conducted a survey sponsored by Alteryx of 241 global executives to gauge their perceptions of big data adoption. Fifty-three percent of respondents are board members or C-suite executives, including 66 CEOs, presidents or managing directors. Those polled are based in North America (34%), the Asia-Pacific region (27%), Western Europe (25%), the Middle East and Africa (6%), Latin America (5%) and Eastern Europe (4%). Half of executives work for companies with revenue that exceeds US$500m. Executives hail from 18 sectors and represent 14 functional roles, including general management (30%), strategy and business development (18%), finance (17%) and marketing and sales (10%).
Big Data and the Internet of Things Advances in sensing and computing capabilities are making it possible to embed increasing computing power in small devices. This has enabled the sensing devices not just to passively capture data at very high resolution but also to take sophisticated actions in response. Combined with advances in communication, this is resulting in an ecosystem of highly interconnected devices referred to as the Internet of Things – IoT. In conjunction, the advances in machine learning have allowed building models on this ever increasing amounts of data. Consequently, devices all the way from heavy assets such as aircraft engines to wearables such as health monitors can all now not only generate massive amounts of data but can draw back on aggregate analytics to ‘improve’ their performance over time. Big data analytics has been identified as a key enabler for the IoT. In this chapter, we discuss various avenues of the IoT where big data analytics either is already making a significant impact or is on the cusp of doing so. We also discuss social implications and areas of concern.
Big Data for Big Business A Taxonomy of Data-driven Business Models used by Start-up Firms This paper reports a study which provides a series of implications that may be particularly helpful to companies already leveraging ‘big data´ for their businesses or planning to do so. The Data Driven Business Model (DDBM) framework represents a basis for the analysis and clustering of business models. For practitioners the dimensions and various features may provide guidance on possibilities to form a business model for their specific venture. The framework allows identification and assessment of available potential data sources that can be used in a new DDBM. It also provides comprehensive sets of potential key activities as well as revenue models. The identified business model types can serve as both inspiration and blueprint for companies considering creating new data-driven business models. Although the focus of this paper was on business models in the start-up world, the key findings presumably also apply to established organisations to a large extent. The DDBM can potentially be used and tested by established organisations across different sectors in future research.
Big Data for Finance According to the 2014 IDG Enterprise Big Data research report, companies are intensifying their efforts to derive value through big data initiatives with nearly half (49%) of respondents already implementing big data projects or in the process of doing so in the future. Further, organizations are seeing exponential growth in the amount of data managed with an expected increase of 76% within the next 12-18 months. With growth there are opportunities as well as challenges. Among those facing the big data challenge are finance executives, as this extraordinary growth presents a unique opportunity to leverage data assets like never before. As the 3 V´s of big data: volume, velocity and variety continue to grow, so too does the opportunity for finance sector firms to capitalize on this data for strategic advantage. Finance professionals are accomplished in collecting, analyzing and benchmarking data, so they are in a unique position to provide a new and critical service – making big data more manageable while condensing vast amounts of information into actionable business insights.
Big Data Gets Personal Big data and personal data are converging to shape the Internet´s most surprising consumer products. They´ll predict your needs and store your memories – if you let them.
Big Data in Big Companies Big data burst upon the scene in the first decade of the 21st century, and the first organizations to embrace it were online and startup firms. Arguably, firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning. They didn´t have to reconcile or integrate big data with more traditional sources of data and the analytics performed upon them, because they didn´t have those traditional forms. They didn´t have to merge big data technologies with their traditional IT infrastructures because those infrastructures didn´t exist. Big data could stand alone, big data analytics could be the only focus of analytics, and big data technology architectures could be the only architecture. Consider, however, the position of large, well-established businesses. Big data in those environments shouldn´t be separate, but must be integrated with everything else that´s going on in the company. Analytics on big data have to coexist with analytics on other types of data. Hadoop clusters have to do their work alongside IBM mainframes. Data scientists must somehow get along and work jointly with mere quantitative analysts. In order to understand this coexistence, we interviewed 20 large organizations in the early months of 2013 about how big data fit in to their overall data and analytics environments. Overall, we found the expected co-existence; in not a single one of these large organizations was big data being managed separately from other types of data and analytics. The integration was in fact leading to a new management perspective on analytics, which we´ll call ‘Analytics 3.0.’ In this paper we´ll describe the overall context for how organizations think about big data, the organizational structure and skills required for it…etc. We´ll conclude by describing the Analytics 3.0 era.
Big Data Machine Learning: Patterns for Predictive Analytics (RefCard)
Big data maturity: An action plan for policymakers and executives Big data have the potential to improve or transform existing business operations and reshape entire economic sectors. Big data can pave the way for disruptive, entrepreneurial companies and allow new industries to emerge. The technological aspect is important, but insufficient to allow big data to show their full potential and to stop companies from feeling swamped by this information. What matters is to reshape internal decision-making culture so that executives base their judgments on data rather than hunches. Research already indicates that companies that have managed this are more likely to be productive and profitable than the competition. Organizations need to understand where they are in terms of big data maturity, an approach that allows them to assess progress and identify necessary initiatives. Judging maturity requires looking at environment readiness, how far governments have provided the necessary legal and regulatory frameworks, and information and communications technology (ICT) infrastructure; an organization´s internal capabilities and how ready it is to implement big data initiatives; and the many and more complicated methods for using big data, which can mean simple efficiency gains or revamping a business model. The ultimate maturity level involves transforming the business model to be data-driven, which requires significant investment over many years. Policymakers should pay particular attention to environment readiness. They should present citizens with a compelling case for the benefits of big data. This means addressing privacy concerns and seeking to harmonize regulations around data privacy globally. Policymakers should establish an environment that facilitates the business viability of the big data sector (such as data, service, or IT system providers), and they should take educational measures to address the shortage of big data specialists. As big data become ubiquitous in public and private organizations, their use will become a source of national and corporate competitive advantage.
Big Data Meet Cyber-Physical Systems: A Panoramic Survey The world is witnessing an unprecedented growth of cyber-physical systems (CPS), which are foreseen to revolutionize our world {via} creating new services and applications in a variety of sectors such as environmental monitoring, mobile-health systems, intelligent transportation systems and so on. The {information and communication technology }(ICT) sector is experiencing a significant growth in { data} traffic, driven by the widespread usage of smartphones, tablets and video streaming, along with the significant growth of sensors deployments that are anticipated in the near future. {It} is expected to outstandingly increase the growth rate of raw sensed data. In this paper, we present the CPS taxonomy {via} providing a broad overview of data collection, storage, access, processing and analysis. Compared with other survey papers, this is the first panoramic survey on big data for CPS, where our objective is to provide a panoramic summary of different CPS aspects. Furthermore, CPS {require} cybersecurity to protect {them} against malicious attacks and unauthorized intrusion, which {become} a challenge with the enormous amount of data that is continuously being generated in the network. {Thus, we also} provide an overview of the different security solutions proposed for CPS big data storage, access and analytics. We also discuss big data meeting green challenges in the contexts of CPS.
Big Data Quality: A systematic literature review and future research directions One of the challenges manifested after global growth of social networks and the exponential growth of user-generated data is to identify user needs based on the data they share or tend to like. ‘Big Data’ is a term referring to data that exist in huge volume and various formats, i.e. structured or semi structured. The inherent features of this data have forced organizations to seek to identify desirable patterns amongst big data and make their fundamental decisions based on this information, in order to improve their customer services and enhance their business. As long as the big data that is being used is not of good quality, the business needs would not be expected to be met. As a result, big data quality needs to be taken into consideration seriously. Since there is no systematic review in the big data quality area, this study aims to present a systematic literature review of the research efforts on big data quality for those researchers who attempt to enter this area. In this systematic review, and after determining the basic requirements, a total of 419 studies are initially considered to be relevant. Then, with a review of the abstracts of the studies, 170 papers are included and ultimately after the complete study, 88 papers have been added to the final papers pool. Through careful study and analysis of these papers, the desired information has been extracted. As a result, a research tree is presented that divides the studies based on the type of processing, task, and technique. Then the active venues and other interesting profiles, as well as the classification of the new challenges of this field are discussed.
Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service Recently, we have been witnessing huge advancements in the scale of data we routinely generate and collect in pretty much everything we do, as well as our ability to exploit modern technologies to process, analyze and understand this data. The intersection of these trends is what is called, nowadays, as Big Data Science. Cloud computing represents a practical and cost-effective solution for supporting Big Data storage, processing and for sophisticated analytics applications. We analyze in details the building blocks of the software stack for supporting big data science as a commodity service for data scientists. We provide various insights about the latest ongoing developments and open challenges in this domain.
Big Data Visualization Tools Data visualization is the presentation of data in a pictorial or graphical format, and a data visualization tool is the software that generates this presentation. Data visualization provides users with intuitive means to interactively explore and analyze data, enabling them to effectively identify interesting patterns, infer correlations and causalities, and supports sense-making activities.
Big Data Visualization: Turning Big Data Into Big Insights This white paper provides valuable information about visualization-based data discovery tools and how they can help IT decision-makers derive more value from big data. Topics include: • An overview of the IT landscape and the challenges that are leading more businesses to look for alternatives to traditional business intelligence tools • A description of the features and benefits of visualization-based data discovery tools • Guidance and suggestions on data governance, and ways to protect the quality of big data while facilitating self-service business intelligence • Several usage examples of visualization-based data discovery tools from TIBCO* Software, the world´s second-largest data discovery vendor
Big Data: Harnessing the Power of Big Data through Education and data-driven Decision Making Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data-science programs, and publications are touting data science as a hot – even ‘‘sexy´´ – career choice. However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz. In this article, we argue that there are good reasons why it has been hard to pin down exactly what is data science. One reason is that data science is intricately intertwined with other important concepts also of growing importance, such as big data and data-driven decision making. Another reason is the natural tendency to associate what a practitioner does with the definition of the practitioner´s field; this can result in overlooking the fundamentals of the field. We believe that trying to define the boundaries of data science precisely is not of the utmost importance. We can debate the boundaries of the field in an academic setting, but in order for data science to serve business effectively, it is important (i) to understand its relationships to other important related concepts, and (ii) to begin to identify the fundamental principles underlying data science. Once we embrace (ii), we can much better understand and explain exactly what data science has to offer. Furthermore, only once we embrace (ii) should we be comfortable calling it data science. In this article, we present a perspective that addresses all these concepts.We close by offering, as examples, a partial list of fundamental principles underlying data science.
Big Data: New Tricks for Econometrics Computers are now involved in many economic transactions and can capture data associated with these transactions, which can then be manipulated and analyzed. Conventional statistical and econometric techniques such as regression often work well, but there are issues unique to big datasets that may require different tools. First, the sheer size of the data involved may require more powerful data manipulation tools. Second, we may have more potential predictors than appropriate for estimation, so we need to do some kind of variable selection. Third, large datasets may allow for more flexible relationships than simple linear models. Machine learning techniques such as decision trees, support vector machines, neural nets, deep learning, and so on may allow for more effective ways to model complex relationships. In this essay, I will describe a few of these tools for manipulating and analyzing big data. I believe that these methods have a lot to offer and should be more widely known and used by economists. In fact, my standard advice to graduate students these days is go to the computer science department and take a class in machine learning. There have been very fruitful collaborations between computer scientists and statisticians in the last decade or so, and I expect collaborations between computer scientists and econometricians will also be productive in the future.
Big data: The next frontier for innovation, competition, and productivity This report contributes to MGI´s mission to help global leaders understand the forces transforming the global economy, improve company performance, and work for better national and international policies. As with all MGI research, we would like to emphasize that this work is independent and has not been commissioned or sponsored in any way by any business, government, or other institution.
Big Workflow: More than Just Intelligent Workload Management for Big Data Big data applications represent a fast-growing category of high-value applications that are increasingly employed by business and technical computing users. However, they have exposed an inconvenient dichotomy in the way resources are utilized in data centers. Conventional enterprise and web-based applications can be executed efficiently in virtualized server environments, where resource management and scheduling is generally confined to a single server. By contrast, data-intensive analytics and technical simulations demand large aggregated resources, necessitating intelligent scheduling and resource management that spans a computer cluster, cloud, or entire data center. Although these tools exist in isolation, they are not available in a general-purpose framework that allows them to interoperate easily and automatically within existing IT infrastructure. A new approach, known as ‘Big Workflow,’ is being created by Adaptive Computing to address the needs of these applications. It is designed to unify public clouds, private clouds, Map Reduce-type clusters, and technical computing clusters. Specifically Big Workflow will: • Schedule, optimize and enforce policies across the data center • Enable data-aware workflow coordination across storage and compute silos • Integrate with external workflow automation tools Such a solution will provide a much-needed toolset for managing big data applications, shortening timelines, simplifying operations, and maximizing resource utilization, and preserving existing investments.
Blending Transactions and Analytics in a Single In-Memory Platform: Key to the Real-Time Enterprise This white paper discusses the issues involved in the traditional practice of deploying transactional and analytic applications on separate platforms using separate databases. It analyzes the results from a user survey, conducted on SAP’s behalf by IDC, that explores these issues. The paper then considers how SAP HANA, with its combination of in-memory data management and its ability to handle both transactions and analytics in real time, can resolve these issues. It explores how businesses may find opportunities for innovation (such as the ability to engage in a richer dialog with a customer based on analysis of the latest transactional information), for speed (with the ability to provide faster access to information to make timely decisions), and for simplification of the IT landscape with a single in-memory platform.
Blind Source Separation: Fundamentals and Recent Advances (A Tutorial Overview Presented at SBrT-2001) Blind source separation (BSS), i.e., the decoupling of unknown signals that have been mixed in an unknown way, has been a topic of great interest in the signal processing community for the last decade, covering a wide range of applications in such diverse fields as digital communications, pattern recognition, biomedical engineering, and financial data analysis, among others. This course aims at an introduction to the BSS problem via an exposition of well-known and established as well as some more recent approaches to its solution. A unified way is followed in presenting the various results so as to more easily bring out their similarities/differences and emphasize their relative advantages/disadvantages. Only a representative sample of the existing knowledge on BSS will be included in this course. The interested readers are encouraged to consult the list of bibliographical references for more details on this exciting and always active research topic.
Blockchain and Artificial Intelligence It is undeniable that artificial intelligence (AI) and blockchain concepts are spreading at a phenomenal rate. Both technologies have distinct degree of technological complexity and multi-dimensional business implications. However, a common misunderstanding about blockchain concept, in particular, is that blockchain is decentralized and is not controlled by anyone. But the underlying development of a blockchain system is still attributed to a cluster of core developers. Take smart contract as an example, it is essentially a collection of codes (or functions) and data (or states) that are programmed and deployed on a blockchain (say, Ethereum) by different human programmers. It is thus, unfortunately, less likely to be free of loopholes and flaws. In this article, through a brief overview about how artificial intelligence could be used to deliver bug-free smart contract so as to achieve the goal of blockchain 2.0, we to emphasize that the blockchain implementation can be assisted or enhanced via various AI techniques. The alliance of AI and blockchain is expected to create numerous possibilities.
Blockchain for Future Smart Grid: A Comprehensive Survey The concept of smart grid has been introduced as a new vision of the conventional power grid to figure out an efficient way of integrating green and renewable energy technologies. In this way, Internet-connected smart grid, also called energy Internet, is also emerging as an innovative approach to ensure the energy from anywhere at any time. The ultimate goal of these developments is to build a sustainable society. However, integrating and coordinating a large number of growing connections can be a challenging issue for the traditional centralized grid system. Consequently, the smart grid is undergoing a transformation to the decentralized topology from its centralized form. On the other hand, blockchain has some excellent features which make it a promising application for smart grid paradigm. In this paper, we have an aim to provide a comprehensive survey on application of blockchain in smart grid. As such, we identify the significant security challenges of smart grid scenarios that can be addressed by blockchain. Then, we present a number of blockchain-based recent research works presented in different literatures addressing security issues in the area of smart grid. We also summarize several related practical projects, trials, and products that have been emerged recently. Finally, we discuss essential research challenges and future directions of applying blockchain to smart grid security issues.
Blockchain for Internet of Things: A Survey Internet of Things (IoT) is reshaping the incumbent industry to smart industry featured with data-driven decision-making. However, intrinsic features of IoT result in a number of challenges such as decentralization, poor interoperability, privacy and security vulnerabilities. Blockchain technology brings the opportunities in addressing the challenges of IoT. In this paper, we investigate the integration of blockchain technology with IoT. We name such synthesis of blockchain and IoT as Blockchain of Things (BCoT). This paper presents an in-depth survey of BCoT and discusses the insights of this new paradigm. In particular, we first briefly introduce IoT and discuss the challenges of IoT. Then we give an overview of blockchain technology. We next concentrate on introducing the convergence of blockchain and IoT and presenting the proposal of BCoT architecture. We further discuss the issues about using blockchain for 5G beyond in IoT as well as industrial applications of BCoT. Finally, we outline the open research directions in this promising area.
Blockchain for the IoT: Opportunities and Challenges Blockchain technology has been transforming the financial industry and has created a new crypto-economy in the last decade. The foundational concepts such as decentralized trust and distributed ledger are promising for distributed, and large-scale Internet of Things (IoT) applications. However, the applications of Blockchain beyond cryptocurrencies in this domain are few and far between because of the lack of understanding and inherent architectural challenges. In this paper, we describe the opportunities for applications of blockchain for the IoT and examine the challenges involved in architecting Blockchain-based IoT applications.
Blockchain Games: A Survey With the support of the blockchain systems, the cryptocurrency has changed the world of virtual assets. Digital games, especially those with massive multi-player scenarios, will be significantly impacted by this novel technology. However, there are insufficient academic studies on this topic. In this work, we filled the blank by surveying the state-of-the-art blockchain games. We discuss the blockchain integration for games and then categorize existing blockchain games from the aspects of their genres and technical platforms. Moreover, by analyzing the industrial trend with a statistical approach, we envision the future of blockchain games from technological and commercial perspectives.
Blockchain Technology Overview Blockchains are tamper evident and tamper resistant digital ledgers implemented in a distributed fashion (i.e., without a central repository) and usually without a central authority (i.e., a bank, company, or government). At their basic level, they enable a community of users to record transactions in a shared ledger within that community, such that under normal operation of the blockchain network no transaction can be changed once published. This document provides a high-level technical overview of blockchain technology. The purpose is to help readers understand how blockchain technology works.
Blockchain: Emerging Applications and Use Cases Blockchain also known as a distributed ledger technology stores different transactions/operations in a chain of blocks in a distributed manner without needing a trusted third-party. Blockchain is proven to be immutable which helps for integrity and accountability, and, to some extent, confidentiality through a pair of public and private keys. Blockchain has been in the spotlight after successful boom of the Bitcoin. There have been efforts to leverage salient features of Blockchain for different applications and use cases. This paper present a comprehensive survey of applications and use cases of Blockchain technology. Specifically, readers of this paper can have thorough understanding of applications and user cases of Blockchain technology.
Brain Intelligence: Go Beyond Artificial Intelligence Artificial intelligence (AI) is an important technology that supports daily social life and economic activities. It contributes greatly to the sustainable growth of Japan’s economy and solves various social problems. In recent years, AI has attracted attention as a key for growth in developed countries such as Europe and the United States and developing countries such as China and India. The attention has been focused mainly on developing new artificial intelligence information communication technology (ICT) and robot technology (RT). Although recently developed AI technology certainly excels in extracting certain patterns, there are many limitations. Most ICT models are overly dependent on big data, lack a self-idea function, and are complicated. In this paper, rather than merely developing next-generation artificial intelligence technology, we aim to develop a new concept of general-purpose intelligence cognition technology called Beyond AI. Specifically, we plan to develop an intelligent learning model called Brain Intelligence (BI) that generates new ideas about events without having experienced them by using artificial life with an imagine function. We will also conduct demonstrations of the developed BI intelligence learning model on automatic driving, precision medical care, and industrial robots.
Breaking Data Science Open Deliver Collaboration, Self-Service and Production Deployment with Open Data Science Data science has burst into public attention over the past few years as perhaps the hottest and most lucrative technology field. No longer just a buzzword for advanced analytics, Christine Doig is a senior data scientist at Continuum Analytics, where she’s worked on several projects, including MEMEX, a DARPA-funded open data science project to help stop human trafficking. She has 5+ years of experience in analytics, operations research, and machine learning in a variety of industries. Christine Doig @ch_doig data science is poised to change everything about an organization: its potential customers, expansion plans, engineering and manufacturing process, how it chooses and interacts with suppliers and more. The leading edge of this tsunami is a combination of innovative business and technology trends that promise a more intelligent future based on Open Data Science. Open Data Science is a movement that makes the open source tools of data science—data, analytics and computation—work together as a connected ecosystem.
Bridging the gap between hierarchical network representation and functional analysis RedeR is an R-based package combined with a Java application for dynamic network visualization and manipulation. It implements a callback engine by using a low-level R-to-Java interface to build and run common plugins. In this sense, RedeR takes advantage of R to run robust statistics, while the R-to-Java interface bridge the gap between network analysis and visualization. RedeR is designed to deal with three key challenges in network analysis. Firstly, biological networks are modular and hierarchical, so network visualization needs to take advantage of such structural features. Secondly, network analysis relies on statistical methods, many of which are already available in resources like CRAN or Bioconductor. However, the missing link between ad- vanced visualization and statistical computing makes it hard to take full advantage of R packages for network analysis. Thirdly, in larger networks user input is needed to focus the view of the network on the biologically relevant parts, rather than relying on an automatic layout function. RedeR is designed to address these challenges (additional information is available at Castro et al.).
Brief Review of Computational Intelligence Algorithms Computational Intelligence algorithms have gained a lot of attention of researchers in the recent years due to their ability to deliver near optimal solutions. In this paper we propose a new hierarchy which classifies algorithms based on their sources of inspiration. The algorithms have been divided into two broad domains namely modeling of human mind and nature inspired intelligence. Algorithms of Modeling of human mind take their motivation from the manner in which humans perceive and deal with information. Similarly algorithms of nature inspired intelligence domain are based on ordinary phenomenon occurring in nature. The latter has further been broken into swarm intelligence, geosciences and artificial immune system. Geoscience based is the new domain whose algorithms are based on geographic phenomenon on the Earths surface. A comprehensive tabular comparison is done amongst algorithms in each domain in various attributes such as problem solving method, application, characteristics and more. For further insights, we examine a variant of every algorithm and its implementation for a specific application. To understand the performance and efficiency better, we compare the performance of select algorithms on Traveling salesman problem.
Brief: Real-Time Speech Analytics – Still More Sizzle Than Steak Most customer service organizations record phone interactions with their customers. If they get around to analyzing those recordings, whatever they find can´t change the outcome of those calls — they are long since over. Vendors of real-time speech analytics tools promise to allow companies to intervene at the moment of truth, while the customer and the contact center agent are still talking. This brief discusses the hurdles application development and delivery (ADandD) pros will need to overcome to justify the expenditure on this technology and the steps they will need to take to prepare for a world of alerts generated in real-time based on customer conversations.
Build a Powerful Business Case for Data Quality with Metrics Money and resources wasted; sales missed; extra costs incurred. Recent research by industry analyst firm Gartner shows that the shocking price that companies are paying because of poor quality data adds up to a staggering $8.2 million annually.
Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models We survey latent variable models for solving data-analysis problems. A latent variable model is a probabilistic model that encodes hidden patterns in the data.We uncover these patterns from their conditional distribution and use them to summarize data and form predictions. Latent variable models are important in many fields, including computational biology, natural language processing, and social network analysis. Our perspective is that models are developed iteratively: We build a model, use it to analyze data, assess how it succeeds and fails, revise it, and repeat. We describe how new research has transformed these essential activities. First, we describe probabilistic graphical models, a language for formulating latent variable models. Second, we describe mean field variational inference, a generic algorithm for approximating conditional distributions. Third, we describe how to use our analyses to solve problems: exploring the data, forming predictions, and pointing us in the direction of improved models.
Building Data Science Teams Starting in 2008, Jeff Hammerbacher (@hackingdata) and I sat down to share our experiences building the data and analytics groups at Facebook and LinkedIn. In many ways, that meeting was the start of data science as a distinct professional specialization (see ‘What Makes a Data Scientist ‘ on page 11 for the story on how we came up with the title ‘Data Scientist’). Since then, data science has taken on a life of its own. The hugely positive response to ‘What Is Data Science ,’ a great introduction to the meaning of data science in today´s world, showed that we were at the start of a movement. There are now regular meetups, well-established startups, and even college curricula focusing on data science. As McKinsey´s big data research report and LinkedIn´s data indicates indicates (see Figure 1), data science talent is in high demand. This increase in the demand for data scientists has been driven by the success of the major Internet companies. Google, Facebook, LinkedIn, and Amazon have all made their marks by using data creatively: not just warehousing data, but turning it into something of value. Whether that value is a search result, a targeted advertisement, or a list of possible acquaintances, data science is producing products that people want and value. And it´s not just Internet companies: Walmart doesn´t produce ‘data products’ as such, but they´re well known for using data to optimize every aspect of their retail operations. Given how important data science has grown, it´s important to think about what data scientists add to an organization, how they fit in, and how to hire and build effective data science teams.
Building High-level Features Using Large Scale Unsupervised Learning We consider the problem of building highlevel, class-speci c feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images To answer this, we train a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200×200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also nd that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8% accuracy in recognizing 22,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art.
Building Machines that Learn and Think for Themselves: Commentary on Lake et al., Behavioral and Brain Sciences, 2017 We agree with Lake and colleagues on their list of key ingredients for building humanlike intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand-engineering. We believe an approach centered on autonomous learning has the greatest chance of success as we scale toward real-world complexity, tackling domains for which ready-made formal models are not available. Here we survey several important examples of the progress that has been made toward building autonomous agents with humanlike abilities, and highlight some outstanding challenges.
Building Production-Ready Predictive Analytics There´s a part of data science that you never hear about: the production. Everybody talks about how to build models, but not many people worry about how to actually use those models. Yet production issues are the reason many companies fail to see value come from their data science efforts. We wondered how companies handled their production processes and environments to build production-ready data products, and we figured the easiest way to find out was to ask them. We conducted a worldwide survey and asked thousands of companies. And we got our answers. After analyzing those answers, we isolated four different ways companies are dealing with production today, and we put together a series of recommendations on how to build production-ready data science projects.
Building Real-Time Data Pipelines Imagine you had a time machine that could go back one minute, or an hour. Think about what you could do with it. From the perspective of other people, it would seem like there was nothing you couldn´t do, no contest you couldn´t win. In the real world, there are three basic ways to win. One way is to have something, or to know something, that your competition does not. Nice work if you can get it. The second way to win is to simply be more intelligent. However, the number of people who think they are smarter is much larger than the number of people who actually are smarter. The third way is to process information faster so you can make and act on decisions faster. Being able to make more decisions in less time gives you an advantage in both information and intelligence. It allows you to try many ideas, correct the bad ones, and react to changes before your competition. If your opponent cannot react as fast as you can, it does not matter what they have, what they know, or how smart they are. Taken to extremes, it´s almost like having a time machine. An example of the third way can be found in high-frequency stock trading. Every trading desk has access to a large pool of highly intelligent people, and pays them well. All of the players have access to the same information at the same time, at least in theory. Being more or less equally smart and informed, the most active area of competition is the end-to-end speed of their decision loops. In recent years, traders have gone to the trouble of building their own wireless long-haul networks, to exploit the fact that microwaves move through the air 50% faster than light can pulse through fiber optics. This allows them to execute trades a crucial millisecond faster. Finding ways to shorten end-to-end information latency is also a constant theme at leading tech companies. They are forever working to reduce the delay between something happening out there in the world or in their huge clusters of computers, and when it shows up on a graph. At Facebook in the early 2010s, it was normal to wait hours after pushing new code to discover whether everything was working efficiently. The full report came in the next day. After building their own distributed in-memory database and event pipeline, their information loop is now on the order of 30 seconds, and they push at least two full builds per day. Instead of slowing down as they got bigger, Facebook doubled down on making more decisions faster. What is your system´s end-to-end latency How long is your decision loop, compared to the competition Imagine you had a system that was twice as fast. What could you do with it This might be the most important question for your business. In this book we´ll explore new models of quickly processing information end to end that are enabled by long-term hardware trends, learnings from some of the largest and most successful tech companies, and surprisingly powerful ideas that have survived the test of time.
Business Analytics for Manufacturing: Four Ways to Increase Efficiency and Performance Whether the economy is strong or weak, the fundamental strategies for surviving and thriving still hold true. Manufacturers have to be highly efficient to meet demand and supply requirements. Costs and resources also have to be managed carefully and intelligently. At the same time, companies are considering new tactics: inventory optimization, maintenance operations, intelligent supply chains and leveraging technology as a focal point of business strategy. In order to be successful your company needs access to critical information and visibility into how well your business, your market and your competitors are responding to today´s challenging and changing times. …
Business Models for the Data Economy Whether you call it Big Data, data science, or simply analytics, modern businesses see data as a gold mine. Sometimes they already have this data in hand and understand that it is central to their activities. Other times, they uncover new data that fills a perceived gap, or seemingly ‘useless’ data generated by other processes. Whatever the case, there is certainly value in using data to advance your business.
Business Process Deviance Mining: Review and Evaluation Business process deviance refers to the phenomenon whereby a subset of the executions of a business process deviate, in a negative or positive way, with respect to its expected or desirable outcomes. Deviant executions of a business process include those that violate compliance rules, or executions that undershoot or exceed performance targets. Deviance mining is concerned with uncovering the reasons for deviant executions by analyzing business process event logs. This article provides a systematic review and comparative evaluation of deviance mining approaches based on a family of data mining techniques known as sequence classification. Using real-life logs from multiple domains, we evaluate a range of feature types and classification methods in terms of their ability to accurately discriminate between normal and deviant executions of a process. We also analyze the interestingness of the rule sets extracted using different methods. We observe that feature sets extracted using pattern mining techniques only slightly outperform simpler feature sets based on counts of individual activity occurrences in a trace.
Business-Driven BI: Using New Technologies to Foster Self-Service Access to Insights Self-Service Business Intelligence (BI) has been the holy grail for BI professionals for a long time. Yet almost two-thirds of BI professionals (64%) rate the success of their self-service initiatives ‘average’ or lower. Newcomers to BI struggle even more, with more than half (52%) rating their attempts at selfservice BI ‘fair’ or ‘poor.’ One reason for these less-than-stellar numbers is this: Implementing selfservice BI is more complex than it looks. It´s not a one-size-fits-all program. BI users come in many different shapes and sizes, each with unique information requirements. This report lays out several frameworks that explain how users interact with information and then maps elements of each to BI functionality and categories of BI tools. This mapping is critical to success with self-service BI….

C

Caching and Distributing Statistical Analyses in R We present the cacher package for R, which provides tools for caching statistical analyses and for distributing these analyses to others in an e cient manner. The cacher package takes objects created by evaluating R expressions and stores them in key-value databases. These databases of cached objects can subsequently be assembled into packages for distribution over the web. The cacher package also provides tools to help readers examine the data and code in a statistical analysis and reproduce, modify, or improve upon the results. In addition, readers can easily conduct alternate analyses of the data. We describe the design and implementation of the cacher package and provide two examples of how the package can be used for reproducible research. This vignette was originally published as Peng (2008).
Calling R from .NET: a case-study using Rapid NCA, the non-compartmental analysis workflow tool (Slide Deck)
Can Autism be Catered with Artificial Intelligence-Assisted Intervention Technology A Literature Review This article presents an extensive literature review of technology based intervention methodologies for individuals facing Autism Spectrum Disorder (ASD). Reviewed methodologies include: contemporary Computer Aided Systems (CAS), Computer Vision Assisted Technologies (CVAT) and Virtual Reality (VR) or Artificial Intelligence-Assisted interventions. The research over the past decade has provided enough demonstrations that individuals of ASD have a strong interest in technology based interventions and can connect with them for longer durations without facing any trouble(s). Theses technology based interventions are useful for individuals facing autism in clinical settings as well as at home and classrooms. Despite showing great promise, research in developing an advanced technology based intervention that is clinically quantitative for ASD is minimal. Moreover, the clinicians are generally not convinced about the potential of the technology based interventions due to non-empirical nature of published results. A major reason behind this non-acceptability is a vast majority of studies on distinct intervention methodologies do not follow any specific standard or research design. Consequently, the data produced by these studies is minimally appealing to the clinical community. This research domain has a vast social impact as per official statistics given by the Autism Society of America, autism is the fastest growing developmental disability in the United States (US). The estimated annual cost in the US for diagnosis and treatment for ASD is 236-262 Billion US Dollars. The cost of up-bringing an ASD individual is estimated to be 1.4 million USD while statistics show 1% of the worlds’ total population is suffering from ASD.
Can Deep Neural Networks Match the Related Objects : A Survey on ImageNet-trained Classification Models Deep neural networks (DNNs) have shown the state-of-the-art level of performances in wide range of complicated tasks. In recent years, the studies have been actively conducted to analyze the black box characteristics of DNNs and to grasp the learning behaviours, tendency, and limitations of DNNs. In this paper, we investigate the limitation of DNNs in image classification task and verify it with the method inspired by cognitive psychology. Through analyzing the failure cases of ImageNet classification task, we hypothesize that the DNNs do not sufficiently learn to associate related classes of objects. To verify how DNNs understand the relatedness between object classes, we conducted experiments on the image database provided in cognitive psychology. We applied the ImageNet-trained DNNs to the database consisting of pairs of related and unrelated object images to compare the feature similarities and determine whether the pairs match each other. In the experiments, we observed that the DNNs show limited performance in determining relatedness between object classes. In addition, the DNNs present somewhat improved performance in discovering relatedness based on similarity, but they perform weaker in discovering relatedness based on association. Through these experiments, a novel analysis of learning behaviour of DNNs is provided and the limitation which needs to be overcome is suggested.
Can Entropy Explain Successor Surprisal Effects in Reading? Human reading behavior is sensitive to surprisal: more predictable words tend to be read faster. Unexpectedly, this applies not only to the surprisal of the word that is currently being read, but also to the surprisal of upcoming (successor) words that have not been fixated yet. This finding has been interpreted as evidence that readers can extract lexical information parafoveally. Calling this interpretation into question, Angele et al. (2015) showed that successor effects appear even in contexts in which those successor words are not yet visible. They hypothesized that successor surprisal predicts reading time because it approximates the reader’s uncertainty about upcoming words. We test this hypothesis on a reading time corpus using an LSTM language model, and find that successor surprisal and entropy are independent predictors of reading time. This independence suggests that entropy alone is unlikely to be the full explanation for successor surprisal effects.
Can machine learning identify interesting mathematics An exploration using empirically observed laws We explore the possibility of using machine learning to identify interesting mathematical structures by using certain quantities that serve as fingerprints. In particular, we extract features from integer sequences using two empirical laws: Benford’s law and Taylor’s law and experiment with various classifiers to identify whether a sequence is nice, important, multiplicative, easy to compute or related to primes or palindromes.
Can Machines Design An Artificial General Intelligence Approach Can machines design Can they come up with creative solutions to problems and build tools and artifacts across a wide range of domains Recent advances in the field of computational creativity and formal Artificial General Intelligence (AGI) provide frameworks for machines with the general ability to design. In this paper we propose to integrate a formal computational creativity framework into the G\’odel machine framework. We call this machine a design G\’odel machine. Such a machine could solve a variety of design problems by generating novel concepts. In addition, it could change the way these concepts are generated by modifying itself. The design G\’odel machine is able to improve its initial design program, once it has proven that a modification would increase its return on the utility function. Finally, we sketch out a specific version of the design G\’odel machine which specifically aims at the design of complex software and hardware systems. Future work could be the development of a more formal version of the Design G\’odel machine and a potential implementation.
Can We Distinguish Machine Learning from Human Learning? What makes a task relatively more or less difficult for a machine compared to a human? Much AI/ML research has focused on expanding the range of tasks that machines can do, with a focus on whether machines can beat humans. Allowing for differences in scale, we can seek interesting (anomalous) pairs of tasks T, T’. We define interesting in this way: The ‘harder to learn’ relation is reversed when comparing human intelligence (HI) to AI. While humans seems to be able to understand problems by formulating rules, ML using neural networks does not rely on constructing rules. We discuss a novel approach where the challenge is to ‘perform well under rules that have been created by human beings.’ We suggest that this provides a rigorous and precise pathway for understanding the difference between the two kinds of learning. Specifically, we suggest a large and extensible class of learning tasks, formulated as learning under rules. With these tasks, both the AI and HI will be studied with rigor and precision. The immediate goal is to find interesting groundtruth rule pairs. In the long term, the goal will be to understand, in a generalizable way, what distinguishes interesting pairs from ordinary pairs, and to define saliency behind interesting pairs. This may open new ways of thinking about AI, and provide unexpected insights into human learning.
Canonical example of Bayes´ theorem in detail The most common elementary illustration of Bayes´ theorem is medical testing for a rare disease. The example is almost a clich´e in probability and statistics books. And yet in my opinion, it´s usually presented too quickly and too abstractly. Here I´m going to risk erring on the side of going too slowly and being too concrete. I´ll work out an example with numbers and no equations before presenting Bayes theorem. Then I´ll include a few graphs.
Capitalizing on the power of big data for retail The retail industry is changing dramatically as consumers shop in new ways. With the growing popularity of online shopping and mobile commerce, consumers are using more retail channels than ever before to research products, compare prices, search for promotions, make purchases and provide feedback. Social media has become one of the key channels. Consumers are using social media – and the leading e-commerce platforms that integrate with social media – to find product recommendations, lavish praise, voice complaints, capitalize on product offers and engage in ongoing dialogs with their favorite brands. The multiplication of retail channels and the increasing use of social media are empowering consumers. With a wealth of information readily available online, consumers are now better able to compare products, services and prices – even as they shop in physical stores. When consumers interact with companies publically through social media, they have greater power to influence other customers or damage a brand. These and other changes in the retail industry are creating important opportunities for retailers. But to capitalize on those opportunities, retailers need ways to collect, manage and analyze a tremendous volume, variety and velocity of data. When point-of-sale (POS) systems were first commercialized, retailers were able to collect large amounts of potentially valuable information, but most of that information remained untapped. The emergence of social media and other consumer-oriented technologies is now introducing even more data to the retail ecosystem. Retailers must handle not only the growing volume of information but also an increasing variety – including both structured and unstructured data. They must also find ways to accommodate the changing nature of this data and the velocity at which is being produced and collected. If retailers succeed in addressing the challenges of ‘big data,’ they can use this data to generate valuable insights for personalizing marketing and improving the effectiveness of marketing campaigns, optimizing assortment and merchandising decisions, and removing inefficiencies in distribution and operations. Adopting solutions designed to capitalize on this big data allows companies to navigate the shifting retail landscape and drive a positive transformation for the business….
Career Transitions and Trajectories: A Case Study in Computing From artificial intelligence to network security to hardware design, it is well-known that computing research drives many important technological and societal advancements. However, less is known about the long-term career paths of the people behind these innovations. What do their careers reveal about the evolution of computing research Which institutions were and are the most important in this field, and for what reasons Can insights into computing career trajectories help predict employer retention In this paper we analyze several decades of post-PhD computing careers using a large new dataset rich with professional information, and propose a versatile career network model, R^3, that captures temporal career dynamics. With R^3 we track important organizations in computing research history, analyze career movement between industry, academia, and government, and build a powerful predictive model for individual career transitions. Our study, the first of its kind, is a starting point for understanding computing research careers, and may inform employer recruitment and retention mechanisms at a time when the demand for specialized computational expertise far exceeds supply.
Causal inference and the data-fusion problem We review concepts, principles, and tools that unify current approaches to causal analysis and attend to new challenges presented by big data. In particular, we address the problem of data fusion – piecing together multiple datasets collected under heterogeneous conditions (i.e., different populations, regimes, and sampling methods) to obtain valid answers to queries of interest. The availability of multiple heterogeneous datasets presents new opportunities to big data analysts, because the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted parametric models. We here present a general, nonparametric framework for handling these biases and, ultimately, a theoretical solution to the problem of data fusion in causal inference tasks.
Causal inference in statistics: An overview This review presents empirical researchers with recent advances in causal inference, and stresses the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called ‘causal effects’ or ‘policy evaluation’) (2) queries about probabilities of counterfactuals, (including assessment of ‘regret,’ ‘attribution’ or ’causes of effects’) and (3) queries about direct and indirect effects (also known as ‘mediation’). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both.
Causality and Statistical Learning In social science we are sometimes in the position of studying descriptive questions (In what places do working-class whites vote for Republicans In what eras has social mobility been higher in the United States than in Europe In what social settings are different sorts of people more likely to act strategically ). Answering descriptive questions is not easy and involves issues of data collection, data analysis, and measurement (how one should define concepts such as ‘working-class whites,’ ‘social mobility,’ and ‘strategic’) but is uncontroversial from a statistical standpoint. All becomes more difficult when we shift our focus from what to what if and why. Consider two broad classes of inferential questions: 1. Forward causal inference. What might happen if we do X What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth 2. Reverse causal inference. What causes Y Why do more attractive people earn more money Why do many poor people vote for Republicans and rich people vote for Democrats Why did the economy collapse
Causality for Machine Learning Graphical causal inference as pioneered by Judea Pearl arose from research on artificial intelligence (AI), and for a long time had little connection to the field of machine learning. This article discusses where links have been and should be established, introducing key concepts along the way. It argues that the hard open problems of machine learning and AI are intrinsically related to causality, and explains how the field is beginning to understand them.
Challenges and Opportunities with Big Data The promise of data-driven decision-making is now being recognized broadly, and there is growing enthusiasm for the notion of “Big Data.´´ While the promise of Big Data is real — for example, it is estimated that Google alone contributed 54 billion dollars to the US economy in 2009 — there is currently a wide gap between its potential and its realization. Heterogeneity, scale, timeliness, complexity, and privacy problems with Big Data impede progress at all phases of the pipeline that can create value from data. The problems start right away during data acquisition, when the data tsunami requires us to make decisions, currently in an ad hoc manner, about what data to keep and what to discard, and how to store what we keep reliably with the right metadata. Much data today is not natively in structured format; for example, tweets and blogs are weakly structured pieces of text, while images and video are structured for storage and display, but not for semantic content and search: transforming such content into a structured format for later analysis is a major challenge. The value of data explodes when it can be linked with other data, thus data integration is a major creator of value. Since most data is directly generated in digital format today, we have the opportunity and the challenge both to influence the creation to facilitate later linkage and to automatically link previously created data. Data analysis, organization, retrieval, and modeling are other foundational challenges. Data analysis is a clear bottleneck in many applications, both due to lack of scalability of the underlying algorithms and due to the complexity of the data that needs to be analyzed. Finally, presentation of the results and its interpretation by non-technical domain experts is crucial to extracting actionable knowledge. During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led, during the last 35 years, to a multi-billion dollar industry. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today. The many novel challenges and opportunities associated with Big Data necessitate rethinking many aspects of these data management platforms, while retaining other desirable aspects. We believe that appropriate investment in Big Data will lead to a new wave of fundamental technological advances that will be embodied in the next generations of Big Data management and analysis platforms, products, and systems. We believe that these research problems are not only timely, but also have the potential to create huge economic value in the US economy for years to come. However, they are also hard, requiring us to rethink data analysis systems in fundamental ways. A major investment in Big Data, properly directed, can result not only in major scientific advances, but also lay the foundation for the next generation of advances in science, medicine, and business.
Challenges of Big Data Analysis Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.
Characterization of Fundamental Networks In the framework of coupled cell systems, a coupled cell network describes graphically the dynamical dependencies between individual dynamical systems, the cells. The fundamental network of a network reveals the hidden symmetries of that network. Subspaces defined by equalities of coordinates which are flow-invariant for any coupled cell system consistent with a network structure are called the network synchrony subspaces. Moreover, for every synchrony subspaces, each network admissible system restricted to that subspace is a dynamical systems consistent with a smaller network. The original network is then said to be a lift of the smaller network. We characterize networks such that: its fundamental network is a lift of the network; the network is a subnetwork of its fundamental network, and the network is a fundamental network. The size of cycles in a network and the distance of a cell to a cycle are two important properties concerning the description of the network architecture. In this paper, we relate these two architectural properties in a network and its fundamental network.
Characterizing HCI Research in China: Streams, Methodologies and Future Directions Human-computer Interaction (HCI) is an interdisciplinary research field involving multiple disciplines, such as computer science, psychology, social science and design. It studies the interaction between users and computer in order to better design technologies and solve real-life problems. This position paper characterizes HCI research in China by comparing it with international HCI research traditions. We discuss the current streams and methodologies of Chinese HCI research. We then propose future HCI research directions such as including emergent users who have less access to technology and addressing the cultural dimensions in order to provide better technical solutions and support.
Character-level Convolutional Networks for Text Classification This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several largescale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.
Chart Suggestions – A Thought-Starter (Cheat Sheet)
Choosing the right NoSQL database for the job: a quality attribute evaluation For over forty years, relational databases have been the leading model for data storage, retrieval and management. However, due to increasing needs for scalability and performance, alternative systems have emerged, namely NoSQL technology. The rising interest in NoSQL technology, as well as the growth in the number of use case scenarios, over the last few years resulted in an increasing number of evaluations and comparisons among competing NoSQL technologies. While most research work mostly focuses on performance evaluation using standard benchmarks, it is important to notice that the architecture of real world systems is not only driven by performance requirements, but has to comprehensively include many other quality attribute requirements. Software quality attributes form the basis from which software engineers and architects develop software and make design decisions. Yet, there has been no quality attribute focused survey or classification of NoSQL databases where databases are compared with regards to their suitability for quality attributes common on the design of enterprise systems. To fill this gap, and aid software engineers and architects, in this article, we survey and create a concise and up-to-date comparison of NoSQL engines, identifying their most beneficial use case scenarios from the software engineer point of view and the quality attributes that each of them is most suited to.
Classification and Regression Tree Methods A classification or regression tree is a prediction model that can be represented as a decision tree. This article discusses the C4.5, CART, CRUISE, GUIDE, and QUEST methods in terms of their algorithms, features, properties, and performance.
Classification and Regression Trees Classification and regression trees are machine-learningmethods for constructing predictionmodels from data. Themodels are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered discrete values, with prediction error typically measured by the squared difference between the observed and predicted values. This article gives an introduction to the subject by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
Classification And Regression Trees : A Practical Guide for Describing a Dataset (Slide Deck)
Classification revisited: a web of knowledge The vision of the Semantic Web (SW) is gradually unfolding and taking shape through a web of linked data, a part of which is built by capturing semantics stored in existing knowledge organization systems (KOS), subject metadata and resource metadata. The content of vast bibliographic collections is currently categorized by some widely used bibliographic classification and we may soon see them being mined for information and linked in a meaningful way across the Web. Bibliographic classifications are designed for knowledge mediation which offers both a rich terminology and different ways in which concepts can be categorized and related to each other in the universe of knowledge. From 1990-2010 they have been used in various resource discovery services on the Web and continue to be used to support information integration in a number of international digital library projects. In this chapter we will revisit some of the ways in which universal classifications, as language independent concept schemes, can assist humans and computers in structuring and presenting information and formulating queries. Most importantly, we highlight issues important to understanding bibliographic classifications, both in terms of their unused potential and technical limitations.
Classification via Minimum Incremental Coding Length We present a simple new criterion for classification, based on principles from lossy data compression. The criterion assigns a test sample to the class that uses the minimum number of additional bits to code the test sample, subject to an allowable distortion. We demonstrate the asymptotic optimality of this criterion for Gaussian distributions and analyze its relationships to classical classifiers. The theoretical results clarify the connections between our approach and popular classifiers such as MAP, RDA, k-NN, and SVM, as well as unsupervised methods based on lossy coding. Our formulation induces several good effects on the resulting classifier. First, minimizing the lossy coding length induces a regularization effect which stabilizes the (implicit) density estimate in a small sample setting. Second, compression provides a uniform means of handling classes of varying dimension. The new criterion and its kernel and local versions perform competitively on synthetic examples, as well as on real imagery data such as handwritten digits and face images. On these problems, the performance of our simple classifier approaches the best reported results, without using domain-specific information. All MATLAB code and classification results are publicly available for peer evaluation at http://…/home.htm.
Classification with imperfect training labels We study the effect of imperfect training data labels on the performance of classification methods. In a general setting, where the probability that an observation in the training dataset is mislabelled may depend on both the feature vector and the true label, we bound the excess risk of an arbitrary classifier trained with imperfect labels in terms of its excess risk for predicting a noisy label. This reveals conditions under which a classifier trained with imperfect labels remains consistent for classifying uncorrupted test data points. Furthermore, under stronger conditions, we derive detailed asymptotic properties for the popular $k$-nearest neighbour ($k$nn), Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA) classifiers. One consequence of these results is that the $k$nn and SVM classifiers are robust to imperfect training labels, in the sense that the rate of convergence of the excess risks of these classifiers remains unchanged; in fact, it even turns out that in some cases, imperfect labels may improve the performance of these methods. On the other hand, the LDA classifier is shown to be typically inconsistent in the presence of label noise unless the prior probabilities of each class are equal. Our theoretical results are supported by a simulation study.
Closing the AI Knowledge Gap AI researchers employ not only the scientific method, but also methodology from mathematics and engineering. However, the use of the scientific method – specifically hypothesis testing – in AI is typically conducted in service of engineering objectives. Growing interest in topics such as fairness and algorithmic bias show that engineering-focused questions only comprise a subset of the important questions about AI systems. This results in the AI Knowledge Gap: the number of unique AI systems grows faster than the number of studies that characterize these systems’ behavior. To close this gap, we argue that the study of AI could benefit from the greater inclusion of researchers who are well positioned to formulate and test hypotheses about the behavior of AI systems. We examine the barriers preventing social and behavioral scientists from conducting such studies. Our diagnosis suggests that accelerating the scientific study of AI systems requires new incentives for academia and industry, mediated by new tools and institutions. To address these needs, we propose a two-sided marketplace called TuringBox. On one side, AI contributors upload existing and novel algorithms to be studied scientifically by others. On the other side, AI examiners develop and post machine intelligence tasks designed to evaluate and characterize algorithmic behavior. We discuss this market’s potential to democratize the scientific study of AI behavior, and thus narrow the AI Knowledge Gap.
Cloud based Predictive Analytics poised for rapid growth Rather than report survey results question by question the results, and their implications, have been grouped into a number of sections. Each section highlights significant results from the survey and discusses its implication. – Business solutions are what organizations need – Predictive analytics are showing real strength – Customers are the focus for predictive analytics and cloud – Cloud-based predictive analytic scenarios are gaining momentum – Early adopters are gaining a competitive advantage – Decision Management matters to predictive analytic success – There are still some barriers and concerns with cloud-based predictive analytics – Industries vary in their adoption and concerns – A mix of clouds is appropriate – Traditional data sources dominate predictive analytic models After the survey results and implications are discussed we will make some recommendations and identify pros and cons of the various options. Demographics and vendor profiles complete the paper.
Cloud Computing – Architecture and Applications In the era of Internet of Things and with the explosive worldwide growth of electronic data volume, and associated need of processing, analysis, and storage of such humongous volume of data, it has now become mandatory to exploit the power of massively parallel architecture for fast computation. Cloud computing provides a cheap source of such computing framework for large volume of data for real-time applications. It is, therefore, not surprising to see that cloud computing has become a buzzword in the computing fraternity over the last decade. This book presents some critical applications in cloud frameworks along with some innovation design of algorithms and architecture for deployment in cloud environment. It is a valuable source of knowledge for researchers, engineers, practitioners, and graduate and doctoral students working in the field of cloud computing. It will also be useful for faculty members of graduate schools and universities.
Cloud Service Matchmaking Approaches: A Systematic Literature Survey Service matching concerns finding suitable services according to the service requester’s requirements, which is a complex task due to the increasing number and diversity of cloud services available. Service matching is discussed in web services composition and user oriented service marketplaces contexts. The suggested approaches have different problem definitions and have to be examined closer in order to identify comparable results and to find out which approaches have built on the former ones. One of the most important use cases is service requesters with limited technical knowledge who need to compare services based on their QoS requirements in cloud service marketplaces. Our survey examines the service matching approaches in order to find out the relation between their context and their objectives. Moreover, it evaluates their applicability for the cloud service marketplaces context.
Cluster Analysis: Tutorial with R In this tutorial we inspect classification. classification and ordination are al- ternative strategies of simplifying data. Ordination tries to simplify data into a map showing similarities among points. classification simpli es data by putting similar points into same class. The task of describing a high number of points is simpli ed to an easier task of describing a low number of classes.
Cluster Validation (Slide Deck)
Clustering large Data Sets with mixed numeric and Categorical Values Efficient partitioning of large data sets into homogenous clusters is a fundamental problem in data mining. The standard hierarchical clustering methods provide no solution for this problem due to their computational inefficiency. The k-means based methods are promising for their efficiency in processing large data sets. However, their use is often limited to numeric data. In this paper we present a k-prototypes algorithm which is based on the k-means paradigm but removes the numeric data limitation whilst preserving its efficiency. In the algorithm, objects are clustered against k prototypes. A method is developed to dynamically update the k prototypes in order to maximise the intra cluster similarity of objects. When applied to numeric data the algorithm is identical to the kmeans. To assist interpretation of clusters we use decision tree induction algorithms to create rules for clusters. These rules, together with other statistics about clusters, can assist data miners to understand and identify interesting clusters.
Clustering with Deep Learning: Taxonomy and New Methods Clustering is a fundamental machine learning method. The quality of its results is dependent on the data distribution. For this reason, deep neural networks can be used for learning better representations of the data. In this paper, we propose a systematic taxonomy for clustering with deep learning, in addition to a review of methods from the field. Based on our taxonomy, creating new methods is more straightforward. We also propose a new approach which is built on the taxonomy and surpasses some of the limitations of some previous work. Our experimental evaluation on image datasets shows that the method approaches state-of-the-art clustering quality, and performs better in some cases.
Cogniculture: Towards a Better Human-Machine Co-evolution Research in Artificial Intelligence is breaking technology barriers every day. New algorithms and high performance computing are making things possible which we could only have imagined earlier. Though the enhancements in AI are making life easier for human beings day by day, there is constant fear that AI based systems will pose a threat to humanity. People in AI community have diverse set of opinions regarding the pros and cons of AI mimicking human behavior. Instead of worrying about AI advancements, we propose a novel idea of cognitive agents, including both human and machines, living together in a complex adaptive ecosystem, collaborating on human computation for producing essential social goods while promoting sustenance, survival and evolution of the agents’ life cycle. We highlight several research challenges and technology barriers in achieving this goal. We propose a governance mechanism around this ecosystem to ensure ethical behaviors of all cognitive agents. Along with a novel set of use-cases of Cogniculture, we discuss the road map ahead for this journey.
Cognitive Dynamic Systems: A Technical Review of Cognitive Radar We start with the history of cognitive radar, where origins of the PAC, Fuster research on cognition and principals of cognition are provided. Fuster describes five cognitive functions: perception, memory, attention, language, and intelligence. We describe the Perception-Action Cyclec as it applies to cognitive radar, and then discuss long-term memory, memory storage, memory retrieval and working memory. A comparison between memory in human cognition and cognitive radar is given as well. Attention is another function described by Fuster, and we have given the comparison of attention in human cognition and cognitive radar. We talk about the four functional blocks from the PAC: Bayesian filter, feedback information, dynamic programming and state-space model for the radar environment. Then, to show that the PAC improves the tracking accuracy of Cognitive Radar over Traditional Active Radar, we have provided simulation results. In the simulation, three nonlinear filters: Cubature Kalman Filter, Unscented Kalman Filter and Extended Kalman Filter are compared. Based on the results, radars implemented with CKF perform better than the radars implemented with UKF or radars implemented with EKF. Further, radar with EKF has the worst accuracy and has the biggest computation load because of derivation and evaluation of Jacobian matrices. We suggest using the concept of risk management to better control parameters and improve performance in cognitive radar. We believe, spectrum sensing can be seen as a potential interest to be used in cognitive radar and we propose a new approach Probabilistic ICA which will presumably reduce noise based on estimation error in cognitive radar. Parallel computing is a concept based on divide and conquers mechanism, and we suggest using the parallel computing approach in cognitive radar by doing complicated calculations or tasks to reduce processing time.
Collaborative Filtering Recommender Systems Recommender systems are an important part of the information and e-commerce ecosystem. They represent a powerful method for enabling users to filter through large information and product spaces. Nearly two decades of research on collaborative filtering have led to a varied set of algorithms and a rich collection of tools for evaluating their performance. Research in the field is moving in the direction of a richer understanding of how recommender technology may be embedded in specific domains. The differing personalities exhibited by different recommender algorithms show that recommendation is not a one-sizefits- all problem. Specific tasks, information needs, and item domains represent unique problems for recommenders, and design and evaluation of recommenders needs to be done based on the user tasks to be supported. Effective deployments must begin with careful analysis of prospective users and their goals. Based on this analysis, system designers have a host of options for the choice of algorithm and for its embedding in the surrounding user experience. This paper discusses a wide variety of the choices available and their implications, aiming to provide both practicioners and researchers with an introduction to the important issues underlying recommenders and current best practices for addressing these issues.
Combine Statistical Thinking With Scientific Practice: A Protocol of a Bayesian Thesis Project For Undergraduate Students Current developments in the statistics community suggest that modern statistics education should be structured holistically, i.e., by allowing students to work with real data and answer concrete statistical questions, but also by educating them about alternative statistical frameworks, such as Bayesian statistics. In this article, we describe how we incorporated such a holistic structure in a Bayesian thesis project on ordered binomial probabilities. The project was targeted at undergraduate students in psychology with basic knowledge in Bayesian statistics and programming, but no formal mathematical training. The thesis project aimed to (1) convey the basic mathematical concepts of Bayesian inference, (2) let students experience the entire empirical cycle including the collection, analysis, and interpretation of data, and (3) teach students open science practices.
Combining Predictions for Accurate Recommender Systems We analyze the application of ensemble learning to recommender systems on the Net ix Prize dataset. For our analysis we use a set of diverse state-of-the-art collaborative ltering (CF) algorithms, which include: SVD, Neighborhood Based Approaches, Restricted Boltzmann Machine, Asymmetric Factor Model and Global E ects. We show that linearly combining (blending) a set of CF algorithms increases the accuracy and outperforms any single CF algorithm. Furthermore, we show how to use ensemble methods for blending predictors in order to outperform a single blending algorithm. The dataset and the source code for the ensemble blending are available online.
Comment: A brief survey of the current state of play for Bayesian computation in data science at Big-Data scale We wish to contribute to the discussion of ‘Comparing Consensus Monte Carlo Strategies for Distributed Bayesian Computation’ by offering our views on the current best methods for Bayesian computation, both at big-data scale and with smaller data sets, as summarized in Table 1. This table is certainly an over-simplification of a highly complicated area of research in constant (present and likely future) flux, but we believe that constructing summaries of this type is worthwhile despite their drawbacks, if only to facilitate further discussion.
Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches Commonsense knowledge and commonsense reasoning are some of the main bottlenecks in machine intelligence. In the NLP community, many benchmark datasets and tasks have been created to address commonsense reasoning for language understanding. These tasks are designed to assess machines’ ability to acquire and learn commonsense knowledge in order to reason and understand natural language text. As these tasks become instrumental and a driving force for commonsense research, this paper aims to provide an overview of existing tasks and benchmarks, knowledge resources, and learning and inference approaches toward commonsense reasoning for natural language understanding. Through this, our goal is to support a better understanding of the state of the art, its limitations, and future challenges.
Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis We describe a detailed analysis of a sample of large benchmark of commonsense reasoning problems that has been automatically obtained from WordNet, SUMO and their mapping. The objective is to provide a better assessment of the quality of both the benchmark and the involved knowledge resources for advanced commonsense reasoning tasks. By means of this analysis, we are able to detect some knowledge misalignments, mapping errors and lack of knowledge and resources. Our final objective is the extraction of some guidelines towards a better exploitation of this commonsense knowledge framework by the improvement of the included resources.
Community Detection in Networks: The Leader-Follower Algorithm Natural networks such as those between humans observed through their interactions or biological networks predicted based on various experimental measurements contain a wealth of information about the unobserved structure of the social or biological system. However, these networks are inherently noisy in the sense that they contain spurious connections making them seemingly dense. Therefore, identifying important, refined structures such as communities or clusters becomes quite challenging. Specifically, we find that the popular, traditional method of spectral clustering does not manage to learn refined community structure. The primary reason for this is that it is based upon external community connectivity properties such as graph-cuts. Motivated to overcome this limitation, we propose a community detection algorithm, called the leader-follower algorithm, based upon identifying the natural internal structure of the expected communities. The algorithm uses the notion of network centrality in a novel manner to differentiate leaders (nodes which connect different communities) from loyal followers (nodes which only have neighbors within a single community). Using this approach, it is able to learn the communities from the network structure. A salient feature of our algorithm is that, unlike the spectral clustering, it does not require knowledge of number of communities in the network; it learns it naturally. We show that our algorithm is quite effective. We prove that it detects all of the communities exactly for any network possessing communities with the natural internal structure expected in social networks. More importantly, we demonstrate its effectiveness in the context of various real networks ranging from social networks such as Facebook to biological networks such as an fMRI based human brain network.
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms In the arena of software, data mining technology has been considered as useful means for identifying patterns and trends of large volume of data. This approach is basically used to extract the unknown pattern from the large set of data for business as well as real time applications. It is a computational intelligence discipline which has emerged as a valuable tool for data analysis, new knowledge discovery and autonomous decision making. The raw, unlabeled data from the large volume of dataset can be classified initially in an unsupervised fashion by using cluster analysis i.e. clustering the assignment of a set of observations into clusters so that observations in the same cluster may be in some sense be treated as similar. The outcome of the clustering process and efficiency of its domain application are generally determined through algorithms. There are various algorithms which are used to solve this problem. In this research work two important clustering algorithms namely centroid based K-Means and representative object based FCM (Fuzzy C-Means) clustering algorithms are compared. These algorithms are applied and performance is evaluated on the basis of the efficiency of clustering output. The numbers of data points as well as the number of clusters are the factors upon which the behaviour patterns of both the algorithms are analyzed. FCM produces close results to K-Means clustering but it still requires more computation time than K-Means clustering.
Comparative Analysis of Open Source Frameworks for Machine Learning with Use Case in Single-Threaded and Multi-Threaded Modes The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation.
Comparative Study on Generative Adversarial Networks In recent years, there have been tremendous advancements in the field of machine learning. These advancements have been made through both academic as well as industrial research. Lately, a fair amount of research has been dedicated to the usage of generative models in the field of computer vision and image classification. These generative models have been popularized through a new framework called Generative Adversarial Networks. Moreover, many modified versions of this framework have been proposed in the last two years. We study the original model proposed by Goodfellow et al. as well as modifications over the original model and provide a comparative analysis of these models.
Comparison of Bayesian predictive methods for model selection The goal of this paper is to compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches. We focus on the variable subset selection for regression and classification and perform several numerical experiments using both simulated and real world data. The results show that the optimization of a utility estimate such as the cross-validation score is liable to finding overfitted models due to relatively high variance in the utility estimates when the data is scarce. Better and much less varying results are obtained by incorporating all the uncertainties into a full encompassing model and projecting this information onto the submodels. The reference model projection appears to outperform also the maximum a posteriori model and the selection of the most probable variables. The study also demonstrates that the model selection can greatly bene t from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.
Comparison of PCA with ICA from data distribution perspective We performed an empirical comparison of ICA and PCA algorithms by applying them on two simulated noisy time series with varying distribution parameters and level of noise. In general, ICA shows better results than PCA because it takes into account higher moments of data distribution. On the other hand, PCA remains quite sensitive to the level of correlations among signals.
Complex and Holographic Embeddings of Knowledge Graphs: A Comparison Embeddings of knowledge graphs have received significant attention due to their excellent performance for tasks like link prediction and entity resolution. In this short paper, we are providing a comparison of two state-of-the-art knowledge graph embeddings for which their equivalence has recently been established, i.e., ComplEx and HolE [Nickel, Rosasco, and Poggio, 2016; Trouillon et al., 2016; Hayashi and Shimbo, 2017]. First, we briefly review both models and discuss how their scoring functions are equivalent. We then analyze the discrepancy of results reported in the original articles, and show experimentally that they are likely due to the use of different loss functions. In further experiments, we evaluate the ability of both models to embed symmetric and antisymmetric patterns. Finally, we discuss advantages and disadvantages of both models and under which conditions one would be preferable to the other.
Complex Contagions: A Decade in Review Since the publication of ‘Complex Contagions and the Weakness of Long Ties’ in 2007, complex contagions have been studied across an enormous variety of social domains. In reviewing this decade of research, we discuss recent advancements in applied studies of complex contagions, particularly in the domains of health, innovation diffusion, social media, and politics. We also discuss how these empirical studies have spurred complementary advancements in the theoretical modeling of contagions, which concern the effects of network topology on diffusion, as well as the effects of individual-level attributes and thresholds. In synthesizing these developments, we suggest three main directions for future research. The first concerns the study of how multiple contagions interact within the same network and across networks, in what may be called an ecology of contagions. The second concerns the study of how the structure of thresholds and their behavioral consequences can vary by individual and social context. The third area concerns the roles of diversity and homophily in the dynamics of complex contagion, including both diversity of demographic profiles among local peers, and the broader notion of structural diversity within a network. Throughout this discussion, we make an effort to highlight the theoretical and empirical opportunities that lie ahead.
Comprehensive View on Cran Packages (Cheat Sheet)
Computation of the multivariate Oja median The multivariate Oja (1983) median is an affine equivariant multivariate location estimate with high efficiency. This estimate has a bounded influence function but zero breakdown. The computation of the estimate appears to be highly intensive. We consider different, exact and stochastic, algorithms for the calculation of the value of the estimate. In the stochastic algorithms, the gradient of the objective function, the rank function, is estimated by sampling observation hyperplanes. The estimated rank function with its estimated accuracy then yields a confidence region for the true Oja samplemedian, and the confidence region shrinks to the sample median with the increasing number of the sampled hyperplanes. Regular grids and and the grid given by the data points are used in the construction. Computation times of different algorithms are discussed and compared.
Computational Intelligence in Sports: A Systematic Literature Review Recently, data mining studies are being successfully conducted to estimate several parameters in a variety of domains. Data mining techniques have attracted the attention of the information industry and society as a whole, due to a large amount of data and the imminent need to turn it into useful knowledge. However, the effective use of data in some areas is still under development, as is the case in sports, which in recent years, has presented a slight growth; consequently, many sports organizations have begun to see that there is a wealth of unexplored knowledge in the data extracted by them. Therefore, this article presents a systematic review of sports data mining. Regarding years 2010 to 2018, 31 types of research were found in this topic. Based on these studies, we present the current panorama, themes, the database used, proposals, algorithms, and research opportunities. Our findings provide a better understanding of the sports data mining potentials, besides motivating the scientific community to explore this timely and interesting topic.
Computational Machines in a Coexistence with Concrete Universals and Data Streams We discuss that how the majority of traditional modeling approaches are following the idealism point of view in scientific modeling, which follow the set theoretical notions of models based on abstract universals. We show that while successful in many classical modeling domains, there are fundamental limits to the application of set theoretical models in dealing with complex systems with many potential aspects or properties depending on the perspectives. As an alternative to abstract universals, we propose a conceptual modeling framework based on concrete universals that can be interpreted as a category theoretical approach to modeling. We call this modeling framework pre-specific modeling. We further, discuss how a certain group of mathematical and computational methods, along with ever-growing data streams are able to operationalize the concept of pre-specific modeling.
Computational Power and the Social Impact of Artificial Intelligence Machine learning is a computational process. To that end, it is inextricably tied to computational power – the tangible material of chips and semiconductors that the algorithms of machine intelligence operate on. Most obviously, computational power and computing architectures shape the speed of training and inference in machine learning, and therefore influence the rate of progress in the technology. But, these relationships are more nuanced than that: hardware shapes the methods used by researchers and engineers in the design and development of machine learning models. Characteristics such as the power consumption of chips also define where and how machine learning can be used in the real world. Despite this, many analyses of the social impact of the current wave of progress in AI have not substantively brought the dimension of hardware into their accounts. While a common trope in both the popular press and scholarly literature is to highlight the massive increase in computational power that has enabled the recent breakthroughs in machine learning, the analysis frequently goes no further than this observation around magnitude. This paper aims to dig more deeply into the relationship between computational power and the development of machine learning. Specifically, it examines how changes in computing architectures, machine learning methodologies, and supply chains might influence the future of AI. In doing so, it seeks to trace a set of specific relationships between this underlying hardware layer and the broader social impacts and risks around AI.
Computational Theories of Curiosity-Driven Learning What are the functions of curiosity What are the mechanisms of curiosity-driven learning We approach these questions using concepts and tools from machine learning and developmental robotics. We argue that curiosity-driven learning enables organisms to make discoveries to solve complex problems with rare or deceptive rewards. By fostering exploration and discovery of a diversity of behavioural skills, and ignoring these rewards, curiosity can be efficient to bootstrap learning when there is no information, or deceptive information, about local improvement towards these problems. We review both normative and heuristic computational frameworks used to understand the mechanisms of curiosity in humans, conceptualizing the child as a sense-making organism. These frameworks enable us to discuss the bi-directional causal links between curiosity and learning, and to provide new hypotheses about the fundamental role of curiosity in self-organizing developmental structures through curriculum learning. We present various developmental robotics experiments that study these mechanisms in action, both supporting these hypotheses and opening new research avenues in machine learning and artificial intelligence. Finally, we discuss challenges for the design of experimental paradigms for studying curiosity in psychology and cognitive neuroscience. Keywords: Curiosity, intrinsic motivation, lifelong learning, predictions, world model, rewards, free-energy principle, learning progress, machine learning, AI, developmental robotics, development, curriculum learning, self-organization.
Computer Science and Metaphysics: A Cross-Fertilization Computational philosophy is the use of mechanized computational techniques to unearth philosophical insights that are either difficult or impossible to find using traditional philosophical methods. Computational metaphysics is computational philosophy with a focus on metaphysics. In this paper, we (a) develop results in modal metaphysics whose discovery was computer assisted, and (b) conclude that these results work not only to the obvious benefit of philosophy but also, less obviously, to the benefit of computer science, since the new computational techniques that led to these results may be more broadly applicable within computer science. The paper includes a description of our background methodology and how it evolved, and a discussion of our new results.
Computer-Assisted Text Analysis for Social Science: Topic Models and Beyond Topic models are a family of statistical-based algorithms to summarize, explore and index large collections of text documents. After a decade of research led by computer scientists, topic models have spread to social science as a new generation of data-driven social scientists have searched for tools to explore large collections of unstructured text. Recently, social scientists have contributed to topic model literature with developments in causal inference and tools for handling the problem of multi-modality. In this paper, I provide a literature review on the evolution of topic modeling including extensions for document covariates, methods for evaluation and interpretation, and advances in interactive visualizations along with each aspect’s relevance and application for social science research.
Computer-Simulation Model Theory (P= NP is not provable) The simulation hypothesis says that all the materials and events in the reality (including the universe, our body, our thinking, walking and etc) are computations, and the reality is a computer simulation program like a video game. All works we do (talking, reasoning, seeing and etc) are computations performed by the universe-computer which runs the simulation program. Inspired by the view of the simulation hypothesis (but independent of this hypothesis), we propose a new method of logical reasoning named ‘Computer-Simulation Model Theory’, CSMT. Computer-Simulation Model Theory is an extension of Mathematical Model Theory where instead of mathematical-structures, computer-simulations are replaced, and the activity of reasoning and computing of the reasoner is also simulated in the model. (CSMT) argues that: For a formula $\phi$, construct a computer simulation model $S$, such that 1- $\phi$ does not hold in $S$, and 2- the reasoner $I$ $($human being, the one who lives inside the reality$)$ cannot distinguish $S$ from the reality $(R)$, then $I$ cannot prove $\phi$ in reality. Although $\mathrm{CSMT}$ is inspired by the simulation hypothesis, but this reasoning method is independent of the acceptance of this hypothesis. As we argue in this part, one may do not accept the simulation hypothesis, but knows $\mathrm{CSMT}$ a valid reasoning method. As an application of Computer-Simulation Model Theory, we study the famous problem P vs NP. We let $\phi \equiv\mathrm{ [P= NP]} $ and construct a computer simulation model $E$ such that $\mathrm{P= NP}$ does not hold in $E$.
Computing the Unique Information Given a set of predictor variables and a response variable, how much information do the predictors have about the response, and how is this information distributed between unique, complementary, and shared components Recent work has proposed to quantify the unique component of the decomposition as the minimum value of the conditional mutual information over a constrained set of information channels. We present an efficient iterative divergence minimization algorithm to solve this optimization problem with convergence guarantees, and we evaluate its performance against other techniques.
Concept Tagging for Natural Language Understanding: Two Decadelong Algorithm Development Concept tagging is a type of structured learning needed for natural language understanding (NLU) systems. In this task, meaning labels from a domain ontology are assigned to word sequences. In this paper, we review the algorithms developed over the last twenty five years. We perform a comparative evaluation of generative, discriminative and deep learning methods on two public datasets. We report on the statistical variability performance measurements. The third contribution is the release of a repository of the algorithms, datasets and recipes for NLU evaluation.
Condition-Based Maintenance Using Sensor Arrays and Telematics Emergence of uniquely addressable embeddable devices has raised the bar on Telematics capabilities. Though the technology itself is not new, its application has been quite limited until now. Sensor based telematics technologies generate volumes of data that are orders of magnitude larger than what operators have dealt with previously. Real-time big data computation capabilities have opened the flood gates for creating new predictive analytics capabilities into an otherwise simple data log systems, enabling real-time control and monitoring to take preventive action in case of any anomalies. Condition-based-maintenance, usage-based-insurance, smart metering and demand-based load generation etc. are some of the predictive analytics use cases for Telematics. This paper presents the approach of condition-based maintenance using real-time sensor monitoring, Telematics and predictive data analytics.
Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2013) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B = Theta(n^1.5) bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B = Theta(n) replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.
Conservation AI: Live Stream Analysis for the Detection of Endangered Species Using Convolutional Neural Networks and Drone Technology Many different species are adversely affected by poaching. In response to this escalating crisis, efforts to stop poaching using hidden cameras, drones and DNA tracking have been implemented with varying degrees of success. Limited resources, costs and logistical limitations are often the cause of most unsuccessful poaching interventions. The study presented in this paper outlines a flexible and interoperable framework for the automatic detection of animals and poaching activity to facilitate early intervention practices. Using a robust deep learning pipeline, a convolutional neural network is trained and implemented to detect rhinos and cars (considered an important tool in poaching for fast access and artefact transportation in natural habitats) in the study, that are found within live video streamed from drones Transfer learning with the Faster RCNN Resnet 101 is performed to train a custom model with 350 images of rhinos and 350 images of cars. Inference is performed using a frame sampling technique to address the required trade-off control precision and processing speed and maintain synchronisation with the live feed. Inference models are hosted on a web platform using flask web serving, OpenCV and TensorFlow 1.13. Video streams are transmitted from a DJI Mavic Pro 2 drone using the Real-Time Messaging Protocol (RMTP). The best trained Faster RCNN model achieved a mAP of 0.83 @IOU 0.50 and 0.69 @IOU 0.75 respectively. In comparison an SSD-mobilenetmodel trained under the same experimental conditions achieved a mAP of 0.55 @IOU .50 and 0.27 @IOU 0.75.The results demonstrate that using a FRCNN and off-the-shelf drones is a promising and scalable option for a range of conservation projects.
Considerations for maximising analytic performance When it comes to running business analytics, there are three key nonfunctional requirements that must be met: fast performance, usability and affordability. Bloor Research was asked by IBM to compare the performance capabilities of the leading business analytic platforms. Specifically, we were asked to evaluate how the combined capabilities of business analytic tools and the underlying database management system can affect the overall performance of your analytic applications, reports and dashboards.
Constrained Bayesian Networks: Theory, Optimization, and Applications We develop the theory and practice of an approach to modelling and probabilistic inference in causal networks that is suitable when application-specific or analysis-specific constraints should inform such inference or when little or no data for the learning of causal network structure or probability values at nodes are available. Constrained Bayesian Networks generalize a Bayesian Network such that probabilities can be symbolic, arithmetic expressions and where the meaning of the network is constrained by finitely many formulas from the theory of the reals. A formal semantics for constrained Bayesian Networks over first-order logic of the reals is given, which enables non-linear and non-convex optimisation algorithms that rely on decision procedures for this logic, and supports the composition of several constrained Bayesian Networks. A non-trivial case study in arms control, where few or no data are available to assess the effectiveness of an arms inspection process, evaluates our approach. An open-access prototype implementation of these foundations and their algorithms uses the SMT solver Z3 as decision procedure, leverages an open-source package for Bayesian inference to symbolic computation, and is evaluated experimentally.
Content Recommendation through Semantic Annotation of User Reviews and Linked Data – An Extended Technical Report Nowadays, most recommender systems exploit user-provided ratings to infer their preferences. However, the growing popularity of social and e-commerce websites has encouraged users to also share comments and opinions through textual reviews. In this paper, we introduce a new recommendation approach which exploits the semantic annotation of user reviews to extract useful and non-trivial information about the items to recommend. It also relies on the knowledge freely available in the Web of Data, notably in DBpedia and Wikidata, to discover other resources connected with the annotated entities. We evaluated our approach in three domains, using both DBpedia and Wikidata. The results showed that our solution provides a better ranking than another recommendation method based on the Web of Data, while it improves in novelty with respect to traditional techniques based on ratings. Additionally, our method achieved a better performance with Wikidata than DBpedia.
Content Selection in Data-to-Text Systems: A Survey Data-to-text systems are powerful in generating reports from data automatically and thus they simplify the presentation of complex data. Rather than presenting data using visualisation techniques, data-to-text systems use natural (human) language, which is the most common way for human-human communication. In addition, data-to-text systems can adapt their output content to users’ preferences, background or interests and therefore they can be pleasant for users to interact with. Content selection is an important part of every data-to-text system, because it is the module that determines which from the available information should be conveyed to the user. This survey initially introduces the field of data-to-text generation, describes the general data-to-text system architecture and then it reviews the state-of-the-art content selection methods. Finally, it provides recommendations for choosing an approach and discusses opportunities for future research.
Context is Everything: Finding Meaning Statistically in Semantic Spaces This paper introduces a simple and explicit measure of word importance in a global context, including very small contexts (10+ sentences). After generating a word-vector space containing both 2-gram clauses and single tokens, it became clear that more contextually significant words disproportionately define clause meanings. Using this simple relationship in a weighted bag-of-words sentence embedding model results in sentence vectors that outperform the state-of-the-art for subjectivity/objectivity analysis, as well as paraphrase detection, and fall within those produced by state-of-the-art models for six other transfer learning tests. The metric was then extended to a sentence/document summarizer, an improved (and context-aware) cosine distance and a simple document stop word identifier. The sigmoid-global context weighted bag of words is presented as a new baseline for sentence embeddings.
Context-Aware Recommender Systems The importance of contextual information has been recognized by researchers and practitioners in many disciplines, including e-commerce personalization, information retrieval, ubiquitous and mobile computing, data mining, marketing, and management. While a substantial amount of research has already been performed in the area of recommender systems, most existing approaches focus on recommending the most relevant items to users without taking into account any additional contextual information, such as time, location, or the company of other people (e.g., for watching movies or dining out). In this chapter we argue that relevant contextual information does matter in recommender systems and that it is important to take this information into account when providing recommendations. We discuss the general notion of context and how it can be modeled in recommender systems. Furthermore, we introduce three different algorithmic paradigms – contextual prefiltering, post-filtering, and modeling – for incorporating contextual information into the recommendation process, discuss the possibilities of combining several context-aware recommendation techniques into a single unifying approach, and provide a case study of one such combined approach. Finally, we present additional capabil- ities for context-aware recommenders and discuss important and promising directions for future research.
Continual Lifelong Learning with Neural Networks: A Review Humans and animals have the ability to continually acquire and fine-tune knowledge throughout their lifespan. This ability is mediated by a rich set of neurocognitive functions that together contribute to the early development and experience-driven specialization of our sensorimotor skills. Consequently, the ability to learn from continuous streams of information is crucial for computational learning systems and autonomous agents (inter)acting in the real world. However, continual lifelong learning remains a long-standing challenge for machine learning and neural network models since the incremental acquisition of new skills from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback also for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which the number of tasks is not known a priori and the information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to continual lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic interference. Although significant advances have been made in domain-specific continual lifelong learning with neural networks, extensive research efforts are required for the development of general-purpose artificial intelligence and autonomous agents. We discuss well-established research and recent methodological trends motivated by experimentally observed lifelong learning factors in biological systems. Such factors include principles of neurosynaptic stability-plasticity, critical developmental stages, intrinsically motivated exploration, transfer learning, and crossmodal integration.
Control And Protect Sensitive Information In The Era Of Big Data This report outlines the future look of Forrester´s solution for security and risk (SandR) executives seeking to develop a holistic strategy to protect and manage sensitive data. In the never-ending race to stay ahead of the competition, companies are developing advanced capabilities to store, process, and analyze vast amounts of data from social networks, sensors, IT systems, and other sources to improve business intelligence and decisioning capabilities. ‘Big data processing’ refers to the tools and techniques that handle the extreme data volumes and velocities and wide variety of data formats resulting from implementing these capabilities. As organizations aggregate more and more data, they need to be aware that much of it could be financial, personal, and other types of sensitive data that are subject to global laws and regulations. SandR professionals need to be aware of the security issues surrounding big data so they can take an active role early in these initiatives. This report will help SandR pros understand how to control and properly protect sensitive information in the era of big data.
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey Ubiquitous sensors and smart devices from factories and communities guarantee massive amounts of data and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network. As an important enabler broadly changing people’s lives, from face recognition to ambitious smart factories and cities, artificial intelligence (especially deep learning) applications and services have experienced a thriving development process. However, due to efficiency and latency issues, the current cloud computing service architecture hinders the vision of ‘providing artificial intelligence for every person and every organization at everywhere’. Thus, recently, a better solution is unleashing deep learning services from the cloud to the edge near to data sources. Therefore, edge intelligence, aiming to facilitate the deployment of deep learning services by edge computing, has received great attention. In addition, deep learning, as the main representative of artificial intelligence, can be integrated into edge computing frameworks to build intelligent edge for dynamic, adaptive edge maintenance and management. With regard to mutually benefited edge intelligence and intelligent edge, this paper introduces and discusses: 1) the application scenarios of both; 2) the practical implementation methods and enabling technologies, namely deep learning training and inference in the customized edge computing framework; 3) existing challenges and future trends of more pervasive and fine-grained intelligence. We believe that this survey can help readers to garner information scattered across the communication, networking, and deep learning, understand the connections between enabling technologies, and promotes further discussions on the fusion of edge intelligence and intelligent edge.
Converging High-Throughput and High-Performance Computing: A Case Study The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan — a DOE leadership facility in conjunction with traditional distributed high-throughput computing to reach sustained production scales of approximately 51M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.
Cooperating with Machines Since Alan Turing envisioned Artificial Intelligence (AI) [1], a major driving force behind technical progress has been competition with human cognition. Historical milestones have been frequently associated with computers matching or outperforming humans in difficult cognitive tasks (e.g. face recognition [2], personality classification [3], driving cars [4], or playing video games [5]), or defeating humans in strategic zero-sum encounters (e.g. Chess [6], Checkers [7], Jeopardy! [8], Poker [9], or Go [10]). In contrast, less attention has been given to developing autonomous machines that establish mutually cooperative relationships with people who may not share the machine’s preferences. A main challenge has been that human cooperation does not require sheer computational power, but rather relies on intuition [11], cultural norms [12], emotions and signals [13, 14, 15, 16], and pre-evolved dispositions toward cooperation [17], common-sense mechanisms that are difficult to encode in machines for arbitrary contexts. Here, we combine a state-of-the-art machine-learning algorithm with novel mechanisms for generating and acting on signals to produce a new learning algorithm that cooperates with people and other machines at levels that rival human cooperation in a variety of two-player repeated stochastic games. This is the first general-purpose algorithm that is capable, given a description of a previously unseen game environment, of learning to cooperate with people within short timescales in scenarios previously unanticipated by algorithm designers. This is achieved without complex opponent modeling or higher-order theories of mind, thus showing that flexible, fast, and general human-machine cooperation is computationally achievable using a non-trivial, but ultimately simple, set of algorithmic mechanisms.
Cooperative Multi-Agent Planning: A Survey Cooperative multi-agent planning (MAP) is a relatively recent research field that combines technologies, algorithms and techniques developed by the Artificial Intelligence Planning and Multi-Agent Systems communities. While planning has been generally treated as a single-agent task, MAP generalizes this concept by considering multiple intelligent agents that work cooperatively to develop a course of action that satisfies the goals of the group. This paper reviews the most relevant approaches to MAP, putting the focus on the solvers that took part in the 2015 Competition of Distributed and Multi-Agent Planning, and classifies them according to their key features and relative performance.
Copulas: A Personal View Copula modeling has taken the world of finance and insurance, and well beyond, by storm. Why is this In this paper I review the early start of this development, discuss some important current research, mainly from an applications point of view, and comment on potential future developments. An alternative title of the paper would be ‘Demystifying the copula craze’. The paper also contains what I would like to call the copula must-reads.
Copy the dynamics using a learning machine Is it possible to generally construct a dynamical system to simulate a black system without recovering the equations of motion of the latter Here we show that this goal can be approached by a learning machine. Trained by a set of input-output responses or a segment of time series of a black system, a learning machine can be served as a copy system to mimic the dynamics of various black systems. It can not only behave as the black system at the parameter set that the training data are made, but also recur the evolution history of the black system. As a result, the learning machine provides an effective way for prediction, and enables one to probe the global dynamics of a black system. These findings have significance for practical systems whose equations of motion cannot be approached accurately. Examples of copying the dynamics of an artificial neural network, the Lorenz system, and a variable star are given. Our idea paves a possible way towards copy a living brain.
Core Decomposition in Multilayer Networks: Theory, Algorithms, and Applications Multilayer networks are a powerful paradigm to model complex systems, where various relations might occur among the same set of entities. Despite the keen interest in a variety of problems, algorithms, and analysis methods in this type of network, the problem of extracting dense subgraphs has remained largely unexplored. As a first step in this direction, we study the problem of core decomposition of a multilayer network. Unlike the single-layer counterpart in which cores are all nested into one another, in the multilayer context no total order exists among multilayer cores: they form a lattice whose size is exponential in the number of layers. In this setting we devise three algorithms which differ in the way they visit the core lattice and in their pruning techniques. We assess time and space efficiency of the three algorithms on a large variety of real-world multilayer networks. We then study the problem of extracting only the inner-most cores, i.e., the cores that are not dominated by any other core in terms of their index on all the layers. As inner-most cores are orders of magnitude less than all the cores, it is desirable to develop algorithms that effectively exploit the maximality property and extract inner-most cores directly, without first computing a complete decomposition. Moreover, we showcase an application of the multilayer core-decomposition tool to the problem of densest-subgraph extraction from multilayer networks. We introduce a definition of multilayer densest subgraph that trades-off between high density and number of layers in which the high density holds, and show how multilayer core decomposition can be exploited to approximate this problem with quality guarantees. We also exploit multilayer core decomposition to speed-up the extraction of frequent cross-graph quasi-cliques and to generalize the community-search problem to the multilayer setting.
Correlated Topic Models Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than x-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution. We derive a mean-field variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. The CTM gives a better fit than LDA on a collection of OCRed articles from the journal Science. Furthermore, the CTM provides a natural way of visualizing and exploring this and other unstructured data sets.
Correspondence Analysis This working paper gives a comprehensive explanation of the multivariate technique called correspondence analysis, applied in the context of a large survey of a nation´s state of health, in this case the Spanish National Health Survey. It is first shown how correspondence analysis can be used to interpret a simple cross-tabulation by visualizing the table in the form of a map of points representing the rows and columns of the table. Combinations of variables can also be interpreted by coding the data in the appropriate way. The technique can also be used to deduce optimal scale values for the levels of a categorical variable, thus giving quantitative meaning to the categories. Multiple correspondence analysis can analyze several categorical variables simultaneously, and is analogous to factor analysis of continuous variables. Other uses of correspondence analysis are illustrated using different variables of the same Spanish database: for example, exploring patterns of missing data and visualizing trends across surveys from consecutive years.
Coupled Ensembles of Neural Networks We investigate in this paper the architecture of deep convolutional networks. Building on existing state of the art models, we propose a reconfiguration of the model parameters into several parallel branches at the global network level, with each branch being a standalone CNN. We show that this arrangement is an efficient way to significantly reduce the number of parameters without losing performance or to significantly improve the performance with the same level of performance. The use of branches brings an additional form of regularization. In addition to the split into parallel branches, we propose a tighter coupling of these branches by placing the ‘fuse (averaging) layer’ before the Log-Likelihood and SoftMax layers during training. This gives another significant performance improvement, the tighter coupling favouring the learning of better representations, even at the level of the individual branches. We refer to this branched architecture as ‘coupled ensembles’. The approach is very generic and can be applied with almost any DCNN architecture. With coupled ensembles of DenseNet-BC and parameter budget of 25M, we obtain error rates of 2.92%, 15.68% and 1.50% respectively on CIFAR-10, CIFAR-100 and SVHN tasks. For the same budget, DenseNet-BC has error rate of 3.46%, 17.18%, and 1.8% respectively. With ensembles of coupled ensembles, of DenseNet-BC networks, with 50M total parameters, we obtain error rates of 2.72%, 15.13% and 1.42% respectively on these tasks.
Credimus We believe that economic design and computational complexity—while already important to each other—should become even more important to each other with each passing year. But for that to happen, experts in on the one hand such areas as social choice, economics, and political science and on the other hand computational complexity will have to better understand each other’s worldviews. This article, written by two complexity theorists who also work in computational social choice theory, focuses on one direction of that process by presenting a brief overview of how most computational complexity theorists view the world. Although our immediate motivation is to make the lens through which complexity theorists see the world be better understood by those in the social sciences, we also feel that even within computer science it is very important for nontheoreticians to understand how theoreticians think, just as it is equally important within computer science for theoreticians to understand how nontheoreticians think.
Cross-Dataset Recognition: A Survey This paper summarise and analyse the cross-dataset recognition techniques with the emphasize on what kinds of methods can be used when the available source and target data are presented in different forms for boosting the target task. This paper for the first time summarises several transferring criteria in details from the concept level, which are the key bases to guide what kind of knowledge to transfer between datasets. In addition, a taxonomy of cross-dataset scenarios and problems is proposed according the properties of data that define how different datasets are diverged, thereby review the recent advances on each specific problem under different scenarios. Moreover, some real world applications and corresponding commonly used benchmarks of cross-dataset recognition are reviewed. Lastly, several future directions are identified.
Cross-media Similarity Metric Learning with Unified Deep Networks As a highlighting research topic in the multimedia area, cross-media retrieval aims to capture the complex correlations among multiple media types. Learning better shared representation and distance metric for multimedia data is important to boost the cross-media retrieval. Motivated by the strong ability of deep neural network in feature representation and comparison functions learning, we propose the Unified Network for Cross-media Similarity Metric (UNCSM) to associate cross-media shared representation learning with distance metric in a unified framework. First, we design a two-pathway deep network pretrained with contrastive loss, and employ double triplet similarity loss for fine-tuning to learn the shared representation for each media type by modeling the relative semantic similarity. Second, the metric network is designed for effectively calculating the cross-media similarity of the shared representation, by modeling the pairwise similar and dissimilar constraints. Compared to the existing methods which mostly ignore the dissimilar constraints and only use sample distance metric as Euclidean distance separately, our UNCSM approach unifies the representation learning and distance metric to preserve the relative similarity as well as embrace more complex similarity functions for further improving the cross-media retrieval accuracy. The experimental results show that our UNCSM approach outperforms 8 state-of-the-art methods on 4 widely-used cross-media datasets.
Cross-Platform Emoji Interpretation: Analysis, a Solution, and Applications Most social media platforms are largely based on text, and users often write posts to describe where they are, what they are seeing, and how they are feeling. Because written text lacks the emotional cues of spoken and face-to-face dialogue, ambiguities are common in written language. This problem is exacerbated in the short, informal nature of many social media posts. To bypass this issue, a suite of special characters called ’emojis,’ which are small pictograms, are embedded within the text. Many emojis are small depictions of facial expressions designed to help disambiguate the emotional meaning of the text. However, a new ambiguity arises in the way that emojis are rendered. Every platform (Windows, Mac, and Android, to name a few) renders emojis according to their own style. In fact, it has been shown that some emojis can be rendered so differently that they look ‘happy’ on some platforms, and ‘sad’ on others. In this work, we use real-world data to verify the existence of this problem. We verify that the usage of the same emoji can be significantly different across platforms, with some emojis exhibiting different sentiment polarities on different platforms. We propose a solution to identify the intended emoji based on the platform-specific nature of the emoji used by the author of a social media post. We apply our solution to sentiment analysis, a task that can benefit from the emoji calibration technique we use in this work. We conduct experiments to evaluate the effectiveness of the mapping in this task.
Cross-validation This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given family. For the risk estimation problem, we compute the bias (which can also be corrected) and the variance of cross-validation methods. For estimator selection, we first provide a first-order analysis (based on expectations). Then, we explain how to take into account second-order terms (from variance computations, and by taking into account the usefulness of overpenalization). This allows, in the end, to provide some guidelines for choosing the best cross-validation method for a given learning problem.
Crowd-Powered Data Mining Many data mining tasks cannot be completely addressed by automated processes, such as sentiment analysis and image classification. Crowdsourcing is an effective way to harness the human cognitive ability to process these machine-hard tasks. Thanks to public crowdsourcing platforms, e.g., Amazon Mechanical Turk and CrowdFlower, we can easily involve hundreds of thousands of ordi- nary workers (i.e., the crowd) to address these machine-hard tasks. In this tutorial, we will survey and synthesize a wide spectrum of existing studies on crowd-powered data mining. We rst give an overview of crowdsourcing, and then summarize the fundamental techniques, including quality control, cost control, and latency control, which must be considered in crowdsourced data mining. Next we review crowd-powered data mining operations, including classification, clustering, pattern mining, outlier detection, knowledge base construction and enrichment. Finally, we provide the emerging challenges in crowdsourced data mining.
Cumulative Gains Model Quality Metric This paper proposes a more comprehensive look at the ideas of KS and Area Under the Curve AUC of a cumulative gains chart to develop a model quality statistic which can be used agnostically to evaluate the quality of a wide range of models in a standardized fashion. It can be either used holistically on the entire range of the model or at a given decision threshold of the model. Further it can be extended into the model learning process.
Customer Analytics in the age of Social Media Becoming ‘customer centric’ is a top priority today, and for good reason: as if it weren´t important enough that customers buy products and contract for services, they now do much more than simply buy. Customers participate in social media networks and chat rooms; they write blogs and contribute to comment sites; and they share information through sites such as YouTube and Flickr. Their activities and expressions not only reveal personal buying behavior and interests, but they also bring into focus their influence on purchasing by others in their social networks.
Customised Structural Elicitation Established methods for structural elicitation typically rely on code modelling standard graphical models classes, most often Bayesian networks. However, more appropriate models may arise from asking the expert questions in common language about what might relate to what and exploring the logical implications of the statements. Only after identifying the best matching structure should this be embellished into a fully quantified probability model. Examples of the efficacy and potential of this more flexible approach are shown below for four classes of graphical models: Bayesian networks, Chain Event Graphs, Multi-regression Dynamic Models, and Flow Graphs. We argue that to be fully effective any structural elicitation phase must first be customised to an application and if necessary new types of structure with their own bespoke semantics elicited.
Cyber-Physical Systems Resilience: State of the Art, Research Issues and Future Trends Ideally, full integration is needed between the Internet and Cyber-Physical Systems (CPSs). These systems should fulfil time-sensitive functions with variable levels of integration with their environment, incorporating data storage, computation, communications, sensing, and control. There are, however, significant problems emerging from the convergence between CPS and Internet of Things (IoT) areas. The high heterogeneity, complexity, and dynamics of these resource-constrained systems bring new challenges to their robust and reliable operation, which implies the need for novel resilience management strategies. This paper surveys the state of the art in the relevant fields and, discusses the research issues and future trends that emerge. Thus, we hope to provide new insights into the management of resilient CPSs, formed by IoT devices, modelled by Game Theory, and flexibly programmed using the latest software and virtualization platforms.

D

Data Acceleration: Architecture for the Modern Data Supply Chain Data technologies are evolving rapidly, but organizations have adopted most of these in piecemeal fashion. As a result, enterprise data—whether related to customer interactions, business performance, computer notifications, or external events in the business environment —is vastly underutilized. Moreover, companies´ data ecosystems have become complex and littered with data silos. This makes the data more difficult to access, which in turn limits the value that organizations can get out of it. Indeed, according to a recent Gartner, Inc. report, 85 percent of Fortune 500 organizations will be unable to exploit Big Data for competitive advantage through 2015.1 Furthermore, a recent Accenture study found that half of all companies have concerns about the accuracy of their data, and the majority of executives are unclear about the business outcomes they are getting from their data analytics programs. To unlock the value hidden in their data, companies must start treating data as a supply chain, enabling it to flow easily and usefully through the entire organization—and eventually throughout each company´s ecosystem of partners, including suppliers and customers. The time is right for this approach. For one thing, new external data sources are becoming available, providing fresh opportunities for data insights. In addition, the tools and technology required to build a better data platform are available and in use. These provide a foundation on which companies can construct an integrated, end-to-end data supply chain.
Data Analysis the Data.Table way (Cheat Sheet)
Data Center Infrastructure Management (DCIM) For Dummies Data Center Infrastructure Management (DCIM) is the discipline of managing the physical infrastructure of a data center and optimizing its ongoing operation. DCIM is a software suite that bridges the traditional gap between IT and the facilities groups and coordinates between the two. DCIM reduces computing costs while making it easier to quickly support new applications and other business requirements. About This Book This book explains the importance of DCIM, describes the key components of a modern DCIM system, guides you in the selection of the right DCIM solution for your particular needs, and gives you a step-by-step formula for a successful DCIM implementation. Because this is a For Dummies book, you can be sure that it´s easy to read and has touches of humor.
Data Clustering With Leaders and Subleaders Algorithm In this paper, an efficient hierarchical clustering algorithm, suitable for large data sets is proposed for effective clustering and prototype selection for pattern classification. It is another simple and efficient technique which uses incremental clustering principles to generate a hierarchical structure for finding the subgroups/subclusters within each cluster. As an example, a two level clustering algorithm – Leaders-Subleaders, an extension of the leader algorithm is presented. Classification accuracy (CA) obtained using the representatives generated by the Leaders-Subleaders method is found to be better than that of using leaders as representatives. Even if more number of prototypes are generated, classification time is less as only a part of the hierarchical structure is searched.
Data Clustering: A Review Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
Data Curation with Deep Learning [Vision]: Towards Self Driving Data Curation Past. Data curation – the process of discovering, integrating, and cleaning data – is one of the oldest data management problems. Unfortunately, it is still the most time consuming and least enjoyable work of data scientists. So far, successful data curation stories are mainly ad-hoc solutions that are either domain-specific (for example, ETL rules) or task-specific (for example, entity resolution). Present. The power of current data curation solutions are not keeping up with the ever changing data ecosystem in terms of volume, velocity, variety and veracity, mainly due to the high human cost, instead of machine cost, needed for providing the ad-hoc solutions mentioned above. Meanwhile, deep learning is making strides in achieving remarkable successes in areas such as image recognition, natural language processing, and speech recognition. This is largely due to its ability to understanding features that are neither domain-specific nor task-specific. Future. Data curation solutions need to keep the pace with the fast-changing data ecosystem, where the main hope is to devise domain-agnostic and task-agnostic solutions. To this end, we start a new research project, called AutoDC, to unleash the potential of deep learning towards self-driving data curation. We will discuss how different deep learning concepts can be adapted and extended to solve various data curation problems. We showcase some low-hanging fruits about the early encounters between deep learning and data curation happening in AutoDC. We believe that the directions pointed out by this work will not only drive AutoDC towards democratizing data curation, but also serve as a cornerstone for researchers and practitioners to move to a new realm of data curation solutions.
Data Driven: Creating a Data Culture The data movement is in full swing. There are conferences (Strata +Hadoop World), bestselling books (Big Data, The Signal and the Noise, Lean Analytics), business articles (‘Data Scientist: The Sexiest Job of the 21st Century’), and training courses (An Introduction to Machine Learning with Web Data, the Insight Data Science Fellows Program) on the value of data and how to be a data scientist. Unfortunately, there is little that discusses how companies that successfully use data actually do that work. Using data effectively is not just about which database you use or how many data scientists you have on staff, but rather it´s a complex interplay between the data you have, where it is stored and how people work with it, and what problems are considered worth solving. While most people focus on the technology, the best organizations recognize that people are at the center of this complexity. In any organization, the answers to questions such as who controls the data, who they report to, and how they choose what to work on are always more important than whether to use a database like PostgreSQL or Amazon Redshift or HDFS. We want to see more organizations succeed with data. We believe data will change the way that businesses interact with the world, and we want more people to have access. To succeed with data, businesses must develop a data culture.
Data Innovation for International Development: An overview of natural language processing for qualitative data analysis Availability, collection and access to quantitative data, as well as its limitations, often make qualitative data the resource upon which development programs heavily rely. Both traditional interview data and social media analysis can provide rich contextual information and are essential for research, appraisal, monitoring and evaluation. These data may be difficult to process and analyze both systematically and at scale. This, in turn, limits the ability of timely data driven decision-making which is essential in fast evolving complex social systems. In this paper, we discuss the potential of using natural language processing to systematize analysis of qualitative data, and to inform quick decision-making in the development context. We illustrate this with interview data generated in a format of micro-narratives for the UNDP Fragments of Impact project.
Data learning from big data Technology is generating a huge and growing availability of observa tions of diverse nature. This big data is placing data learning as a central scientific discipline. It includes collection, storage, preprocessing, visualization and, essentially, statistical analysis of enormous batches of data. In this paper, we discuss the role of statistics regarding some of the issues raised by big data in this new paradigm and also propose the name of data learning to describe all the activities that allow to obtain relevant knowledge from this new source of information.
Data Management: A Unified Approach Unified data management is becoming a strategic advantage in today´s business world. With the advent of big data, the volume and type of information that companies must use in near-real time to gain a competitive edge is growing at an unprecedented rate. Meanwhile, industry consolidation is leading to mergers and acquisitions that require disparate IT systems to be harmonized in order to move forward. These forces, combined with ongoing pressure to use all available data to improve employee productivity, customer satisfaction and innovation, are spurring enterprises to make data management planning a top priority. To support these plans and help achieve important business goals, enterprises are turning to data management solutions with significant urgency. According to a recent IDG Research Services study of 118 IT professionals, 87 percent of respondents said data integration tools have been deployed or are on their company´s road maps; 84 percent answered the same for data quality tools; 82 percent for master data management solutions; and 81 percent for data governance/data stewardship initiatives. Nearly three-fifths of respondents at organizations that have data management solutions in place are planning to continue making near-term investments in these types of tools.
Data Mining and Statistics: What is the Connection Data Mining is used to discover patterns and relationships in data, with an emphasis on large observational data bases. It sits at the common frontiers of several fields including Data Base Management, Arti cial Intelligence, Machine Learning, Pattern Recognition, and Data Visualization. From a statistical perspective it can be viewed as computer automated exploratory data analysis of (usually) large complex data sets. In spite of (or perhaps because of) the somewhat exaggerated hype, this eld is having a major impact in business, industry, and science. It also a ords enormous research opportunities for new methodological developments. Despite the obvious connections between data mining and statistical data analysis, most of the methodologies used in Data Mining have so far originated in fields other than Statistics. This paper explores some of the reasons for this, and why statisticians should have an interest in Data Mining. It is argued that Statistics can potentially have a major in uence on Data Mining, but in order to do so some of our basic paradigms and operating principles may have to be modified.
Data Mining Cluster Analysis: Basic Concepts and Algorithms (Slide Deck)
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Manufacturing enterprises have been collecting and storing more and more current, detailed and accurate production relevant data. The data stores offer enormous potential as source of new knowledge, but the huge amount of data and its complexity far exceeds the ability to reduce and analyze data without the use of automated analysis techniques. This paper provides a brief introduction into knowledge discovery from databases and presents the methology for data mining in time series. The relevancy of data mining for manufacturing shall be depicted.
Data Mining Standards In this survey paper we have consolidated all the current data mining standards. We have categorized them in to process standards, XML standards, standard APIs, web standards and grid standards and discussed them in considerable detail. We have also designed an application using these standards. We later also analyze the standards their influence on data mining application development and later point out areas in the data mining application development that need to be standardized. We also talk about the trend in the focus areas addressed by these standards.
Data Mining: A Conceptual Overview This tutorial provides an overview of the data mining process. The tutorial also provides a basic understanding of how to plan, evaluate and successfully refine a data mining project, particularly in terms of model building and model evaluation. Methodological considerations are discussed and illustrated. After explaining the nature of data mining and its importance in business, the tutorial describes the underlying machine learning and statistical techniques involved. It describes the CRISP-DM standard now being used in industry as the standard for a technology-neutral data mining process model. The paper concludes with a major illustration of the data mining process methodology and the unsolved problems that offer opportunities for research. The approach is both practical and conceptually sound in order to be useful to both academics and practitioners.
Data Mining: Discovering and Visualizing Patterns with Python (RefCard)
Data profit vs. Data waste: Boosting business performance every day in the real world with information optimization Companies do many things to grow profits. They discover new market opportunities. They sell more effectively. They innovate. They delight their customers. They improve productivity. They find ways to cut costs and mitigate risks. It can be difficult to do these things in today´s economic environment, because revenue opportunities are not always abundant and executives are largely disinclined to make substantial investments in new business capabilities. Despite current conditions, businesses are still finding ways to significantly improve their performance on a daily basis. One of these ways is the aggressive pursuit of data profit. Data profit is what results when companies make economically optimized use of all the structured and unstructured data already residing in existing systems across the enterprise to get better at everything the business needs to do: discovering opportunities, selling, innovating, delighting customers, improving productivity, cutting costs, and mitigating risk. Data profit has become an especially compelling business strategy today, because companies now suffer as never before from a specific problem that is the very opposite of data profit. That problem is data waste. Data waste occurs when companies do not fully utilize the wealth of data that they already have. This problem has become highly prevalent because companies have implemented so many systems over the past decade or more – from high-end databases and applications to email and basic desktop productivity tools – but have not developed effective strategies for fully leveraging their collective information output….
Data Science (Poster)
Data Science and its Relationship to Big Data and data-driven Decision Making Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data-science programs, and publications are touting data science as a hot – even ‘‘sexy´´ – career choice. However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz. In this article, we argue that there are good reasons why it has been hard to pin down exactly what is data science. One reason is that data science is intricately intertwined with other important concepts also of growing importance, such as big data and data-driven decision making. Another reason is the natural tendency to associate what a practitioner does with the definition of the practitioner´s field; this can result in overlooking the fundamentals of the field. We believe that trying to define the boundaries of data science precisely is not of the utmost importance. We can debate the boundaries of the field in an academic setting, but in order for data science to serve business effectively, it is important (i) to understand its relationships to other important related concepts, and (ii) to begin to identify the fundamental principles underlying data science. Once we embrace (ii), we can much better understand and explain exactly what data science has to offer. Furthermore, only once we embrace (ii) should we be comfortable calling it data science. In this article, we present a perspective that addresses all these concepts. We close by offering, as examples, a partial list of fundamental principles underlying data science.
Data science as a language: challenges for computer science – a position paper In this paper, I posit that from a research point of view, Data Science is a language. More precisely Data Science is doing Science using computer science as a language for datafied sciences; much as mathematics is the language of, e.g., physics. From this viewpoint, three (classes) of challenges for computer science are identified; complementing the challenges the closely related Big Data problem already poses to computer science. I discuss the challenges with references to, in my opinion, related, interesting directions in computer science research; note, I claim neither that these directions are the most appropriate to solve the challenges nor that the cited references represent the best work in their field, they are inspirational to me. So, what are these challenges Firstly, if computer science is to be a language, what should that language look like While our traditional specifications such as pseudocode are an excellent way to convey what has been done, they fail for more mathematics like reasoning about computations. Secondly, if computer science is to function as a foundation of other, datafied, sciences, its own foundations should be in order. While we have excellent foundations for supervised learning—e.g., by having loss functions to optimize and, more general, by PAC learning (Valiant in Commun ACM 27(11):1134-1142, 1984)—this is far less true for unsupervised learning. Kolmogorov complexity—or, more general, Algorithmic Information Theory—provides a solid base (Li and Vitányi in An introduction to Kolmogorov complexity and its applications, Springer, Berlin, 1993). It provides an objective criterion to choose between competing hypotheses, but it lacks, e.g., an objective measure of the uncertainty of a discovery that datafied sciences need. Thirdly, datafied sciences come with new conceptual challenges. Data-driven scientists come up with data analysis questions that sometimes do and sometimes don´t, fit our conceptual toolkit. Clearly, computer science does not suffer from a lack of interesting, deep, research problems. However, the challenges posed by data science point to a large reservoir of untapped problems. Interesting, stimulating problems, not in the least because they are posed by our colleagues in datafied sciences. It is an exciting time to be a computer scientist.
Data Science Code of Professional Conduct We look at the proposed Data Science Code of Professional Conduct and nominate a ‘Golden Rule’ which summarizes the data scientist ethic.
Data Science in the Cloud with Microsoft Azure Machine Learning and R Recently, Microsoft launched the Azure Machine Learning cloud platform – Azure ML. Azure ML provides an easy-to-use and powerful set of cloud-based data transformation and machine learning tools. This report covers the basics of manipulating data, as well as constructing and evaluating models in Azure ML, illustrated with a data science example. Before we get started, here are a few of the benefits Azure ML provides for machine learning solutions: • Solutions can be quickly deployed as web services. • Models run in a highly scalable cloud environment. • Code and data are maintained in a secure cloud environment. • Available algorithms and data transformations are extendable using the R language for solution-specific functionality. Throughout this report, we’ll perform the required data manipulation then construct and evaluate a regression model for a bicycle sharing demand dataset. You can follow along by downloading the code and data provided below. Afterwards, we’ll review how to publish your trained models as web services in the Azure cloud.
Data Science in the Cloud with Microsoft Azure Machine Learning and R: 2015 Update This report covers the basics of manipulating data, constructing models, and evaluating models in the Microsoft Azure Machine Learning platform (Azure ML). The Azure ML platform has greatly simplified the development and deployment of machine learning models, with easy-to-use and powerful cloud-based data transformation and machine learning tools. In this report, we´ll explore extending Azure ML with the R language. (A companion report explores extending Azure ML using the Python language.) All of the concepts we will cover are illustrated with a data science example, using a bicycle rental demand dataset. We´ll perform the required data manipulation, or data munging. Then, we will construct and evaluate regression models for the dataset. You can follow along by downloading the code and data provided in the next section. Later in the report, we´ll discuss publishing your trained models as web services in the Azure cloud.
Data Science Revealed: A Data-Driven Glimpse into the Burgeoning new Field As the cost of computing power, data storage, and high-bandwidth Internet access and have plunged exponentially over the past two decades, companies around the globe recognized the power of harnessing data as a source of competitive advantage. But it was only recently, as social web applications and massive, parallel processing have become more widely available that the nescient field of data science revealed what many are becoming to understand: that data is the new oil,i the source for corporate energy and differentiation in the 21st century. Companies like Facebook, LinkedIn, Yahoo, and Google are generating data not only as their primary product, but are analyzing it to continuously improve their products. Pharmaceutical and biomedical companies are using big data to find new cures and analyze genetic information, while marketers leverage the same technology to generate new customer insights. In order to tap this newfound wealth, organizations of all sizes are turning to practitioners in the new field of data science who are capable of translating massive data into predictive insights that lead to results. Data science is an emerging field, with rapid changes, great uncertainty, and exciting opportunities. Our study attempts the first ever benchmark of the data science community, looking at how they interact with their data, the tools they use, their education, and how their organizations approach data-driven problem solving. We also looked at a smaller group of business intelligence professionals to identify areas of contrast between the emerging role of data scientists and the more mature field of BI. Our findings, summarized here, show an emerging talent gap between organizational needs and current industry capabilities exemplified by the unique contributions data scientists can make to an organization and the broad expectations of data science professionals generally.
Data Science Salary Survey 2013 O´Reilly Media conducted an anonymous salary and tools survey in 2012 and 2013 with attendees of the Strata Conference: Making Data Work in Santa Clara, California and Strata + Hadoop World in New York. Respondents from 37 US states and 33 countries, representing a variety of industries in the public and private sector, completed the survey. We ran the survey to better understand which tools data analysts and data scientists use and how those tools correlate with salary. Not all respondents describe their primary role as data scientist/data analyst, but almost all respondents are exposed to data analytics. Similarly, while just over half the respondents described themselves as technical leads, almost all reported that some part of their role included technical duties (i.e., 10-20% of their responsibilities included data analysis or software development). We looked at which tools correlate with others (if respondents use one, are they more likely to use another ) and created a network graph of the positive correlations. Tools could then be compared with salary, either individually or collectively, based on where they clustered on the graph.
Data Science vs. Statistics: Two Cultures Data science is the business of learning from data, which is traditionally the business of statistics. Data science, however, is often understood as a broader, task-driven and computationally-oriented version of statistics. Both the term data science and the broader idea it conveys have origins in statistics and are a reaction to a narrower view of data analysis. Expanding upon the views of a number of statisticians, this paper encourages a big-tent view of data analysis. We examine how evolving approaches to modern data analysis relate to the existing discipline of statistics (e.g. exploratory analysis, machine learning, reproducibility, computation, communication and the role of theory). Finally, we discuss what these trends mean for the future of statistics by highlighting promising directions for communication, education and research.
Data Science, an Overview of Classification Techniques (Slide Deck)
Data Science, Banking, and Fintech The financial industry today is under siege, but not from economic pressures in Europe and China. Rather, this once-impenetrable fortress is currently riding a giant entrepreneurial wave of disruption, disintermediation, and digital innovation. Behind the siege is fintech, a spunky and growing group of financial technology companies. These venture-backed new arrivals are challenging the old champions in lending, payments, money transfer, trading, wealth management, and cryptocurrencies. In this O´Reilly report, author Cornelia Lévy-Bencheton examines the disruptive megatrends taking hold at every level and juncture of the financial ecosystem. You´ll find out how fintech is reshaping the financial industry, reimagining the ways consumers manage, save, and spend money through a data-driven culture of big data analytics, mobile payment services, and robo-advising. Can traditional financial institutions evolve in time to catch up and avoid being replaced Pick up this report to learn about the current banking and financial services industry, key participants in fintech, and some adaptive strategies being used by traditional financial organizations.
Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics An action plan to enlarge the technical areas of statistics focuses on the data analyst. The plan sets out six technical areas of work for a university department and advocates a specific allocation of resources devoted to research in each area and to courses in each area. The value of technical work is judged by the extent to which it benefits the data analyst, either directly or indirectly. The plan is also applicable to government research labs and corporate research organizations.
Data Science: The Impact of Statistics In this paper, we substantiate our premise that statistics is one of the most important disciplines to provide tools and methods to find structure in and to give deeper insight into data, and the most important discipline to analyze and quantify uncertainty. We give an overview over different proposed structures of Data Science and address the impact of statistics on such steps as data acquisition and enrichment, data exploration, data analysis and modeling, validation and representation and reporting. Also, we indicate fallacies when neglecting statistical reasoning.
Data Scientist Enablement Roadmap (Slide Deck)
Data Scientist: The Sexiest Job of the 21st Century When Jonathan Goldman arrived for work in June 2006 at LinkedIn, the business networking site, the place still felt like a startup. The company had just under 8 million accounts, and the number was growing quickly as existing members invited their friends and colleagues to join. But users weren´t seeking out connections with the people who were already on the site at the rate executives had expected. Something was apparently missing in the social experience. As one LinkedIn manager put it, ‘It was like arriving at a conference reception and realizing you don´t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.’
Data Storytelling: Using visualization to share the human impact of numbers Storytelling is a cornerstone of the human experience. The universe may be full of atoms, but it´s through stories that we truly construct our world. From Greek mythology to the Bible to television series like Cosmos, stories have been shaping our experience on Earth for as long as we´ve lived on it. A key purpose of storytelling is not just understanding the world but changing it. After all, why would we study the world if we didn´t want to know how we can—and should— influence it Though many elements of stories have remained the same throughout history, we have developed better tools and mediums for telling them, such as printed books, movies, and comics. This has changed storytelling styles—and perhaps most importantly, the impact of those stories—over the millennia. But can stories be told with data, as well as with images and words That´s what this paper´s about.
Data Stream Mining – A Practical Approach Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA includes a collection of offline and online methods as well as tools for evaluation. In particular, it implements boosting, bagging, and Hoeffding Trees, all with and without Naïve Bayes classifiers at the leaves. MOA is related to WEKA, the Waikato Environment for Knowledge Analysis, which is an award-winning open-source workbench containing implementations of a wide range of batch machine learning methods. WEKA is also written in Java. The main benefits of Java are portability, where applications can be run on any platform with an appropriate Java virtual machine, and the strong and welldeveloped support libraries. Use of the language is widespread, and features such as the automatic garbage collection help to reduce programmer burden and error. This text explains the theoretical and practical foundations of the methods and streams available in MOA.
Data Visualization Techniques – From Basics to Big Data with SAS Visual Analytics A picture is worth a thousand words – especially when you are trying to understand and gain insights from data. It is particularly relevant when you are trying to find relationships among thousands or even millions of variables and determine their relative importance. Organizations of all types and sizes generate data each minute, hour and day. Everyone – including executives, departmental decision makers, call center workers and employees on production lines – hopes to learn things from collected data that can help them make better decisions, take smarter actions and operate more efficiently. Regardless of how much data you have, one of the best ways to discern important relationships is through advanced analysis and high-performance data visualization. If sophisticated analyses can be performed quickly, even immediately, and results presented in ways that showcase patterns and allow querying and exploration, people across all levels in your organization can make faster, more effective decisions. To create meaningful visuals of your data, there are some basics you should consider. Data size and column composition play an important role when selecting graphs to represent your data. This paper discusses some of the basic issues concerning data visualization and provides suggestions for addressing those issues. In addition, big data brings a unique set of challenges for creating visualizations. This paper covers some of those challenges and potential solutions as well. If you are working with massive amounts of data, one challenge is how to display results of data exploration and analysis in a way that is not overwhelming. You may need a new way to look at the data – one that collapses and condenses the results in an intuitive fashion but still displays graphs and charts that decision makers are accustomed to seeing. And, in today´s on-the-go society, you may also need to make the results available quickly via mobile devices, and provide users with the ability to easily explore data on their own in real time. SAS Visual Analytics is a new business intelligence solution that uses intelligent autocharting to help business analysts and nontechnical users visualize data. It creates the best possible visual based on the data that is selected. The visualizations make it easy to see patterns and trends and identify opportunities for further analysis. The heart and soul of SAS Visual Analytics is the SAS LASR Analytic Server, which can execute and accelerate analytic computations in-memory with unprecedented performance. The combination of high-performance analytics and an easy-to-use data exploration interface enables different types of users to create and interact with graphs so they can understand and derive value from their data faster than ever. This creates an unprecedented ability to solve difficult problems, improve business performance and mitigate risk – rapidly and confidently.
Data Visualization with ggplot2 (Cheat Sheet)
Data Visualization: A New Language for Storytelling An Emerging Universal Medium: When was the last time you saw a business presentation that did not include at least one slide with a bar graph or a pie chart Data visualizations have become so ubiquitous that we no longer find them remarkable.
Data Visualization: Making Big Data Approachable and Valuable Enterprises today are beginning to realize the important role Big Data plays in achieving business goals. Concepts that used to be difficult for companies to comprehend— factors that influence a customer to make a purchase, behavior patterns that point to fraud or misuse, inefficiencies slowing down business processes—now can be understood and addressed by collecting and analyzing Big Data. The insight gained from such analysis helps organizations improve operations and identify new product and service opportunities that they may have otherwise missed. In essence, Big Data promises to deliver the advantages that companies need to drive revenue growth and gain a competitive edge. However, getting to that Big Data payoff is proving a difficult challenge for many organizations. Big Data is often voluminous and tends to rapidly change and morph, making it challenging to get a handle on and difficult to access. The majority of tools available to work with Big Data are complex and hard to use, and most enterprises don´t have the in-house expertise to perform the required data analysis and manipulation to draw out the answers that the business is seeking. In fact, in a recent survey conducted by IDG Research, when asked about analyzing Big Data, respondents cite lack of skills and difficulty in making Big Data available to users as two significant challenges. ‘A lot of existing Big Data techniques require you to really get your hands dirty; I don´t think that most Big Data software is as mature as it needs to be in order to be accessible to business users at most enterprises,’ says Paul Kent, vice president of Big Data with SAS. ‘So if you´re not Google or LinkedIn or Facebook, and you don´t have thousands of engineers to work with Big Data, it can be difficult to find business answers in the information.’ What enterprises need are tools to help them easily and effectively understand and analyze Big Data. Employees who aren´t data scientists or analysts should be able to ask questions of the data based on their own business expertise and quickly and easily find patterns, spot inconsistencies, even get answers to questions they haven´t yet thought to ask. Otherwise, the effort and expense that companies invest in collecting and mining Big Data may be challenged to yield significant actionable results. And companies run the risk of missing important business opportunities if they can´t find the answers that are likely stored in their own data.
Data Visualization: When Data Speaks Business This TEC Product Analysis Report aims to provide an extensive review of the set of data visualization features that form part of the essential core of IBM Cognos Business Intelligence (BI) capabilities. The report contains the following elements: 1. An introduction to IBM Cognos Business Intelligence and data visualization for providing extensive analytics and data discovery services 2. An analyst perspective covering data visualization, its role, importance, and value in the BI lifecycle chain and examining its relationship to other elements in a reliable and best practice scenario for performing BI within an organization 3. A review of IBM Cognos data visualization capabilities 4. A general conclusion and final analyst summary
Data Warehousing: Best Practices for Collecting, Storing, and Delivering Decision-Support Data Data Warehousing is a process for collecting, storing, and delivering decision-support data for some or all of an enterprise. Data warehousing is a broad subject that is described point by point in this Refcard. A data warehouse is one of the artifacts created in the data warehousing process.
Data Wrangling with dplyr and tidyr Cheat Sheet (Cheat Sheet)
Data: Emerging Trends and Technologies What are the emerging trends and technologies that will transform the data landscape in coming months In this report from Strata + Hadoop World co-chair Alistair Croll, you’ll learn how the ubiquity of cheap sensors, fast networks, and distributed computing have given rise to several developments that will soon have a profound effect on individuals and society as a whole. Machine learning, for example, has quickly moved from lab tool to hosted, pay-as-you-go services in the cloud. Those services, in turn, are leading to predictive apps that will provide individuals with the right functionality and content at the right time by continuously learning about them and predicting what they’ll need. Computational power can produce cognitive augmentation.
Database as a Service – Current Issues and Its Future With the prevalence of applications in cloud, Database as a Service (DBaaS) becomes a promising method to provide cloud applications with reliable and flexible data storage services. It provides a number of interesting features to cloud developers, however, it suffers a few drawbacks: long learning curve and development cycle, lacking of in-depth support for NoSQL, lacking of flexible configuration for security and privacy, and high cost models. In this paper, we investigate these issues among current DBaaS providers and propose a novel Trinity Model that can significantly reduce the learning curves, improve the security and privacy, and accelerate database design and development. We further elaborate our ongoing and future work on developing large real-world SaaS projects using this new DBaaS model.
Database Meets Deep Learning: Challenges and Opportunities Deep learning has recently become very popular on account of its incredible success in many complex data-driven applications, such as image classification and speech recognition. The database community has worked on data-driven applications for many years, and therefore should be playing a lead role in supporting this new wave. However, databases and deep learning are different in terms of both techniques and applications. In this paper, we discuss research problems at the intersection of the two fields. In particular, we discuss possible improvements for deep learning systems from a database perspective, and analyze database applications that may benefit from deep learning techniques.
Data-Driven Nested Stochastic Robust Optimization: A General Computational Framework and Algorithm for Optimization under Uncertainty in the Big Data Era A novel data-driven nested stochastic robust optimization (DDNSRO) framework is proposed to systematically and automatically handle labeled multi-class uncertainty data in optimization problems. Uncertainty realizations in large datasets are often collected from various conditions, which are encoded by class labels. A group of Dirichlet process mixture models is employed for uncertainty modeling from the multi-class uncertainty data. The proposed data-driven nonparametric uncertainty model could automatically adjust its complexity based on the data structure and complexity, thus accurately capturing the uncertainty information. A DDNSRO framework is further proposed based on the data-driven uncertainty model through a bi-level optimization structure. The outer optimization problem follows a two-stage stochastic programming approach to optimize the expected objective across different classes of data; robust optimization is nested as the inner problem to ensure the robustness of the solution while maintaining computational tractability. A tailored column-and-constraint generation algorithm is further developed to solve the resulting multi-level optimization problem efficiently. Case studies on strategic planning of process networks are presented to demonstrate the applicability of the proposed framework.
Data-intensive applications, challenges, techniques and technologies: A survey on Big Data It is already true that Big Data has drawn huge attention from researchers in information sciences, policy and decision makers in governments and enterprises. As the speed of information growth exceeds Moore´s Law at the beginning of this new century, excessive data is making great troubles to human beings. However, there are so much potential and highly useful values hidden in the huge volume of data. A new scientific paradigm is born as dataintensive scientific discovery (DISD), also known as Big Data problems. A large number of fields and sectors, ranging from economic and business activities to public administration, from national security to scientific researches in many areas, involve with Big Data problems. On the one hand, Big Data is extremely valuable to produce productivity in businesses and evolutionary breakthroughs in scientific disciplines, which give us a lot of opportunities to make great progresses in many fields. There is no doubt that the future competitions in business productivity and technologies will surely converge into the Big Data explorations. On the other hand, Big Data also arises with many challenges, such as difficulties in data capture, data storage, data analysis and data visualization. This paper is aimed to demonstrate a close-up view about Big Data, including Big Data applications, Big Data opportunities and challenges, as well as the state-of-the-art techniques and technologies we currently adopt to deal with the Big Data problems. We also discuss several underlying methodologies to handle the data deluge, for example, granular computing, cloud computing, bio-inspired computing, and quantum computing.
Deciphering Big Data Stacks: An Overview of Big Data Tools With its ability to ingest, process, and decipher an abundance of incoming data, the Big Data is considered by many a cornerstone of future research and development. However, the large number of available tools and the overlap between those are impeding their technological potential. In this paper, we present a systematic grouping of the available tools and present a network of dependencies among those with the aim of composing individual tools into functional software stacks required to perform Big Data analyses.
Decision Management and Cloud as a Platform for Predictive Analytics (Slide Deck)
Decision Modeling with DMN: How to Build a Decision Requirements Model using the new Decision Model and Notation (DMN) standard The goal of this paper is to describe the four iterative steps to complete a Decision Requirements Model using the forthcoming DMN standard. Before beginning, it is important to understand the value of defining decision requirements as part of your overall requirements process. Experience shows that there are three main reasons for doing so: 1. Current requirements approaches don´t tackle the decision-making that is increasingly important in information systems. 2. While important for all software development projects, decision requirements are especially important for projects adopting business rules and advanced analytic technologies. 3. Decisions are a common language across business, IT and analytic organizations improving collaboration, increasing reuse, and easing implementation.
Decision Requirements Modeling for Analytic Projects Established analytic approaches like CRISP-DM stress the importance of understanding the project objectives and requirements from a business perspective, but to date there are no formal approaches to capturing this understanding in a repeatable, understandable format. Decision Requirements Modeling closes this gap. Decision Requirements Modeling is a successful technique that develops a richer, more complete business understanding earlier. Decision Requirements Modeling results in a clear business target, an understanding of how the results will be used and deployed, and by whom. Using Decision Requirements Modeling to guide and shape analytics projects reduces reliance on constrained specialist resources by improving requirements gathering, helps teams ask the key questions and enables teams to collaborate effectively across the organization, bringing analytics, IT and business professionals together. Using Decision Requirements Modeling to document analytic project requirements enables organizations to: – Compare multiple projects for prioritization, including allowing new analytic development to be compared with updating or refining existing analytics. – Act on a specific plan to guide analytic development that is accessible to business, IT and analytic teams alike. – Reuse knowledge from project to project by creating an increasingly detailed and accurate view of decision-making and the role of analytics. – Value information sources and analytics in terms of business impact. There is an emerging consensus that Decision Requirements Modeling is the best way to specify decision-making. It is also central to a forthcoming standard, the Object Management Group´s Decision Model and Notation, which will give adopters access to a broad community and a vehicle for sharing expertise more widely.
Decision Theory – A Brief Introduction Decision theory is theory about decisions. The subject is not a very unified one. To the contrary, there are many different ways to theorize about decisions, and therefore also many different research traditions. This text attempts to reflect some of the diversity of the subject. Its emphasis lies on the less (mathematically) technical aspects of decision theory.
Decision Tree Classification with Differential Privacy: A Survey Data mining information about people is becoming increasingly important in the data-driven society of the 21st century. Unfortunately, sometimes there are real-world considerations that conflict with the goals of data mining; sometimes the privacy of the people being data mined needs to be considered. This necessitates that the output of data mining algorithms be modified to preserve privacy while simultaneously not ruining the predictive power of the outputted model. Differential privacy is a strong, enforceable definition of privacy that can be used in data mining algorithms, guaranteeing that nothing will be learned about the people in the data that could not already be discovered without their participation. In this survey, we focus on one particular data mining algorithm — decision trees — and how differential privacy interacts with each of the components that constitute decision tree algorithms. We analyze both greedy and random decision trees, and the conflicts that arise when trying to balance privacy requirements with the accuracy of the model.
Decision-Making with Belief Functions: a Review Approaches to decision-making under uncertainty in the belief function framework are reviewed. Most methods are shown to blend criteria for decision under ignorance with the maximum expected utility principle of Bayesian decision theory. A distinction is made between methods that construct a complete preference relation among acts, and those that allow incomparability of some acts due to lack of information. Methods developed in the imprecise probability framework are applicable in the Dempster-Shafer context and are also reviewed. Shafer’s constructive decision theory, which substitutes the notion of goal for that of utility, is described and contrasted with other approaches. The paper ends by pointing out the need to carry out deeper investigation of fundamental issues related to decision-making with belief functions and to assess the descriptive, normative and prescriptive values of the different approaches.
Declarative Data Analytics: a Survey The area of declarative data analytics explores the application of the declarative paradigm on data science and machine learning. It proposes declarative languages for expressing data analysis tasks and develops systems which optimize programs written in those languages. The execution engine can be either centralized or distributed, as the declarative paradigm advocates independence from particular physical implementations. The survey explores a wide range of declarative data analysis frameworks by examining both the programming model and the optimization techniques used, in order to provide conclusions on the current state of the art in the area and identify open challenges.
Declarative Statistics In this work we introduce declarative statistics, a suite of declarative modelling tools for statistical analysis. Statistical constraints represent the key building block of declarative statistics. First, we introduce a range of relevant counting and matrix constraints and associated decompositions, some of which novel, that are instrumental in the design of statistical constraints. Second, we introduce a selection of novel statistical constraints and associated decompositions, which constitute a self-contained toolbox that can be used to tackle a wide range of problems typically encountered by statisticians. Finally, we deploy these statistical constraints to a wide range of application areas drawn from classical statistics and we contrast our framework against established practices.
Deconstructing Blockchains: A Comprehensive Survey on Consensus, Membership and Structure It is no exaggeration to say that since the introduction of Bitcoin, blockchains have become a disruptive technology that has shaken the world. However, the rising popularity of the paradigm has led to a flurry of proposals addressing variations and/or trying to solve problems stemming from the initial specification. This added considerable complexity to the current blockchain ecosystems, amplified by the absence of detail in many accompanying blockchain whitepapers. Through this paper, we set out to explain blockchains in a simple way, taming that complexity through the deconstruction of the blockchain into three simple, critical components common to all known systems: membership selection, consensus mechanism and structure. We propose an evaluation framework with insight into system models, desired properties and analysis criteria, using the decoupled components as criteria. We use this framework to provide clear and intuitive overviews of the design principles behind the analyzed systems and the properties achieved. We hope our effort will help clarifying the current state of blockchain proposals and provide directions to the analysis of future proposals.
Decorrelation of Neutral Vector Variables: Theory and Applications In this paper, we propose novel strategies for neutral vector variable decorrelation. Two fundamental invertible transformations, namely serial nonlinear transformation and parallel nonlinear transformation, are proposed to carry out the decorrelation. For a neutral vector variable, which is not multivariate Gaussian distributed, the conventional principal component analysis (PCA) cannot yield mutually independent scalar variables. With the two proposed transformations, a highly negatively correlated neutral vector can be transformed to a set of mutually independent scalar variables with the same degrees of freedom. We also evaluate the decorrelation performances for the vectors generated from a single Dirichlet distribution and a mixture of Dirichlet distributions. The mutual independence is verified with the distance correlation measurement. The advantages of the proposed decorrelation strategies are intensively studied and demonstrated with synthesized data and practical application evaluations.
Decoupling Learning Rules from Representations In the artificial intelligence field, learning often corresponds to changing the parameters of a parameterized function. A learning rule is an algorithm or mathematical expression that specifies precisely how the parameters should be changed. When creating an artificial intelligence system, we must make two decisions: what representation should be used (i.e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions. Using most learning rules, these two decisions are coupled in a subtle (and often unintentional) way. That is, using the same learning rule with two different representations that can represent the same sets of functions can result in two different outcomes. After arguing that this coupling is undesirable, particularly when using artificial neural networks, we present a method for partially decoupling these two decisions for a broad class of learning rules that span unsupervised learning, reinforcement learning, and supervised learning.
Deep Active Learning for Named Entity Recognition Deep neural networks have advanced the state of the art in named entity recognition. However, under typical training procedures, advantages over classical methods emerge only with large datasets. As a result, deep learning is employed only when large public datasets or a large budget for manually labeling data is available. In this work, we show that by combining deep learning with active learning, we can outperform classical methods even with a significantly smaller amount of training data.
Deep Architectures for Modulation Recognition We survey the latest advances in machine learning with deep neural networks by applying them to the task of radio modulation recognition. Results show that radio modulation recognition is not limited by network depth and further work should focus on improving learned synchronization and equalization. Advances in these areas will likely come from novel architectures designed for these tasks or through novel training methods.
Deep Belief Nets (Slide Deck)
Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review In recent years, deep convolutional neural networks (CNNs) have shown record-shattering performance in a variety of computer vision problems, such as visual object recognition, detection and segmentation. These methods have also been utilized in medical image analysis domain for lesion segmentation, anatomical segmentation and classification. We present an extensive literature review of CNN techniques applied in brain magnetic resonance imaging (MRI) analysis, focusing on the architectures, pre-processing, data-preparation and post-processing strategies available in these works. The aim of this study is three-fold. Our primary goal is to report how different CNN architectures have evolved, now entailing state-of-the-art methods by extensive discussion of the architectures and examining the pros and cons of the models when evaluating their performance using public datasets. Second, this paper is intended to be a detailed reference of the research activity in deep CNN for brain MRI analysis. Finally, our goal is to present a perspective on the future of CNNs, which we believe will be among the growing approaches in brain image analysis in subsequent years.
Deep Dive into Anonymity: A Large Scale Analysis of Quora Questions Anonymity forms an integral and important part of our digital life. It enables us to express our true selves without the fear of judgment. In this paper, we investigate the different aspects of anonymity in the social QandA site Quora. The choice of Quora is motivated by the fact that this is one of the rare social QandA sites that allow users to explicitly post anonymous questions and such activity in this forum has become normative rather than a taboo. Through an analysis of 5.1 million questions, we observe that at a global scale almost no difference manifests between the linguistic structure of the anonymous and the non-anonymous questions. We find that topical mixing at the global scale to be the primary reason for the absence. However, the differences start to feature once we ‘deep dive’ and (topically) cluster the questions and compare the clusters that have high volumes of anonymous questions with those that have low volumes of anonymous questions. In particular, we observe that the choice to post the question as anonymous is dependent on the user’s perception of anonymity and they often choose to speak about depression, anxiety, social ties and personal issues under the guise of anonymity. We further perform personality trait analysis and observe that the anonymous group of users has positive correlation with extraversion, agreeableness, and negative correlation with openness. Subsequently, to gain further insights, we build an anonymity grid to identify the differences in the perception on anonymity of the user posting the question and the community of users answering it. We also look into the first response time of the questions and observe that it is lowest for topics which talk about personal and sensitive issues, which hints toward a higher degree of community support and user engagement.
Deep EHR: A Survey of Recent Advances on Deep Learning Techniques for Electronic Health Record (EHR) Analysis The past decade has seen an explosion in the amount of digital information stored in electronic health records (EHR). While primarily designed for archiving patient clinical information and administrative healthcare tasks, many researchers have found secondary use of these records for various clinical informatics tasks. Over the same period, the machine learning community has seen widespread advances in deep learning techniques, which also have been successfully applied to the vast amount of EHR data. In this paper, we review these deep EHR systems, examining architectures, technical aspects, and clinical applications. We also identify shortcomings of current techniques and discuss avenues of future research for EHR-based deep learning.
Deep Face Recognition: A Survey Driven by graphics processing units (GPUs), massive amounts of annotated data and more advanced algorithms, deep learning has recently taken the computer vision community by storm and has benefited real-world applications, including face recognition (FR). Deep FR methods leverage deep networks to learn more discriminative representations, significantly improving the state of the art and surpassing human performance (97.53%). In this paper, we provide a comprehensive survey of deep FR methods, including data, algorithms and scenes. First, we summarize the commonly used datasets for training and testing. Then, the data preprocessing methods are categorized into two classes: ‘one-to-many augmentation’ and ‘many-to-one normalization’. Second, for algorithms, we summarize different network architectures and loss functions used in the state-of-the art methods. Third, we review several scenes in deep FR, such as video FR, 3D FR and cross-age FR. Finally, some potential deficiencies of the current methods and several future directions are highlighted.
Deep Facial Expression Recognition: A Survey With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.
Deep Generative Models with Learnable Knowledge Constraints The broad set of deep generative models (DGMs) has achieved remarkable advances. However, it is often difficult to incorporate rich structured domain knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a principled framework to impose structured constraints on probabilistic models, but has limited applicability to the diverse DGMs that can lack a Bayesian formulation or even explicit density evaluation. PR also requires constraints to be fully specified {\it a priori}, which is impractical or suboptimal for complex knowledge with learnable uncertain parts. In this paper, we establish mathematical correspondence between PR and reinforcement learning (RL), and, based on the connection, expand PR to learn constraints as the extrinsic reward in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is flexible to adapt arbitrary constraints with the model jointly. Experiments on human image generation and templated sentence generation show models with learned knowledge constraints by our algorithm greatly improve over base generative models.
Deep Learning Deep learning (DL) is a high dimensional data reduction technique for constructing high-dimensional predictors in input-output models. DL is a form of machine learning that uses hierarchical layers of latent features. In this article, we review the state-of-the-art of deep learning from a modeling and algorithmic perspective. We provide a list of successful areas of applications in Artificial Intelligence (AI), Image Processing, Robotics and Automation. Deep learning is predictive in its nature rather then inferential and can be viewed as a black-box methodology for high-dimensional function estimation.
Deep Learning (Slide Deck)
Deep Learning and Quantum Physics : A Fundamental Bridge Deep convolutional networks have witnessed unprecedented success in various machine learning applications. Formal understanding on what makes these networks so successful is gradually unfolding, but for the most part there are still significant mysteries to unravel. The inductive bias, which reflects prior knowledge embedded in the network architecture, is one of them. In this work, we establish a fundamental connection between the fields of quantum physics and deep learning. We use this connection for asserting novel theoretical observations regarding the role that the number of channels in each layer of the convolutional network fulfills in the overall inductive bias. Specifically, we show an equivalence between the function realized by a deep convolutional arithmetic circuit (ConvAC) and a quantum many-body wave function, which relies on their common underlying tensorial structure. This facilitates the use of quantum entanglement measures as well-defined quantifiers of a deep network’s expressive ability to model intricate correlation structures of its inputs. Most importantly, the construction of a deep ConvAC in terms of a Tensor Network is made available. This description enables us to carry a graph-theoretic analysis of a convolutional network, with which we demonstrate a direct control over the inductive bias of the deep network via its channel numbers, that are related min-cut in the underlying graph. This result is relevant to any practitioner designing a convolutional network for a specific task. We theoretically analyze ConvACs, and empirically validate our findings on more common ConvNets which involve ReLU activations and max pooling. Beyond the results described above, the description of a deep convolutional network in well-defined graph-theoretic tools and the formal connection to quantum entanglement, are two interdisciplinary bridges that are brought forth by this work.
Deep learning applications and challenges in big data analytics Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and decisions, impacting existing and future technology. Deep Learning algorithms extract high-level, complex abstractions as data representations through a hierarchical learning process. Complex abstractions are learnt at a given level based on relatively simpler abstractions formulated in the preceding level in the hierarchy. A key benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled and un-categorized. In the present study, we explore how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks.We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional data, scalability of models, and distributed computing. We conclude by presenting insights into relevant future works by posing some questions, including defining data sampling criteria, domain adaptation modeling, defining criteria for obtaining useful data abstractions, improving semantic indexing, semi-supervised learning, and active learning.
Deep Learning applied to NLP Convolutional Neural Network (CNNs) are typically associated with Computer Vision. CNNs are responsible for major breakthroughs in Image Classification and are the core of most Computer Vision systems today. More recently CNNs have been applied to problems in Natural Language Processing and gotten some interesting results. In this paper, we will try to explain the basics of CNNs, its different variations and how they have been applied to NLP.
Deep Learning based Recommender System: A Survey and New Perspectives With the ever-growing volume, complexity and dynamicity of online information, recommender system is an effective key solution to overcome such information overload. In recent years, deep learning’s revolutionary advances in speech recognition, image analysis and natural language processing have drawn significant attention. Meanwhile, recent studies also demonstrate its effectiveness in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performances and high-quality recommendations. In contrast to traditional recommendation models, deep learning provides a better understanding of user’s demands, item’s characteristics and historical interactions between them. This article provides a comprehensive review of recent research efforts on deep learning based recommender systems towards fostering innovations of recommender system research. A taxonomy of deep learning based recommendation models is presented and used to categorise surveyed articles. Open problems are identified based on the insightful analytics of the reviewed works and potential solutions discussed.
Deep Learning for 2D and 3D Rotatable Data: An Overview of Methods One of the reasons for the success of convolutional networks is their equivariance/invariance under translations. However, rotatable data such as molecules, living cells, everyday objects, or galaxies require processing with equivariance/invariance under rotations in cases where the rotation of the coordinate system does not affect the meaning of the data (e.g. object classification). On the other hand, estimation/processing of rotations is necessary in cases where rotations are important (e.g. motion estimation). There has been recent progress in methods and theory in all these regards. Here we provide an overview of existing methods, both for 2D and 3D rotations (and translations), and identify commonalities and links between them, in the hope that our insights will be useful for choosing and perfecting the methods.
Deep Learning for Anomaly Detection: A Survey Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. The aim of this survey is two-fold, firstly we present a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore, we review the adoption of these methods for anomaly across various application domains and assess their effectiveness. We have grouped state-of-the-art research techniques into different categories based on the underlying assumptions and approach adopted. Within each category we outline the basic anomaly detection technique, along with its variants and present key assumptions, to differentiate between normal and anomalous behavior. For each category, we present we also present the advantages and limitations and discuss the computational complexity of the techniques in real application domains. Finally, we outline open issues in research and challenges faced while adopting these techniques.
Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions Deep neural networks are widely used for classification. These deep models often suffer from a lack of interpretability — they are particularly difficult to understand because of their non-linear nature. As a result, neural networks are often treated as ‘black box’ models, and in the past, have been trained purely to optimize the accuracy of predictions. In this work, we create a novel network architecture for deep learning that naturally explains its own reasoning for each prediction. This architecture contains an autoencoder and a special prototype layer, where each unit of that layer stores a weight vector that resembles an encoded training input. The encoder of the autoencoder allows us to do comparisons within the latent space, while the decoder allows us to visualize the learned prototypes. The training objective has four terms: an accuracy term, a term that encourages every prototype to be similar to at least one encoded input, a term that encourages every encoded input to be close to at least one prototype, and a term that encourages faithful reconstruction by the autoencoder. The distances computed in the prototype layer are used as part of the classification process. Since the prototypes are learned during training, the learned network naturally comes with explanations for each prediction, and the explanations are loyal to what the network actually computes.
Deep Learning For Computer Vision Tasks: A review Deep learning has recently become one of the most popular sub-fields of machine learning owing to its distributed data representation with multiple levels of abstraction. A diverse range of deep learning algorithms are being employed to solve conventional artificial intelligence problems. This paper gives an overview of some of the most widely used deep learning algorithms applied in the field of computer vision. It first inspects the various approaches of deep learning algorithms, followed by a description of their applications in image classification, object identification, image extraction and semantic segmentation in the presence of noise. The paper concludes with the discussion of the future scope and challenges for construction and training of deep neural networks.
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments Eliminating the negative effect of highly non-stationary environmental noise is a long-standing research topic for speech recognition but remains an important challenge nowadays. To address this issue, traditional unsupervised signal processing methods seem to have touched the ceiling. However, data-driven based supervised approaches, particularly the ones designed with deep learning, have recently emerged as potential alternatives. In this light, we are going to comprehensively summarise the recently developed and most representative deep learning approaches to deal with the raised problem in this article, with the aim of providing guidelines for those who are going deeply into the field of environmentally robust speech recognition. To better introduce these approaches, we categorise them into single- and multi-channel techniques, each of which is specifically described at the front-end, the back-end, and the joint framework of speech recognition systems. In the meanwhile, we describe the pros and cons of these approaches as well as the relationships among them, which can probably benefit future research.
Deep Learning for Fine-Grained Image Analysis: A Survey Computer vision (CV) is the process of using machines to understand and analyze imagery, which is an integral branch of artificial intelligence. Among various research areas of CV, fine-grained image analysis (FGIA) is a longstanding and fundamental problem, and has become ubiquitous in diverse real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, \eg, species of birds or models of cars. The small inter-class variations and the large intra-class variations caused by the fine-grained nature makes it a challenging problem. During the booming of deep learning, recent years have witnessed remarkable progress of FGIA using deep learning techniques. In this paper, we aim to give a survey on recent advances of deep learning based FGIA techniques in a systematic way. Specifically, we organize the existing studies of FGIA techniques into three major categories: fine-grained image recognition, fine-grained image retrieval and fine-grained image generation. In addition, we also cover some other important issues of FGIA, such as publicly available benchmark datasets and its related domain specific applications. Finally, we conclude this survey by highlighting several directions and open problems which need be further explored by the community in the future.
Deep Learning for Generic Object Detection: A Survey Generic object detection, aiming at locating object instances from a large number of predefined categories in natural images, is one of the most fundamental and challenging problems in computer vision. Deep learning techniques have emerged in recent years as powerful methods for learning feature representations directly from data, and have led to remarkable breakthroughs in the field of generic object detection. Given this time of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought by deep learning techniques. More than 250 key contributions are included in this survey, covering many aspects of generic object detection research: leading detection frameworks and fundamental subproblems including object feature representation, object proposal generation, context information modeling and training strategies; evaluation issues, specifically benchmark datasets, evaluation metrics, and state of the art performance. We finish by identifying promising directions for future research.
Deep Learning for Genomics: A Concise Overview Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into ‘big data’ disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.
Deep Learning for Hyperspectral Image Classification: An Overview Hyperspectral image (HSI) classification has become a hot topic in the field of remote sensing. In general, the complex characteristics of hyperspectral data make the accurate classification of such data challenging for traditional machine learning methods. In addition, hyperspectral imaging often deals with an inherently nonlinear relation between the captured spectral information and the corresponding materials. In recent years, deep learning has been recognized as a powerful feature-extraction tool to effectively address nonlinear problems and widely used in a number of image processing tasks. Motivated by those successful applications, deep learning has also been introduced to classify HSIs and demonstrated good performance. This survey paper presents a systematic review of deep learning-based HSI classification literatures and compares several strategies for this topic. Specifically, we first summarize the main challenges of HSI classification which cannot be effectively overcome by traditional machine learning methods, and also introduce the advantages of deep learning to handle these problems. Then, we build a framework which divides the corresponding works into spectral-feature networks, spatial-feature networks, and spectral-spatial-feature networks to systematically review the recent achievements in deep learning-based HSI classification. In addition, considering the fact that available training samples in the remote sensing field are usually very limited and training deep networks require a large number of samples, we include some strategies to improve classification performance, which can provide some guidelines for future studies on this topic. Finally, several representative deep learning-based classification methods are conducted on real HSIs in our experiments.
Deep Learning for Image Denoising: A Survey Since the proposal of big data analysis and Graphic Processing Unit (GPU), the deep learning technology has received a great deal of attention and has been widely applied in the field of imaging processing. In this paper, we have an aim to completely review and summarize the deep learning technologies for image denoising proposed in recent years. Morever, we systematically analyze the conventional machine learning methods for image denoising. Finally, we point out some research directions for the deep learning technologies in image denoising.
Deep Learning for Image Super-resolution: A Survey Image Super-Resolution (SR) is an important class of image processing techniques to enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. In this survey, we aim to give a survey on recent advances of image super-resolution techniques using deep learning approaches in a systematic way. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.
Deep Learning for Sensor-based Activity Recognition: A Survey Sensor-based activity recognition seeks the profound high-level knowledge about human activity from multitudes of low-level sensor readings. Conventional pattern recognition approaches have made tremendous progress in the past years. However, most of those approaches heavily rely on heuristic hand-crafted feature extraction methods, which dramatically hinder their generalization performance. Additionally, those methods often produce unsatisfactory results for unsupervised and incremental learning tasks. Meanwhile, the recent advancement of deep learning makes it possible to perform automatic high-level feature extraction thus achieves promising performance in many areas. Since then, deep learning based methods have been widely adopted for the sensor-based activity recognition tasks. In this paper, we survey and highlight the recent advancement of deep learning approaches for sensor-based activity recognition. Specifically, we summarize existing literatures from three aspects: sensor modality, deep model and application. We also present a detailed discussion and propose grand challenges for future direction.
Deep Learning for Sentiment Analysis : A Survey Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.
Deep Learning for Single Image Super-Resolution: A Brief Review Single image super-resolution (SISR) is a notoriously challenging ill-posed problem, which aims to obtain a high- resolution (HR) output from one of its low-resolution (LR) versions. To solve the SISR problem, recently powerful deep learning algorithms have been employed and achieved the state- of-the-art performance. In this survey, we review representative deep learning-based SISR methods, and group them into two categories according to their major contributions to two essential aspects of SISR: the exploration of efficient neural network archi- tectures for SISR, and the development of effective optimization objectives for deep SISR learning. For each category, a baseline is firstly established and several critical limitations of the baseline are summarized. Then representative works on overcoming these limitations are presented based on their original contents as well as our critical understandings and analyses, and relevant comparisons are conducted from a variety of perspectives. Finally we conclude this review with some vital current challenges and future trends in SISR leveraging deep learning algorithms.
Deep Learning for Spatio-Temporal Data Mining: A Survey With the fast development of various positioning techniques such as Global Position System (GPS), mobile devices and remote sensing, spatio-temporal data has become increasingly available nowadays. Mining valuable knowledge from spatio-temporal data is critically important to many real world applications including human mobility understanding, smart transportation, urban planning, public safety, health care and environmental management. As the number, volume and resolution of spatio-temporal datasets increase rapidly, traditional data mining methods, especially statistics based methods for dealing with such data are becoming overwhelmed. Recently, with the advances of deep learning techniques, deep leaning models such as convolutional neural network (CNN) and recurrent neural network (RNN) have enjoyed considerable success in various machine learning tasks due to their powerful hierarchical feature learning ability in both spatial and temporal domains, and have been widely applied in various spatio-temporal data mining (STDM) tasks such as predictive learning, representation learning, anomaly detection and classification. In this paper, we provide a comprehensive survey on recent progress in applying deep learning techniques for STDM. We first categorize the types of spatio-temporal data and briefly introduce the popular deep learning models that are used in STDM. Then a framework is introduced to show a general pipeline of the utilization of deep learning models for STDM. Next we classify existing literatures based on the types of ST data, the data mining tasks, and the deep learning models, followed by the applications of deep learning for STDM in different domains including transportation, climate science, human mobility, location based social network, crime analysis, and neuroscience. Finally, we conclude the limitations of current research and point out future research directions.
Deep learning for time series classification: a review Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state of the art performance for document classification and speech recognition. In this article, we study the current state of the art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.
Deep learning in agriculture: A survey Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.
Deep learning in bioinformatics: introduction, application, and perspective in big data era Deep learning, which is especially formidable in handling big data, has achieved great success in various fields, including bioinformatics. With the advances of the big data era in biology, it is foreseeable that deep learning will become increasingly important in the field and will be incorporated in vast majorities of analysis pipelines. In this review, we provide both the exoteric introduction of deep learning, and concrete examples and implementations of its representative applications in bioinformatics. We start from the recent achievements of deep learning in the bioinformatics field, pointing out the problems which are suitable to use deep learning. After that, we introduce deep learning in an easy-to-understand fashion, from shallow neural networks to legendary convolutional neural networks, legendary recurrent neural networks, graph neural networks, generative adversarial networks, variational autoencoder, and the most recent state-of-the-art architectures. After that, we provide eight examples, covering five bioinformatics research directions and all the four kinds of data type, with the implementation written in Tensorflow and Keras. Finally, we discuss the common issues, such as overfitting and interpretability, that users will encounter when adopting deep learning methods and provide corresponding suggestions. The implementations are freely available at \url{https://…/Deep_learning_examples}.
Deep Learning in Mobile and Wireless Networking: A Survey The rapid uptake of mobile devices and the rising popularity of mobile applications and services pose unprecedented demands on mobile and wireless networking infrastructure. Upcoming 5G systems are evolving to support exploding mobile traffic volumes, agile management of network resource to maximize user experience, and extraction of fine-grained real-time analytics. Fulfilling these tasks is challenging, as mobile environments are increasingly complex, heterogeneous, and evolving. One potential solution is to resort to advanced machine learning techniques to help managing the rise in data volumes and algorithm-driven applications. The recent success of deep learning underpins new and powerful tools that tackle problems in this space. In this paper we bridge the gap between deep learning and mobile and wireless networking research, by presenting a comprehensive survey of the crossovers between the two areas. We first briefly introduce essential background and state-of-the-art in deep learning techniques with potential applications to networking. We then discuss several techniques and platforms that facilitate the efficient deployment of deep learning onto mobile systems. Subsequently, we provide an encyclopedic review of mobile and wireless networking research based on deep learning, which we categorize by different domains. Drawing from our experience, we discuss how to tailor deep learning to mobile environments. We complete this survey by pinpointing current challenges and open future directions for research.
Deep learning in remote sensing: a review Standing at the paradigm shift towards data-intensive science, machine learning techniques are becoming increasingly important. In particular, as a major breakthrough in the field, deep learning has proven as an extremely powerful tool in many fields. Shall we embrace deep learning as the key to all Or, should we resist a ‘black-box’ solution There are controversial opinions in the remote sensing community. In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously simple to start with. More importantly, we advocate remote sensing scientists to bring their expertise into deep learning, and use it as an implicit general model to tackle unprecedented large-scale influential challenges, such as climate change and urbanization.
Deep Learning is Robust to Massive Label Noise Deep neural networks trained on large supervised datasets have led to impressive results in recent years. However, since well-annotated datasets can be prohibitively expensive and time-consuming to collect, recent work has explored the use of larger but noisy datasets that can be more easily obtained. In this paper, we investigate the behavior of deep neural networks on training sets with massively noisy labels. We show that successful learning is possible even with an essentially arbitrary amount of noise. For example, on MNIST we find that accuracy of above 90 percent is still attainable even when the dataset has been diluted with 100 noisy examples for each clean example. Such behavior holds across multiple patterns of label noise, even when noisy labels are biased towards confusing classes. Further, we show how the required dataset size for successful training increases with higher label noise. Finally, we present simple actionable techniques for improving learning in the regime of high label noise.
Deep learning methods in speaker recognition: a review This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.
Deep Learning on Graphs: A Survey Deep learning has been shown successful in a number of domains, ranging from acoustics, images to natural language processing. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs. Recently, a significant amount of research efforts have been devoted to this area, greatly advancing graph analyzing techniques. In this survey, we comprehensively review different kinds of deep learning methods applied to graphs. We divide existing methods into three main categories: semi-supervised methods including Graph Neural Networks and Graph Convolutional Networks, unsupervised methods including Graph Autoencoders, and recent advancements including Graph Recurrent Neural Networks and Graph Reinforcement Learning. We then provide a comprehensive overview of these methods in a systematic manner following their history of developments. We also analyze the differences of these methods and how to composite different architectures. Finally, we briefly outline their applications and discuss potential future directions.
Deep learning research landscape & roadmap in a nutshell: past, present and future — Towards deep cortical learning The past, present and future of deep learning is presented in this work. Given this landscape & roadmap, we predict that deep cortical learning will be the convergence of deep learning & cortical learning which builds an artificial cortical column ultimately.
Deep Learning Techniques for Music Generation – A Survey This book is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. At first, we propose a methodology based on four dimensions for our analysis: – objective – What musical content is to be generated (e.g., melody, accompaniment…); – representation – What are the information formats used for the corpus and for the expected generated output (e.g., MIDI, piano roll, text…); – architecture – What type of deep neural network is to be used (e.g., recurrent network, autoencoder, generative adversarial networks…); – strategy – How to model and control the process of generation (e.g., direct feedforward, sampling, unit selection…). For each dimension, we conduct a comparative analysis of various models and techniques. For the strategy dimension, we propose some tentative typology of possible approaches and mechanisms. This classification is bottom-up, based on the analysis of many existing deep-learning based systems for music generation, which are described in this book. The last part of the book includes discussion and prospects.
Deep Learning Works in Practice. But Does it Work in Theory Deep learning relies on a very specific kind of neural networks: those superposing several neural layers. In the last few years, deep learning achieved major breakthroughs in many tasks such as image analysis, speech recognition, natural language processing, and so on. Yet, there is no theoretical explanation of this success. In particular, it is not clear why the deeper the network, the better it actually performs. We argue that the explanation is intimately connected to a key feature of the data collected from our surrounding universe to feed the machine learning algorithms: large non-parallelizable logical depth. Roughly speaking, we conjecture that the shortest computational descriptions of the universe are algorithms with inherently large computation times, even when a large number of computers are available for parallelization. Interestingly, this conjecture, combined with the folklore conjecture in theoretical computer science that $ P \neq NC$, explains the success of deep learning.
Deep Learning: A Bayesian Perspective Deep learning is a form of machine learning for nonlinear high dimensional data reduction and prediction. A Bayesian probabilistic perspective provides a number of advantages. Specifically statistical interpretation and properties, more efficient algorithms for optimisation and hyper-parameter tuning, and an explanation of predictive performance. Traditional high-dimensional statistical techniques; principal component analysis (PCA), partial least squares (PLS), reduced rank regression (RRR), projection pursuit regression (PPR) are shown to be shallow learners. Their deep learning counterparts exploit multiple layers of of data reduction which leads to performance gains. Stochastic gradient descent (SGD) training and optimisation and Dropout (DO) provides model and variable selection. Bayesian regularization is central to finding networks and provides a framework for optimal bias-variance trade-off to achieve good out-of sample performance. Constructing good Bayesian predictors in high dimensions is discussed. To illustrate our methodology, we provide an analysis of first time international bookings on Airbnb. Finally, we conclude with directions for future research.
Deep Learning: A Critical Appraisal Although deep learning has historical roots going back decades, neither the term ‘deep learning’ nor the approach was popular just over five years ago, when the field was reignited by papers such as Krizhevsky, Sutskever and Hinton’s now classic (2012) deep network model of Imagenet. What has the field discovered in the five subsequent years Against a background of considerable progress in areas such as speech recognition, image recognition, and game playing, and considerable enthusiasm in the popular press, I present ten concerns for deep learning, and suggest that deep learning must be supplemented by other techniques if we are to reach artificial general intelligence.
Deep Learning: An Introduction for Applied Mathematicians Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective. Our target audience includes postgraduate and final year undergraduate students in mathematics who are keen to learn about the area. The article may also be useful for instructors in mathematics who wish to enliven their classes with references to the application of deep learning techniques. We focus on three fundamental questions: what is a deep neural network how is a network trained what is the stochastic gradient method We illustrate the ideas with a short MATLAB code that sets up and trains a network. We also show the use of state-of-the art software on a large scale image classification problem. We finish with references to the current literature.
Deep Learning: Generalization Requires Deep Compositional Feature Space Design Generalization error defines the discriminability and the representation power of a deep model. In this work, we claim that feature space design using deep compositional function plays a significant role in generalization along with explicit and implicit regularizations. Our claims are being established with several image classification experiments. We show that the information loss due to convolution and max pooling can be marginalized with the compositional design, improving generalization performance. Also, we will show that learning rate decay acts as an implicit regularizer in deep model training.
Deep Learning: Past, Present and Future (Slide Deck)
Deep learning: Technical introduction This note presents in a technical though hopefully pedagogical way the three most common forms of neural network architectures: Feedforward, Convolutional and Recurrent. For each network, their fundamental building blocks are detailed. The forward pass and the update rules for the backpropagation algorithm are then derived in full.
Deep Learning-based Sequential Recommender Systems: Concepts, Algorithms, and Evaluations In the field of sequential recommendation, deep learning methods have received a lot of attention in the past few years and surpassed traditional models such as Markov chain-based and factorization-based ones. However, DL-based methods also have some critical drawbacks, such as insufficient modeling of user representation and ignoring to distinguish the different types of interactions (i.e., user behavior) among users and items. In this view, this survey focuses on DL-based sequential recommender systems by taking the aforementioned issues into consideration. Specifically, we illustrate the concept of sequential recommendation, propose a categorization of existing algorithms in terms of three types of behavioral sequence, summarize the key factors affecting the performance of DL-based models, and conduct corresponding evaluations to demonstrate the effects of these factors. We conclude this survey by systematically outlining future directions and challenges in this field.
Deep Neural Decision Forests We present Deep Neural Decision Forests – a novel approach that unifies classification trees with the representation learning functionality known from deep convolutional networks, by training them in an end-to-end manner. To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network. Our model differs from conventional deep networks because a decision forest provides the final predictions and it differs from conventional decision forests since we propose a principled, joint and global optimization of split and leaf node parameters. We show experimental results on benchmark machine learning datasets like MNIST and ImageNet and find onpar or superior results when compared to state-of-the-art deep models. Most remarkably, we obtain Top5-Errors of only 7:84%=6:38% on ImageNet validation data when integrating our forests in a single-crop, single/seven model GoogLeNet architecture, respectively. Thus, even without any form of training data set augmentation we are improving on the 6.67% error obtained by the best GoogLeNet architecture (7 models, 144 crops).
Deep Neural Network Approximation Theory Deep neural networks have become state-of-the-art technology for a wide range of practical machine learning tasks such as image classification, handwritten digit recognition, speech recognition, or game intelligence. This paper develops the fundamental limits of learning in deep neural networks by characterizing what is possible if no constraints on the learning algorithm and the amount of training data are imposed. Concretely, we consider information-theoretically optimal approximation through deep neural networks with the guiding theme being a relation between the complexity of the function (class) to be approximated and the complexity of the approximating network in terms of connectivity and memory requirements for storing the network topology and the associated quantized weights. The theory we develop educes remarkable universality properties of deep networks. Specifically, deep networks are optimal approximants for vastly different function classes such as affine systems and Gabor systems. This universality is afforded by a concurrent invariance property of deep networks to time-shifts, scalings, and frequency-shifts. In addition, deep networks provide exponential approximation accuracy i.e., the approximation error decays exponentially in the number of non-zero weights in the network of vastly different functions such as the squaring operation, multiplication, polynomials, sinusoidal functions, general smooth functions, and even one-dimensional oscillatory textures and fractal functions such as the Weierstrass function, both of which do not have any known methods achieving exponential approximation accuracy. In summary, deep neural networks provide information-theoretically optimal approximation of a very wide range of functions and function classes used in mathematical signal processing.
Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images Deep neural networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification problems. Given that DNNs are now able to classify objects in images with near-human-level performance, questions naturally arise as to what differences remain between computer and human vision. A recent study revealed that changing an image (e.g. of a lion) in a way imperceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a library). Here we show a related result: it is easy to produce images that are completely unrecognizable to humans, but that state-of-theart DNNs believe to be recognizable objects with 99.99% confidence (e.g. labeling with certainty that white noise static is a lion). Specifically, we take convolutional neural networks trained to perform well on either the ImageNet or MNIST datasets and then find images with evolutionary algorithms or gradient ascent that DNNs label with high confidence as belonging to each dataset class. It is possible to produce images totally unrecognizable to human eyes that DNNs believe with near certainty are familiar objects. Our results shed light on interesting differences between human vision and current DNNs, and raise questions about the generality of DNN computer vision.
Deep Neural Networks as Gaussian Processes A deep fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP) in the limit of infinite network width. This correspondence enables exact Bayesian inference for neural networks on regression tasks by means of straightforward matrix computations. For single hidden-layer networks, the covariance function of this GP has long been known. Recently, kernel functions for multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified the correspondence between using these kernels as the covariance function for a GP and performing fully Bayesian prediction with a deep neural network. In this work, we derive this correspondence and develop a computationally efficient pipeline to compute the covariance functions. We then use the resulting GP to perform Bayesian inference for deep neural networks on MNIST and CIFAR-10. We find that the GP-based predictions are competitive and can outperform neural networks trained with stochastic gradient descent. We observe that the trained neural network accuracy approaches that of the corresponding GP-based computation with increasing layer width, and that the GP uncertainty is strongly correlated with prediction error. We connect our observations to the recent development of signal propagation in random neural networks.
Deep Probabilistic Programming Languages: A Qualitative Study Deep probabilistic programming languages try to combine the advantages of deep learning with those of probabilistic programming languages. If successful, this would be a big step forward in machine learning and programming languages. Unfortunately, as of now, this new crop of languages is hard to use and understand. This paper addresses this problem directly by explaining deep probabilistic programming languages and indirectly by characterizing their current strengths and weaknesses.
Deep Regression Bayesian Network and Its Applications Deep directed generative models have attracted much attention recently due to their generative modeling nature and powerful data representation ability. In this paper, we review different structures of deep directed generative models and the learning and inference algorithms associated with the structures. We focus on a specific structure that consists of layers of Bayesian Networks due to the property of capturing inherent and rich dependencies among latent variables. The major difficulty of learning and inference with deep directed models with many latent variables is the intractable inference due to the dependencies among the latent variables and the exponential number of latent variable configurations. Current solutions use variational methods often through an auxiliary network to approximate the posterior probability inference. In contrast, inference can also be performed directly without using any auxiliary network to maximally preserve the dependencies among the latent variables. Specifically, by exploiting the sparse representation with the latent space, max-max instead of max-sum operation can be used to overcome the exponential number of latent configurations. Furthermore, the max-max operation and augmented coordinate ascent are applied to both supervised and unsupervised learning as well as to various inference. Quantitative evaluations on benchmark datasets of different models are given for both data representation and feature learning tasks.
Deep Reinforcement Learning We discuss deep reinforcement learning in an overview style. We draw a big picture, filled with details. We discuss six core elements, six important mechanisms, and twelve applications, focusing on contemporary work, and in historical contexts. We start with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources. Next we discuss RL core elements, including value function, policy, reward, model, exploration vs. exploitation, and representation. Then we discuss important mechanisms for RL, including attention and memory, unsupervised learning, hierarchical RL, multi-agent RL, relational RL, and learning to learn. After that, we discuss RL applications, including games, robotics, natural language processing (NLP), computer vision, finance, business management, healthcare, education, energy, transportation, computer systems, and, science, engineering, and art. Finally we summarize briefly, discuss challenges and opportunities, and close with an epilogue.
Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey Owe to the recent advancements in Artificial Intelligence especially deep learning, many data-driven decision support systems have been implemented to facilitate medical doctors in delivering personalized care. We focus on the deep reinforcement learning (DRL) models in this paper. DRL models have demonstrated human-level or even superior performance in the tasks of computer vision and game playings, such as Go and Atari game. However, the adoption of deep reinforcement learning techniques in clinical decision optimization is still rare. We present the first survey that summarizes reinforcement learning algorithms with Deep Neural Networks (DNN) on clinical decision support. We also discuss some case studies, where different DRL algorithms were applied to address various clinical challenges. We further compare and contrast the advantages and limitations of various DRL algorithms and present a preliminary guide on how to choose the appropriate DRL algorithm for particular clinical applications.
Deep Reinforcement Learning for Conversational AI Deep reinforcement learning is revolutionizing the artificial intelligence field. Currently, it serves as a good starting point for constructing intelligent autonomous systems which offer a better knowledge of the visual world. It is possible to scale deep reinforcement learning with the use of deep learning and do amazing tasks such as use of pixels in playing video games. In this paper, key concepts of deep reinforcement learning including reward function, differences between reinforcement learning and supervised learning and models for implementation of reinforcement are discussed. Key challenges related to the implementation of reinforcement learning in conversational AI domain are identified as well as discussed in detail. Various conversational models which are based on deep reinforcement learning (as well as deep learning) are also discussed. In summary, this paper discusses key aspects of deep reinforcement learning which are crucial for designing an efficient conversational AI.
Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications Reinforcement learning (RL) algorithms have been around for decades and been employed to solve various sequential decision-making problems. These algorithms however have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This paper addresses an important aspect of deep RL related to situations that demand multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multi-agent deep RL (MADRL) is presented, including non-stationarity, partial observability, continuous state and action spaces, multi-agent training schemes, multi-agent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed, with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to future development of more robust and highly useful multi-agent learning methods for solving real-world problems.
Deep Reinforcement Learning: An Overview In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.
Deep Reinforcement Learning: An Overview We give an overview of recent exciting achievements of deep reinforcement learning (RL). We start with background of deep learning and reinforcement learning, as well as introduction of testbeds. Next we discuss Deep Q-Network (DQN) and its extensions, asynchronous methods, policy optimization, reward, and planning. After that, we talk about attention and memory, unsupervised learning, and learning to learn. Then we discuss various applications of RL, including games, in particular, AlphaGo, robotics, spoken dialogue systems (a.k.a. chatbot), machine translation, text sequence prediction, neural architecture design, personalized web services, healthcare, finance, and music generation. We mention topics/papers not reviewed yet. After listing a collection of RL resources, we close with discussions.
Deep Reinforcement Learning: Framework, Applications, and Embedded Implementations The recent breakthroughs of deep reinforcement learning (DRL) technique in Alpha Go and playing Atari have set a good example in handling large state and actions spaces of complicated control problems. The DRL technique is comprised of (i) an offline deep neural network (DNN) construction phase, which derives the correlation between each state-action pair of the system and its value function, and (ii) an online deep Q-learning phase, which adaptively derives the optimal action and updates value estimates. In this paper, we first present the general DRL framework, which can be widely utilized in many applications with different optimization objectives. This is followed by the introduction of three specific applications: the cloud computing resource allocation problem, the residential smart grid task scheduling problem, and building HVAC system optimal control problem. The effectiveness of the DRL technique in these three cyber-physical applications have been validated. Finally, this paper investigates the stochastic computing-based hardware implementations of the DRL framework, which consumes a significant improvement in area efficiency and power consumption compared with binary-based implementation counterparts.
Deep Retrieval-Based Dialogue Systems: A Short Review Building dialogue systems that naturally converse with humans is being an attractive and an active research domain. Multiple systems are being designed everyday and several datasets are being available. For this reason, it is being hard to keep an up-to-date state-of-the-art. In this work, we present the latest and most relevant retrieval-based dialogue systems and the available datasets used to build and evaluate them. We discuss their limitations and provide insights and guidelines for future work.
Deep Semantic Segmentation of Natural and Medical Images: A Review The (medical) image semantic segmentation task consists of classifying each pixel of an image (or just several ones) into an instance, where each instance (or category) corresponding to a class. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological diagnostics. In this review, we categorize the main deep learning-based medical and non-medical image segmentation solutions into six main groups of deep architectural improvements, data synthesis-based, loss function-based improvements, sequenced models, weakly supervised, and multi-task methods and further for each group we analyzed each variant of these groups and discuss limitations of the current approaches and future research directions for semantic image segmentation.
Deep Stochastic Configuration Networks: Universal Approximation and Learning Representation This paper focuses on the development of randomized approaches for building deep neural networks. A supervisory mechanism is proposed to constrain the random assignment of the hidden parameters (i.e., all biases and weights within the hidden layers). Full-rank oriented criterion is suggested and utilized as a termination condition to determine the number of nodes for each hidden layer, and a pre-defined error tolerance is used as a global indicator to decide the depth of the learner model. The read-out weights attached with all direct links from each hidden layer to the output layer are incrementally evaluated by the least squares method. Such a class of randomized leaner models with deep architecture is termed as deep stochastic configuration networks (DeepSCNs), of which the universal approximation property is verified with rigorous proof. Given abundant samples from a continuous distribution, DeepSCNs can speedily produce a learning representation, that is, a collection of random basis functions with the cascaded inputs together with the read-out weights. Simulation results with comparisons on function approximation align with the theoretical findings.
Deep Visual Domain Adaptation: A Survey Deep domain adaption has emerged as a new learning technique to address the lack of massive amounts of labeled data. Compared to conventional methods, which learn shared feature subspaces or reuse important source instances with shallow representations, deep domain adaption methods leverage deep networks to learn more transferable representations by embedding domain adaptation in the pipeline of deep learning. There have been comprehensive surveys for shallow domain adaption, but few timely reviews the emerging deep learning based methods. In this paper, we provide a comprehensive survey of deep domain adaptation methods for computer vision applications with four major contributions. First, we present a taxonomy of different deep domain adaption scenarios according to the properties of data that define how two domains are diverged. Second, we summarize deep domain adaption approaches into several categories based on training loss, and analyze and compare briefly the state-of-the-art methods under these categories. Third, we overview the computer vision applications that go beyond image classification, such as face recognition, semantic segmentation and object detection. Fourth, some potential deficiencies of current methods and several future directions are highlighted.
Deep-learning in Mobile Robotics – from Perception to Control Systems: A Survey on Why and Why not Deep-learning has dramatically changed the world overnight. It greatly boosted the development of visual perception, object detection, and speech recognition, etc. That was attributed to the multiple convolutional processing layers for abstraction of learning representations from massive data. The advantages of deep convolutional structures in data processing motivated the applications of artificial intelligence methods in robotic problems, especially perception and control system, the two typical and challenging problems in robotics. This paper presents a survey of the deep-learning research landscape in mobile robotics. We start with introducing the definition and development of deep-learning in related fields, especially the essential distinctions between image processing and robotic tasks. We described and discussed several typical applications and related works in this domain, followed by the benefits from deep-learning, and related existing frameworks. Besides, operation in the complex dynamic environment is regarded as a critical bottleneck for mobile robots, such as that for autonomous driving. We thus further emphasize the recent achievement on how deep-learning contributes to navigation and control systems for mobile robots. At the end, we discuss the open challenges and research frontiers.
DeepWalk: Online Learning of Social Representations We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk’s latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk’s representations can provide F1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk’s representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.
Delivering Information Faster: In-Memory Technology Reboots the Big Data Analytics World In-memory technology – in which entire datasets are pre-loaded into a computer´s random access memory, alleviating the need for shuttling data between memory and disk storage every time a query is initiated – has actually been around for a number of years. However, with the onset of big data, as well as an insatiable thirst for analytics, the industry is taking a second look at this promising approach to speeding up data processing.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%,) on this visual recognition challenge.
Demystifying Fog Computing: Characterizing Architectures, Applications and Abstractions Internet of Things (IoT) has accelerated the deployment of millions of sensors at the edge of the network, through Smart City infrastructure and lifestyle devices. Cloud computing platforms are often tasked with handling these large volumes and fast streams of data from the edge. Recently, Fog computing has emerged as a concept for low-latency and resource-rich processing of these observation streams, to complement Edge and Cloud computing. In this paper, we review various dimensions of system architecture, application characteristics and platform abstractions that are manifest in this Edge, Fog and Cloud eco-system. We highlight novel capabilities of the Edge and Fog layers, such as physical and application mobility, privacy sensitivity, and a nascent runtime environment. IoT application case studies based on first-hand experiences across diverse domains drive this categorization. We also highlight the gap between the potential and the reality of Fog computing, and identify challenges that need to be overcome for the solution to be sustainable. Together, our article can help platform and application developers bridge the gap that remains in making Fog computing viable.
Density-based Clustering The clustering methods like K-means or Expectation-Maximization are suitable for finding ellipsoid-shaped clusters, or at best convex clusters. However, for non-convex clusters, such as those shown in Figure 15.1, these methods have trouble finding the true clusters, since two points from different clusters may be closer than two points in the same cluster. The density-based methods we consider in this chapter are able to mine such non-convex or shape-based clusters. Figure
Design Principles of Massive, Robust Prediction Systems Most data mining research is concerned with building high-quality classification models in isolation. In massive production systems, however, the ability to monitor and maintain performance over time while growing in size and scope is equally important. Many external factors may degrade classification performance including changes in data distribution, noise or bias in the source data, and the evolution of the system itself. A well-functioning system must gracefully handle all of these. This paper lays out a set of design principles for large-scale autonomous data mining systems and then demonstrates our application of these principles within the m6d automated ad targeting system. We demonstrate a comprehensive set of quality control processes that allow us monitor and maintain thousands of distinct classification models automatically, and to add new models, take on new data, and correct poorly-performing models without manual intervention or system disruption.
Designing Great Visualizations This paper traces the history of visual representation, from early cave drawings through the computer revolution and the launch of Tableau. We will discuss some of the pioneers in data research and show how their work helped to revolutionize visualization techniques. We will also examine the different styles of data visuals, discuss some of the barriers to making effective visuals and the methods we use to overcome those barriers. In the end, we will show the power (and limits) of human perception, and how we can use data to tell stories – much like those of the earliest cave drawings.
Designing with Data: A Case Study As the Internet of Things continues to take hold in the commercial world, the teams designing these new technologies are constantly evolving and turning their hand to uncharted territory. This is especially key within the field of secondary service design as businesses attempt to utilize and find value in the sensor data being produced by connected products. This paper discusses the ways in which a commercial design team use smart thermostat data to prototype an advice-giving chatbot. The team collaborate to produce a chat sequence through careful ordering of data & reasoning about customer reactions. The paper contributes important insights into design methods being used in practice within the under researched areas of chatbot prototyping and secondary service design.
Detecting Dead Weights and Units in Neural Networks Deep Neural Networks are highly over-parameterized and the size of the neural networks can be reduced significantly after training without any decrease in performance. One can clearly see this phenomenon in a wide range of architectures trained for various problems. Weight/channel pruning, distillation, quantization, matrix factorization are some of the main methods one can use to remove the redundancy to come up with smaller and faster models. This work starts with a short informative chapter, where we motivate the pruning idea and provide the necessary notation. In the second chapter, we compare various saliency scores in the context of parameter pruning. Using the insights obtained from this comparison and stating the problems it brings we motivate why pruning units instead of the individual parameters might be a better idea. We propose some set of definitions to quantify and analyze units that don’t learn and create any useful information. We propose an efficient way for detecting dead units and use it to select which units to prune. We get 5x model size reduction through unit-wise pruning on MNIST.
Deterministic Distributed Matching: Simpler, Faster, Better We present improved deterministic distributed algorithms for a number of well-studied matching problems, which are simpler, faster, more accurate, and/or more general than their known counterparts. The common denominator of these results is a deterministic distributed rounding method for certain linear programs, which is the first such rounding method, to our knowledge. A sampling of our end results is as follows: — An $O(\log^2 \Delta \log n)$-round deterministic distributed algorithm for computing a maximal matching, in $n$-node graphs with maximum degree $\Delta$. This is the first improvement in about 20 years over the celebrated $O(\log^4 n)$-round algorithm of Hanckowiak, Karonski, and Panconesi [SODA’98, PODC’99]. — An $O(\log^2 \Delta \log \frac{1}{\varepsilon} + \log^ * n)$-round deterministic distributed algorithm for a $(2+\varepsilon)$-approximation of maximum matching. This is exponentially faster than the classic $O(\Delta +\log^* n)$-round $2$-approximation of Panconesi and Rizzi [DIST’01]. With some modifications, the algorithm can also find an almost maximal matching which leaves only an $\varepsilon$-fraction of the edges on unmatched nodes. — An $O(\log^2 \Delta \log \frac{1}{\varepsilon} \log_{1+\varepsilon} W + \log^ * n)$-round deterministic distributed algorithm for a $(2+\varepsilon)$-approximation of a maximum weighted matching, and also for the more general problem of maximum weighted $b$-matching. Here, $W$ denotes the maximum normalized weight. These improve over the $O(\log^4 n \log_{1+\varepsilon} W)$-round $(6+\varepsilon)$-approximation algorithm of Panconesi and Sozio [DIST’10].
Diachronic word embeddings and semantic shifts: a survey Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.
Different Approach to the Problem of Missing Data There is a long history of devleopment of methodology dealing with missing data in statistical analysis. Today, the most popular methods fall into two classes, Complete Cases (CC) and Multiple Imputation (MI). Another approach, Available Cases (AC), has occasionally been mentioned in the research literature, in the context of linear regression analysis, but has generally been ignored. In this paper, we revisit the AC method, showing that it can perform better than CC and MI, and we extend its breadth of application.
Different Approaches for Human Activity Recognition: A Survey Human activity recognition has gained importance in recent years due to its applications in various fields such as health, security and surveillance, entertainment, and intelligent environments. A significant amount of work has been done on human activity recognition and researchers have leveraged different approaches, such as wearable, object-tagged, and device-free, to recognize human activities. In this article, we present a comprehensive survey of the work conducted over the period 2010-2018 in various areas of human activity recognition with main focus on device-free solutions. The device-free approach is becoming very popular due to the fact that the subject is not required to carry anything, instead, the environment is tagged with devices to capture the required information. We propose a new taxonomy for categorizing the research work conducted in the field of activity recognition and divide the existing literature into three sub-areas: action-based, motion-based, and interaction-based. We further divide these areas into ten different sub-topics and present the latest research work in these sub-topics. Unlike previous surveys which focus only on one type of activities, to the best of our knowledge, we cover all the sub-areas in activity recognition and provide a comparison of the latest research work in these sub-areas. Specifically, we discuss the key attributes and design approaches for the work presented. Then we provide extensive analysis based on 10 important metrics, to give the reader, a complete overview of the state-of-the-art techniques and trends in different sub-areas of human activity recognition. In the end, we discuss open research issues and provide future research directions in the field of human activity recognition.
Different Stages of Wearable Health Tracking Adoption & Abandonment: A Survey Study and Analysis Health trackers are widely adopted to support users with daily health and wellness tracking. They can help increase steps taken, enhance sleeping pattern, improve healthy diet, and promote overall health. Despite the growth in the adoption of such technology, their reallife use is still questionable. While some users derive longterm value from their trackers, others face barriers to integrate it into their daily routine. Studies have analysed technical aspects of these barriers. In this study, we analyse the behavioural factors of discouragement and wearable abandonment strictly tied to user habits and living circumstances. A data analysis was conducted in two different studies, one with users posts about wearable sales and the other one was a survey analysis. The two studies were used to analyse the stages of wearable adoption, use and abandonment. Therefore, we mainly focused on users motives to get a wearable tracker and to post it for sale. We extracted insights about user motives, highlighted technology condition and limitations, and timeframe before abandonment. The findings revealed certain user behavioural pattern throughout the wearable use and abandonment.
Differential Similarity in Higher Dimensional Spaces: Theory and Applications This paper presents an extension and an elaboration of the theory of differential similarity, which was originally proposed in arXiv:1401.2411 [cs.LG]. The goal is to develop an algorithm for clustering and coding that combines a geometric model with a probabilistic model in a principled way. For simplicity, the geometric model in the earlier paper was restricted to the three-dimensional case. The present paper removes this restriction, and considers the full $n$-dimensional case. Although the mathematical model is the same, the strategies for computing solutions in the $n$-dimensional case are different, and one of the main purposes of this paper is to develop and analyze these strategies. Another main purpose is to devise techniques for estimating the parameters of the model from sample data, again in $n$ dimensions. We evaluate the solution strategies and the estimation techniques by applying them to two familiar real-world examples: the classical MNIST dataset and the CIFAR-10 dataset.
Digital Twin: Enabling Technology, Challenges and Open Research Digital Twin technology is an emerging concept that has recently become the centre of attention for industry and in more recent year’s academia. The advancements in industry 4.0 concepts have facilitated its growth, particularly in the manufacturing industry. The Digital Twin is defined extensively but is described as the effortless integration of data between a physical and virtual machine in either direction. The challenges, applications, and enabling technologies for Artificial Intelligence, Internet of Things and Digital Twins are presented. A review of publications relating to Digital Twins is performed, producing a categorical review of recent papers. The review has categorised them by research area; Manufacturing, Healthcare and Smart cities. Discussing a range of papers that reflect these areas and the current state of research. The paper outlines the open research opportunities and challenges.
Directional Statistics in Machine Learning: a Brief Review The modern data analyst must cope with data encoded in various forms, vectors, matrices, strings, graphs, or more. Consequently, statistical and machine learning models tailored to different data encodings are important. We focus on data encoded as normalized vectors, so that their ‘direction’ is more important than their magnitude. Specifically, we consider high-dimensional vectors that lie either on the surface of the unit hypersphere or on the real projective plane. For such data, we briefly review common mathematical models prevalent in machine learning, while also outlining some technical aspects, software, applications, and open mathematical challenges.
Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society The Big Data Research and Development Initiative is now in its third year and making great strides to address the challenges of Big Data. To further advance this initiative, we describe how statistical thinking can help tackle the many Big Data challenges, emphasizing that often the most productive approach will involve multidisciplinary teams with statistical, computational, mathematical, and scientific domain expertise.
discrete examples: genetics and spell checking
Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier – A Review The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested example and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures This review attempts to answer the previous question through evaluating the performance (measured by accuracy, precision and recall) of the KNN using a large number of distance measures, tested on a number of real world datasets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, the results showed large gaps between the performances of different distances. We found that a recently proposed non-convex distance performed the best when applied on most datasets comparing to the other tested distances. In addition, the performance of the KNN degraded only about $20\%$ while the noise level reaches $90\%$, this is true for all the distances used. This means that the KNN classifier using any of the top $10$ distances tolerate noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing to other distances.
Distance Metric Learning – A Comprehensive Survey Many machine learning algorithms, such as K Nearest Neighbor (KNN), heavily rely on the distance metric for the input data patterns. Distance Metric learning is to learn a distance metric for the input space of data from a given collection of pair of similar/dissimilar points that preserves the distance relation among the training data. In recent years, many studies have demonstrated, both empirically and theoretically, that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. This paper surveys the field of distance metric learning from a principle perspective, and includes a broad selection of recent work. In particular, distance metric learning is reviewed under different learning conditions: supervised learning versus unsupervised learning, learning in a global sense versus in a local sense; and the distance matrix based on linear kernel versus nonlinear kernel. In addition, this paper discusses a number of techniques that is central to distance metric learning, including convex programming, positive semi-definite programming, kernel learning, dimension reduction, K Nearest Neighbor, large margin classification, and graph-based approaches.
Distinguishing cause from effect using observational data: methods and benchmarks The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X; Y . This was often considered to be impossible. Nevertheless, several approaches for addressing this bivariate causal discovery problem were proposed recently. In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 di erent \causee ect pairs’ selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on arti cially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from e ect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. (2009).
Distributed Computation of Linear Matrix Equations: An Optimization Perspective This paper investigates the distributed computation of the well-known linear matrix equation in the form of AXB = F, with the matrices A, B, X, and F of appropriate dimensions, over multi-agent networks from an optimization perspective. In this paper, we consider the standard distributed matrix-information structures, where each agent of the considered multi-agent network has access to one of the sub-block matrices of A, B, and F. To be specific, we first propose different decomposition methods to reformulate the matrix equations in standard structures as distributed constrained optimization problems by introducing substitutional variables; we show that the solutions of the reformulated distributed optimization problems are equivalent to least squares solutions to the original matrix equations; and we design distributed continuous-time algorithms for the constrained optimization problems, even by using augmented matrices and a derivative feedback technique. With help of the semi-stability analysis, we prove the convergence of the algorithms to a least squares solution to the matrix equation for any initial condition.
Distributed Constraint Optimization Problems and Applications: A Survey The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents’ autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.
Distributed Decision Tree Learning for Mining Big Data Streams Web companies need to e ectively analyse big data in order to enhance the experiences of their users. They need to have systems that are capable of handling big data in term of three dimensions: volume as data keeps growing, variety as the type of data is diverse, and velocity as the is continuously arriving very fast into the systems. However, most of the existing systems have addressed at most only two out of the three dimensions such as Mahout, a distributed machine learning framework that addresses the volume and variety dimensions, and Massive Online Analysis (MOA), a streaming machine learning framework that handles the variety and velocity dimensions. In this thesis, we propose and develop Scalable Advanced Massive Online Analysis (SAMOA), a distributed streaming machine learning framework to address the aforementioned challenge. SAMOA provides exible application programming interfaces (APIs) to allow rapid development of new ML algorithms for dealing with variety. Moreover, we integrate SAMOA with Storm, a state-of-the-art stream processing engine (SPE), which allows SAMOA to inherit Storm’s scalability to address velocity and volume. The main benefits of SAMOA are: it provides exibility in developing new ML algorithms and extensibility in integrating new SPEs. We develop a distributed online classification algorithm on top of SAMOA to verify the aforementioned features of SAMOA. The evaluation results show that the distributed algorithm is suitable for high number of attributes settings.
Distributed Latent Dirichlet Allocation via Tensor Factorization Latent Dirichlet Allocation (LDA) has proven extremely popular and versatile since its introduction over a decade ago. LDA is successful in part because it assigns a mixture of latent states (‘topics’) to each set of exchangeable observations (‘document’), in contrast to a hard clustering. This property complicates the estimation of latent parameters, and has led to extensive research in disparate learning techniques. Broadly speaking there are 3 basic strategies: variational inference ; Markov chain Monte Carlo ; and the method of moments , the latter having been recently discovered. Due to high dimensional data with large vocabulary size; numerous documents; and number of topics, computational constraints are the limiting factor to developing large scale topic models. This has motivated research into scalable computational strategies for LDA. In the single node context, stochastic variational inference is fast and accurate, but has high communication costs in the distributed setting. Batch variational inference has a more favorable ratio of communication to computation as the E-step (but not the M-step) is embarrisingly parallel. Markov chain Monte Carlo (MCMC) techniques have also been implemented in the distributed setting, both synchronous and asynchronous variants. Due to their recent introduction, there are no distributed implementations of method of moments based approaches to LDA. We leverage that the method of moments for LDA reduces to canonical polyadic (CP) decomposition of a tensor, a problem which has received extensive study in the literature , including distributed variants. We combine ALS with whitening preprocessing (data orthogonalization and dimensionality reduction) motivated by better convergence rate and perturbation guarantees compared to previous methods. Additionally, the preprocessing has the benefit that the subsequent tensor decomposition is independent of the vocabulary size and the number of documents. Although ALS requires many iterations to converge (more than would be tolerable using map-reduce without custom support for low-overhead iteration), we utilize REEF , a distributed processing framework which runs on YARN managed clusters, e.g., a Hadoop 2 installation.
Distributed Least-Squares Iterative Methods in Networks: A Survey Many science and engineering applications involve solving a linear least-squares system formed from some field measurements. In the distributed cyber-physical systems (CPS), often each sensor node used for measurement only knows partial independent rows of the least-squares system. To compute the least-squares solution they need to gather all these measurement at a centralized location and then compute the solution. These data collection and computation are inefficient because of bandwidth and time constraints and sometimes are infeasible because of data privacy concerns. Thus distributed computations are strongly preferred or demanded in many of the real world applications e.g.: smart-grid, target tracking etc. To compute least squares for the large sparse system of linear equation iterative methods are natural candidates and there are a lot of studies regarding this, however, most of them are related to the efficiency of centralized/parallel computations while and only a few are explicitly about distributed computation or have the potential to apply in distributed networks. This paper surveys the representative iterative methods from several research communities. Some of them were not originally designed for this need, so we slightly modified them to suit our requirement and maintain the consistency. In this survey, we sketch the skeleton of the algorithm first and then analyze its time-to-completion and communication cost. To our best knowledge, this is the first survey of distributed least-squares in distributed networks.
Distributed Machine Learning with Apache Mahout (RefCard) Apache Mahout is a library for scalable machine learning. Originally a subproject of Apache Lucene (a high-performance text search engine library), Mahout has progressed to be a top-level Apache project. While Mahout has only been around for a few years, it has established itself as a frontrunner in the field of machine learning technologies. This Refcard will present the basics of Mahout by studying two possible applications: • Training and testing a Random Forest for handwriting recognition using Amazon Web Services EMR. • Running a recommendation engine on a standalone Spark cluster.
Distributionally robust optimization with polynomial densities: theory, models and algorithms In distributionally robust optimization the probability distribution of the uncertain problem parameters is itself uncertain, and a fictitious adversary, e.g., nature, chooses the worst distribution from within a known ambiguity set. A common shortcoming of most existing distributionally robust optimization models is that their ambiguity sets contain pathological discrete distribution that give nature too much freedom to inflict damage. We thus introduce a new class of ambiguity sets that contain only distributions with sum-of-squares polynomial density functions of known degrees. We show that these ambiguity sets are highly expressive as they conveniently accommodate distributional information about higher-order moments, conditional probabilities, conditional moments or marginal distributions. Exploiting the theoretical properties of a measure-based hierarchy for polynomial optimization due to Lasserre [SIAM J. Optim. 21(3) (2011), pp. 864–885], we prove that certain worst-case expectation constraints are computationally tractable under these new ambiguity sets. We showcase the practical applicability of the proposed approach in the context of a stylized portfolio optimization problem and a risk aggregation problem of an insurance company.
Distributionally Robust Optimization: A Review The concepts of risk-aversion, chance-constrained optimization, and robust optimization have developed significantly over the last decade. Statistical learning community has also witnessed a rapid theoretical and applied growth by relying on these concepts. A modeling framework, called distributionally robust optimization (DRO), has recently received significant attention in both the operations research and statistical learning communities. This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization.
Distribution-Based Categorization of Classifier Transfer Learning Transfer Learning (TL) aims to transfer knowledge acquired in one problem, the source problem, onto another problem, the target problem, dispensing with the bottom-up construction of the target model. Due to its relevance, TL has gained significant interest in the Machine Learning community since it paves the way to devise intelligent learning models that can easily be tailored to many different applications. As it is natural in a fast evolving area, a wide variety of TL methods, settings and nomenclature have been proposed so far. However, a wide range of works have been reporting different names for the same concepts. This concept and terminology mixture contribute however to obscure the TL field, hindering its proper consideration. In this paper we present a review of the literature on the majority of classification TL methods, and also a distribution-based categorization of TL with a common nomenclature suitable to classification problems. Under this perspective three main TL categories are presented, discussed and illustrated with examples.
Divergence, Entropy, Information: An Opinionated Introduction to Information Theory Information theory is a mathematical theory of learning with deep connections with topics as diverse as artificial intelligence, statistical physics, and biological evolution. Many primers on the topic paint a broad picture with relatively little mathematical sophistication, while many others develop specific application areas in detail. In contrast, these informal notes aim to outline some elements of the information-theoretic ‘way of thinking,’ by cutting a rapid and interesting path through some of the theory’s foundational concepts and theorems. We take the Kullback-Leibler divergence as our foundational concept, and then proceed to develop the entropy and mutual information. We discuss some of the main foundational results, including the Chernoff bounds as a characterization of the divergence; Gibbs’ Theorem; and the Data Processing Inequality. A recurring theme is that the definitions of information theory support natural theorems that sound ‘obvious’ when translated into English. More pithily, ‘information theory makes common sense precise.’ Since the focus of the notes is not primarily on technical details, proofs are provided only where the relevant techniques are illustrative of broader themes. Otherwise, proofs and intriguing tangents are referenced in liberally-sprinkled footnotes. The notes close with a highly nonexhaustive list of references to resources and other perspectives on the field.
Do Convolutional Networks need to be Deep for Text Classification We study in this work the importance of depth in convolutional models for text classification, either when character or word inputs are considered. We show on 5 standard text classification and sentiment analysis tasks that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms deep models such as DenseNet with word inputs. Our shallow word model further establishes new state-of-the-art performances on two datasets: Yelp Binary (95.9\%) and Yelp Full (64.9\%).
Do Deep Learning Models Have Too Many Parameters An Information Theory Viewpoint Deep learning models often have more parameters than observations, and still perform well. This is sometimes described as a paradox. In this work, we show experimentally that despite their huge number of parameters, deep neural networks can compress the data losslessly even when taking the cost of encoding the parameters into account. Such a compression viewpoint originally motivated the use of variational methods in neural networks. However, we show that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. Better encoding methods, imported from the Minimum Description Length (MDL) toolbox, yield much better compression values on deep networks, corroborating the hypothesis that good compression on the training set correlates with good test performance.
Do GANs actually learn the distribution An empirical study Do GANS (Generative Adversarial Nets) actually learn the target distribution The foundational paper of (Goodfellow et al 2014) suggested they do, if they were given sufficiently large deep nets, sample size, and computation time. A recent theoretical analysis in Arora et al (to appear at ICML 2017) raised doubts whether the same holds when discriminator has finite size. It showed that the training objective can approach its optimum value even if the generated distribution has very low support —in other words, the training objective is unable to prevent mode collapse. The current note reports experiments suggesting that such problems are not merely theoretical. It presents empirical evidence that well-known GANs approaches do learn distributions of fairly low support, and thus presumably are not learning the target distribution. The main technical contribution is a new proposed test, based upon the famous birthday paradox, for estimating the support size of the generated distribution.
Do Neural Nets Learn Statistical Laws behind Natural Language The performance of deep learning in natural language processing has been spectacular, but the reason for this success remains unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a Long Short-Term Memory (LSTM)-based neural language model effectively reproduces Zipf’s law and Heaps’ law, two representative statistical properties underlying natural language. We discuss the quality of the reproducibility and the emergence of Zipf’s law and Heaps’ law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical law of natural language. This understanding could provide a direction of improvement of architectures of neural networks.
Do we Need Hundreds of Classifiers to Solve Real World Classification Problems We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively).
Do you know Big Data (Cheat Sheet)
Does modelling need a Reformation Ideas for a new grammar of modelling The quality of mathematical modelling is looked at from the perspective of science’s own quality control arrangement and recent crises. It is argued that the crisis in the quality of modelling is at least as serious as that which has come to light in fields such as medicine, economics, psychology, and nutrition. In the context of the nascent sociology of quantification, the linkages between big data, algorithms, mathematical and statistical modelling (use and misuse of p-values) are evident. Looking at existing proposals for best practices the suggestion is put forward that the field needs a thorough Reformation, leading to a new grammar for modelling. Quantitative methodologies such as uncertainty and sensitivity analysis can form the bedrock on which the new grammar is built, while incorporating important normative and ethical elements. To this effect we introduce sensitivity auditing, quantitative storytelling, and ethics of quantification.
Does putting your emotions into words make you feel better? Measuring the minute-scale dynamics of emotions from online data Studies of affect labeling, i.e. putting your feelings into words, indicate that it can attenuate positive and negative emotions. Here we track the evolution of individual emotions for tens of thousands of Twitter users by analyzing the emotional content of their tweets before and after they explicitly report having a strong emotion. Our results reveal how emotions and their expression evolve at the temporal resolution of one minute. While the expression of positive emotions is preceded by a short but steep increase in positive valence and followed by short decay to normal levels, negative emotions build up more slowly, followed by a sharp reversal to previous levels, matching earlier findings of the attenuating effects of affect labeling. We estimate that positive and negative emotions last approximately 1.25 and 1.5 hours from onset to evanescence. A separate analysis for male and female subjects is suggestive of possible gender-specific differences in emotional dynamics.
Doing the impossible: Why neural networks can be trained at all As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don’t we also need billions of samples to train it Of course, the success of deep learning indicates that reliable models can be learned with reasonable amounts of data. Similar questions arise in protein folding, spin glasses and biological neural networks. With effectively infinite potential folding/spin/wiring configurations, how does the system find the precise arrangement that leads to useful and robust results Simple sampling of the possible configurations until an optimal one is reached is not a viable option even if one waited for the age of the universe. On the contrary, there appears to be a mechanism in the above phenomena that forces them to achieve configurations that live on a low-dimensional manifold, avoiding the curse of dimensionality. In the current work we use the concept of mutual information between successive layers of a deep neural network to elucidate this mechanism and suggest possible ways of exploiting it to accelerate training. We show that adding structure to the neural network that enforces higher mutual information between layers speeds training and leads to more accurate results. High mutual information between layers implies that the effective number of free parameters is exponentially smaller than the raw number of tunable weights.
Domain Adaptation for Visual Applications: A Comprehensive Survey The aim of this paper is to give an overview of domain adaptation and transfer learning with a specific view on visual applications. After a general motivation, we first position domain adaptation in the larger transfer learning problem. Second, we try to address and analyze briefly the state-of-the-art methods for different types of scenarios, first describing the historical shallow methods, addressing both the homogeneous and the heterogeneous domain adaptation methods. Third, we discuss the effect of the success of deep convolutional architectures which led to new type of domain adaptation methods that integrate the adaptation within the deep architecture. Fourth, we overview the methods that go beyond image categorization, such as object detection or image segmentation, video analyses or learning visual attributes. Finally, we conclude the paper with a section where we relate domain adaptation to other machine learning solutions.
Don´t Be Overwhelmed by Big Data Big. Data. The CPG industry is abuzz with those two words. And for good reason, as both brick-and-mortar retailers and online retailers each attempt to create the ideal omni-channel consumer experience that will drive increased sales, they look to Big Data for actionable insights that can be measured against key KPIs. And while it´s understandable that the CPG industry is excited by the prospect of more data they can use to better understand the who, what, why and when of consumer purchasing behavior, it´s critical CPG organizations pause and ask themselves, ‘Are we providing retail team and executive team members with ‘quality’ data POS, Big Data, order data or shipment summary data. It doesn´t matter. Is the right (i.e., ‘quality’) data getting to the right people at the right time In essence, it´s the same question retail and executive teams face when considering how to best merchandise their SKUs. If CPG organizations understand the importance of getting the right product to the right people at the right time, then surely they understand the importance of applying the same forethought to their demand data …
Duality of Graphical Models and Tensor Networks In this article we show the duality between tensor networks and undirected graphical models with discrete variables. We study tensor networks on hypergraphs, which we call tensor hypernetworks. We show that the tensor hypernetwork on a hypergraph exactly corresponds to the graphical model given by the dual hypergraph. We translate various notions under duality. For example, marginalization in a graphical model is dual to contraction in the tensor network. Algorithms also translate under duality. We show that belief propagation corresponds to a known algorithm for tensor network contraction. This article is a reminder that the research areas of graphical models and tensor networks can benefit from interaction.
Dykstra’s Algorithm, ADMM, and Coordinate Descent: Connections, Insights, and Extensions We study connections between Dykstra’s algorithm for projecting onto an intersection of convex sets, the augmented Lagrangian method of multipliers or ADMM, and block coordinate descent. We prove that coordinate descent for a regularized regression problem, in which the (separable) penalty functions are seminorms, is exactly equivalent to Dykstra’s algorithm applied to the dual problem. ADMM on the dual problem is also seen to be equivalent, in the special case of two sets, with one being a linear subspace. These connections, aside from being interesting in their own right, suggest new ways of analyzing and extending coordinate descent. For example, from existing convergence theory on Dykstra’s algorithm over polyhedra, we discern that coordinate descent for the lasso problem converges at an (asymptotically) linear rate. We also develop two parallel versions of coordinate descent, based on the Dykstra and ADMM connections.
Dynamic Bayesian Networks: A State of the Art
Dynamic Decision Networks for Decision-Making in Self-Adaptive Systems: A Case Study Bayesian decision theory is increasingly applied to support decision-making processes under environmental variability and uncertainty. Researchers from application areas like psychology and biomedicine have applied these techniques successfully. However, in the area of software engineering and specifically in the area of self-adaptive systems (SASs), little progress has been made in the application of Bayesian decision theory. We believe that techniques based on Bayesian Networks (BNs) are useful for systems that dynamically adapt themselves at runtime to a changing environment, which is usually uncertain. In this paper, we discuss the case for the use of BNs, specifically Dynamic Decision Networks (DDNs), to support the decision-making of self-adaptive systems. We present how such a probabilistic model can be used to support the decisionmaking in SASs and justify its applicability. We have applied our DDN-based approach to the case of an adaptive remote data mirroring system. We discuss results, implications and potential benefits of the DDN to enhance the development and operation of self-adaptive systems, by providing mechanisms to cope with uncertainty and automatically make the best decision.
Dynamic Shortest Path and Transitive Closure Algorithms: A Survey Algorithms which compute properties over graphs have always been of interest in computer science, with some of the fundamental algorithms, such as Dijkstra’s algorithm, dating back to the 50s. Since the 70s there as been interest in computing over graphs which are constantly changing, in a way which is more efficient than simple recomputing after each time the graph changes. In this paper we provide a survey of both the foundational, and the state of the art, algorithms which solve either shortest path or transitive closure problems in either fully or partially dynamic graphs. We balance this with the known conditional lowerbounds.
Dynamo: Amazon´s Highly Available Key-value Store Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon´s core services use to provide an ‘always-on’ experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

E

Easy over Hard: A Case Study on Deep Learning While deep learning is an exciting new technique, the benefits of this method need to be assessed w.r.t. its computational cost. This is particularly important for deep learning since these learners need hours (to weeks) to train the model. Such long CPU times limit the ability of (a) a researcher to test the stability of their conclusion via repeated runs with different random seeds; and (b)other researchers to repeat, improve, or even refute that original work. For example, recently, deep learning was used to find which questions in the Stack Overflow programmer discussion forum can be linked together. That system took 14 hours to execute. We show here that a very simple optimizer called DE (differential evolution) can achieve similar (and sometimes better). The DE approach terminated in 10 minutes; i.e. 84 times faster hours than deep learning. We offer these results as a cautionary tale to the software analytics community and suggest that not every new innovation should be applied without critical analysis. If researchers deploy some new and expensive process, that work should be baselined against some simpler and faster alternatives.
Econometrics in R A more advanced tutorial on econometrics with R
Economic Forecasting Forecasts guide decisions in all areas of economics and finance and their value can only be understood in relation to, and in the context of, such decisions. We discuss the central role of the loss function in helping determine the forecaster´s objectives. Decision theory provides a framework for both the construction and evaluation of forecasts. This framework allows an understanding of the challenges that arise from the explosion in the sheer volume of predictor variables under consideration and the forecaster´s ability to entertain an endless array of forecasting models and time-varying specifications, none of which may coincide with the ‘true´ model. We show this along with reviewing methods for comparing the forecasting performance of pairs of models or evaluating the ability of the best of many models to beat a benchmark specification.
EDISON Data Science Framework The EDISON Data Science Framework is a collection of documents that define the Data Science profession. Freely available, these documents have been developed to guide educators and trainers, emplyers and managers, and Data Scientists themselves. This collection of documents collectively breakdown the complexity of the skills and competences need to define Data Science as a professional practice.
Effective optimization using sample persistence: A case study on quantum annealers and various Monte Carlo optimization methods We present and apply a general-purpose, multi-start algorithm for improving the performance of low-energy samplers used for solving optimization problems. The algorithm iteratively fixes the value of a large portion of the variables to values that have a high probability of being optimal. The resulting problems are smaller and less connected, and samplers tend to give better low-energy samples for these problems. The algorithm is trivially parallelizable, since each start in the multi-start algorithm is independent, and could be applied to any heuristic solver that can be run multiple times to give a sample. We present results for several classes of hard problems solved using simulated annealing, path-integral quantum Monte Carlo, parallel tempering with isoenergetic cluster moves, and a quantum annealer, and show that the success metrics as well as the scaling are improved substantially. When combined with this algorithm, the quantum annealer’s scaling was substantially improved for native Chimera graph problems. In addition, with this algorithm the scaling of the time to solution of the quantum annealer is comparable to the Hamze–de Freitas–Selby algorithm on the weak-strong cluster problems introduced by Boixo et al. Parallel tempering with isoenergetic cluster moves was able to consistently solve 3D spin glass problems with 8000 variables when combined with our method, whereas without our method it could not solve any.
Efficient Dimensionality Reduction for High-Dimensional Network Estimation We propose module graphical lasso (MGL), an aggressive dimensionality reduction and network estimation technique for a highdimensional Gaussian graphical model (GGM). MGL achieves scalability, interpretability and robustness by exploiting the modularity property of many real-world networks. Variables are organized into tightly coupled modules and a graph structure is estimated to determine the conditional independencies among modules. MGL iteratively learns the module assignment of variables, the latent variables, each corresponding to a module, and the parameters of the GGM of the latent variables. In synthetic data experiments, MGL outperforms the standard graphical lasso and three other methods that incorporate latent variables into GGMs. When applied to gene expression data from ovarian cancer, MGL outperforms standard clustering algorithms in identifying functionally coherent gene sets and predicting survival time of patients. The learned modules and their dependencies provide novel insights into cancer biology as well as identifying possible novel drug targets.
Efficient Estimation of Word Representations in Vector Space We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Efficient Forecasting for Hierarchical Time Series Forecasting is used as the basis for business planning in many application areas such as energy, sales and traffic management. Time series data used in these areas is often hierarchically organized and thus, aggregated along the hierarchy levels based on their dimensional features. Calculating forecasts in these environments is very time consuming, due to ensuring forecasting consistency between hierarchy levels. To increase the forecasting efficiency for hierarchically organized time series, we introduce a novel forecasting approach that takes advantage of the hierarchical organization. There, we reuse the forecast models maintained on the lowest level of the hierarchy to almost instantly create already estimated forecast models on higher hierarchical levels. In addition, we define a hierarchical communication framework, increasing the communication flexibility and efficiency. Our experiments show significant runtime improvements for creating a forecast model at higher hierarchical levels, while still providing a very high accuracy.
Efficient Optimization Algorithms for Robust Principal Component Analysis and Its Variants Robust PCA has drawn significant attention in the last decade due to its success in numerous application domains, ranging from bio-informatics, statistics, and machine learning to image and video processing in computer vision. Robust PCA and its variants such as sparse PCA and stable PCA can be formulated as optimization problems with exploitable special structures. Many specialized efficient optimization methods have been proposed to solve robust PCA and related problems. In this paper we review existing optimization methods for solving convex and nonconvex relaxations/variants of robust PCA, discuss their advantages and disadvantages, and elaborate on their convergence behaviors. We also provide some insights for possible future research directions including new algorithmic frameworks that might be suitable for implementing on multi-processor setting to handle large-scale problems.
Efficient Processing of Deep Neural Networks: A Tutorial and Survey Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of deep neural network to improve energy-efficiency and throughput without sacrificing performance accuracy or increasing hardware cost are critical to enabling the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various platforms and architectures that support DNNs, and highlight key trends in recent efficient processing techniques that reduce the computation cost of DNNs either solely via hardware design changes or via joint hardware design and network algorithm changes. It will also summarize various development resources that can enable researchers and practitioners to quickly get started on DNN design, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-design, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand trade-offs between various architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand of recent implementation trends and opportunities.
Eigenvalue and Generalized Eigenvalue Problems: Tutorial This paper is a tutorial for eigenvalue and generalized eigenvalue problems. We first introduce eigenvalue problem, eigen-decomposition (spectral decomposition), and generalized eigenvalue problem. Then, we mention the optimization problems which yield to the eigenvalue and generalized eigenvalue problems. We also provide examples from machine learning, including principal component analysis, kernel supervised principal component analysis, and Fisher discriminant analysis, which result in eigenvalue and generalized eigenvalue problems. Finally, we introduce the solutions to both eigenvalue and generalized eigenvalue problems.
Elements and Principles of Data Analysis The data revolution has led to an increased interest in the practice of data analysis. As a result, there has been a proliferation of ‘data science’ training programs. Because data science has been previously defined as an intersection of already-established fields or union of emerging technologies, the following problems arise: (1) There is little agreement about what is data science; (2) Data science becomes secondary to established fields in a university setting; and (3) It is difficult to have discussions on what it means to learn about data science, to teach data science courses and to be a data scientist. To address these problems, we propose to define the field from first principles based on the activities of people who analyze data with a language and taxonomy for describing a data analysis in a manner spanning disciplines. Here, we describe the elements and principles of data analysis. This leads to two insights: it suggests a formal mechanism to evaluate data analyses based on objective characteristics, and it provides a framework to teach students how to build data analyses. We argue that the elements and principles of data analysis lay the foundational framework for a more general theory of data science.
Elements of nonlinear analysis of information streams This review considers methods of nonlinear dynamics to apply for analysis of time series corresponding to information streams on the Internet. In the main, these methods are based on correlation, fractal, multifractal, wavelet, and Fourier analysis. The article is dedicated to a detailed description of these approaches and interconnections among them. The methods and corresponding algorithms presented can be used for detecting key points in the dynamic of information processes; identifying periodicity, anomaly, self-similarity, and correlations; forecasting various information processes. The methods discussed can form the basis for detecting information attacks, campaigns, operations, and wars.
Elite Bases Regression: A Real-time Algorithm for Symbolic Regression Symbolic regression is an important but challenging research topic in data mining. It can detect the underlying mathematical models. Genetic programming (GP) is one of the most popular methods for symbolic regression. However, its convergence speed might be too slow for large scale problems with a large number of variables. This drawback has become a bottleneck in practical applications. In this paper, a new non-evolutionary real-time algorithm for symbolic regression, Elite Bases Regression (EBR), is proposed. EBR generates a set of candidate basis functions coded with parse-matrix in specific mapping rules. Meanwhile, a certain number of elite bases are preserved and updated iteratively according to the correlation coefficients with respect to the target model. The regression model is then spanned by the elite bases. A comparative study between EBR and a recent proposed machine learning method for symbolic regression, Fast Function eXtraction (FFX), are conducted. Numerical results indicate that EBR can solve symbolic regression problems more effectively.
Embedded Analytics – Empower the Citizen Data Scientist: A guide for analytics teams and line-ofbusiness managers trying to do more with embedded analytics With the advent of technologies that connect more people, machines and processes to one another, the importance of extending advanced analytics and machine learning to your users is growing fast. But the effort to derive maximum benefit from advanced analytics is still limited in most organizations by the human element. Data scientists, seen as the only people sufficiently trained to navigate big data successfully, become the bottleneck. This paper examines how the citizen data scientist can use embedded analytics in your software applications, and how the line of business (LOB) can benefit. Powerful software tools like TIBCO Statistica™ allow trained data scientists to develop models around advanced analytics, machine learning and algorithmic business, then make them available to the LOB managers and staff who use your applications to make better decisions.
Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework In this paper, we argue that the future of Artificial Intelligence research resides in two keywords: integration and embodiment. We support this claim by analyzing the recent advances of the field. Regarding integration, we note that the most impactful recent contributions have been made possible through the integration of recent Machine Learning methods (based in particular on Deep Learning and Recurrent Neural Networks) with more traditional ones (e.g. Monte-Carlo tree search, goal babbling exploration or addressable memory systems). Regarding embodiment, we note that the traditional benchmark tasks (e.g. visual classification or board games) are becoming obsolete as state-of-the-art learning algorithms approach or even surpass human performance in most of them, having recently encouraged the development of first-person 3D game platforms embedding realistic physics. Building upon this analysis, we first propose an embodied cognitive architecture integrating heterogenous sub-fields of Artificial Intelligence into a unified framework. We demonstrate the utility of our approach by showing how major contributions of the field can be expressed within the proposed framework. We then claim that benchmarking environments need to reproduce ecologically-valid conditions for bootstrapping the acquisition of increasingly complex cognitive skills through the concept of a cognitive arms race between embodied agents.
Emotion Detection in Text: a Review In recent years, emotion detection in text has become more popular due to its vast potential applications in marketing, political science, psychology, human-computer interaction, artificial intelligence, etc. Access to a huge amount of textual data, especially opinionated and self-expression text also played a special role to bring attention to this field. In this paper, we review the work that has been done in identifying emotion expressions in text and argue that although many techniques, methodologies, and models have been created to detect emotion in text, there are various reasons that make these methods insufficient. Although, there is an essential need to improve the design and architecture of current systems, factors such as the complexity of human emotions, and the use of implicit and metaphorical language in expressing it, lead us to think that just re-purposing standard methodologies will not be enough to capture these complexities, and it is important to pay attention to the linguistic intricacies of emotion expression.
Emotion in Reinforcement Learning Agents and Robots: A Survey This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent’s decision making architecture, of which RL is an important subclass. Studying emotions in RL-based agents is useful for three research fields. For machine learning (ML) researchers, emotion models may improve learning efficiency. For the interactive ML and human-robot interaction (HRI) community, emotions can communicate state and enhance user investment. Lastly, it allows affective modelling (AM) researchers to investigate their emotion theories in a successful AI agent class. This survey provides background on emotion theory and RL. It systematically addresses 1) from what underlying dimensions (e.g., homeostasis, appraisal) emotions can be derived and how these can be modelled in RL-agents, 2) what types of emotions have been derived from these dimensions, and 3) how these emotions may either influence the learning efficiency of the agent or be useful as social signals. We also systematically compare evaluation criteria, and draw connections to important RL sub-domains like (intrinsic) motivation and model-based RL. In short, this survey provides both a practical overview for engineers wanting to implement emotions in their RL agents, and identifies challenges and directions for future emotion-RL research.
Emotionally-Aware Chatbots: A Survey Textual conversational agent or chatbots’ development gather tremendous traction from both academia and industries in recent years. Nowadays, chatbots are widely used as an agent to communicate with a human in some services such as booking assistant, customer service, and also a personal partner. The biggest challenge in building chatbot is to build a humanizing machine to improve user engagement. Some studies show that emotion is an important aspect to humanize machine, including chatbot. In this paper, we will provide a systematic review of approaches in building an emotionally-aware chatbot (EAC). As far as our knowledge, there is still no work focusing on this area. We propose three research question regarding EAC studies. We start with the history and evolution of EAC, then several approaches to build EAC by previous studies, and some available resources in building EAC. Based on our investigation, we found that in the early development, EAC exploits a simple rule-based approach while now most of EAC use neural-based approach. We also notice that most of EAC contain emotion classifier in their architecture, which utilize several available affective resources. We also predict that the development of EAC will continue to gain more and more attention from scholars, noted by some recent studies propose new datasets for building EAC in various languages.
Empirically Grounded Agent-Based Models of Innovation Diffusion: A Critical Review Innovation diffusion has been studied extensively in a variety of disciplines, including sociology, economics, marketing, ecology, and computer science. Traditional literature on innovation diffusion has been dominated by models of aggregate behavior and trends. However, the agent-based modeling (ABM) paradigm is gaining popularity as it captures agent heterogeneity and enables fine-grained modeling of interactions mediated by social and geographic networks. While most ABM work on innovation diffusion is theoretical, empirically grounded models are increasingly important, particularly in guiding policy decisions. We present a critical review of empirically grounded agent-based models of innovation diffusion, developing a categorization of this research based on types of agent models as well as applications. By connecting the modeling methodologies in the fields of information and innovation diffusion, we suggest that the maximum likelihood estimation framework widely used in the former is a promising paradigm for calibration of agent-based models for innovation diffusion. Although many advances have been made to standardize ABM methodology, we identify four major issues in model calibration and validation, and suggest potential solutions. Finally, we discuss open problems that are critical for the future development of empirically grounded agent-based models of innovation diffusion that enable reliable decision support for stakeholders.
End-to-End Entity Resolution for Big Data: A Survey One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in this survey, we provide for the first time an end-to-end view of modern ER workflows, and of the novel aspects of entity indexing and matching methods in order to cope with more than one of the Big Data characteristics simultaneously. We present the basic concepts, processing steps and execution strategies that have been proposed by different communities, i.e., database, semantic Web and machine learning, in order to cope with the loose structuredness, extreme diversity, high speed and large scale of entity descriptions used by real-world applications. Finally, we provide a synthetic discussion of the existing approaches, and conclude with a detailed presentation of open research directions.
Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief In data mining and data analytics, tools and techniques once confined to research laboratories are being adopted by forward-looking industries to generate business intelligence for improving decision making. Higher education institutions are beginning to use analytics for improving the services they provide and for increasing student grades and retention. The U.S. Department of Education´s National Education Technology Plan, as one part of its model for 21st-century learning powered by technology, envisions ways of using data from online learning systems to improve instruction.
Entering the Era of Data Science: Targeted Learning and the Integration of Statistics and Computational Data Analysis This outlook paper reviews the research of van der Laan´s group on Targeted Learning, a subfield of statistics that is concerned with the construction of data adaptive estimators of user-supplied target parameters of the probability distribution of the data and corresponding confidence intervals, aiming at only relying on realistic statistical assumptions. Targeted Learning fully utilizes the state of the art in machine learning tools, while still preserving the important identity of statistics as a field that is concerned with both accurate estimation of the true target parameter value and assessment of uncertainty in order to make sound statistical conclusions.We also provide a philosophical historical perspective on Targeted Learning, also relating it to the new developments in Big Data. We conclude with some remarks explaining the immediate relevance of Targeted Learning to the current Big Data movement.
Entropy in Quantum Information Theory — Communication and Cryptography In this Thesis, several results in quantum information theory are collected, most of which use entropy as the main mathematical tool. *While a direct generalization of the Shannon entropy to density matrices, the von Neumann entropy behaves differently. A long-standing open question is, whether there are quantum analogues of unconstrained non-Shannon type inequalities. Here, a new constrained non-von-Neumann type inequality is proven, a step towards a conjectured unconstrained inequality by Linden and Winter. *IID quantum state merging can be optimally achieved using the decoupling technique. The one-shot results by Berta et al. and Anshu at al., however, had to bring in additional mathematical machinery. We introduce a natural generalized decoupling paradigm, catalytic decoupling, that can reproduce the aforementioned results when used analogously to the application of standard decoupling in the asymptotic case. *Port based teleportation, a variant of standard quantum teleportation protocol, cannot be implemented perfectly. We prove several lower bounds on the necessary number of output ports N to achieve port based teleportation for given error and input dimension, showing that N diverges uniformly in the dimension of the teleported quantum system, for vanishing error. As a byproduct, a new lower bound for the size of the program register for an approximate universal programmable quantum processor is derived. *In the last part, we give a new definition for information-theoretic quantum non-malleability, strengthening the previous definition by Ambainis et al. We show that quantum non-malleability implies secrecy, analogous to quantum authentication. Furthermore, non-malleable encryption schemes can be used as a primitive to build authenticating encryption schemes. We also show that the strong notion of authentication recently proposed by Garg et al. can be fulfilled using 2-designs.
Error Statistics Error statistics, as we are using that term, has a dual dimension involving philosophy and methodology. It refers to a standpoint regarding both: 1. a cluster of statistical tools, their interpretation and justification, 2. a general philosophy of science, and the roles probability plays in inductive inference.
Evaluating a classification model – What does precision and recall tell me Once you have created a predictive model, you always need to find out is how good it is. RDS helps you with this by reporting the performance of the model. All measures reported by RDS are estimated from data not used for creating the model.
Evaluating Machine Learning Models This report on evaluating machine learning models arose out of a sense of need. The content was first published as a series of six technical posts on the Dato Machine Learning Blog. I was the editor of the blog, and I needed something to publish for the next day. Dato builds machine learning tools that help users build intelligent data products. In our conversations with the community, we sometimes ran into a confusion in terminology. For example, people would ask for cross-validation as a feature, when what they really meant was hyperparameter tuning, a feature we already had. So I thought, ‘Aha! I´ll just quickly explain what these concepts mean and point folks to the relevant sections in the user guide.’
Evaluating the Design of the R Language R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features.
Evaluation of Multidisciplinary Effects of Artificial Intelligence with Optimization Perspective Artificial Intelligence has an important place in the scientific community as a result of its successful outputs in terms of different fields. In time, the field of Artificial Intelligence has been divided into many sub-fields because of increasing number of different solution approaches, methods, and techniques. Machine Learning has the most remarkable role with its functions to learn from samples from the environment. On the other hand, intelligent optimization done by inspiring from nature and swarms had its own unique scientific literature, with effective solutions provided for optimization problems from different fields. Because intelligent optimization can be applied in different fields effectively, this study aims to provide a general discussion on multidisciplinary effects of Artificial Intelligence by considering its optimization oriented solutions. The study briefly focuses on background of the intelligent optimization briefly and then gives application examples of intelligent optimization from a multidisciplinary perspective.
Evaluation of Session-based Recommendation Algorithms Recommender systems help users find relevant items of interest, for example on e-commerce or media streaming sites. Most academic research is concerned with approaches that personalize the recommendations according to long-term user profiles. In many real-world applications, however, such long-term profiles often do not exist and recommendations therefore have to be made solely based on the observed behavior of a user during an ongoing session. Given the high practical relevance of the problem, an increased interest in this problem can be observed in recent years, leading to a number of proposals for session-based recommendation algorithms that typically aim to predict the user’s immediate next actions. In this work, we present the results of an in-depth performance comparison of a number of such algorithms, using a variety of datasets and evaluation measures. Our comparison includes the most recent approaches based on recurrent neural networks like GRU4REC, factorized Markov model approaches such as FISM or Fossil, as well as more simple methods based, e.g., on nearest neighbor schemes. Our experiments reveal that algorithms of this latter class, despite their sometimes almost trivial nature, often perform equally well or significantly better than today’s more complex approaches based on deep neural networks. Our results therefore suggest that there is substantial room for improvement regarding the development of more sophisticated session-based recommendation algorithms.
Evaluation-as-a-Service: Overview and Outlook Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfield paradigm of creating shared test collections, defining search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not fit this paradigm very well: extremely large data sets, confidential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Also, crowdsourcing has changed the way that industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the field of machine learning. This white paper is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM) or other possibilities to ship executables. The objective of this white paper are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly towards sustainable research infrastructures. This white paper summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are overviewed, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions.
EvaluationNet: Can Human Skill be Evaluated by Deep Networks With the recent substantial growth of media such as YouTube, a considerable number of instructional videos covering a wide variety of tasks are available online. Therefore, online instructional videos have become a rich resource for humans to learn everyday skills. In order to improve the effectiveness of the learning with instructional video, observation and evaluation of the activity are required. However, it is difficult to observe and evaluate every activity steps by expert. In this study, a novel deep learning framework which targets human activity evaluation for learning from instructional video has been proposed. In order to deal with the inherent variability of activities, we propose to model activity as a structured process. First, action units are encoded from dense trajectories with LSTM network. The variable-length action unit features are then evaluated by a Siamese LSTM network. By the comparative experiments on public dataset, the effectiveness of the proposed method has been demonstrated.
Event History Analysis Event history analysis deals with data obtained by observing individuals over time, focusing on events occurring for the individuals under observation. Important applications are to life events of humans in demography, life insurance mathematics, epidemiology, and sociology. The basic data are the times of occurrence of the events and the types of events that occur. The standard approach to the analysis of such data is to use multistate models; a basic example is finite-state Markov processes in continuous time. Censoring and truncation are defining features of the area. This review comments specifically on three areas that are current subjects of active development, all motivated by demands from applications: sampling patterns, the possibility of causal interpretation of the analyses, and the levels and interpretation of variability.
Event-based Vision: A Survey Event cameras are bio-inspired sensors that work radically different from traditional cameras. Instead of capturing images at a fixed rate, they measure per-pixel brightness changes asynchronously. This results in a stream of events, which encode the time, location and sign of the brightness changes. Event cameras posses outstanding properties compared to traditional cameras: very high dynamic range (140 dB vs. 60 dB), high temporal resolution (in the order of microseconds), low power consumption, and do not suffer from motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as high speed and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
Evolution of Ant Colony Optimization Algorithm — A Brief Literature Review Ant Colony Optimization (ACO) is a metaheuristic proposed by Marco Dorigo in 1991 based on behavior of biological ants. Pheromone laying and selection of shortest route with the help of pheromone inspired development of first ACO algorithm. Since, presentation of first such algorithm, many researchers have worked and published their research in this field. Though initial results were not so promising but recent developments have made this metaheuristic a significant algorithm in Swarm Intelligence. This research presents a brief overview of recent developments carried out in ACO algorithms in terms of both applications and algorithmic developments. For application developments, multi-objective optimization, continuous optimization and time-varying NP-hard problems have been presented. While to review articles based on algorithmic development, hybridization and parallel architectures have been investigated.
Evolutionary Algorithms Evolutionary algorithms (EAs) are population-based metaheuristics, originally inspired by aspects of natural evolution. Modern varieties incorporate a broad mixture of search mechanisms, and tend to blend inspiration from nature with pragmatic engineering concerns; however, all EAs essentially operate by maintaining a population of potential solutions and in some way artificially ‘evolving’ that population over time. Particularly well-known categories of EAs include genetic algorithms (GAs), Genetic Programming (GP), and Evolution Strategies (ES). EAs have proven very successful in practical applications, particularly those requiring solutions to combinatorial problems. EAs are highly flexible and can be configured to address any optimization task, without the requirements for reformulation and/or simplification that would be needed for other techniques. However, this flexibility goes hand in hand with a cost: the tailoring of an EA’s configuration and parameters, so as to provide robust performance for a given class of tasks, is often a complex and time-consuming process. This tailoring process is one of the many ongoing research areas associated with EAs.
Evolutionary Computation, Optimization and Learning Algorithms for Data Science A large number of engineering, science and computational problems have yet to be solved in a computationally efficient way. One of the emerging challenges is how evolving technologies grow towards autonomy and intelligent decision making. This leads to collection of large amounts of data from various sensing and measurement technologies, e.g., cameras, smart phones, health sensors, smart electricity meters, and environment sensors. Hence, it is imperative to develop efficient algorithms for generation, analysis, classification, and illustration of data. Meanwhile, data is structured purposefully through different representations, such as large-scale networks and graphs. We focus on data science as a crucial area, specifically focusing on a curse of dimensionality (CoD) which is due to the large amount of generated/sensed/collected data. This motivates researchers to think about optimization and to apply nature-inspired algorithms, such as evolutionary algorithms (EAs) to solve optimization problems. Although these algorithms look un-deterministic, they are robust enough to reach an optimal solution. Researchers do not adopt evolutionary algorithms unless they face a problem which is suffering from placement in local optimal solution, rather than global optimal solution. In this chapter, we first develop a clear and formal definition of the CoD problem, next we focus on feature extraction techniques and categories, then we provide a general overview of meta-heuristic algorithms, its terminology, and desirable properties of evolutionary algorithms.
Evolutionary Multimodal Optimization: A Short Survey Real world problems always have different multiple solutions. For instance, optical engineers need to tune the recording parameters to get as many optimal solutions as possible for multiple trials in the varied-line-spacing holographic grating design problem. Unfortunately, most traditional optimization techniques focus on solving for a single optimal solution. They need to be applied several times; yet all solutions are not guaranteed to be found. Thus the multimodal optimization problem was proposed. In that problem, we are interested in not only a single optimal point, but also the others. With strong parallel search capability, evolutionary algorithms are shown to be particularly effective in solving this type of problem. In particular, the evolutionary algorithms for multimodal optimization usually not only locate multiple optima in a single run, but also preserve their population diversity throughout a run, resulting in their global optimization ability on multimodal functions. In addition, the techniques for multimodal optimization are borrowed as diversity maintenance techniques to other problems. In this chapter, we describe and review the state-of-the-arts evolutionary algorithms for multimodal optimization in terms of methodology, benchmarking, and application.
Evolving Academia/Industry Relations in Computing Research: Interim Report In 2015, the CCC co-sponsored an industry round table that produced the document ‘The Future of Computing Research: Industry-Academic Collaborations’. Since then, several important trends in computing research have emerged, and this document considers how those trends impact the interaction between academia and industry in computing fields. We reach the following conclusions: – In certain computing disciplines, such as currently artificial intelligence, we observe significant increases in the level of interaction between professors and companies, which take the form of extended joint appointments. – Increasingly, companies are highly motivated to engage both professors and graduate students working in specific technical areas because companies view computing research and technical talent as a core aspect of their business success. – This increasing connection between faculty, students, and companies has the potential to change (either positively or negatively) numerous things, including: the academic culture in computing research universities, the research topics that faculty and students pursue, the ability of universities to train undergraduate and graduate students, how companies and universities cooperate, share, and interact – This report outlines areas of further investigation to help direct the computing research community to achieve the maximum benefits from these changes and minimize the negative outcomes. – We actively seek input from all the constituencies that can influence and are affected by these changes including industry, universities, and government agencies and in the roles of students, professors, industry representatives, and administrators.
Examining the Use of Neural Networks for Feature Extraction: A Comparative Analysis using Deep Learning, Support Vector Machines, and K-Nearest Neighbor Classifiers Neural networks in many varieties are touted as very powerful machine learning tools because of their ability to distill large amounts of information from different forms of data, extracting complex features and enabling powerful classification abilities. In this study, we use neural networks to extract features from both images and numeric data and use these extracted features as inputs for other machine learning models, namely support vector machines (SVMs) and k-nearest neighbor classifiers (KNNs), in order to see if neural-network-extracted features enhance the capabilities of these models. We tested 7 different neural network architectures in this manner, 4 for images and 3 for numeric data, training each for varying lengths of time and then comparing the results of the neural network independently to those of an SVM and KNN on the data, and finally comparing these results to models of SVM and KNN trained using features extracted via the neural network architecture. This process was repeated on 3 different image datasets and 2 different numeric datasets. The results show that, in many cases, the features extracted using the neural network significantly improve the capabilities of SVMs and KNNs compared to running these algorithms on the raw features, and in some cases also surpass the performance of the neural network alone. This in turn suggests that it may be a reasonable practice to use neural networks as a means to extract features for classification by other machine learning models for some datasets.
Expectation propagation: a probabilistic view of Deep Feed Forward Networks We present a statistical mechanics model of deep feed forward neural networks (FFN). Our energy-based approach naturally explains several known results and heuristics, providing a solid theoretical framework and new instruments for a systematic development of FFN. We infer that FFN can be understood as performing three basic steps: encoding, representation validation and propagation. We obtain a set of natural activations — such as sigmoid, $\tanh$ and ReLu — together with a state-of-the-art one, recently obtained by Ramachandran et al.(arXiv:1710.05941) using an extensive search algorithm. We term this activation ESP (Expected Signal Propagation), explain its probabilistic meaning, and study the eigenvalue spectrum of the associated Hessian on classification tasks. We find that ESP allows for faster training and more consistent performances over a wide range of network architectures.
Experiencing SAX: a Novel Symbolic Representation of Time Series Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models etc. Many researchers have also considered symbolic representations of time series, noting that such representations would potentiality allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities. While many symbolic representations of time series have been introduced over the past decades, they all suffer from two fatal flaws. Firstly, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Secondly, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series. In this work we formulate a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measures defined on the original series. As we shall demonstrate, this latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation, while producing identical results to the algorithms that operate on the original data. In particular, we will demonstrate the utility of our representation on various data mining tasks of clustering, classification, query by content, anomaly detection, motif discovery, and visualization.
Experimental Analysis of Design Elements of Scalarizing Functions-based Multiobjective Evolutionary Algorithms In this paper we systematically study the importance, i.e., the influence on performance, of the main design elements that differentiate scalarizing functions-based multiobjective evolutionary algorithms (MOEAs). This class of MOEAs includes Multiobjecitve Genetic Local Search (MOGLS) and Multiobjective Evolutionary Algorithm Based on Decomposition (MOEA/D) and proved to be very successful in multiple computational experiments and practical applications. The two algorithms share the same common structure and differ only in two main aspects. Using three different multiobjective combinatorial optimization problems, i.e., the multiobjective symmetric traveling salesperson problem, the traveling salesperson problem with profits, and the multiobjective set covering problem, we show that the main differentiating design element is the mechanism for parent selection, while the selection of weight vectors, either random or uniformly distributed, is practically negligible if the number of uniform weight vectors is sufficiently large.
Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g., in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks.
Explainable Deterministic MDPs We present a method for a certain class of Markov Decision Processes (MDPs) that can relate the optimal policy back to one or more reward sources in the environment. For a given initial state, without fully computing the value function, q-value function, or the optimal policy the algorithm can determine which rewards will and will not be collected, whether a given reward will be collected only once or continuously, and which local maximum within the value function the initial state will ultimately lead to. We demonstrate that the method can be used to map the state space to identify regions that are dominated by one reward source and can fully analyze the state space to explain all actions. We provide a mathematical framework to show how all of this is possible without first computing the optimal policy or value function.
Explainable Machine Learning for Scientific Insights and Discoveries Machine learning methods have been remarkably successful for a wide range of application areas in the extraction of essential information from data. An exciting and relatively recent development is the uptake of machine learning in the natural sciences, where the major goal is to obtain novel scientific insights and discoveries from observational or simulated data. A prerequisite for obtaining a scientific outcome is domain knowledge, which is needed to gain explainability, but also to enhance scientific consistency. In this article we review explainable machine learning in view of applications in the natural sciences and discuss three core elements which we identified as relevant in this context: transparency, interpretability, and explainability. With respect to these core elements, we provide a survey of recent scientific works incorporating machine learning, and in particular to the way that explainable machine learning is used in their respective application areas.
Explainable Recommendation: A Survey and New Perspectives Explainable Recommendation refers to the personalized recommendation algorithms that address the problem of why — they not only provide the user with the recommendations, but also make the user aware why such items are recommended by generating recommendation explanations, which help to improve the effectiveness, efficiency, persuasiveness, and user satisfaction of recommender systems. In recent years, a large number of explainable recommendation approaches — especially model-based explainable recommendation algorithms — have been proposed and adopted in real-world systems. In this survey, we review the work on explainable recommendation that has been published in or before the year of 2018. We first high-light the position of explainable recommendation in recommender system research by categorizing recommendation problems into the 5W, i.e., what, when, who, where, and why. We then conduct a comprehensive survey of explainable recommendation itself in terms of three aspects: 1) We provide a chronological research line of explanations in recommender systems, including the user study approaches in the early years, as well as the more recent model-based approaches. 2) We provide a taxonomy for explainable recommendation algorithms, including user-based, item-based, model-based, and post-model explanations. 3) We summarize the application of explainable recommendation in different recommendation tasks, including product recommendation, social recommendation, POI recommendation, etc. We devote a chapter to discuss the explanation perspectives in the broader IR and machine learning settings, as well as their relationship with explainable recommendation research. We end the survey by discussing potential future research directions to promote the explainable recommendation research area.
Explainable Recommendation: Theory and Applications Although personalized recommendation has been investigated for decades, the wide adoption of Latent Factor Models (LFM) has made the explainability of recommendations a critical issue to both the research community and practical application of recommender systems. For example, in many practical systems the algorithm just provides a personalized item recommendation list to the users, without persuasive personalized explanation about why such an item is recommended while another is not. Unexplainable recommendations introduce negative effects to the trustworthiness of recommender systems, and thus affect the effectiveness of recommendation engines. In this work, we investigate explainable recommendation in aspects of data explainability, model explainability, and result explainability, and the main contributions are as follows: 1. Data Explainability: We propose Localized Matrix Factorization (LMF) framework based Bordered Block Diagonal Form (BBDF) matrices, and further applied this technique for parallelized matrix factorization. 2. Model Explainability: We propose Explicit Factor Models (EFM) based on phrase-level sentiment analysis, as well as dynamic user preference modeling based on time series analysis. In this work, we extract product features and user opinions towards different features from large-scale user textual reviews based on phrase-level sentiment analysis techniques, and introduce the EFM approach for explainable model learning and recommendation. 3. Economic Explainability: We propose the Total Surplus Maximization (TSM) framework for personalized recommendation, as well as the model specification in different types of online applications. Based on basic economic concepts, we provide the definitions of utility, cost, and surplus in the application scenario of Web services, and propose the general framework of web total surplus calculation and maximization.
Explaining Reinforcement Learning to Mere Mortals: An Empirical Study We present a user study to investigate the impact of explanations on non-experts’ understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare participants’ mental models of an RL agent in a simple Real-Time Strategy (RTS) game. Our results show that the combination of both saliency and reward bars were needed to achieve a statistically significant improvement in mental model score over the control. In addition, our qualitative analysis of the data reveals a number of effects for further study.
Explanation in Artificial Intelligence: Insights from the Social Sciences There has been a recent resurgence in the area of explainable artificial intelligence as researchers and practitioners seek to provide more transparency to their algorithms. Much of this research is focused on explicitly explaining decisions or actions to a human observer, and it should not be controversial to say that, if these techniques are to succeed, the explanations they generate should have a structure that humans accept. However, it is fair to say that most work in explainable artificial intelligence uses only the researchers’ intuition of what constitutes a `good’ explanation. There exists vast and valuable bodies of research in philosophy, psychology, and cognitive science of how people define, generate, select, evaluate, and present explanations. This paper argues that the field of explainable artificial intelligence should build on this existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics. It draws out some important findings, and discusses ways that these can be infused with work on explainable artificial intelligence.
Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI This is an integrative review that address the question, ‘What makes for a good explanation?’ with reference to AI systems. Pertinent literatures are vast. Thus, this review is necessarily selective. That said, most of the key concepts and issues are expressed in this Report. The Report encapsulates the history of computer science efforts to create systems that explain and instruct (intelligent tutoring systems and expert systems). The Report expresses the explainability issues and challenges in modern AI, and presents capsule views of the leading psychological theories of explanation. Certain articles stand out by virtue of their particular relevance to XAI, and their methods, results, and key points are highlighted. It is recommended that AI/XAI researchers be encouraged to include in their research reports fuller details on their empirical or experimental methods, in the fashion of experimental psychology research reports: details on Participants, Instructions, Procedures, Tasks, Dependent Variables (operational definitions of the measures and metrics), Independent Variables (conditions), and Control Conditions.
Exploiting Innovative Technologies in BI and Big Data Analytics Data is worth nothing without the right technologies to facilitate its transformation into meaningful information – delivered to the right people in a timely manner – for improved decision making. Forrester recently conducted a survey with 330 global business intelligence (BI) decision makers and found strong correlations between overall company success and adoption of innovative BI, analytics, and Big Data tools. Is your company harnessing the latest innovations in BI technology to achieve your business goals
Exponential scaling of neural algorithms – a future beyond Moore’s Law Although the brain has long been considered a potential inspiration for future computing, Moore’s Law – the scaling property that has seen revolutions in technologies ranging from supercomputers to smart phones – has largely been driven by advances in materials science. As the ability to miniaturize transistors is coming to an end, there is increasing attention on new approaches to computation, including renewed enthusiasm around the potential of neural computation. Recent advances in neurotechnologies, many of which have been aided by computing’s rapid progression over recent decades, are now reigniting this opportunity to bring neural computation insights into broader computing applications. As we understand more about the brain, our ability to motivate new computing paradigms with continue to progress. These new approaches to computing, which we are already seeing in techniques such as deep learning, will themselves improve our ability to learn about the brain and accordingly can be projected to give rise to even further insights. Such a positive feedback has the potential to change the complexion of how computing sciences and neurosciences interact, and suggests that the next form of exponential scaling in computing may emerge from our progressive understanding of the brain.
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes algorithms to allow for clustering objects described by mixed numeric and categorical attributes.
Extraction of food consumption systems by non-negative matrix factorization (NMF) for the assessment of food choices. In Western countries where food supply is satisfactory, consumers organize their diets around a large combination of foods. It is the purpose of this paper to examine how recent nonnegative matrix fac- torization (NMF) techniques can be applied to food consumption data in order to understand these combinations. Such data are nonnegative by nature and of high dimension. The NMF model provides a representation of consumption data through latent vectors with nonnegative coe cients, we call consumption systems, in a small number. As the NMF approach may encourage sparsity of the data representation produced, the resulting consumption systems are easily interpretable. Beyond the illustration of its properties we provide through a simple simulation result, the NMF method is applied to data issued from a french consumption survey. The numerical results thus obtained are displayed and thoroughly discussed. A clustering based on the k-means method is also achieved in the resulting latent consumption space, in order to recover food consumption patterns easily usable for nutritionists.
Extremal Quantile Regression: An Overview This chapter provides an overview of extremal quantile regression. It is forthcoming in the Handbook of Quantile Regression.
Eyes Wide Open: Open Source Analytics 1. The total cost of owning and managing analytics technology consists of hardware (price per CPU, price per unit of storage), software (price per unit/license) and human capital (price per output) costs. Human capital costs are divided between line of business (LOB) users and IT support costs. 2. Transformational advances in data storage and compute power over the past 20 years have driven hardware costs so low that adoption is nearly universal. At the same time, managing these systems has become easier, resulting in lower human capital expense in the form training time (LOB users) and maintenance and management costs (IT). Resilient and reliable storage and compute power is now a commodity. 3. Open source storage (Apache Hadoop) and operating system (Linux) options have proliferated over the past 3+ years leading many firms to reliably experiment with low/no cost open source options to supplement or replace licensed commercial solutions. 4. In contrast, firms venturing down the open source analytics software path are not always seeing the expected cost reductions due to higher human capital expenses and increased risk that introduced into the enterprise through open source software. 5. IIA recommends firms take a blended approach to software selection, matching the correct tool to analytics user type/role, and that firms recalculate total costs, specifically incorporating potential risks associated with open source tools, particularly in mission critical applications.

F

Face Recognition Techniques: A Survey Nowadays research has expanded to extracting auxiliary information from various biometric techniques like fingerprints, face, iris, palm and voice . This information contains some major features like gender, age, beard, mustache, scars, height, hair, skin color, glasses, weight, facial marks and tattoos. All this information contributes strongly to identification of human. The major challenges that come across face recognition are to find age and gender of the person. This paper contributes a survey of various face recognition techniques for finding the age and gender. The existing techniques are discussed based on their performances. This paper also provides future directions for further research.
Factorization tricks for LSTM networks We present two simple ways of reducing the number of parameters and accelerating the training of large Long Short-Term Memory (LSTM) networks: the first one is ‘matrix factorization by design’ of LSTM matrix into the product of two smaller matrices, and the second one is partitioning of LSTM matrix, its inputs and states into the independent groups. Both approaches allow us to train large LSTM networks significantly faster to the state-of the art perplexity. On the One Billion Word Benchmark we improve single model perplexity down to 24.29.
Failures of Deep Learning In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art. However, it is important, for both theoreticians and practitioners, to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms. We describe four families of problems for which some of the commonly used existing algorithms fail or suffer significant difficulty. We illustrate the failures through practical experiments, and provide theoretical insights explaining their source, and how they might be remedied.
Fairness in Deep Learning: A Computational Perspective Deep learning is increasingly being used in high-stake decision making applications that affect individual lives. However, deep learning models might exhibit algorithmic discrimination behaviors with respect to protected groups, potentially posing negative impacts on individuals and society. Therefore, fairness in deep learning has attracted tremendous attention recently. We provide a comprehensive review covering existing techniques to tackle algorithmic fairness problems from the computational perspective. Specifically, we show that interpretability can serve as a useful ingredient, which could be augmented into the biases detection and mitigation pipelines. We also discuss open research problems and future research directions, aiming to push forward the area of fairness in deep learning and build genuinely fair, accountable, and transparent deep learning systems.
Fairness in Supervised Learning: An Information Theoretic Approach Automated decision making systems are increasingly being used in real-world applications. In these systems for the most part, the decision rules are derived by minimizing the training error on the available historical data. Therefore, if there is a bias related to a sensitive attribute such as gender, race, religion, etc. in the data, say, due to cultural/historical discriminatory practices against a certain demographic, the system could continue discrimination in decisions by including the said bias in its decision rule. We present an information theoretic framework for designing fair predictors from data, which aim to prevent discrimination against a specified sensitive attribute in a supervised learning setting. We use equalized odds as the criterion for discrimination, which demands that the prediction should be independent of the protected attribute conditioned on the actual label. To ensure fairness and generalization simultaneously, we compress the data to an auxiliary variable, which is used for the prediction task. This auxiliary variable is chosen such that it is decontaminated from the discriminatory attribute in the sense of equalized odds. The final predictor is obtained by applying a Bayesian decision rule to the auxiliary variable.
Fairness-aware machine learning: a perspective Algorithms learned from data are increasingly used for deciding many aspects in our life: from movies we see, to prices we pay, or medicine we get. Yet there is growing evidence that decision making by inappropriately trained algorithms may unintentionally discriminate people. For example, in automated matching of candidate CVs with job descriptions, algorithms may capture and propagate ethnicity related biases. Several repairs for selected algorithms have already been proposed, but the underlying mechanisms how such discrimination happens from the computational perspective are not yet scientifically understood. We need to develop theoretical understanding how algorithms may become discriminatory, and establish fundamental machine learning principles for prevention. We need to analyze machine learning process as a whole to systematically explain the roots of discrimination occurrence, which will allow to devise global machine learning optimization criteria for guaranteed prevention, as opposed to pushing empirical constraints into existing algorithms case-by-case. As a result, the state-of-the-art will advance from heuristic repairing, to proactive and theoretically supported prevention. This is needed not only because law requires to protect vulnerable people. Penetration of big data initiatives will only increase, and computer science needs to provide solid explanations and accountability to the public, before public concerns lead to unnecessarily restrictive regulations against machine learning.
Fake News Detection on Social Media: A Data Mining Perspective Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of ‘fake news’, i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users’ social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.
False Information on Web and Social Media: A Survey False information can be created and spread easily through the web and social media platforms, resulting in widespread real-world impact. Characterizing how false information proliferates on social platforms and why it succeeds in deceiving readers are critical to develop efficient detection algorithms and tools for early detection. A recent surge of research in this area has aimed to address the key issues using methods based on feature engineering, graph mining, and information modeling. Majority of the research has primarily focused on two broad categories of false information: opinion-based (e.g., fake reviews), and fact-based (e.g., false news and hoaxes). Therefore, in this work, we present a comprehensive survey spanning diverse aspects of false information, namely (i) the actors involved in spreading false information, (ii) rationale behind successfully deceiving readers, (iii) quantifying the impact of false information, (iv) measuring its characteristics across different dimensions, and finally, (iv) algorithms developed to detect false information. In doing so, we create a unified framework to describe these recent methods and highlight a number of important directions for future research.
Fast unfolding of communities in large networks We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks.
Feature Dimensionality Reduction for Video Affect Classification: A Comparative Study Affective computing has become a very important research area in human-machine interaction. However, affects are subjective, subtle, and uncertain. So, it is very difficult to obtain a large number of labeled training samples, compared with the number of possible features we could extract. Thus, dimensionality reduction is critical in affective computing. This paper presents our preliminary study on dimensionality reduction for affect classification. Five popular dimensionality reduction approaches are introduced and compared. Experiments on the DEAP dataset showed that no approach can universally outperform others, and performing classification using the raw features directly may not always be a bad choice.
Feature Engineering Tips for Data Scientists and Business Analysts Most data scientists and statisticians agree that predictive modeling is both art and science yet, relatively little to no air time is given to describing the art. This post describes one piece of the art of modeling called feature engineering which expands the number of variables you have to build a model. I offer six ways to implement feature engineering and provide examples of each. Using methods like these is important because additional relevant variables increase model accuracy, which makes feature engineering an essential part of the modeling process.
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review Pattern analysis often requires a pre-processing stage for extracting or selecting features in order to help the classification, prediction, or clustering stage discriminate or represent the data in a better way. The reason for this requirement is that the raw data are complex and difficult to process without extracting or selecting appropriate features beforehand. This paper reviews theory and motivation of different common methods of feature selection and extraction and introduces some of their applications. Some numerical implementations are also shown for these methods. Finally, the methods in feature selection and extraction are compared.
Federated Learning: Challenges, Methods, and Future Directions Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving data analysis. In this article, we discuss the unique characteristics and challenges of federated learning, provide a broad overview of current approaches, and outline several directions of future work that are relevant to a wide range of research communities.
Few-shot Learning: A Survey The quest of `can machines think’ and `can machines do what human do’ are quests that drive the development of artificial intelligence. Although recent artificial intelligence succeeds in many data intensive applications, it still lacks the ability of learning from limited exemplars and fast generalizing to new tasks. To tackle this problem, one has to turn to machine learning, which supports the scientific study of artificial intelligence. Particularly, a machine learning problem called Few-Shot Learning (FSL) targets at this case. It can rapidly generalize to new tasks of limited supervised experience by turning to prior knowledge, which mimics human’s ability to acquire knowledge from few examples through generalization and analogy. It has been seen as a test-bed for real artificial intelligence, a way to reduce laborious data gathering and computationally costly training, and antidote for rare cases learning. With extensive works on FSL emerging, we give a comprehensive survey for it. We first give the formal definition for FSL. Then we point out the core issues of FSL, which turns the problem from ‘how to solve FSL’ to ‘how to deal with the core issues’. Accordingly, existing works from the birth of FSL to the most recent published ones are categorized in a unified taxonomy, with thorough discussion of the pros and cons for different categories. Finally, we envision possible future directions for FSL in terms of problem setup, techniques, applications and theory, hoping to provide insights to both beginners and experienced researchers.
Financial Series Prediction: Comparison Between Precision of Time Series Models and Machine Learning Methods Precise financial series predicting has long been a difficult problem because of unstableness and many noises within the series. Although Traditional time series models like ARIMA and GARCH have been researched and proved to be effective in predicting, their performances are still far from satisfying. Machine Learning, as an emerging research field in recent years, has brought about many incredible improvements in tasks such as regressing and classifying, and it’s also promising to exploit the methodology in financial time series predicting. In this paper, the predicting precision of financial time series between traditional time series models and mainstream machine learning models including some state-of-the-art ones of deep learning are compared through experiment using real stock index data from history. The result shows that machine learning as a modern method far surpasses traditional models in precision.
Finite Mixture Models and Model-Based Clustering Finite mixture models have a long history in statistics, having been used to model pupulation heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classification. This paper provides a detailed review into mixture models and model-based clustering. Recent trends in the area, as well as open problems are also discussed.
Firefly Algorithm for optimization problems with non-continuous variables: A Review and Analysis Firefly algorithm is a swarm based metaheuristic algorithm inspired by the flashing behavior of fireflies. It is an effective and an easy to implement algorithm. It has been tested on different problems from different disciplines and found to be effective. Even though the algorithm is proposed for optimization problems with continuous variables, it has been modified and used for problems with non-continuous variables, including binary and integer valued problems. In this paper a detailed review of this modifications of firefly algorithm for problems with non-continuous variables will be discussed. The strength and weakness of the modifications along with possible future works will be presented.
Fisher and Kernel Fisher Discriminant Analysis: Tutorial This is a detailed tutorial paper which explains the Fisher discriminant Analysis (FDA) and kernel FDA. We start with projection and reconstruction. Then, one- and multi-dimensional FDA subspaces are covered. Scatters in two- and then multi-classes are explained in FDA. Then, we discuss on the rank of the scatters and the dimensionality of the subspace. A real-life example is also provided for interpreting FDA. Then, possible singularity of the scatter is discussed to introduce robust FDA. PCA and FDA directions are also compared. We also prove that FDA and linear discriminant analysis are equivalent. Fisher forest is also introduced as an ensemble of fisher subspaces useful for handling data with different features and dimensionality. Afterwards, kernel FDA is explained for both one- and multi-dimensional subspaces with both two- and multi-classes. Finally, some simulations are performed on AT&T face dataset to illustrate FDA and compare it with PCA.
Five big data challenges And how to overcome them with visual analytics Big data is set to offer companies tremendous insight. But with terabytes and petabytes of data pouring in to organizations today, traditional architectures and infrastructures are not up to the challenge. IT teams are burdened with ever-growing requests for data, ad hoc analyses and one-off reports. Decision makers become frustrated because it takes hours or days to get answers to questions, if at all. More users are expecting self-service access to information in a form they can easily understand and share with others. This begs the question: How do you present big data in a way that business leaders can quickly understand and use This is not a minor consideration. Mining millions of rows of data creates a big headache for analysts tasked with sorting and presenting data. Organizations often approach the problem in one of two ways: Build ‘samples’ so that it is easier to both analyze and present the data, or create template charts and graphs that can accept certain types of information. Both approaches miss the potential for big data. Instead, consider pairing big data with visual analytics so that you use all the data and receive automated help in selecting the best ways to present the data. This frees staff to deploy insights from data. Think of your data as a great, but messy, story. Visual analytics is the master filmmaker and the gifted editor who bring the story to life.
Five pillars of prescriptive analytics success As the Big Data Analytics space continues to evolve, one of the breakthrough technologies that businesses will be talking about in the coming years is prescriptive analytics. The promise of prescriptive analytics is certainly alluring: it enables decision-makers to not only look into the future of their mission critical processes and see the opportunities (and issues) that are potentially out there, but it also presents the best course of action to take advantage of that foresight in a timely manner. What should we look for in a prescriptive analytics solution to ensure it will deliver business value today and tomorrow
Five Ways to Empower Business Analysts and Succeed in Your Self-Service BI Program The term self-service is ubiquitous in today´s business intelligence (BI) market. BI vendors and organizations alike constantly work to expand BI´s use and value proposition within the organization by making it more accessible to a wider variety of people. This push has created a series of BI offerings that are easy to interact and design with without the help of IT departments. However, there are many types of BI users that are still underserved. Primary among them are business analysts that could make self-service BI more successful if they are empowered with higher levels of interactivity and the capabilities to design their own BI applications. Traditionally, business analysts have a broader understanding of data relationships and know how to develop their own analytics. They interact with spreadsheets, develop their own SQL scripts, and create their own databases. In many cases, business analysts are power users because they are tasked with taking ownership over data due to their level of expertise. The business analyst is the person who understands the intricacies of data and how it interrelates within the organization´s ecosystem, represents the link between their departments and IT, and develops analytical insights based on their subject matter expertise. Because of this skill set, these users are tasked with developing their own complex set of analytics. They also create BI models and reports that will be consumed by employees across the organization. Luckily some self-service BI offerings support this two-tiered approach, providing business analysts with access to the components required to do so successfully. The importance of the power user/business analyst role cannot be overlooked as creating a successful self-service BI strategy requires consumption of analytics, design that supports this consumption, and business control to ensure that the right information is delivered to the right people and that the right business rules are applied. This represents the intersection of business and IT roles and represents the value of the power user. Now, more than ever, it is important to take advantage of these skill sets to drive self-service BI. The reality is that many BI projects fail when not controlled by the business. Lack of proper requirements gathering and the inability to meet the needs of users creates a lack of adoption. Also, information is becoming more varied and complex. Simple tools such as Excel and Access no longer handle the complexities and increasing volumes inherent in big data or maintain the validity of analytic models. Between this and the increasing demand for in-house programmers and software developers, organizations need to have business analysts that understand the needs of the business, can perform robust analytics, and provide consumable applications for a wide variety of business users to support more technical roles required. This paper looks at five key enablers of self-service BI for business analysts. These are: 1. Building a collaborative relationship between the business analyst and IT 2. Design flexibility 3. Cohesion between technology, people, and business processes 4. Data diversity and preparation 5. Data quality
Fog Computing Architecture: Survey and Challenges Emerging technologies that generate a huge amount of data such as the Internet of Things (IoT) services need latency aware computing platforms to support time-critical applications. Due to the on-demand services and scalability features of cloud computing, Big Data application processing is done in the cloud infrastructure. Managing Big Data applications exclusively in the cloud is not an efficient solution for latency-sensitive applications related to smart transportation systems, healthcare solutions, emergency response systems and content delivery applications. Thus, the Fog computing paradigm that allows applications to perform computing operations in-between the cloud and the end devices has emerged. In Fog architecture, IoT devices and sensors are connected to the Fog devices which are located in close proximity to the users and it is also responsible for intermediate computation and storage. Most computations will be done on the edge by eliminating full dependencies on the cloud resources. In this chapter, we investigate and survey Fog computing architectures which have been proposed over the past few years. Moreover, we study the requirements of IoT applications and platforms, and the limitations faced by cloud systems when executing IoT applications. Finally, we review current research works that particularly focus on Big Data application execution on Fog and address several open challenges as well as future research directions.
Fog Computing: A Taxonomy, Survey and Future Directions In recent years, the number of Internet of Things (IoT) devices/sensors has increased to a great extent. To support the computational demand of real-time latency-sensitive applications of largely geo-distributed IoT devices/sensors, a new computing paradigm named ‘Fog computing’ has been introduced. Generally, Fog computing resides closer to the IoT devices/sensors and extends the Cloud-based computing, storage and networking facilities. In this chapter, we comprehensively analyse the challenges in Fogs acting as an intermediate layer between IoT devices/ sensors and Cloud datacentres and review the current developments in this field. We present a taxonomy of Fog computing according to the identified challenges and its key features.We also map the existing works to the taxonomy in order to identify current research gaps in the area of Fog computing. Moreover, based on the observations, we propose future directions for research.
Fog Computing: Survey of Trends, Architectures, Requirements, and Research Directions Emerging technologies like the Internet of Things (IoT) require latency-aware computation for real-time application processing. In IoT environments, connected things generate a huge amount of data, which are generally referred to as big data. Data generated from IoT devices are generally processed in a cloud infrastructure because of the on-demand services and scalability features of the cloud computing paradigm. However, processing IoT application requests on the cloud exclusively is not an efficient solution for some IoT applications, especially time-sensitive ones. To address this issue, Fog computing, which resides in between cloud and IoT devices, was proposed. In general, in the Fog computing environment, IoT devices are connected to Fog devices. These Fog devices are located in close proximity to users and are responsible for intermediate computation and storage. Fog computing research is still in its infancy, and taxonomy-based investigation into the requirements of Fog infrastructure, platform, and applications mapped to current research is still required. This paper starts with an overview of Fog computing in which the definition of Fog computing, research trends, and the technical differences between Fog and cloud are reviewed. Then, we investigate numerous proposed Fog computing architecture and describe the components of these architectures in detail. From this, the role of each component will be defined, which will help in the deployment of Fog computing. Next, a taxonomy of Fog computing is proposed by considering the requirements of the Fog computing paradigm. We also discuss existing research works and gaps in resource allocation and scheduling, fault tolerance, simulation tools, and Fog-based microservices. Finally, by addressing the limitations of current research works, we present some open issues, which will determine the future research direction.
Forecasting Time Series With Complex Seasonal Patterns Using Exponential Smoothing An innovations state space modeling framework is introduced for forecasting complex seasonal time series such as those with multiple seasonal periods, high-frequency seasonality, non-integer seasonality, and dual-calendar effects. The new framework incorporates Box-Cox transformations, Fourier representations with time varying coefficients, and ARMA error correction. Likelihood evaluation and analytical expressions for point forecasts and interval predictions under the assumption of Gaussian errors are derived, leading to a simple, comprehensive approach to forecasting complex seasonal time series. A key feature of the framework is that it relies on a new method that greatly reduces the computational burden in the maximum likelihood estimation. The modeling framework is useful for a broad range of applications, its versatility being illustrated in three empirical studies. In addition, the proposed trigonometric formulation is presented as a means of decomposing complex seasonal time series, and it is shown that this decomposition leads to the identification and extraction of seasonal components which are otherwise not apparent in the time series plot itself.
Forecasting Transformative AI: An Expert Survey Transformative AI technologies have the potential to reshape critical aspects of society in the near future. However, in order to properly prepare policy initiatives for the arrival of such technologies accurate forecasts and timelines are necessary. A survey was administered to attendees of three AI conferences during the summer of 2018 (ICML, IJCAI and the HLAI conference). The survey included questions for estimating AI capabilities over the next decade, questions for forecasting five scenarios of transformative AI and questions concerning the impact of computational resources in AI research. Respondents indicated a median of 21.5% of human tasks (i.e., all tasks that humans are currently paid to do) can be feasibly automated now, and that this figure would rise to 40% in 5 years and 60% in 10 years. Median forecasts indicated a 50% probability of AI systems being capable of automating 90% of current human tasks in 25 years and 99% of current human tasks in 50 years. The conference of attendance was found to have a statistically significant impact on all forecasts, with attendees of HLAI providing more optimistic timelines with less uncertainty. These findings suggest that AI experts expect major advances in AI technology to continue over the next decade to a degree that will likely have profound transformative impacts on society.
Fostering a data-driven culture Fostering a data-driven culture is an Economist Intelligence Unit report, sponsored by Tableau Software. It explores the challenges in nurturing a data-driven culture, and what companies can do to meet them. The Economist Intelligence Unit bears sole responsibility for the content of this report. The fi ndings do not necessarily refl ect the views of the sponsor. The paper draws on two main sources for its research and fi ndings: * A survey, conducted in October 2012, of 530 senior executives from around the world. More than 40% of respondents are C-Level executives, including 23% from the CEO, president or managing director ranks and 9%, CIOs. Responses come from a wide range of regions: 50% North America, 15% Asia-Pacifi c, 26% Western Europe and 9% Latin America. The range of company sizes is also diverse, from those with revenue of less than US$500m (53%) through to those with revenue of US$10bn or more (20%). The survey covers nearly all industries, including IT and technology (18%), fi nancial services (17%), professional services (11%) and manufacturing (7%). * A series of in-depth interviews with the following senior executives: Sidney Minassian, CEO, Contexti Jerry O´Dwyer, principal, Deloitte Consulting William Schmarzo, CTO, EMC Colin Hill, CEO, GNS Healthcare We would like to thank all interviewees and survey respondents for their time and insight. The report was written by Jim Giles and edited by Gilda Stahl.
Foundations of Complex Event Processing Complex Event Processing (CEP) has emerged as the unifying field for technologies that require processing and correlating heterogeneous distributed data sources in real-time. CEP finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. However, existing CEP frameworks are based on ad-hoc solutions that do not rely on solid theoretical ground, making them hard to understand, extend or generalize. Moreover, they are usually presented as application programming interfaces documented by examples, and using each of them requires learning a different set of skills. In this paper we embark on the task of giving a rigorous framework to CEP. As a starting point, we propose a formal language for specifying complex events, called CEPL, that contains the common features used in the literature and has a simple and denotational semantics. We also formalize the so-called selection strategies, which are the cornerstone of CEP and had only been presented as by-design extensions to existing frameworks. With a well-defined semantics at hand, we study how to efficiently evaluate CEPL for processing complex events. We provide optimization results based on rewriting formulas to a normal form that simplifies the evaluation of filters. Furthermore, we introduce a formal computational model for CEP based on transducers and symbolic automata, called match automata, that captures the regular core of CEPL, i.e. formulas with unary predicates. By using rewriting techniques and automata-based translations, we show that formulas in the regular core of CEPL can be evaluated using constant time per event followed by constant-delay enumeration of the output (under data complexity). By gathering these results together, we propose a framework for efficiently evaluating CEPL, establishing solid foundations for future CEP systems.
Foundations of Constructive Probability Theory We provide a systematic, thorough treatment of the foundations of probability theory and stochastic processes along the lines of E. Bishop’s constructive analysis. Every existence result presented shall be a construction; and the input data, the construction procedure, and the output objects shall be regarded as integral parts of the theorem. A brief description of this approach is in Part I of this book. Part II develops basic topics in probability theory in this constructive framework, expanding on [Bishop and Bridges 1985, Springer], and in terms familiar to probabilists. Part III, the main part of the book, builds on Part II to provide a new constructive treatment of stochastic processes, in the spirit and style of Kolmogorov’s constructive methods for Brownian motion. Topics include a Daniell-Kolmogorov-Skorokhod construction of random fields, measurable random fields, a.u. continuous processes, a.u. c\`adl\`ag processes, martingales, and a.u. c\`adl\`ag and strongly Markov processes with Feller semigroups. This text also contains some new theorems in classical probability theory. Each construction theorem is accompanied by a metrical continuity theorem. For example, the construction of Markov processes from Feller semigroups is shown to be metrically continuous, which strengthens the sequential weak convergence in the classical approach. Another new result is a maximal inequality for $L_p$-martingales for p $\ge$ 1. In addition to providing explicit rates of convergence, this maximal inequality also provides a unified proof of a.u. convergence of martingales, which previously required separate proofs for the cases $p > 1$ and $p = 1$. A third new result is a proof that a familiar condition on the triple-joint distributions implies that a process is not only a.u. c\`adl\`ag, but also right Hoelder, in a sense made precise in the text.
Four Fundamental Questions in Probability Theory and Statistics This study has the purpose of addressing four questions that lie at the base of the probability theory and statistics, and includes two main steps. As first, we conduct the textual analysis of the most significant works written by eminent probability theorists. The textual analysis turns out to be a rather innovative method of study in this domain, and shows how the sampled writers, no matter he is a frequentist or a subjectivist, share a similar approach. Each author argues on the multifold aspects of probability then he establishes the mathematical theory on the basis of his intellectual conclusions. It may be said that mathematics ranks second. Hilbert foresees an approach far different from that used by the sampled authors. He proposes to axiomatize the probability calculus notably to describe the probability concepts using purely mathematical criteria. In the second stage of the present research we address the four issues of the probability theory and statistics following the recommendations of Hilbert. Specifically, we use two theorems that prove how the frequentist and the subjectivist models are not incompatible as many believe. Probability has distinct meanings under different hypotheses, and in turn classical statistics and Bayesian statistics are available for adoption in different circumstances. Subsequently, these results are commented upon, followed by our conclusions
FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review Due to recent advances in digital technologies, and availability of credible data, an area of artificial intelligence, deep learning, has emerged, and has demonstrated its ability and effectiveness in solving complex learning problems not possible before. In particular, convolution neural networks (CNNs) have demonstrated their effectiveness in image detection and recognition applications. However, they require intensive CPU operations and memory bandwidth that make general CPUs fail to achieve desired performance levels. Consequently, hardware accelerators that use application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), and graphic processing units (GPUs) have been employed to improve the throughput of CNNs. More precisely, FPGAs have been recently adopted for accelerating the implementation of deep learning networks due to their ability to maximize parallelism as well as due to their energy efficiency. In this paper, we review recent existing techniques for accelerating deep learning networks on FPGAs. We highlight the key features employed by the various techniques for improving the acceleration performance. In addition, we provide recommendations for enhancing the utilization of FPGAs for CNNs acceleration. The techniques investigated in this paper represent the recent trends in FPGA-based accelerators of deep learning networks. Thus, this review is expected to direct the future advances on efficient hardware accelerators and to be useful for deep learning researchers.
Fractal AI: A fragile theory of intelligence Fractal AI is a theory for general artificial intelligence. It allows to derive new mathematical tools that constitute the foundations for a new kind of stochastic calculus, by modelling information using cellular automaton-like structures instead of smooth functions. In the repository included we are presenting a new Agent, derived from the first principles of the theory, which is capable of solving Atari games several orders of magnitude more efficiently than other similar techniques, like Monte Carlo Tree Search. The code provided shows how it is now possible to beat some of the current state of the art benchmarks on Atari games, without previous learning and using less than 1000 samples to calculate each one of the actions when standard MCTS uses 3 Million samples. Among other things, Fractal AI makes it possible to generate a huge database of top performing examples with very little amount of computation required, transforming Reinforcement Learning into a supervised problem. The algorithm presented is capable of solving the exploration vs exploitation dilemma on both the discrete and continuous cases, while maintaining control over any aspect of the behavior of the Agent. From a general approach, new techniques presented here have direct applications to other areas such as: Non-equilibrium thermodynamics, chemistry, quantum physics, economics, information theory, and non-linear control theory.
Frankenstein’s Legacy – Four Conversations About Artificial Intelligence, Machine Learning, and the Modern World Frankenstein´s Legacy: Four Conversations about Artificial Intelligence, Machine Learning, and the Modern World is a collaboration between Carnegie Mellon University´s ETC Press, dSHARP, and the Carnegie Mellon University Libraries in collaboration with the Alumni Association´s CMUThink program. This book is part of a university-wide series celebrating the two-hundredth anniversary of the publication of Mary Shelley´s Frankenstein. This book project specifically sprung from the panel Frankenstein 200: Perils and Potential panel hosted by Digital Scholarship Strategist Rikk Mulligan. Each of the four panel participants – Jeffrey Bigham, David Danks, Barry Luokkala, and Molly Wright Steenson – sat down with ETC Editor Brad King for wide-ranging discussions about artificial intelligence, machine learning, and the impact of these technologies on the world in which we live. Those conversations were edited into the manuscript that you´re now reading. The book – part of the ETC Press ‘In Conversation With’ series – is a conversational examination and explanation of some of the most powerful technological tools in our society.
From Data Mining to Knowledge Discovery in Databases Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field.
From Data Scientist to Data Artist: Data Sculpting to Shape Business Insights Discover the new role of data artist and understand the need, value and availability of powerful, flexible and affordable analytics tools that do not require an advanced degree in mathematics nor a team of information technology experts to use. Learn about the professional requirements of a data artist and how to change corporate culture with the right analytics tools in the right hands. Read about the exploits of a news organization that used those tools to change its culture and become profitable.
From Data to City Indicators: A Knowledge Graph for Supporting Automatic Generation of Dashboards In the context of Smart Cities, indicator definitions have been used to calculate values that enable the comparison among different cities. The calculation of an indicator values has challenges as the calculation may need to combine some aspects of quality while addressing different levels of abstraction. Knowledge graphs (KGs) have been used successfully to support flexible representation, which can support improved understanding and data analysis in similar settings. This paper presents an operational description for a city KG, an indicator ontology that support indicator discovery and data visualization and an application capable of performing metadata analysis to automatically build and display dashboards according to discovered indicators. We describe our implementation in an urban mobility setting.
From Linear Models to Machine Learning Regression analysis is both one of the oldest branches of statistics, with least-squares analysis having been rst proposed way back in 1805, and also one of the newest areas, in the form of the machine learning techniques being vigorously researched today. Not surprisingly, then, there is a vast literature on the subject. Well, then, why write yet another regression book Many books are out there already, with titles using words like regression, classi cation, predic- tive analytics, machine learning and so on. They are written by authors whom I greatly admire, and whose work I myself have found useful. Yet, I did not feel that any existing books covered the material in a manner that su ciently provided insight for the practicing data analyst.
From statistical inference to a differential learning rule for stochastic neural networks Stochastic neural networks are a prototypical computational device able to build a probabilistic representation of an ensemble of external stimuli. Building on the relation between inference and learning, we derive a synaptic plasticity rule that relies only on delayed activity correlations, and that shows a number of remarkable features. Our ‘delayed-correlations matching’ (DCM) rule satisfies some basic requirements for biological feasibility: finite and noisy afferent signals, Dale’s principle and asymmetry of synaptic connections, locality of the weight update computations. Nevertheless, the DCM rule is capable of storing a large, extensive number of patterns as attractors in a stochastic recurrent neural network, under general scenarios without requiring any modification: it can deal with correlated patterns, a broad range of architectures (with or without hidden neuronal states), one-shot learning with the palimpsest property, all the while avoiding the proliferation of spurious attractors. When hidden units are present, our learning rule can be employed to construct Boltzman-Machine-like generative models, exploiting the addition of hidden neurons in feature extraction and classification tasks.
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning Over the past years, distributed representations have proven effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey is focused on semantic representation of meaning. We start from the theoretical background behind word vector space models and highlight one of its main limitations: the meaning conflation deficiency. Then, we explain how this deficiency can be addressed through a transition from word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and an analysis of five important aspects: interpretability, sense granularity, adaptability to different domains, compositionality and integration into downstream applications.
From Yawn to YARN: Why You Should be Excited About Hadoop 2 By now almost everyone has heard the story of the yellow elephant who never forgets data, consumes whatever data you have from any source, and magically produces a big data treasure trove of business insights for you, including tweets, telemetry, customer sentiment, sensor readings, mobile app activity, and more! In fact, the story has been told and re-told so many times now that most people´s natural reaction is… yawn. Hadoop. Big Data. Yeah, yeah. I have heard this story too many times. I google Hadoop and get almost one billion results, but I can´t yell ‘yahoo!’ about getting paid big bucks to code big data applications in MapReduce, which those cool kids in Silicon Valley used to ‘money-expand’ into billionaires. So why should I be excited about Hadoop 2 After all, no sequel is as good as the original. Well, in this case, the sequel is better. The story has changed. The script has flipped. Even though the new protagonist´s name sounds like yawn, the yarn about YARN is much more than yet another chapter in the same old story. More reimagining than sequel, it will take you from yawn to YARN and get you excited about Hadoop 2. This paper explains why.
Functional data clustering: a survey The main contributions to functional data clustering are reviewed. Most approaches used for clustering functional data are based on the following three methodologies: dimension reduction before clustering, nonparametric methods using specific distances or dissimilarities between curves and model-based clustering methods. These latter assume a probabilistic distribution on either the principal components or coefficients of functional data expansion into a finite dimensional basis of functions. Numerical illustrations as well as a software review are presented.
Functional Regression: A New Model for Predicting Market Penetration of New Products The Bass model has been a standard for analyzing and predicting the market penetration of new products. We demonstrate the insights to be gained and predictive performance of functional data analysis (FDA), a new class of nonparametric techniques that has shown impressive results within the statistics community, on the market penetration of 760 categories drawn from 21 products and 70 countries. We propose a new model called Functional Regression and compare its performance to several models, including the Classic Bass model, Estimated Means, Last Observation Projection, a Meta-Bass model, and an Augmented Meta-Bass model for predicting eight aspects of market penetration. Results (a) validate the logic of FDA in integrating information across categories, (b) show that Augmented Functional Regression is superior to the above models, and (c) product-specific effects are more important than country-specific effects when predicting penetration of an evolving new product.
Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation Many machine learning approaches are characterized by information constraints on how they interact with the training data. These include memory and sequential access constraints (e.g. fast first-order methods to solve stochastic optimization problems); communication constraints (e.g. distributed learning); partial access to the underlying data (e.g. missing features and multi-armed bandits) and more. However, currently we have little understanding how such information constraints fundamentally affect our performance, independent of the learning problem semantics. For example, are there learning problems where any algorithm which has small memory footprint (or can use any bounded number of bits from each example, or has certain communication constraints) will perform worse than what is possible without such constraints In this paper, we describe how a single set of results implies positive answers to the above, for several different settings.
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network Because of their effectiveness in broad practical applications, LSTM networks have received a wealth of coverage in scientific journals, technical blogs, and implementation guides. However, in most articles, the inference formulas for the LSTM network and its parent, RNN, are stated axiomatically, while the training formulas are omitted altogether. In addition, the technique of ‘unrolling’ an RNN is routinely presented without justification throughout the literature. The goal of this paper is to explain the essential RNN and LSTM fundamentals in a single document. Drawing from concepts in signal processing, we formally derive the canonical RNN formulation from differential equations. We then propose and prove a precise statement, which yields the RNN unrolling technique. We also review the difficulties with training the standard RNN and address them by transforming the RNN into the ‘Vanilla LSTM’ network through a series of logical arguments. We provide all equations pertaining to the LSTM system together with detailed descriptions of its constituent entities. Albeit unconventional, our choice of notation and the method for presenting the LSTM system emphasizes ease of understanding. As part of the analysis, we identify new opportunities to enrich the LSTM system and incorporate these extensions into the Vanilla LSTM network, producing the most general LSTM variant to date. The target reader has already been exposed to RNNs and LSTM networks through numerous available resources and is open to an alternative pedagogical approach. A Machine Learning practitioner seeking guidance for implementing our new augmented LSTM model in software for experimentation and research will find the insights and derivations in this tutorial valuable as well.
Future of Computing is Boring (and that is exciting!) or How to get to Computing Nirvana in 20 years or less We see a trend where computing becomes a metered utility similar to how the electric grid evolved. Initially electricity was generated locally but economies of scale (and standardization) made it more efficient and economical to have utility companies managing the electric grid. Similar developments can be seen in computing where scientific grids paved the way for commercial cloud computing offerings. However, in our opinion, that evolution is far from finished and in this paper we bring forward the remaining challenges and propose a vision for the future of computing. In particular we focus on changes in cost of computing and high cost of human time in comparison that indicates that saving developer time is the most important for future of computing.

G

Game theory models for communication between agents: a review In the real world, agents or entities are in a continuous state of interactions. These inter- actions lead to various types of complexity dynamics. One key difficulty in the study of complex agent interactions is the difficulty of modeling agent communication on the basis of rewards. Game theory offers a perspective of analysis and modeling these interactions. Previously, while a large amount of literature is available on game theory, most of it is from specific domains and does not cater for the concepts from an agent- based perspective. Here in this paper, we present a comprehensive multidisciplinary state-of-the-art review and taxonomy of game theory models of complex interactions between agents.
Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.
Gaussian Processes for Regression A Quick Introduction
Gaussian Processes in Machine Learning We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters using the marginal likelihood. We explain the practical advantages of Gaussian Process and end with conclusions and a look at the current trends in GP work.
Generalization Error in Deep Learning Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this article, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results.
Generalization in Deep Learning This paper explains why deep learning can generalize well, despite large capacity and possible algorithmic instability, nonrobustness, and sharp minima, effectively addressing an open problem in the literature. Based on our theoretical insight, this paper also proposes a family of new regularization methods. Its simplest member was empirically shown to improve base models and achieve state-of-the-art performance on MNIST and CIFAR-10 benchmarks. Moreover, this paper presents both data-dependent and data-independent generalization guarantees with improved convergence rates. Our results suggest several new open areas of research.
Generalization in Machine Learning via Analytical Learning Theory This paper introduces a novel measure-theoretic learning theory to analyze generalization behaviors of practical interest. The proposed learning theory has the following abilities: 1) to utilize the qualities of each learned representation on the path from raw inputs to outputs in representation learning, 2) to guarantee good generalization errors possibly with arbitrarily rich hypothesis spaces (e.g., arbitrarily large capacity and Rademacher complexity) and non-stable/non-robust learning algorithms, and 3) to clearly distinguish each individual problem instance from each other. Our generalization bounds are relative to a representation of the data, and hold true even if the representation is learned. We discuss several consequences of our results on deep learning, one-shot learning and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. Because of the differences in the assumptions and the objectives, the proposed learning theory is meant to be complementary to previous learning theory and is not designed to compete with it.
Generalization of Dempster-Shafer theory: A complex belief function Dempster-Shafer evidence theory has been widely used in various fields of applications, because of the flexibility and effectiveness in modeling uncertainties without prior information. However, the existing evidence theory is insufficient to consider the situations where it has no capability to express the fluctuations of data at a given phase of time during their execution, and the uncertainty and imprecision which are inevitably involved in the data occur concurrently with changes to the phase or periodicity of the data. In this paper, therefore, a generalized Dempster-Shafer evidence theory is proposed. To be specific, a mass function in the generalized Dempster-Shafer evidence theory is modeled by a complex number, called as a complex basic belief assignment, which has more powerful ability to express uncertain information. Based on that, a generalized Dempster’s combination rule is exploited. In contrast to the classical Dempster’s combination rule, the condition in terms of the conflict coefficient between the evidences K<1 is released in the generalized Dempster's combination rule. Hence, it is more general and applicable than the classical Dempster's combination rule. When the complex mass function is degenerated from complex numbers to real numbers, the generalized Dempster's combination rule degenerates to the classical evidence theory under the condition that the conflict coefficient between the evidences K is less than 1. In a word, this generalized Dempster-Shafer evidence theory provides a promising way to model and handle more uncertain information.
Generalized Gradient Descent (Slide Deck)
Generalized Power Method for Sparse Principal Component Analysis In this paper we develop a new approach to sparse principal component analysis (sparse PCA). We propose two single-unit and two block optimization formulations of the sparse PCA problem, aimed at extracting a single sparse dominant principal component of a data matrix, or more components at once, respectively. While the initial formulations involve nonconvex functions, and are therefore computationally intractable, we rewrite them into the form of an optimization program involving maximization of a convex function on a compact set. The dimension of the search space is decreased enormously if the data matrix has many more columns (variables) than rows. We then propose and analyze a simple gradient method suited for the task. It appears that our algorithm has best convergence properties in the case when either the objective function or the feasible set are strongly convex, which is the case with our single-unit formulations and can be enforced in the block case. Finally, we demonstrate numerically on a set of random and gene expression test problems that our approach outperforms existing algorithms both in quality of the obtained solution and in computational speed.
Generalized uncertain theory: concepts and fundamental principles Although there are many mathematical theories to address uncertain phenomena however, these theories are presented under implicit presupposition that uncertainty of objects is accurately measurable while not considering that the measure of uncertainty itself may be inaccurate. Considering this evident but critical overlook, on the basis of reviewing and commenting several widely used mathematical theories of uncertainty, the fundamental concepts and axiomatic system of generalized uncertain theory (GUT)are proposed for the purpose of describing and analyzing that imprecision of objects has inaccurate attributes. We show that current main stream theories of studying uncertain phenomena, such as probability theory, fuzzy mathematics, etc., are the special cases of generalized uncertain theory. So the generalized uncertain theory could cover previous main stream theories of studying uncertainty. Further research directions and possible application realms are discussed. It may be a beneficial endeavor for enriching and developing current uncertainty mathematical theories.
Generating Textual Adversarial Examples for Deep Learning Models: A Survey With the development of high computational devices, deep neural networks (DNNs), in recent years, have gained significant popularity in many Artificial Intelligence (AI) applications. However, previous efforts have shown that DNNs were vulnerable to strategically modified samples, named adversarial examples. These samples are generated with some imperceptible perturbations but can fool the DNNs to give false predictions. Inspired by the popularity of generating adversarial examples for image DNNs, research efforts on attacking DNNs for textual applications emerges in recent years. However, existing perturbation methods for images cannotbe directly applied to texts as text data is discrete. In this article, we review research works that address this difference and generatetextual adversarial examples on DNNs. We collect, select, summarize, discuss and analyze these works in a comprehensive way andcover all the related information to make the article self-contained. Finally, drawing on the reviewed literature, we provide further discussions and suggestions on this topic.
Generative Adversarial Active Learning We propose a new active learning approach using Generative Adversarial Networks (GAN). Different from regular active learning, we adaptively synthesize training instances for querying to increase learning speed. Our approach outperforms random generation using GAN alone in active learning experiments. We demonstrate the effectiveness of the proposed algorithm in various datasets when compared to other algorithms. To the best our knowledge, this is the first active learning work using GAN.
Generative Adversarial Nets for Information Retrieval: Fundamentals and Advances Generative adversarial nets (GANs) have been widely studied during the recent development of deep learning and unsupervised learning. With an adversarial training mechanism, GAN manages to train a generative model to fit the underlying unknown real data distribution under the guidance of the discriminative model estimating whether a data instance is real or generated. Such a framework is originally proposed for fitting continuous data distribution such as images, thus it is not straightforward to be directly applied to information retrieval scenarios where the data is mostly discrete, such as IDs, text and graphs. In this tutorial, we focus on discussing the GAN techniques and the variants on discrete data fitting in various information retrieval scenarios. (i) We introduce the fundamentals of GAN framework and its theoretic properties; (ii) we carefully study the promising solutions to extend GAN onto discrete data generation; (iii) we introduce IRGAN, the fundamental GAN framework of fitting single ID data distribution and the direct application on information retrieval; (iv) we further discuss the task of sequential discrete data generation tasks, e.g., text generation, and the corresponding GAN solutions; (v) we present the most recent work on graph/network data fitting with node embedding techniques by GANs. Meanwhile, we also introduce the relevant open-source platforms such as IRGAN and Texygen to help audience conduct research experiments on GANs in information retrieval. Finally, we conclude this tutorial with a comprehensive summarization and a prospect of further research directions for GANs in information retrieval.
Generative Adversarial Networks: A Survey and Taxonomy Generative adversarial networks (GANs) have been extensively studied in the past few years. Arguably the revolutionary techniques are in the area of computer vision such as plausible image generation, image to image translation, facial attribute manipulation and similar domains. Despite the significant success achieved in computer vision field, applying GANs over real-world problems still have three main challenges: (1) High quality image generation; (2) Diverse image generation; and (3) Stable training. Considering numerous GAN-related research in the literature, we provide a study on the architecture-variants and loss-variants, which are proposed to handle these three challenges from two perspectives. We propose loss and architecture-variants for classifying most popular GANs, and discuss the potential improvements with focusing on these two aspects. While several reviews for GANs have been presented, there is no work focusing on the review of GAN-variants based on handling challenges mentioned above. In this paper, we review and critically discuss 7 architecture-variant GANs and 9 loss-variant GANs for remedying those three challenges. The objective of this review is to provide an insight on the footprint that current GANs research focuses on the performance improvement. Code related to GAN-variants studied in this work is summarized on https://…/GAN_Review.
Generative Adversarial Networks: An Overview Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application.
Generative and Discriminative Text Classification with Recurrent Neural Networks We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find that generative models approach their asymptotic error rate more rapidly than their discriminative counterparts—the same pattern that Ng and Jordan (2001) proved holds for linear classification models that make more naive conditional independence assumptions. Building on this finding, we hypothesize that RNN-based generative classification models will be more robust to shifts in the data distribution. This hypothesis is confirmed in a series of experiments in zero-shot and continual learning settings that show that generative models substantially outperform discriminative models.
Generative Deep Neural Networks for Dialogue: A Short Review Researchers have recently started investigating deep neural networks for dialogue applications. In particular, generative sequence-to-sequence (Seq2Seq) models have shown promising results for unstructured tasks, such as word-level dialogue response generation. The hope is that such models will be able to leverage massive amounts of data to learn meaningful natural language representations and response generation strategies, while requiring a minimum amount of domain knowledge and hand-crafting. An important challenge is to develop models that can effectively incorporate dialogue context and generate meaningful and diverse responses. In support of this goal, we review recently proposed models based on generative encoder-decoder neural network architectures, and show that these models have better ability to incorporate long-term dialogue history, to model uncertainty and ambiguity in dialogue, and to generate responses with high-level compositional structure.
Generative learning for deep networks Learning, taking into account full distribution of the data, referred to as generative, is not feasible with deep neural networks (DNNs) because they model only the conditional distribution of the outputs given the inputs. Current solutions are either based on joint probability models facing difficult estimation problems or learn two separate networks, mapping inputs to outputs (recognition) and vice-versa (generation). We propose an intermediate approach. First, we show that forward computation in DNNs with logistic sigmoid activations corresponds to a simplified approximate Bayesian inference in a directed probabilistic multi-layer model. This connection allows to interpret DNN as a probabilistic model of the output and all hidden units given the input. Second, we propose that in order for the recognition and generation networks to be more consistent with the joint model of the data, weights of the recognition and generator network should be related by transposition. We demonstrate in a tentative experiment that such a coupled pair can be learned generatively, modelling the full distribution of the data, and has enough capacity to perform well in both recognition and generation.
Generativity and Interactional Effects: an Overview We propose a means to relate properties of an interconnected system to its separate component systems in the presence of cascade-like phenomena. Building on a theory of interconnection reminiscent of the behavioral approach to systems theory, we introduce the notion of generativity, and its byproduct, generative effects. Cascade effects, enclosing contagion phenomena and cascading failures, are seen as instances of generative effects. The latter are precisely the instances where properties of interest are not preserved or behave very badly when systems interact. The goal is to overcome that obstruction. We will show how to extract mathematical objects from the systems, that encode their generativity: their potential to generate new phenomena upon interaction. Those objects may then be used to link the properties of the interconnected system to its separate systems. Such a link will be executed through the use of exact sequences from commutative algebra.
Getting Started with Apache Hadoop This Refcard presents Apache Hadoop, a software framework that enables distributed storage and processing of large datasets using simple high-level programming models. We cover the most important concepts of Hadoop, describe its architecture, guide how to start using it as well as write and execute various applications on Hadoop. In the nutshell, Hadoop is an open-source project of the Apache Software Foundation that can be installed on a set of standard machines, so that these machines can communicate and work together to store and process large datasets. …
Getting Started with Spark (Slide Deck)
GIS with R (Slide Deck)
Global overview of Imitation Learning Imitation Learning is a sequential task where the learner tries to mimic an expert’s action in order to achieve the best performance. Several algorithms have been proposed recently for this task. In this project, we aim at proposing a wide review of these algorithms, presenting their main features and comparing them on their performance and their regret bounds.
Glossary (Cheat Sheet)
Grades of Evidence (Cheat Sheet)
Gradient Boosting Machine: A Survey In this survey, we discuss several different types of gradient boosting algorithms and illustrate their mathematical frameworks in detail: 1. introduction of gradient boosting leads to 2. objective function optimization, 3. loss function estimations, and 4. model constructions. 5. application of boosting in ranking.
Gradient boosting machines, a tutorial Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine-learning aspects of modeling. A theoretical information is complemented with descriptive examples and illustrations which cover all the stages of the gradient boosting model design. Considerations on handling the model complexity are discussed. Three practical examples of gradient boosting applications are presented and comprehensively analyzed.
Graph Based Recommendations: From Data Representation to Feature Extraction and Application Modeling users for the purpose of identifying their preferences and then personalizing services on the basis of these models is a complex task, primarily due to the need to take into consideration various explicit and implicit signals, missing or uncertain information, contextual aspects, and more. In this study, a novel generic approach for uncovering latent preference patterns from user data is proposed and evaluated. The approach relies on representing the data using graphs, and then systematically extracting graph-based features and using them to enrich the original user models. The extracted features encapsulate complex relationships between users, items, and metadata. The enhanced user models can then serve as an input to any recommendation algorithm. The proposed approach is domain-independent (demonstrated on data from movies, music, and business recommender systems), and is evaluated using several state-of-the-art machine learning methods, on different recommendation tasks, and using different evaluation metrics. The results show a unanimous improvement in the recommendation accuracy across tasks and domains. In addition, the evaluation provides a deeper analysis regarding the performance of the approach in special scenarios, including high sparsity and variability of ratings.
Graph Kernels: A Survey Graph kernels have attracted a lot of attention during the last decade, and have evolved into a rapidly developing branch of learning on structured data. During the past 20 years, the considerable research activity that occurred in the field resulted in the development of dozens of graph kernels, each focusing on specific structural properties of graphs. Graph kernels have proven successful in a wide range of domains, ranging from social networks to bioinformatics. The goal of this survey is to provide a unifying view of the literature on graph kernels. In particular, we present a comprehensive overview of a wide range of graph kernels. Furthermore, we perform an experimental evaluation of several of those kernels on publicly available datasets, and provide a comparative study. Finally, we discuss key applications of graph kernels, and outline some challenges that remain to be addressed.
Graph Neural Networks for Small Graph and Giant Network Representation Learning: An Overview Graph neural networks denote a group of neural network models introduced for the representation learning tasks on graph data specifically. Graph neural networks have been demonstrated to be effective for capturing network structure information, and the learned representations can achieve the state-of-the-art performance on node and graph classification tasks. Besides the different application scenarios, the architectures of graph neural network models also depend on the studied graph types a lot. Graph data studied in research can be generally categorized into two main types, i.e., small graphs vs. giant networks, which differ from each other a lot in the size, instance number and label annotation. Several different types of graph neural network models have been introduced for learning the representations from such different types of graphs already. In this paper, for these two different types of graph data, we will introduce the graph neural networks introduced in recent years. To be more specific, the graph neural networks introduced in this paper include IsoNN, SDBN, LF&ER, GCN, GAT, DifNN, GNL, GraphSage and seGEN. Among these graph neural network models, IsoNN, SDBN and LF&ER are initially proposed for small graphs and the remaining ones are initially proposed for giant networks instead. The readers are also suggested to refer to these papers for detailed information when reading this tutorial paper.
Graph Neural Networks: A Review of Methods and Applications Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics system, learning molecular fingerprints, predicting protein interface, and classifying diseases require that a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures, like the dependency tree of sentences and the scene graph of images, is an important research topic which also needs graph reasoning models. Graph neural networks (GNNs) are connectionist models that capture the dependence of graphs via message passing between the nodes of graphs. Unlike standard neural networks, graph neural networks retain a state that can represent information from its neighborhood with an arbitrary depth. Although the primitive graph neural networks have been found difficult to train for a fixed point, recent advances in network architectures, optimization techniques, and parallel computation have enabled successful learning with them. In recent years, systems based on graph convolutional network (GCN) and gated graph neural network (GGNN) have demonstrated ground-breaking performance on many tasks mentioned above. In this survey, we provide a detailed review over existing graph neural network models, systematically categorize the applications, and propose four open problems for future research.
Graph Neural Processes: Towards Bayesian Graph Neural Networks We introduce Graph Neural Processes (GNP), inspired by the recent work in conditional and latent neural processes. A Graph Neural Process is defined as a Conditional Neural Process that operates on arbitrary graph data. It takes features of sparsely observed context points as input, and outputs a distribution over target points. We demonstrate graph neural processes in edge imputation and discuss benefits and drawbacks of the method for other application areas. One major benefit of GNPs is the ability to quantify uncertainty in deep learning on graph structures. An additional benefit of this method is the ability to extend graph neural networks to inputs of dynamic sized graphs.
Graph Representation Learning: A Survey Research on graph representation learning has received a lot of attention in recent years since many data in real-world applications come in form of graphs. High-dimensional graph data are often in irregular form, which makes them more difficult to analyze than image/video/audio data defined on regular lattices. Various graph embedding techniques have been developed to convert the raw graph data into a low-dimensional vector representation while preserving the intrinsic graph properties. In this review, we first explain the graph embedding task and its challenges. Next, we review a wide range of graph embedding techniques with insights. Then, we evaluate several state-of-the-art methods against small and large datasets and compare their performance. Finally, potential applications and future directions are presented.
Graph Spanners: A Tutorial Review This tutorial review provides a guiding reference to researchers who want to have an overview of the large body of literature about graph spanners. It reviews the current literature covering various research streams about graph spanners, such as different formulations, sparsity and lightness results, computational complexity, dynamic algorithms, and applications. As an additional contribution, we offer a list of open problems on graph spanners.
Graph-based Ontology Summarization: A Survey Ontologies have been widely used in numerous and varied applications, e.g., to support data modeling, information integration, and knowledge management. With the increasing size of ontologies, ontology understanding, which is playing an important role in different tasks, is becoming more difficult. Consequently, ontology summarization, as a way to distill key information from an ontology and generate an abridged version to facilitate a better understanding, is getting growing attention. In this survey paper, we review existing ontology summarization techniques and focus mainly on graph-based methods, which represent an ontology as a graph and apply centrality-based and other measures to identify the most important elements of an ontology as its summary. After analyzing their strengths and weaknesses, we highlight a few potential directions for future research.
Graphical Models Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism.We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in large-scale data analysis problems.We also present examples of graphical models in bioinformatics, error-control coding and language processing.
Graphical Models for Processing Missing Data This paper reviews recent advances in missing data research using graphical mod- els to represent multivariate dependencies. We rst examine the limitations of tra- ditional frameworks from three di erent perspectives: transparency, estimability and testability. We then show how procedures based on graphical models can overcome these limitations and provide meaningful performance guarantees even when data are Missing Not At Random (MNAR). In particular, we identify conditions that guar- antee consistent estimation in broad categories of missing data problems, and derive procedures for implementing this estimation. Finally we derive testable implications for missing data models in both MAR (Missing At Random) and MNAR categories.
Graphical Models in a Nutshell Probabilistic graphical models are an elegant framework which combines uncertainty (probabilities) and logical structure (independence constraints) to compactly represent complex, real-world phenomena. The framework is quite general in that many of the commonly proposed statistical models (Kalman filters, hidden Markov models, Ising models) can be described as graphical models. Graphical models have enjoyed a surge of interest in the last two decades, due both to the flexibility and power of the representation and to the increased ability to effectively learn and perform inference in large networks.
Graphical Models: An Extension to Random Graphs, Trees, and Other Objects In this work, we consider an extension of graphical models to random graphs, trees, and other objects. To do this, many fundamental concepts for multivariate random variables (e.g., marginal variables, Gibbs distribution, Markov properties) must be extended to other mathematical objects; it turns out that this extension is possible, as we will discuss, if we have a consistent, complete system of projections on a given object. Each projection defines a marginal random variable, allowing one to specify independence assumptions between them. Furthermore, these independencies can be specified in terms of a small subset of these marginal variables (which we call the atomic variables), allowing the compact representation of independencies by a directed graph. Projections also define factors, functions on the projected object space, and hence a projection family defines a set of possible factorizations for a distribution; these can be compactly represented by an undirected graph. The invariances used in graphical models are essential for learning distributions, not just on multivariate random variables, but also on other objects. When they are applied to random graphs and random trees, the result is a general class of models that is applicable to a broad range of problems, including those in which the graphs and trees have complicated edge structures. These models need not be conditioned on a fixed number of vertices, as is often the case in the literature for random graphs, and can be used for problems in which attributes are associated with vertices and edges. For graphs, applications include the modeling of molecules, neural networks, and relational real-world scenes; for trees, applications include the modeling of infectious diseases, cell fusion, the structure of language, and the structure of objects in visual scenes. Many classic models are particular instances of this framework.
Group theoretical methods in machine learning Ever since its discovery in 1807, the Fourier transform has been one of the mainstays of pure mathematics, theoretical physics, and engineering. The ease with which it connects the analytical and algebraic properties of function spaces; the particle and wave descriptions of matter; and the time and frequency domain descriptions of waves and vibrations make the Fourier transform one of the great unifying concepts of mathematics. Deeper examination reveals that the logic of the Fourier transform is dictated by the structure of the underlying space itself. Hence, the classical cases of functions on the real line, the unit circle, and the integers modulo n are only the beginning: harmonic analysis can be generalized to functions on any space on which a group of transformations acts. Here the emphasis is on the word group in the mathematical sense of an algebraic system obeying specific axioms. The group might even be non-commutative: the fundamental principles behind harmonic analysis are so general that they apply equally to commutative and non-commutative structures. Thus, the humble Fourier transform leads us into the depths of group theory and abstract algebra, arguably the most extensive formal system ever explored by humans. Should this be of any interest to the practitioner who has his eyes set on concrete applications of machine learning and statistical inference Hopefully, the present thesis will convince the reader that the answer is an emphatic ‘yes’. One of the reasons why this is so is that groups are the mathematician´s way of capturing symmetries, and symmetries are all around us. Twentieth century physics has taught us just how powerful a tool symmetry principles are for prying open the secrets of nature. One could hardly ask for a better example of the power of mathematics than particle physics, which translates the abstract machinery of group theory into predictions about the behavior of the elementary building blocks of our universe. I believe that algebra will prove to be just as crucial to the science of data as it has proved to be to the sciences of the physical world. In probability theory and statistics it was Persi Diaconis who did much of the pioneering work in this realm, brilliantly expounded in his little book [Diaconis, 1988]. Since then, several other authors have also contributed to the field. In comparison, the algebraic side of machine learning has until now remained largely unexplored. The present thesis is a first step towards filling this gap. The two main themes of the thesis are (a) learning on domains which have non-trivial algebraic structure; and (b) learning in the presence of invariances. Learning rankings/matchings are the classic example of the first situation, whilst rotation/translation/scale invariance in machine vision is probably the most immediate example of the latter. The thesis presents examples addressing real world problems in these two domains. However, the beauty of the algebraic approach is that it allows us to discuss these matters on a more general, abstract, level, so most of our results apply equally well to a large range of learning scenarios. The generality of our approach also means that we do not have to commit to just one learning paradigm (frequentist/Bayesian) or one group of algorithms (SVMs/graphical models/boosting/etc.). We do find that some of our ideas regarding symmetrization and learning on groups meshes best with the Hilbert space learning framework, so in Chapters 4 and 5 we focus on this methodology, but even there we take a comparative stance, contrasting the SVM with Gaussian Processes and a modified version of the Perceptron. One of the reasons why up until now abstract algebra has not had a larger impact on the applied side of computer science is that it is often perceived as a very theoretical field, where computations are difficult if not impossible due to the sheer size of the objects at hand. For example, while permutations obviously enter many applied problems, calulations on the full symmetric group (permutation group) are seldom viable, since it has n! elements. However, recently abstract algebra has developed a strong computational side [B¨urgisser et al., 1997]. The core algorithms of this new computational algebra, such as the non-commutative FFTs discussed in detail in Chapter 3, are the backbone of the bridge between applied computations and abstract theory. In addition to our machine learning work, the present thesis offers some modest additions to this field by deriving some useful generalizations of Clausen´s FFT for the symmetric group, and presenting an efficient, expandable software library implementing the transform. To the best of our knowledge, this is the first time that such a library has been made publicly available. Clearly, a thesis like this one is only a first step towards building a bridge between the theory of groups/representations and machine learning. My hope is that it will offer ideas and inspiration to both sides, as well as a few practical algorithms that I believe are directly applicable to real world problems.
Guide to Big Data 2014 Big Data, NoSQL, and NewSQL – these are the high-level concepts relating to the new, unprecedented data management and analysis challenges that enterprises and startups are now facing. Some estimates expect the amount of digital data in the world to double every two years, while other estimates suggest that 90% of the world´s current data was created in the last two years. The predictions for data growth are staggering no matter where you look, but what does that mean practically for you, the developer, the sysadmin, the product manager, or C-level leader DZone´s 2014 Guide to Big Data is the definitive resource for learning how industry experts are handling the massive growth and diversity of data. It contains resources that will help you navigate and excel in the world of Big Data management.
Guide to Big Data Business Solutions in the Cloud The recent release of a commercial version of the Lustre* parallel file system running on Amazon Web Services (AWS) was big news for business data centers facing ever expanding data analysis and storage demands. Now, Lustre, the predominant high-performing file system installed in most of the supercomputer installations around the world, could be deployed to business customers in a hardened, tested, easy to manage and fully supported distribution in the cloud. Proven to scale up to extreme levels of storage performance and capacity as measured in tens or even hundreds of petabytes, with shared and accessible to tens of thousands of clients, Lustre combines high throughput with high availability using vendor-neutral server, storage and interconnect hardware coupled with various distributions of Linux. In this Guide, we take a look at what Lustre on infrastructure AWS delivers for a broad community of business and commercial organizations struggling with the challenge of big data and demanding storage growth.
Guide to Machine Learning As the primary facilitator of data science and big data, machine learning has garnered much interest by a broad range of industries as a way to increase value of enterprise data assets. Through techniques of supervised and unsupervised statistical learning, organizations can make important predictions and discover previously unknown knowledge to provide actionable business intelligence. In this guide, we´ll examine the principles underlying machine learning based on the R statistical environment. We´ll explore machine learning with R from the open source R perspective as well as the more robust commercial perspective using Revolution Analytics´ Revolution R Enterprise (RRE) for big data deployments….
Guidelines for Producing Useful Synthetic Data We report on our experiences of helping staff of the Scottish Longitudinal Study to create synthetic extracts that can be released to users. In particular, we focus on how the synthesis process can be tailored to produce synthetic extracts that will provide users with similar results to those that would be obtained from the original data. We make recommendations for synthesis methods and illustrate how the staff creating synthetic extracts can evaluate their utility at the time they are being produced. We discuss measures of utility for synthetic data and show that one tabular utility measure is exactly equivalent to a measure calculated from a propensity score. The methods are illustrated by using the R package $synthpop$ to create synthetic versions of data from the 1901 Census of Scotland.

H

Hadoop Buyer’s Guide Everything you need to know about choosing the right Hadoop distribution for production
Hadoop´s Limitations for Big Data Analytics The era of ‘big data´ represents new challenges to businesses. Incoming data volumes are exploding in complexity, variety, speed and volume, while legacy tools have not kept pace. In recent years, a new tool – Apache Hadoop – has appeared on the scene. And while it solves some big data problems, it is not magic. In order to act effectively on big data, businesses must be able to assimilate data quickly, but also must be able to explore this data for value, allowing analysts to ask and iterate their business questions quickly. Hadoop – purpose built to facilitate certain forms of batch-oriented distributed data processing – lends itself readily to the assimilation process. But it was built on fundamentals which severely limit its ability to act as an analytic database. With the rise of big data has come the rise of the analytic database platform. Even five years ago, a company could leverage a DBMS such as Oracle for a data warehouse. However, Oracle was built in a time when databases rarely exceeded a few gigabytes in size. Along with other legacy DBMSs, it cannot perform at the scale now required. Enter the analytic platform. The analytic platform allows analysts to use their existing tools and skillsets to ask new questions of big data quickly, easily, and at scales unseen previously. The de facto best practice infrastructure for big data today often consists of a processing infrastructure of systems such as Hadoop to acquire and archive the data, and an analytic platform to enable the highly iterative analysis process. But because Hadoop is still relatively new, there is a great deal of confusion about its strengths and weaknesses. This paper will discuss those topics, and concludes with guidance on how to build the complete ecosystem for big data analytics.
Handling Missing Data in Within-Trial Cost-Effectiveness Analysis: a Review with Future Guidelines Cost-Effectiveness Analyses (CEAs) alongside randomised controlled trials (RCTs) are increasingly often designed to collect resource use and preference-based health status data for the purpose of healthcare technology assessment. However, because of the way these measures are collected, they are prone to missing data, which can ultimately affect the decision of whether an intervention is good value for money. We examine how missing cost and effect outcome data are handled in RCT-based CEAs, complementing a previous review (covering 2003-2009, 88 articles) with a new systematic review (2009-2015, 81 articles) focussing on two different perspectives. First, we review the description of the missing data, the statistical methods used to deal with them, and the quality of the judgement underpinning the choice of these methods. Second, we provide guidelines on how the information about missingness and related methods should be presented to improve the reporting and handling of missing data. Our review shows that missing data in within-RCT CEAs are still often inadequately handled and the overall level of information provided to support the chosen methods is rarely satisfactory.
Hands-On Data Science with R – Text Mining Text Mining or Text Analytics applies analytic tools to learn from collections of text documents like books, newspapers, emails, etc. The goal is similar to humans learning by reading books. Using automated algorithms we can learn from massive amounts of text, much more than a human can. The material could be consist of millions of newspaper articles to perhaps summarise the main themes and to identify those that are of most interest to particular people.
HARK Side of Deep Learning — From Grad Student Descent to Automated Machine Learning Recent advancements in machine learning research, i.e., deep learning, introduced methods that excel conventional algorithms as well as humans in several complex tasks, ranging from detection of objects in images and speech recognition to playing difficult strategic games. However, the current methodology of machine learning research and consequently, implementations of the real-world applications of such algorithms, seems to have a recurring HARKing (Hypothesizing After the Results are Known) issue. In this work, we elaborate on the algorithmic, economic and social reasons and consequences of this phenomenon. We present examples from current common practices of conducting machine learning research (e.g. avoidance of reporting negative results) and failure of generalization ability of the proposed algorithms and datasets in actual real-life usage. Furthermore, a potential future trajectory of machine learning research and development from the perspective of accountable, unbiased, ethical and privacy-aware algorithmic decision making is discussed. We would like to emphasize that with this discussion we neither claim to provide an exhaustive argumentation nor blame any specific institution or individual on the raised issues. This is simply a discussion put forth by us, insiders of the machine learning field, reflecting on us.
Harness the Power of Data Visualization to Transform Your Business Data underpins the operations and strategic decisions of every business. Yet these days, data is generated faster than it can be consumed and digested, making it challenging for small to mid-size organizations to extract maximum value from this vital asset. Many decision makers – whether data analysts or senior-level executives – struggle to draw meaningful conclusions in a timely manner from the array of data available to them. Reliance on spreadsheets and specialized reporting and analysis tools only limits their flexibility and output. After all, spreadsheets were not designed for data analysis. And specialized reporting and analysis tools often lack integration with other critical business applications and processes. Moreover, dependence on the IT group for ad hoc reports slows insights and decisions, putting the company at a disadvantage. Business decision makers feel a loss of control waiting for the already overburdened IT department to generate critical reports. Savvy companies are moving beyond static graphs, spreadsheets, and reports by harnessing the power of business visualization to transform how they see, discover, and share insights hidden in their data. Because business visualization spans a broad range of options, from static to dynamic and interactive, it serves a variety of needs within organizations. As a result, those companies adopting business visualization are able to extract maximum value from the information captured throughout their environments.
Hidden Technical Debt in Machine Learning Systems Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.
Hierarchical Bayesian Survival Analysis and Projective Covariate Selection in Cardiovascular Event Risk Prediction Identifying biomarkers with predictive value for disease risk stratification is an important task in epidemiology. This paper describes an application of Bayesian linear survival regression to model cardiovascular event risk in diabetic individuals with measurements available on 55 candidate biomarkers. We extend the survival model to include data from a larger set of non-diabetic individuals in an e↵ort to increase the predictive performance for the diabetic subpopulation. We compare the Gaussian, Laplace and horseshoe shrinkage priors, and find that the last has the best predictive performance and shrinks strong predictors less than the others. We implement the projection predictive covariate selection approach of Dupuis and Robert (2003) to further search for small sets of predictive biomarkers that could provide costefficient prediction without significant loss in performance. In passing, we present a derivation of the projective covariate selection in Bayesian decision theoretic framework.
Hierarchical Clustering: Objective Functions and Algorithms Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Motivated by the fact that most work on hierarchical clustering was based on providing algorithms, rather than optimizing a specific objective, Dasgupta framed similarity-based hierarchical clustering as a combinatorial optimization problem, where a `good’ hierarchical clustering is one that minimizes some cost function. He showed that this cost function has certain desirable properties. We take an axiomatic approach to defining `good’ objective functions for both similarity and dissimilarity-based hierarchical clustering. We characterize a set of ‘admissible’ objective functions (that includes Dasgupta’s one) that have the property that when the input admits a `natural’ hierarchical clustering, it has an optimal value. Equipped with a suitable objective function, we analyze the performance of practical algorithms, as well as develop better algorithms. For similarity-based hierarchical clustering, Dasgupta showed that the divisive sparsest-cut approach achieves an $O(\log^{3/2} n)$-approximation. We give a refined analysis of the algorithm and show that it in fact achieves an $O(\sqrt{\log n})$-approx. (Charikar and Chatziafratis independently proved that it is a $O(\sqrt{\log n})$-approx.). This improves upon the LP-based $O(\log n)$-approx. of Roy and Pokutta. For dissimilarity-based hierarchical clustering, we show that the classic average-linkage algorithm gives a factor 2 approx., and provide a simple and better algorithm that gives a factor 3/2 approx.. Finally, we consider `beyond-worst-case’ scenario through a generalisation of the stochastic block model for hierarchical clustering. We show that Dasgupta’s cost function has desirable properties for these inputs and we provide a simple 1 + o(1)-approximation in this setting.
Hierarchical Temporal Memory including HTM Cortical Learning Algorithms There are many things humans find easy to do that computers are currently unable to do. Tasks such as visual pattern recognition, understanding spoken language, recognizing and manipulating objects by touch, and navigating in a complex world are easy for humans. Yet despite decades of research, we have few viable algorithms for achieving human-like performance on a computer. In humans, these capabilities are largely performed by the neocortex. Hierarchical Temporal Memory (HTM) is a technology modeled on how the neocortex performs these functions. HTM offers the promise of building machines that approach or exceed human level performance for many cognitive tasks. This document describes HTM technology. Chapter 1 provides a broad overview of HTM, outlining the importance of hierarchical organization, sparse distributed representations, and learning time-based transitions. Chapter 2 describes the HTM cortical learning algorithms in detail. Chapters 3 and 4 provide pseudocode for the HTM learning algorithms divided in two parts called the spatial pooler and temporal pooler. After reading chapters 2 through 4, experienced software engineers should be able to reproduce and experiment with the algorithms. Hopefully, some readers will go further and extend our work.
Hierarchically Supervised Latent Dirichlet Allocation We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bag-of- word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not.
High Dimensional Classification via Empirical Risk Minimization: Improvements and Optimality In this article, we investigate a family of classification algorithms defined by the principle of empirical risk minimization, in the high dimensional regime where the feature dimension $p$ and data number $n$ are both large and comparable. Based on recent advances in high dimensional statistics and random matrix theory, we provide under mixture data model a unified stochastic characterization of classifiers learned with different loss functions. Our results are instrumental to an in-depth understanding as well as practical improvements on this fundamental classification approach. As the main outcome, we demonstrate the existence of a universally optimal loss function which yields the best high dimensional performance at any given $n/p$ ratio.
High Dimensional Data Clustering Clustering in high-dimensional spaces is a recurrent problem in many domains, for example in object recognition. High-dimensional data usually live in different lowdimensional subspaces hidden in the original space. This paper presents a clustering approach which estimates the specific subspace and the intrinsic dimension of each class. Our approach adapts the Gaussian mixture model framework to high-dimensional data and estimates the parameters which best fit the data. We obtain a robust clustering method called High- Dimensional Data Clustering (HDDC). We apply HDDC to locate objects in natural images in a probabilistic framework. Experiments on a recently proposed database demonstrate the effectiveness of our clustering method for category localization.
High-Performance Support Vector Machines and Its Applications The support vector machines (SVM) algorithm is a popular classification technique in data mining and machine learning. In this paper, we propose a distributed SVM algorithm and demonstrate its use in a number of applications. The algorithm is named high-performance support vector machines (HPSVM). The major contribution of HPSVM is two-fold. First, HPSVM provides a new way to distribute computations to the machines in the cloud without shuffling the data. Second, HPSVM minimizes the inter-machine communications in order to maximize the performance. We apply HPSVM to some real-world classification problems and compare it with the state-of-the-art SVM technique implemented in R on several public data sets. HPSVM achieves similar or better results.
How an Electrical Engineer Became an Artificial Intelligence Researcher, a Multiphase Active Contours Analysis This essay examines how what is considered to be artificial intelligence (AI) has changed over time and come to intersect with the expertise of the author. Initially, AI developed on a separate trajectory, both topically and institutionally, from pattern recognition, neural information processing, decision and control systems, and allied topics by focusing on symbolic systems within computer science departments rather than on continuous systems in electrical engineering departments. The separate evolutions continued throughout the author’s lifetime, with some crossover in reinforcement learning and graphical models, but were shocked into converging by the virality of deep learning, thus making an electrical engineer into an AI researcher. Now that this convergence has happened, opportunity exists to pursue an agenda that combines learning and reasoning bridged by interpretable machine learning models.
How Complex is your classification problem? A survey on measuring classification complexity Extracting characteristics from the training datasets of classification problems has proven effective in a number of meta-analyses. Among them, measures of classification complexity can estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision boundary are among the existent measures for this characterization. This information can support the formulation of new data-driven pre-processing and pattern recognition techniques, which can in turn be focused on challenging characteristics of the problems. This paper surveys and analyzes measures which can be extracted from the training datasets in order to characterize the complexity of the respective classification problems. Their use in recent literature is also reviewed and discussed, allowing to prospect opportunities for future work in the area. Finally, descriptions are given on an R package named Extended Complexity Library (ECoL) that implements a set of complexity measures and is made publicly available.
How convolutional neural network see the world – A survey of convolutional neural network visualization methods Nowadays, the Convolutional Neural Networks (CNNs) have achieved impressive performance on many computer vision related tasks, such as object detection, image recognition, image retrieval, etc. These achievements benefit from the CNNs outstanding capability to learn the input features with deep layers of neuron structures and iterative training process. However, these learned features are hard to identify and interpret from a human vision perspective, causing a lack of understanding of the CNNs internal working mechanism. To improve the CNN interpretability, the CNN visualization is well utilized as a qualitative analysis method, which translates the internal features into visually perceptible patterns. And many CNN visualization works have been proposed in the literature to interpret the CNN in perspectives of network structure, operation, and semantic concept. In this paper, we expect to provide a comprehensive survey of several representative CNN visualization methods, including Activation Maximization, Network Inversion, Deconvolutional Neural Networks (DeconvNet), and Network Dissection based visualization. These methods are presented in terms of motivations, algorithms, and experiment results. Based on these visualization methods, we also discuss their practical applications to demonstrate the significance of the CNN interpretability in areas of network design, optimization, security enhancement, etc.
How deep is deep enough? – Optimizing deep neural network architecture Deep neural networks use stacked layers of feature detectors to repeatedly transform the input data, so that structurally different classes of input become well separated in the final layer. While the method has turned out extremely powerful in many applications, its success depends critically on the correct choice of hyperparameters, in particular the number of network layers. Here, we introduce a new measure, called the generalized discrimination value (GDV), which quantifies how well different object classes separate in each layer. Due to its definition, the GDV is invariant to translation and scaling of the input data, independent of the number of features, as well as independent of the number and permutation of the neurons within a layer. We compute the GDV in each layer of a Deep Belief Network that was trained unsupervised on the MNIST data set. Strikingly, we find that the GDV first improves with each successive network layer, but then gets worse again beyond layer 30, thus indicating the optimal network depth for this data classification task. Our further investigations suggest that the GDV can serve as a universal tool to determine the optimal number of layers in deep neural networks for any type of input data.
How deep learning works –The geometry of deep learning Why and how that deep learning works well on different tasks remains a mystery from a theoretical perspective. In this paper we draw a geometric picture of the deep learning system by finding its analogies with two existing geometric structures, the geometry of quantum computations and the geometry of the diffeomorphic template matching. In this framework, we give the geometric structures of different deep learning systems including convolutional neural networks, residual networks, recursive neural networks, recurrent neural networks and the equilibrium prapagation framework. We can also analysis the relationship between the geometrical structures and their performance of different networks in an algorithmic level so that the geometric framework may guide the design of the structures and algorithms of deep learning systems.
How do we choose our default methods The field of statistics continues to be divided into competing schools of thought. In theory one might imagine choosing the uniquely best method for each problem as it arises, but in practice we choose for ourselves (and recommend to others) default principles, models, and methods to be used in a wide variety of settings. This article briefly considers the informal criteria we use to decide what methods to use and what principles to apply in statistics problems.
How Far are we from Data Mining Democratisation? A Systematic Review Context: Data mining techniques have demonstrated to be a powerful technique for discovering insights hidden in data from a domain. However, these techniques demand very specialised skills. People willing to analyse data often lack these skills, so they must rely on data scientists, which hinders data mining democratisation. Different approaches have appeared in the last years to address this issue. Objective: Analyse the state of the art to know how far are we from an effective data mining democratisation, what has already been accomplished, and what should be done in the upcoming years. Method: We performed a state-of-the-art review following a systematic and objective procedure, which included works both from the academia and the industry. The reviewed works were grouped in four categories. Each category was then evaluated in detail using a well-defined evaluation criteria to identify its strengths and weaknesses. Results: Around 700 works were initially considered, from which 43 were finally selected for a more in-depth analysis. Only two out of the four identified categories provide effective solutions to data mining democratisation. From these two categories, one always requires a minimum intervention of a data scientist, whereas the other one does not provide support for all the stages of the data mining process, and might exhibit accuracy problems in some contexts. Conclusion: In all analysed approaches, a data scientist is still required to perform some steps of the analysis process. Moreover, automated approaches that do not require data scientists for some steps expose some problems in other quality attributes, such as accuracy. Therefore, although existent work shows some promising initial steps, we are still far from data mining democratisation.
How Generative Adversarial Nets and its variants Work: An Overview of GAN Generative Adversarial Networks gets wide attention in machine learning field because of its massive potential to learn high dimensional, complex real data. Specifically, it does not need to do further distribution assumption and can simply infer real-like samples from latent space. This powerful property leads GAN to be applied various applications such as image synthesis, image attribute editing and semantically decomposing of image. In this review paper, we look into details of GAN that firstly show how it operates and fundamental meaning of objective functions and point to GAN variants applied to vast amount of tasks.
How Good Are Machine Learning Clouds for Binary Classification with Good Features We conduct an empirical study of machine learning functionalities provided by major cloud service providers, which we call em machine learning clouds. Machine learning clouds hold the promise of hiding all the sophistication of running large-scale machine learning: Instead of specifying how to run a machine learning task, users only specify what machine learning task to run and the cloud figures out the rest. Raising the level of abstraction, however, rarely comes free — a performance penalty is possible. How good, then, are current machine learning clouds on real-world machine learning workloads We study this question by presenting mlbench, a novel benchmark dataset constructed with the top winning code for all available competitions on Kaggle, as well as the results we obtained by running mlbench on machine learning clouds from both Azure and Amazon. We analyze the strength and weakness of existing machine learning clouds and discuss potential future directions.
How Important Is a Neuron The problem of attributing a deep network’s prediction to its \emph{input/base} features is well-studied. We introduce the notion of \emph{conductance} to extend the notion of attribution to the understanding the importance of \emph{hidden} units. Informally, the conductance of a hidden unit of a deep network is the \emph{flow} of attribution via this hidden unit. We use conductance to understand the importance of a hidden unit to the prediction for a specific input, or over a set of inputs. We evaluate the effectiveness of conductance in multiple ways, including theoretical properties, ablation studies, and a feature selection task. The empirical evaluations are done using the Inception network over ImageNet data, and a sentiment analysis network over reviews. In both cases, we demonstrate the effectiveness of conductance in identifying interesting insights about the internal workings of these networks.
How Important is Syntactic Parsing Accuracy An Empirical Evaluation on Sentiment Analysis Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful. In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources. The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy.
How intelligent are convolutional neural networks Motivated by the Gestalt pattern theory, and the Winograd Challenge for language understanding, we design synthetic experiments to investigate a deep learning algorithm’s ability to infer simple (at least for human) visual concepts, such as symmetry, from examples. A visual concept is represented by randomly generated, positive as well as negative, example images. We then test the ability and speed of algorithms (and humans) to learn the concept from these images. The training and testing are performed progressively in multiple rounds, with each subsequent round deliberately designed to be more complex and confusing than the previous round(s), especially if the concept was not grasped by the learner. However, if the concept was understood, all the deliberate tests would become trivially easy. Our experiments show that humans can often infer a semantic concept quickly after looking at only a very small number of examples (this is often referred to as an ‘aha moment’: a moment of sudden realization), and performs perfectly during all testing rounds (except for careless mistakes). On the contrary, deep convolutional neural networks (DCNN) could approximate some concepts statistically, but only after seeing many (x10^4) more examples. And it will still make obvious mistakes, especially during deliberate testing rounds or on samples outside the training distributions. This signals a lack of true ‘understanding’, or a failure to reach the right ‘formula’ for the semantics. We did find that some concepts are easier for DCNN than others. For example, simple ‘counting’ is more learnable than ‘symmetry’, while ‘uniformity’ or ‘conformance’ are much more difficult for DCNN to learn. To conclude, we propose an ‘Aha Challenge’ for visual perception, calling for focused and quantitative research on Gestalt-style machine intelligence using limited training examples.
How Intelligent is your Intelligent Robot How intelligent is robot A compared with robot B And how intelligent are robots A and B compared with animals (or plants) X and Y These are both interesting and deeply challenging questions. In this paper we address the question ‘how intelligent is your intelligent robot ‘ by proposing that embodied intelligence emerges from the interaction and integration of four different and distinct kinds of intelligence. We then suggest a simple diagrammatic representation on which these kinds of intelligence are shown as four axes in a star diagram. A crude qualitative comparison of the intelligence graphs of animals and robots both exposes and helps to explain the chronic intelligence deficit of intelligent robots. Finally we examine the options for determining numerical values for the four kinds of intelligence in an effort to move toward a quantifiable intelligence vector.
How IT allows E-Participation in Policy-Making Process With the art and practice of government policy-making, public work, and citizen participation, many governments adopt information and communication technologies (ICT) as a vehicle to facilitate their relationship with citizens. This participation process is widely known as E-Participation or Electronic Participation. This article focuses on different performance indicators and the relevant tools for each level. Despite the growing scientific and pragmatic significance of e-participation, that area still was not able to grow as it was expected. Our diverse set of knowledge and e-participation policies and its implementation is very limited. This is the key reason why e-participation initiatives in practice often fall short of expectations. This study collects the existing perceptions from the various interdisciplinary scientific literature to determine a unifying definition and demonstrates the strong abilities of e-participation and other related components which have great potential in the coming years.
How to Build Dashboards That Persuade, Inform and Engage Flow is powerful. Think about a great conversation you´ve had, with no awkwardness or selfconsciousness: just effortless communication. In data visualization, flow is crucial. Your audience should smoothly absorb and use the information in a dashboard without distractions or turbulence. Lack of flow means lack of communication, which means failure. Psychologist Mihaly Czikszentmihalyi has studied flow extensively. Czikszentmihalyi and other researchers have found that flow is correlated with happiness, creativity, and productivity. People experience flow when their skills are engaged and they´re being challenged just the right amount. The experience is not too challenging or too easy: flow is a just-right, Goldilocks state of being. So how do you create flow for an audience By tailoring the presentation of data to that audience. If you focus on the skills, motivations, and needs of an audience, you´ll have a better chance of creating a positive experience of flow with your dashboards. And by creating that flow, you´ll be able to persuade, inform, and engage.
How to capitalize on a priori contrasts in linear (mixed) models: A tutorial Factorial experiments in research on memory, language, and in other areas are often analyzed using analysis of variance (ANOVA). However, for experimental factors with more than two levels, the ANOVA omnibus F-test is not informative about the source of a main effect or interaction. This is unfortunate as researchers typically have specific hypotheses about which condition means differ from each other. A priori contrasts (i.e., comparisons planned before the sample means are known) between specific conditions or combinations of conditions are the appropriate way to represent such hypotheses in the statistical model. Many researchers have pointed out that contrasts should be ‘tested instead of, rather than as a supplement to, the ordinary `omnibus’ F test’ (Hayes, 1973, p. 601). In this tutorial, we explain the mathematics underlying different kinds of contrasts (i.e., treatment, sum, repeated, Helmert, and polynomial contrasts), discuss their properties, and demonstrate how they are applied in the R System for Statistical Computing (R Core Team, 2018). In this context, we explain the generalized inverse which is needed to compute the weight coefficients for contrasts that test hypotheses that are not covered by the default set of contrasts. A detailed understanding of contrast coding is crucial for successful and correct specification in linear models (including linear mixed models). Contrasts defined a priori yield far more precise confirmatory tests of experimental hypotheses than standard omnibus F-test.
How to estimate time-varying Vector Autoregressive Models A comparison of two methods The ubiquity of mobile devices led to a surge in intensive longitudinal (or time series) data of individuals. This is an exciting development because personalized models both naturally tackle the issue of heterogeneities between people and increase the validity of models for applications. A popular model for time series is the Vector Autoregressive (VAR) model, in which each variable is modeled as a linear function of all variables at previous time points. A key assumption of this model is that the parameters of the true data generating model are constant (or stationary) across time. The most straightforward way to check for time-varying parameters is to fit a model that allows for time-varying parameters. In the present paper we compare two methods to estimate time-varying VAR models: the first method uses a spline-approach to allow for time-varying parameters, the second uses kernel-smoothing. We report the performance of both methods and their stationary counterparts in an extensive simulation study that reflects the situations typically encountered in practice. We compare the performance of stationary and time-varying models and discuss the theoretical characteristics of all methods in the light of the simulation results. In addition, we provide a step-by-step tutorial for both methods showing how to estimate a time-varying VAR model on an openly available individual time series dataset.
How to Grow a Mind: Statistics, Structure, and Abstraction In coming to understand the world—in learning concepts, acquiring language, and grasping causal relations—our minds make inferences that appear to go far beyond the data available. How do we do it This review describes recent approaches to reverse-engineering human learning and cognitive development and, in parallel, engineering more humanlike machine learning systems. Computational models that perform probabilistic inference over hierarchies of flexibly structured representations can address some of the deepest questions about the nature and origins of human thought: How does abstract knowledge guide learning and reasoning from sparse data What forms does our knowledge take, across different domains and tasks And how is that abstract knowledge itself acquired
How to Implement an Effective Decision Management System – Embedding Analytics into Real-Time Business Decisions, Operations and Processes Once used mostly in traditional batch-type environments, analytic techniques are now being embedded into real-time business decisions, operations and processes. In fact, decision support insight should be embedded very consistently in operational systems. For example: • When a credit card organization is processing a card swipe, fraud detection analytics should be embedded in that process. • When analysis of sensor data over time indicates an impending problem with a mechanical process, utility grid or manufacturing system, the system should trigger some proactive intervention. • When call center agents have a customer on the phone, or tellers have a customer at the counter, analytics behind the scenes should be giving them the information they need to customize the interaction – right now. This ideal has historically been a challenge to implement because the niche applications used for different business functions have not been on great speaking terms. Unlike niche tools, an enterprise decision management framework extracts information from multiple sources, runs it through analytical processes, and delivers the results directly into business applications or operational systems. ‘Enterprise decision management is one of the hottest topics in business analytics today,’ said David Duling, Director of Enterprise Decision Management RandD at SAS. ‘You see it on the front page of many journals, and a lot of conferences are being organized around the topic. Enterprise data management marries the analytics that we´ve been doing at SAS with the product environments within your organization to automate routine business decisions.’ This means better decisions, delivered right to the point of decision.
How to Maximize the Spread of Social Influence: A Survey This survey presents the main results achieved for the influence maximization problem in social networks. This problem is well studied in the literature and, thanks to its recent applications, some of which currently deployed on the field, it is receiving more and more attention in the scientific community. The problem can be formulated as follows: given a graph, with each node having a certain probability of influencing its neighbors, select a subset of vertices so that the number of nodes in the network that are influenced is maximized. Starting from this model, we introduce the main theoretical developments and computational results that have been achieved, taking into account different diffusion models describing how the information spreads throughout the network, various ways in which the sources of information could be placed, and how to tackle the problem in the presence of uncertainties affecting the network. Finally, we present one of the main application that has been developed and deployed exploiting tools and techniques previously discussed.
How to Speed up R Code: An Introduction Most calculations performed by the average R user are unremarkable in the sense that nowadays, any computer can crush the related code in a matter of seconds. But more and more often, heavy calculations are also performed using R, something especially true in some fields such as statistics. The user then faces total execution times of his codes that are hard to work with: hours, days, even weeks. In this paper, how to reduce the total execution time of various codes will be shown and typical bottlenecks will be discussed. As a last resort, how to run your code on a cluster of computers (most workplaces have one) in order to make use of a larger processing power than the one available on an average computer will also be discussed through two examples.
How to Use an Uncommon-Sense Approach to Big Data Quality Organizations are inundated in data – terabytes, petabytes and exabytes of it. Data pours in from every conceivable direction: from operational and transactional systems, from scanning and facilities management systems, from inbound and outbound customer contact points, from mobile media and the Web. The hopeful vision of big data is that organizations will be able to harvest every byte of relevant data and use it to make supremely informed decisions. We now have the technologies to collect and store big data, but more importantly, to understand and take advantage of its full value. ‘The financial services industry has led the way in using analytics and big data to manage risk and curb fraud, waste and abuse – especially important in that regulatory environment,’ said Scott Chastain, Director of Information Management and Delivery at SAS. ‘We´re also seeing a transference of big data analytics into other areas, such as health care and government. The ability to find that needle in the haystack becomes very important when you´re examining things like costs, outcomes, utilization and fraud for large populations.
How to Use Hadoop as a Piece of the Big Data Puzzle Imagine you have a jar of multicolored candies, and you need to learn something from them, perhaps the count of blue candies relative to red and yellow ones. You could empty the jar onto a plate, sift through them and tally up your answer. If the jar held only a few hundred candies, this process would take only a few minutes. Now imagine you have four plates and four helpers. You pour out about one-fourth of the candies onto each plate. Everybody sifts through their set and arrives at an answer that they share with the others to arrive at a total. Much faster, no That is what Hadoop does for data. Hadoop is an open-source software framework for running applications on large clusters of commodity hardware. Hadoop delivers enormous processing power – the ability to handle virtually limitless concurrent tasks and jobs – making it a remarkably low-cost complement to a traditional enterprise data infrastructure. Organizations are embracing Hadoop for several notable merits: • Hadoop is distributed. Bringing a high-tech twist to the adage, ‘Many hands make light work,’ data is stored on local disks of a distributed cluster of servers. • Hadoop runs on commodity hardware. Based on the average cost per terabyte of compute capacity of a prepackaged system, Hadoop is easily 10 times cheaper for comparable computing capacity compared to higher-cost specialized hardware. • Hadoop is fault-tolerant. Hardware failure is expected and is mitigated by data replication and speculative processing. If capacity is available, Hadoop runs multiple copies of the same task, accepting the results from the task that finishes first. • Hadoop does not require a predefined data schema. A key benefit of Hadoop is the ability to just upload any unstructured files without having to ‘schematize’ them first. You can dump any type of data into Hadoop and allow the consuming programs to determine and apply structure when necessary. • Hadoop scales to handle big data. Hadoop clusters can scale to between 6,000 and 10,000 nodes and handle more than 100,000 concurrent tasks and 10,000 concurrent jobs. Yahoo! runs thousands of clusters and more than 42,000 Hadoop nodes storing more than 200 petabytes of data. • Hadoop is fast. In a performance test, a 1,400-node cluster sorted a terabyte of data in 62 seconds; a 3,400-node cluster sorted 100 terabytes in 173 minutes. To put it in context, one terabyte contains 2,000 hours of CD-quality music; 10 terabytes could store the entire US Library of Congress print collection. You get the idea. Hadoop handles big data. It does it fast. It redefines the possible when it comes to analyzing large volumes of data, particularly semi-structured and unstructured data (text).
How YARN Opens Doors to Easier Programming Tools for Hadoop 2.0 Users The emergence of YARN for the Hadoop 2.0 platform has opened the door to new tools and applications that promise to allow more companies to reap the benefits of big data in ways never before possible with outcomes possibly never imagined. By separating the problem of cluster resource management from the data processing function, YARN offers a world beyond MapReduce: lessencumbered by complex programming protocols, faster, and at a lower cost….
Human \textit{vs} Machine Attention in Neural Networks: A Comparative Study Recent years have witnessed a surge in the popularity of attention mechanisms encoded within deep neural networks. Inspired by the selective attention in the visual cortex, artificial attention is designed to focus a neural network on the most task-relevant input signal. Many works claim that the attention mechanism offers an extra dimension of interpretability by explaining where the neural networks look. However, recent studies demonstrate that artificial attention maps do not always coincide with common intuition. In view of these conflicting evidences, here we make a systematic study on using artificial attention and human attention in neural network design. With three example computer vision tasks (i.e., salient object segmentation, video action recognition, and fine-grained image classification), diverse representative network backbones (i.e., AlexNet, VGGNet, ResNet) and famous architectures (i.e., Two-stream, FCN), corresponding real human gaze data, and systematically conducted large-scale quantitative studies, we offer novel insights into existing artificial attention mechanisms and give preliminary answers to several key questions related to human and artificial attention mechanisms. Our overall results demonstrate that human attention is capable of bench-marking the meaningful `ground-truth’ in attention-driven tasks, where the more the artificial attention is close to the human attention, the better the performance; for higher-level vision tasks, it is case-by-case. We believe it would be advisable for attention-driven tasks to explicitly force a better alignment between artificial and human attentions to boost the performance; such alignment would also benefit making the deep networks more transparent and explainable for higher-level computer vision tasks.
Human Action Recognition and Prediction: A Survey Derived from rapid advances in computer vision and machine learning, video analysis tasks have been moving from inferring the present state to predicting the future state. Vision-based action recognition and prediction from videos are such tasks, where action recognition is to infer human actions (present state) based upon complete action executions, and action prediction to predict human actions (future state) based upon incomplete action executions. These two tasks have become particularly prevalent topics recently because of their explosively emerging real-world applications, such as visual surveillance, autonomous driving vehicle, entertainment, and video retrieval, etc. Many attempts have been devoted in the last a few decades in order to build a robust and effective framework for action recognition and prediction. In this paper, we survey the complete state-of-the-art techniques in the action recognition and prediction. Existing models, popular algorithms, technical difficulties, popular action databases, evaluation protocols, and promising future directions are also provided with systematic discussions.
Human Motion Trajectory Prediction: A Survey With growing numbers of intelligent systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper provides a survey of human motion trajectory prediction. We review, analyze and structure a large selection of work from different communities and propose a taxonomy that categorizes existing approaches based on the motion modeling approach and level of contextual information used. We provide an overview of the existing datasets and performance metrics. We discuss limitations of the state of the art and outline directions for further research.
Human Perception of Surprise: A User Study Understanding how to engage users is a critical question in many applications. Previous research has shown that unexpected or astonishing events can attract user attention, leading to positive outcomes such as engagement and learning. In this work, we investigate the similarity and differences in how people and algorithms rank the surprisingness of facts. Our crowdsourcing study, involving 106 participants, shows that computational models of surprise can be used to artificially induce surprise in humans.
Human-Centric Data Cleaning [Vision] Data Cleaning refers to the process of detecting and fixing errors in the data. Human involvement is instrumental at several stages of this process, e.g., to identify and repair errors, to validate computed repairs, etc. There is currently a plethora of data cleaning algorithms addressing a wide range of data errors (e.g., detecting duplicates, violations of integrity constraints, missing values, etc.). Many of these algorithms involve a human in the loop, however, this latter is usually coupled to the underlying cleaning algorithms. There is currently no end-to-end data cleaning framework that systematically involves humans in the cleaning pipeline regardless of the underlying cleaning algorithms. In this paper, we highlight key challenges that need to be addressed to realize such a framework. We present a design vision and discuss scenarios that motivate the need for such a framework to judiciously assist humans in the cleaning process. Finally, we present directions to implement such a framework.
Human-Machine Inference Networks For Smart Decision Making: Opportunities and Challenges The emerging paradigm of Human-Machine Inference Networks (HuMaINs) combines complementary cognitive strengths of humans and machines in an intelligent manner to tackle various inference tasks and achieves higher performance than either humans or machines by themselves. While inference performance optimization techniques for human-only or sensor-only networks are quite mature, HuMaINs require novel signal processing and machine learning solutions. In this paper, we present an overview of the HuMaINs architecture with a focus on three main issues that include architecture design, inference algorithms including security/privacy challenges, and application areas/use cases.
Hybrid Recommender Systems: A Systematic Literature Review Recommender systems are software tools used to generate and provide suggestions for items and other entities to the users by exploiting various strategies. Hybrid recommender systems combine two or more recommendation strategies in different ways to benefit from their complementary advantages. This systematic literature review presents the state of the art in hybrid recommender systems of the last decade. It is the first quantitative review work completely focused in hybrid recommenders. We address the most relevant problems considered and present the associated data mining and recommendation techniques used to overcome them. We also explore the hybridization classes each hybrid recommender belongs to, the application domains, the evaluation process and proposed future research directions. Based on our findings, most of the studies combine collaborative filtering with another technique often in a weighted way. Also cold-start and data sparsity are the two traditional and top problems being addressed in 23 and 22 studies each, while movies and movie datasets are still widely used by most of the authors. As most of the studies are evaluated by comparisons with similar methods using accuracy metrics, providing more credible and user oriented evaluations remains a typical challenge. Besides this, newer challenges were also identified such as responding to the variation of user context, evolving user tastes or providing cross-domain recommendations. Being a hot topic, hybrid recommenders represent a good basis with which to respond accordingly by exploring newer opportunities such as contextualizing recommendations, involving parallel hybrid algorithms, processing larger datasets, etc.
HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, ‘short bytes’), HYPERLOGLOG performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about 1.04/sqrt(m). This improves on the best previously known cardinality estimator, LOGLOG, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond 10^9 with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.
Hypervariate Data Visualization Both scientists and normal users face enormous amounts of data, which might be useless if no insight is gained from it. To achieve this, visualization techniques can be used. Many datasets have a dimensionality higher than three. Such data is called ‘hypervariate’ and cannot be visualized directly in the three-dimensional space that we inhabit. Therefore, a wide variety of specialized techniques have been created for rendering hypervariate data. These techniques are based on very different principles and are designed for very different areas of application. This paper gives an overview of six representative techniques. For most techniques a rendering of a common dataset is provided to allow an easier comparison. Furthermore, an evaluation of the strengths and weaknesses of each technique is given. As an outlook, two papers dealing with quantitative analysis of visualization methods are presented.
Hypervariate Information Visualization In the last 20 years improvements in the computer sciences made it possible to store large data sets containing a plethora of different data attributes and data values, which could be applied in different application domains, for example, in the natural sciences, in law enforcements or in social studies. Due to this increasing data complexity in modern times, it is crucial to support the exploration of the hypervariate data with different visualization techniques. These facts are the fundament for this paper, which reveals how the information visualization can support the understanding of data with high dimensionality. Furthermore, it gives an overview and a comparison of the different categories of hypervariate information visualization, in order to analyse the advantages and the disadvantages of each category. We also addressed in the different interaction methods which help to create an understandable visualization and thus facilitate the user´s visual exploration. Interactive techniques are useful to create an understandable visualization of the relationships in a large data set. At the end, we also discussed the possibility of merging different interactions and visualization techniques.

I

I can see clearly now: reinterpreting statistical significance Null hypothesis significance testing remains popular despite decades of concern about misuse and misinterpretation. We believe that much of the problem is due to language: significance testing has little to do with other meanings of the word ‘significance’. Despite the limitations of null-hypothesis tests, we argue here that they remain useful in many contexts as a guide to whether a certain effect can be seen clearly in that context (e.g. whether we can clearly see that a correlation or between-group difference is positive or negative). We therefore suggest that researchers describe the conclusions of null-hypothesis tests in terms of statistical ‘clarity’ rather than statistical ‘significance’. This simple semantic change could substantially enhance clarity in statistical communication.
Idealised Bayesian Neural Networks Cannot Have Adversarial Examples: Theoretical and Empirical Study We prove that idealised discriminative Bayesian neural networks, capturing perfect epistemic uncertainty, cannot have adversarial examples: Techniques for crafting adversarial examples will necessarily fail to generate perturbed images which fool the classifier. This suggests why MC dropout-based techniques have been observed to be fairly robust to adversarial examples. We support our claims mathematically and empirically. We experiment with HMC on synthetic data derived from MNIST for which we know the ground truth image density, showing that near-perfect epistemic uncertainty correlates to density under image manifold, and that adversarial images lie off the manifold. Using our new-found insights we suggest a new attack for MC dropout-based models by looking for imperfections in uncertainty estimation, and also suggest a mitigation. Lastly, we demonstrate our mitigation on a cats-vs-dogs image classification task with a VGG13 variant.
ILNumerics: Numeric Computing for Industry Most enterprise software nowadays gets created by means of managed software frameworks. In the past they have often failed to deliver the speed required for professional data analysis and scientific computing. The ILNumerics Computing Engine offers a new approach for the integration of numerical algorithms into technical applications.
Image and Video Compression with Neural Networks: A Review In recent years, the image and video coding technologies have advanced by leaps and bounds. However, due to the popularization of image and video acquisition devices, the growth rate of image and video data is far beyond the improvement of the compression ratio. In particular, it has been widely recognized that there are increasing challenges of pursuing further coding performance improvement within the traditional hybrid coding framework. Deep convolution neural network (CNN) which makes the neural network resurge in recent years and has achieved great success in both artificial intelligent and signal processing fields, also provides a novel and promising solution for image and video compression. In this paper, we provide a systematic, comprehensive and up-to-date review of neural network based image and video compression techniques. The evolution and development of neural network based compression methodologies are introduced for images and video respectively. More specifically, the cutting-edge video coding techniques by leveraging deep learning and HEVC framework are presented and discussed, which promote the state-of-the-art video coding performance substantially. Moreover, the end-to-end image and video coding frameworks based on neural networks are also reviewed, revealing interesting explorations on next generation image and video coding frameworks/standards. The most significant research works on the image and video coding related topics using neural networks are highlighted, and future trends are also envisioned. In particular, the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision, which are the two dominant signal receptor in the age of artificial intelligence.
Image Captioning based on Deep Learning Methods: A Survey Image captioning is a challenging task and attracting more and more attention in the field of Artificial Intelligence, and which can be applied to efficient image retrieval, intelligent blind guidance and human-computer interaction, etc. In this paper, we present a survey on advances in image captioning based on Deep Learning methods, including Encoder-Decoder structure, improved methods in Encoder, improved methods in Decoder, and other improvements. Furthermore, we discussed future research directions.
Image Segmentation Algorithms Overview The technology of image segmentation is widely used in medical image processing, face recognition pedestrian detection, etc. The current image segmentation techniques include region-based segmentation, edge detection segmentation, segmentation based on clustering, segmentation based on weakly-supervised learning in CNN, etc. This paper analyzes and summarizes these algorithms of image segmentation, and compares the advantages and disadvantages of different algorithms. Finally, we make a prediction of the development trend of image segmentation with the combination of these algorithms.
Implementation of a Practical Distributed Calculation System with Browsers Deep learning can achieve outstanding results in various fields. However, it requires so significant computational power that graphics processing units (GPUs) and/or numerous computers are often required for the practical application. We have developed a new distributed calculation framework called ‘Sashimi’ that allows any computer to be used as a distribution node only by accessing a website. We have also developed a new JavaScript neural network framework called ‘Sukiyaki’ that uses general purpose GPUs with web browsers. Sukiyaki performs 30 times faster than a conventional JavaScript library for deep convolutional neural networks (deep CNNs) learning. The combination of Sashimi and Sukiyaki, as well as new distribution algorithms, demonstrates the distributed deep learning of deep CNNs only with web browsers on various devices. The libraries that comprise the proposed methods are available under MIT license at http://…/.
Importance of the Mathematical Foundations of Machine Learning Methods for Scientific and Engineering Applications There has been a lot of recent interest in adopting machine learning methods for scientific and engineering applications. This has in large part been inspired by recent successes and advances in the domains of Natural Language Processing (NLP) and Image Classification (IC). However, scientific and engineering problems have their own unique characteristics and requirements raising new challenges for effective design and deployment of machine learning approaches. There is a strong need for further mathematical developments on the foundations of machine learning methods to increase the level of rigor of employed methods and to ensure more reliable and interpretable results. Also as reported in the recent literature on state-of-the-art results and indicated by the No Free Lunch Theorems of statistical learning theory incorporating some form of inductive bias and domain knowledge is essential to success. Consequently, even for existing and widely used methods there is a strong need for further mathematical work to facilitate ways to incorporate prior scientific knowledge and related inductive biases into learning frameworks and algorithms. We briefly discuss these topics and discuss some ideas proceeding in this direction.
Improved Bayesian Information Criterion for Linear Regression While the Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC) are powerful tools for model selection in linear regression, they are built on different prior assumptions and thereby apply to different data generation scenarios. We show that their respective assumptions can be unified within an augmented model-plus-noise space and construct a prior in this space which inherits the beneficial properties of both AIC and BIC. The performance of our ‘Noncentral Information Criterion’ (NIC) matches or exceeds that of the AIC and BIC both for weak and strong signal cases.
Improving Deep Learning using Generic Data Augmentation Deep artificial neural networks require a large corpus of training data in order to effectively learn, where collection of such training data is often expensive and laborious. Data augmentation overcomes this issue by artificially inflating the training set with label preserving transformations. Recently there has been extensive use of generic data augmentation to improve Convolutional Neural Network (CNN) task performance. This study benchmarks various popular data augmentation schemes to allow researchers to make informed decisions as to which training methods are most appropriate for their data sets. Various geometric and photometric schemes are evaluated on a coarse-grained data set using a relatively simple CNN. Experimental results, run using 4-fold cross-validation and reported in terms of Top-1 and Top-5 accuracy, indicate that cropping in geometric augmentation significantly increases CNN task performance.
IMSL C Math Library Version 8.5.0 The IMSL C Math Library, a component of the IMSL C Numerical Library, is a library of C functions useful in scientific programming. Each function is designed and documented for use in research activities as well as by technical specialists. A number of the example programs also show graphs of resulting output.
IMSL C Stat Library – Version 8.5.0 The IMSL C Stat Library, a component of the IMSL C Numerical Library, is a library of C functions useful in scientific programming. Each function is designed and documented to be used in research activities as well as by technical specialists. A number of the example programs also show graphs of resulting output.
In a Nutshell: Sequential Parameter Optimization The performance of optimization algorithms relies crucially on their parameterizations. Finding good parameter settings is called algorithm tuning. Using a simple simulated annealing algorithm, we will demonstrate how optimization algorithms can be tuned using the sequential parameter optimization toolbox (SPOT). SPOT provides several tools for automated and interactive tuning. The underling concepts of the SPOT approach are explained. This includes key techniques such as exploratory fitness landscape analysis and response surface methodology. Many examples illustrate how SPOT can be used for understanding the performance of algorithms and gaining insight into algorithm’s behavior. Furthermore, we demonstrate how SPOT can be used as an optimizer and how a sophisticated ensemble approach is able to combine several meta models via stacking.
Independent Component Analysis: Algorithms and Applications A fundamental problem in neural network research, as well as in many other disciplines, is finding a suitable representation of multivariate data, i.e. random vectors. For reasons of computational and conceptual simplicity, the representation is often sought as a linear transformation of the original data. In other words, each component of the representation is a linear combination of the original variables. Well-known linear transformation methods include principal component analysis, factor analysis, and projection pursuit. Independent component analysis (ICA) is a recently developed method in which the goal is to find a linear representation of nongaussian data so that the components are statistically independent, or as independent as possible. Such a representation seems to capture the essential structure of the data in many applications, including feature extraction and signal separation. In this paper, we present the basic theory and applications of ICA, and our recent work on the subject.
Industrial Internet of Things (IIoT) Applications of Edge and Fog Computing: A Review and Future Directions With rapid technological advancements within the domain of Internet of Things (IoT), strong trends have emerged which indicate a rapid growth in the number of smart devices connected to IoT networks and this growth cannot be supported by traditional cloud computing platforms. In response to the increased capacity of data being transferred over networks, the edge and fog computing paradigms have emerged as extremely viable frameworks that shift computational and storage resources towards the edge of the network, thereby migrating processing power from centralized cloud servers to distributed LAN resources and powerful embedded devices within the network. These computing paradigms, therefore, have the potential to support massive IoT networks of the future and have also fueled the advancement of IoT systems within industrial settings, leading to the creation of the Industrial Internet of Things (IIoT) technology that is revolutionizing industrial processes in a variety of domains. In this paper, we elaborate on the impact of edge and fog computing paradigms on IIoT. We also highlight the how edge and fog computing are poised to bring about a turnaround in several industrial applications through a use-case approach. Finally, we conclude with the current issues and challenges faced by these paradigms in IIoT and suggest some research directions that should be followed to solve these problems and accelerate the adaptation of edge and fog computing in IIoT.
Inference Over Programs That Make Predictions This abstract extends on the previous work (arXiv:1407.2646, arXiv:1606.00075) on program induction using probabilistic programming. It describes possible further steps to extend that work, such that, ultimately, automatic probabilistic program synthesis can generalise over any reasonable set of inputs and outputs, in particular in regard to text, image and video data.
Inferential Methods to Assess the Difference The area under the curve (AUC) is the most common statistical approach to evaluate the discriminatory power of a set of factors in a binary regression model. A nested model framework is used to ascertain whether the AUC increases when new factors enter the model. Two statistical tests are proposed for the difference in the AUC parameters from these nested models. The asymptotic null distributions for the two test statistics are derived from the scenarios: (A) the difference in the AUC parameters is zero and the new factors are not associated with the binary outcome, (B) the difference in the AUC parameters is less than a strictly positive value. A confidence interval for the difference in AUC parameters is developed. Simulations are generated to determine the finite sample operating characteristics of the tests and a pancreatic cancer data example is used to illustrate this approach.
Inferring User Interests in Microblogging Social Networks: A Survey With the popularity of microblogging services such as Twitter in recent years, an increasing number of users use these services in their daily lives. The huge volume of information generated by users raises new opportunities in various applications and areas. Inferring user interests plays a significant role in providing personalized recommendations on microblogging services, and third-party applications providing social logins via these services, especially in cold-start situations. In this survey, we review user modeling strategies with respect to inferring user interests in previous studies. To this end, we focus on four dimensions of inferring user interest profiles: (1) data collection, (2) representation of user interest profiles, (3) construction and enhancement of user interest profiles, and (4) the evaluation of the constructed profiles. Through this survey, we aim to provide an overview of state-of-the-art user modeling strategies for inferring user interest profiles on microblogging social networks with respect to the four dimensions. For each dimension, we review and summarize previous studies based on specified criteria. Finally, we discuss some challenges and opportunities for future work in this research domain.
Information Limits of Aggregate Data This paper uses a small model in the Cowles Commission (CC) tradition to examine the limits of aggregate data. It argues that more can be learned about the macroeconomy following the CC approach than the reduced form and VAR approaches allow, but less than the DSGE approach tries to do.
Information Theory: A Tutorial Introduction Shannon’s mathematical theory of communication defines fundamental limits on how much information can be transmitted between the different components of any man-made or biological system. This paper is an informal but rigorous introduction to the main ideas implicit in Shannon’s theory. An annotated reading list is provided for further reading.
Information Visualization with Self-Organizing Maps The Self-Organizing Map (SOM) is an unsupervised neural network algorithm that projects high- dimensional data onto a two-dimensional map. The projection preserves the topology of the data so that similar data items will be mapped to nearby locations on the map. Despite the popular use of the algorithm for clustering and information visualisation, a system has been lacking that combines the fast execution of the algorithm with powerful visualisation of the maps and effective tools for their interactive analysis. Powerful methods for interactive exploration and search from collections of free-form textual documents are needed to manage the ever-increasing flood of digital information. In this article we present a method, SOM, for automatic organization of full-text document collections using the self-organizing map (SOM) algorithm. The document collection is ordered onto a map in an unsupervised manner utilizing statistical information of short word contexts. The resulting ordered map where similar documents lie near each other thus presents a general view of the document space. With the aid of a suitable (SVG) interface, documents in interesting areas of the map can be browsed.
Informing Artificial Intelligence Generative Techniques using Cognitive Theories of Human Creativity The common view that our creativity is what makes us uniquely human suggests that incorporating research on human creativity into generative deep learning techniques might be a fruitful avenue for making their outputs more compelling and human-like. Using an original synthesis of Deep Dream-based convolutional neural networks and cognitive based computational art rendering systems, we show how honing theory, intrinsic motivation, and the notion of a ‘seed incident’ can be implemented computationally, and demonstrate their impact on the resulting generative art. Conversely, we discuss how explorations in deep learn-ing convolutional neural net generative systems can inform our understanding of human creativity. We conclude with ideas for further cross-fertilization between AI based computational creativity and psychology of creativity.
Infovis and Statistical Graphics: Different Goals, Different Looks The importance of graphical displays in statistical practice has been recognized sporadically in the statistical literature over the past century, with wider awareness following Tukey´s Exploratory Data Analysis (1977) and Tufte´s books in the succeeding decades. But statistical graphics still occupies an awkward in-between position: Within statistics, exploratory and graphical methods represent a minor subfield and are not wellintegrated with larger themes of modeling and inference. Outside of statistics, infographics (also called information visualization or Infovis) is huge, but their purveyors and enthusiasts appear largely to be uninterested in statistical principles. We present here a set of goals for graphical displays discussed primarily from the statistical point of view and discuss some inherent contradictions in these goals that may be impeding communication between the fields of statistics and Infovis. One of our constructive suggestions, to Infovis practitioners and statisticians alike, is to try not to cram into a single graph what can be better displayed in two or more. We recognize that we offer only one perspective and intend this article to be a starting point for a wide-ranging discussion among graphics designers, statisticians, and users of statistical methods. The purpose of this article is not to criticize but to explore the different goals that lead researchers in different fields to value different aspects of data visualization.
Infrastructure for Usable Machine Learning: The Stanford DAWN Project Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations. This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end-to-end machine learning application development, from data preparation and labeling to productionization and monitoring. In this document, we outline opportunities for infrastructure supporting usable, end-to-end machine learning applications in the context of the nascent DAWN (Data Analytics for What’s Next) project at Stanford.
Innateness, AlphaZero, and Artificial Intelligence The concept of innateness is rarely discussed in the context of artificial intelligence. When it is discussed, or hinted at, it is often the context of trying to reduce the amount of innate machinery in a given system. In this paper, I consider as a test case a recent series of papers by Silver et al (Silver et al., 2017a) on AlphaGo and its successors that have been presented as an argument that a ‘even in the most challenging of domains: it is possible to train to superhuman level, without human examples or guidance’, ‘starting tabula rasa.’ I argue that these claims are overstated, for multiple reasons. I close by arguing that artificial intelligence needs greater attention to innateness, and I point to some proposals about what that innateness might look like.
InsideBIGDATA Guide to In-Memory Computing In-memory computing (IMC) is an emerging field of importance in the big data industry. It is a quickly evolving technology, seen by many as an effective way to address the proverbial 3 V´s of big data – volume, velocity, and variety. Big data requires ever more powerful means to process and analyze growing stores of data, being collected at more rapid rates, and with increasing diversity in the types of data being sought – both structured and unstructured. In-memory computing´s rapid rise in the marketplace has the big data community on alert. In fact, Gartner picked in-memory computing as one of the Top Ten Strategic Initiatives.
InsideBIGDATA Guide to Predictive Analytics Predictive analytics, sometimes called advanced analytics, is a term used to describe a range of analytical and statistical techniques to predict future actions or behaviors. In business, predictive analytics are used to make proactive decisions and determine actions, by using statistical models to discover patterns in historical and transactional data to uncover likely risks and opportunities. Predictive analytics incorporates a range of activities which we will explore in this paper, including data access, exploratory data analysis and visualization, developing assumptions and data models, applying predictive models, then estimating and/or predicting future outcomes.
Installing R and Optional RStudio R is quickly becoming the statistical software of choice for researchers and analysts in a variety of disciplines. In recent years, it has surpassed many commonly used statistical programs in both number of users and availability of statistical methods. A fundamental difference between R and other statistical software packages is that R is open-source, meaning it is both free for download and the source code is available under the GNU General Project License. Anyone can contribute new techniques or analytical methods, which has been a primary factor enabling the growth of R. These contributions are called `packages’. Currently, almost 6000 packages are available for R.
Instance-Level Explanations for Fraud Detection: A Case Study Fraud detection is a difficult problem that can benefit from predictive modeling. However, the verification of a prediction is challenging; for a single insurance policy, the model only provides a prediction score. We present a case study where we reflect on different instance-level model explanation techniques to aid a fraud detection team in their work. To this end, we designed two novel dashboards combining various state-of-the-art explanation techniques. These enable the domain expert to analyze and understand predictions, dramatically speeding up the process of filtering potential fraud cases. Finally, we discuss the lessons learned and outline open research issues.
Integrated Analytics – Platforms and Principles for Centralizing Your Data Companies are collecting more data than ever. But, given how difficult it is to unify the many internal and external data streams they´ve built, more data doesn´t necessarily translate into better analytics. The real challenge is to provide deep and broad access to ‘a single source of truth’ in their data that the typically slow ETL process for data warehousing cannot achieve. More than just fast access, analysts need the ability to explore data at a granular level. In this O´Reilly report, author Courtney Webster presents a roadmap to data centralization that will help your organization make data accessible, flexible, and actionable. Building a genuine data-driven culture depends on your company´s ability to quickly act upon new findings. This report explains how.
Intelligence Quotient and Intelligence Grade of Artificial Intelligence Although artificial intelligence is currently one of the most interesting areas in scientific research, the potential threats posed by emerging AI systems remain a source of persistent controversy. To address the issue of AI threat, this study proposes a standard intelligence model that unifies AI and human characteristics in terms of four aspects of knowledge, i.e., input, output, mastery, and creation. Using this model, we observe three challenges, namely, expanding of the von Neumann architecture; testing and ranking the intelligence quotient of naturally and artificially intelligent systems, including humans, Google, Bing, Baidu, and Siri; and finally, the dividing of artificially intelligent systems into seven grades from robots to Google Brain. Based on this, we conclude that AlphaGo belongs to the third grade.
Intelligent Choice of the Number of Clusters in K -Means Clustering: An Experimental Study with Different Cluster Spreads The issue of determining ‘the right number of clusters’ in K-Means has attracted considerable interest, especially in the recent years. Cluster intermix appears to be a factor most affecting the clustering results. This paper proposes an experimental setting for comparison of different approaches at data generated from Gaussian clusters with the controlled parameters of between- and within-cluster spread to model cluster intermix. The setting allows for evaluating the centroid recovery on par with conventional evaluation of the cluster recovery. The subjects of our interest are two versions of the ‘intelligent’ K-Means method, ik-Means, that find the ‘right’ number of clusters by extracting ‘anomalous patterns’ from the data one-by-one. We compare them with seven other methods, including Hartigan’s rule, averaged Silhouette width and Gap statistic, under different between- and within-cluster spread-shape conditions. There are several consistent patterns in the results of our experiments, such as that the right K is reproduced best by Hartigan’s rule – but not clusters or their centroids. This leads us to propose an adjusted version of iK-Means, which performs well in the current experiment setting.
Intelligent Data Analysis (Slide Deck)
Interaction Mining: The New Frontier of Call Center Analytics In this paper, we present our solution for pragmatic analysis of call center conversations in order to provide useful insights for enhancing Call Center Analytics to a level that will enable new metrics and key performance indicators (KPIs) beyond the standard approach. These metrics rely on understanding the dynamics of conversations by highlighting the way participants discuss about topics. By doing that we can detect situations that are simply impossible to detect with standard approaches such as controversial topics, customer-oriented behaviors and also predict customer ratings.
Interactive Text Ranking with Bayesian Optimisation: A Case Study on Community QA and Summarisation For many NLP applications, such as question answering and summarisation, the goal is to select the best solution from a large space of candidates to meet a particular user’s needs. To address the lack of user-specific training data, we propose an interactive text ranking approach that actively selects pairs of candidates, from which the user selects the best. Unlike previous strategies, which attempt to learn a ranking across the whole candidate space, our method employs Bayesian optimisation to focus the user’s labelling effort on high quality candidates and integrates prior knowledge in a Bayesian manner to cope better with small data scenarios. We apply our method to community question answering (cQA) and extractive summarisation, finding that it significantly outperforms existing interactive approaches. We also show that the ranking function learned by our method is an effective reward function for reinforcement learning, which improves the state of the art for interactive summarisation.
Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach Visual exploration of high-dimensional real-valued datasets is a fundamental task in exploratory data analysis (EDA). Existing methods use predefined criteria to choose the representation of data. There is a lack of methods that (i) elicit from the user what she has learned from the data and (ii) show patterns that she does not know yet. We construct a theoretical model where identified patterns can be input as knowledge to the system. The knowledge syntax here is intuitive, such as ‘this set of points forms a cluster’, and requires no knowledge of maths. This background knowledge is used to find a Maximum Entropy distribution of the data, after which the system provides the user data projections in which the data and the Maximum Entropy distribution differ the most, hence showing the user aspects of the data that are maximally informative given the user’s current knowledge. We provide an open source EDA system with tailored interactive visualizations to demonstrate these concepts. We study the performance of the system and present use cases on both synthetic and real data. We find that the model and the prototype system allow the user to learn information efficiently from various data sources and the system works sufficiently fast in practice. We conclude that the information theoretic approach to exploratory data analysis where patterns observed by a user are formalized as constraints provides a principled, intuitive, and efficient basis for constructing an EDA system.
Interactive Web Apps with shiny (Cheat Sheet)
Intercomparison of Machine Learning Methods for Statistical Downscaling: The Case of Daily and Extreme Precipitation Statistical downscaling of global climate models (GCMs) allows researchers to study local climate change effects decades into the future. A wide range of statistical models have been applied to downscaling GCMs but recent advances in machine learning have not been explored. In this paper, we compare four fundamental statistical methods, Bias Correction Spatial Disaggregation (BCSD), Ordinary Least Squares, Elastic-Net, and Support Vector Machine, with three more advanced machine learning methods, Multi-task Sparse Structure Learning (MSSL), BCSD coupled with MSSL, and Convolutional Neural Networks to downscale daily precipitation in the Northeast United States. Metrics to evaluate of each method’s ability to capture daily anomalies, large scale climate shifts, and extremes are analyzed. We find that linear methods, led by BCSD, consistently outperform non-linear approaches. The direct application of state-of-the-art machine learning methods to statistical downscaling does not provide improvements over simpler, longstanding approaches.
Internet of NanoThings: Concepts and Applications This chapter focuses on Internet of Things from the nanoscale point of view. The chapter starts with section 1 which provides an introduction of nanothings and nanotechnologies. The nanoscale communication paradigms and the different approaches are discussed for nanodevices development. Nanodevice characteristics are discussed and the architecture of wireless nanodevices are outlined. Section 2 describes Internet of NanoThing(IoNT), its network architecture, and the challenges of nanoscale communication which is essential for enabling IoNT. Section 3 gives some practical applications of IoNT. The internet of Bio-NanoThing (IoBNT) and relevant biomedical applications are discussed. Other Applications such as military, industrial, and environmental applications are also outlined.
Internet of Things: An Overview As technology proceeds and the number of smart devices continues to grow substantially, need for ubiquitous context-aware platforms that support interconnected, heterogeneous, and distributed network of devices has given rise to what is referred today as Internet-of-Things. However, paving the path for achieving aforementioned objectives and making the IoT paradigm more tangible requires integration and convergence of different knowledge and research domains, covering aspects from identification and communication to resource discovery and service integration. Through this chapter, we aim to highlight researches in topics including proposed architectures, security and privacy, network communication means and protocols, and eventually conclude by providing future directions and open challenges facing the IoT development.
Interpretable Convolutional Neural Networks This paper proposes a method to modify traditional convolutional neural networks (CNNs) into interpretable CNNs, in order to clarify knowledge representations in high conv-layers of CNNs. In an interpretable CNN, each filter in a high conv-layer represents a certain object part. We do not need any annotations of object parts or textures to supervise the learning process. Instead, the interpretable CNN automatically assigns each filter in a high conv-layer with an object part during the learning process. Our method can be applied to different types of CNNs with different structures. The clear knowledge representation in an interpretable CNN can help people understand the logics inside a CNN, i.e., based on which patterns the CNN makes the decision. Experiments showed that filters in an interpretable CNN were more semantically meaningful than those in traditional CNNs.
Interpretation of Neural Networks is Fragile In order for machine learning to be deployed and trusted in many applications, it is crucial to be able to reliably explain why the machine learning algorithm makes certain predictions. For example, if an algorithm classifies a given pathology image to be a malignant tumor, then the doctor may need to know which parts of the image led the algorithm to this classification. How to interpret black-box predictors is thus an important and active area of research. A fundamental question is: how much can we trust the interpretation itself In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different interpretations. We systematically characterize the fragility of several widely-used feature-importance interpretation methods (saliency maps, relevance propagation, and DeepLIFT) on ImageNet and CIFAR-10. Our experiments show that even small random perturbation can change the feature importance and new systematic perturbations can lead to dramatically different interpretations without changing the label. We extend these results to show that interpretations based on exemplars (e.g. influence functions) are similarly fragile. Our analysis of the geometry of the Hessian matrix gives insight on why fragility could be a fundamental challenge to the current interpretation approaches.
Interpreting Blackbox Models via Model Extraction Interpretability has become an important issue as machine learning is increasingly used to inform consequential decisions. We propose an approach for interpreting a blackbox model by extracting a decision tree that approximates the model. Our model extraction algorithm avoids overfitting by leveraging blackbox model access to actively sample new training points. We prove that as the number of samples goes to infinity, the decision tree learned using our algorithm converges to the exact greedy decision tree. In our evaluation, we use our algorithm to interpret random forests and neural nets trained on several datasets from the UCI Machine Learning Repository, as well as control policies learned for three classical reinforcement learning problems. We show that our algorithm improves over a baseline based on CART on every problem instance. Furthermore, we show how an interpretation generated by our approach can be used to understand and debug these models.
Interpreting Deep Learning: The Machine Learning Rorschach Test Theoretical understanding of deep learning is one of the most important tasks facing the statistics and machine learning communities. While deep neural networks (DNNs) originated as engineering methods and models of biological networks in neuroscience and psychology, they have quickly become a centerpiece of the machine learning toolbox. Unfortunately, DNN adoption powered by recent successes combined with the open-source nature of the machine learning community, has outpaced our theoretical understanding. We cannot reliably identify when and why DNNs will make mistakes. In some applications like text translation these mistakes may be comical and provide for fun fodder in research talks, a single error can be very costly in tasks like medical imaging. As we utilize DNNs in increasingly sensitive applications, a better understanding of their properties is thus imperative. Recent advances in DNN theory are numerous and include many different sources of intuition, such as learning theory, sparse signal analysis, physics, chemistry, and psychology. An interesting pattern begins to emerge in the breadth of possible interpretations. The seemingly limitless approaches are mostly constrained by the lens with which the mathematical operations are viewed. Ultimately, the interpretation of DNNs appears to mimic a type of Rorschach test — a psychological test wherein subjects interpret a series of seemingly ambiguous ink-blots. Validation for DNN theory requires a convergence of the literature. We must distinguish between universal results that are invariant to the analysis perspective and those that are specific to a particular network configuration. Simultaneously we must deal with the fact that many standard statistical tools for quantifying generalization or empirically assessing important network features are difficult to apply to DNNs.
Interpreting the Ising Model: The Input Matters The Ising model is a widely used model for multivariate binary data. It has been first introduced as a theoretical model for the alignment between positive (+1) and negative (-1) atom spins, but is now estimated from data in many applications. A popular way to estimate the Ising model is the pseudo-likelihood approach which reduces estimation to a sequence of logistic regression problems. However, the latter is defined on the domain $\{0,1\}$. In this paper we investigate the subtleties of using $\{0,1\}$ instead of $\{-1, 1\}$ as the domain for the Ising model. We show that the two domains are statistically equivalent, but imply different interpretations of both threshold and interaction parameters and discuss in which situation which domain is more appropriate. Next, we show that the qualitative behavior of the dynamic Ising model depends on the choice of domains. Finally, we present a transformation that allows to obtain the parameters in one domain from the parameters in the other domain.
Introducing Connection Analytics Connection Analytics provides a new way of looking at people, products, physical phenomena, or events. It provides insights by dissecting the types of relationships between entities to determine causation and can be used for generating predictive intelligence based on the patterns of interactions. Connection Analytics can address queries such as identifying influencers, the groups that they influence, and where promotions or other forms of marketing are best directed. It can be utilized for product affinity analysis by taking a bottom up look at how the decisions to buy different items are linked. Likewise, this approach can help analyze networks by patterns of activity, and fraud and money laundering through the actions (rather than identities) of involved actors. It can help segment customers based on behavior patterns like past purchase behavior or reviews vs. traditional segmentation techniques like income and demographics. Graph analytics is one of the most promising approaches to performing Connection Analytics. Teradata is the first analytics data platform provider to make graph computing accessible to the existing base of data scientists, database developers and business analysts by introducing a SQL-friendly approach. Underneath the hood, Teradata Aster is using a compute approach that allows the data to leverage the power and performance of massively parallel analytic processing engines and pre-built algorithms.
Introducing R R is a powerful environment for statistical computing which runs on several platforms. These notes are written especially for users running the Windows version, but most of the material applies to the Mac and Linux versions as well.
Introduction to Boosted Trees (Slide Deck)
Introduction to Convolutional Neural Networks This is a note that describes how a Convolutional Neural Network (CNN) operates from a mathematical perspective. This note is self-contained, and the focus is to make it comprehensible to beginners in the CNN eld. The Convolutional Neural Network (CNN) has shown excellent performance in many computer vision and machine learning problems. Many solid papers have been published on this topic, and quite some high quality open source CNN software packages have been made available. There are also well-written CNN tutorials or CNN software manuals. However, I believe that an introductory CNN material speci cally prepared for beginners is still needed. Research papers are usually very terse and lack details. It might be difficult for beginners to read such papers. A tutorial targeting experienced researchers may not cover all the necessary details to understand how a CNN runs.
Introduction to Machine Learning and Soft Computing • Introduction • Single-layer Neural Networks • Linear Classification • Linear Regression • Kernel • Multi-layer Neural Networks • Nonlinear Classification • Nonlinear Regression • Model Selection • GA-based Frameworks • PSO-based Frameworks • Conclusion • Epilogue
Introduction to Markov Random Fields This book sets out to demonstrate the power of the Markov random field (MRF) in vision. It treats the MRF both as a tool for modeling image data and, coupled with a set of recently developed algorithms, as a means of making inferences about images. The inferences concern underlying image and scene structure to solve problems such as image reconstruction, image segmentation, 3D vision, and object labeling. This chapter is designed to present some of the main concepts used in MRFs, both as a taster and as a gateway to the more detailed chapters that follow, as well as a stand-alone introduction to MRFs.
Introduction to Multi-Armed Bandits Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous body of work has accumulated over the years, covered in several books and surveys. This book provides a more introductory, textbook-like treatment of the subject. Each chapter tackles a particular line of work, providing a self-contained, teachable technical introduction and a review of the more advanced results. The chapters are as follows: Stochastic bandits; Lower bounds; Bayesian Bandits and Thompson Sampling; Lipschitz Bandits; Full Feedback and Adversarial Costs; Adversarial Bandits; Linear Costs and Semi-bandits; Contextual Bandits; Bandits and Zero-Sum Games; Bandits with Knapsacks; Incentivized Exploration and Connections to Mechanism Design. Status of the manuscript: essentially complete (modulo some polishing), except for last chapter, which the author plans to add over the next few months.
Introduction to Network Theory (and Graph Theory) (Slide Deck)
Introduction to Neural Networks In this lab we are going to have a look at some very basic neural networks on a new data set which relates various covariates about cheese samples to a taste response.
Introduction to Nonnegative Matrix Factorization In this paper, we introduce and provide a short overview of nonnegative matrix factorization (NMF). Several aspects of NMF are discussed, namely, the application in hyperspectral imaging, geometry and uniqueness of NMF solutions, complexity, algorithms, and its link with extended formulations of polyhedra. In order to put NMF into perspective, the more general problem class of constrained low-rank matrix approximation problems is first briefly introduced.
Introduction to Probabilistic Topic Models Probabilistic topic models are a suite of algorithms whose aim is to discover the hidden thematic structure in large archives of documents. In this article, we review the main ideas of this eld, survey the current state-of-the-art, and describe some promising future directions. We rst describe latent Dirichlet allocation (LDA) [8], which is the simplest kind of topic model. We discuss its connections to probabilistic modeling, and describe two kinds of algorithms for topic discovery. We then survey the growing body of research that extends and applies topic models in interesting ways. These extensions have been developed by relaxing some of the statistical assumptions of LDA, incorporating meta-data into the analysis of the documents, and using similar kinds of models on a diversity of data types such as social networks, images and genetics. Finally, we give our thoughts as to some of the important unexplored directions for topic modeling. These include rigorous methods for checking models built for data exploration, new approaches to visualizing text and other high dimensional data, and moving beyond traditional information engineering applications towards using topic models for more scienti c ends.
Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly under standardized conditions. In this paper we introduce stream, a research tool that includes modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. In addition to data handling, plotting and easy scripting capabilities, R also provides many existing algorithms and enables users to interface code written in many programming languages popular among data mining researchers (e.g., C/C++, Java and Python). In this paper we describe the architecture of stream and focus on its use for data stream clustering research. stream was implemented with extensibility in mind and will be extended in the future to cover additional data stream mining tasks like classification and frequent pattern mining.
Introduction to Tensor Decompositions and their Applications in Machine Learning Tensors are multidimensional arrays of numerical values and therefore generalize matrices to multiple dimensions. While tensors first emerged in the psychometrics community in the $20^{\text{th}}$ century, they have since then spread to numerous other disciplines, including machine learning. Tensors and their decompositions are especially beneficial in unsupervised learning settings, but are gaining popularity in other sub-disciplines like temporal and multi-relational data analysis, too. The scope of this paper is to give a broad overview of tensors, their decompositions, and how they are used in machine learning. As part of this, we are going to introduce basic tensor concepts, discuss why tensors can be considered more rigid than matrices with respect to the uniqueness of their decomposition, explain the most important factorization algorithms and their properties, provide concrete examples of tensor decomposition applications in machine learning, conduct a case study on tensor-based estimation of mixture models, talk about the current state of research, and provide references to available software libraries.
Introduction to YARN Apache Hadoop 2.0 includes YARN, which separates the resource management and processing components. The YARN-based architecture is not constrained to MapReduce. This article describes YARN and its advantages over the previous distributed processing layer in Hadoop. Learn how to enhance your clusters with YARN’s scalability, efficiency, and flexibility.
Introduction: Credibility, Models, and Parameters The goal of this chapter is to introduce the conceptual framework of Bayesian data analysis. Bayesian data analysis has two foundational ideas. The first idea is that Bayesian inference is reallocation of credibility across possibilities. The second foundational idea is that the possibilities, over which we allocate credibility, are parameter values in meaningful mathematical models. These two fundamental ideas form the conceptual foundation for every analysis in this book. Simple examples of these ideas are presented in this chapter. The rest of the book merely fills in the mathematical and computational details for specific applications of these two ideas. This chapter also explains the basic procedural steps shared by every Bayesian analysis.
IoT Technologies for Augmented Human: a Survey Internet of Things (IoT) technology has delivered new enablers for improving human abilities. These enablers promise an enhanced quality of life and professional efficiency; however, the synthesis of IoT and human augmentation technologies has also extended IoT-related challenges far beyond the current scope. These potential challenges associated with IoT-empowered Augmented Human (AH) have so far not been well-investigated. Thus, this article attempts to introduce readers to AH concept as well as summarize notable research challenges raised by such systems, in order to facilitate reader’s further interest in this topic. The article considers emerging IoT applications for human augmentation, devices and design principles, connectivity demands, and security aspects.
Is Epicurus the father of Reinforcement Learning The Epicurean Philosophy is commonly thought as simplistic and hedonistic. Here I discuss how this is a misconception and explore its link to Reinforcement Learning. Based on the letters of Epicurus, I construct an objective function for hedonism which turns out to be equivalent of the Reinforcement Learning objective function when omitting the discount factor. I then discuss how Plato and Aristotle ‘s views that can be also loosely linked to Reinforcement Learning, as well as their weaknesses in relationship to it. Finally, I emphasise the close affinity of the Epicurean views and the Bellman equation.
Is ‘Unsupervised Learning’ a Misconceived Term? Is all of machine learning supervised to some degree? The field of machine learning has traditionally been categorized pedagogically into $supervised~vs~unsupervised~learning$; where supervised learning has typically referred to learning from labeled data, while unsupervised learning has typically referred to learning from unlabeled data. In this paper, we assert that all machine learning is in fact supervised to some degree, and that the scope of supervision is necessarily commensurate to the scope of learning potential. In particular, we argue that clustering algorithms such as k-means, and dimensionality reduction algorithms such as principal component analysis, variational autoencoders, and deep belief networks are each internally supervised by the data itself to learn their respective representations of its features. Furthermore, these algorithms are not capable of external inference until their respective outputs (clusters, principal components, or representation codes) have been identified and externally labeled in effect. As such, they do not suffice as examples of unsupervised learning. We propose that the categorization `supervised vs unsupervised learning’ be dispensed with, and instead, learning algorithms be categorized as either $internally~or~externally~supervised$ (or both). We believe this change in perspective will yield new fundamental insights into the structure and character of data and of learning algorithms.
It Takes Two to Tango: Towards Theory of AI’s Mind Theory of Mind is the ability to attribute mental states (beliefs, intents, knowledge, perspectives, etc.) to others and recognize that these mental states may differ from one’s own. Theory of Mind is critical to effective communication and to teams demonstrating higher collective performance. To effectively leverage the progress in Artificial Intelligence (AI) to make our lives more productive, it is important for humans and AI to work well together in a team. Traditionally, there has been much emphasis on research to make AI more accurate, and (to a lesser extent) on having it better understand human intentions, tendencies, beliefs, and contexts. The latter involves making AI more human-like and having it develop a theory of our minds. In this work, we argue that for human-AI teams to be effective, humans must also develop a theory of AI’s mind – get to know its strengths, weaknesses, beliefs, and quirks. We instantiate these ideas within the domain of Visual Question Answering (VQA). We find that using just a few examples(50), lay people can be trained to better predict responses and oncoming failures of a complex VQA model. Surprisingly, we find that having access to the model’s internal states – its confidence in its top-k predictions, explicit or implicit attention maps which highlight regions in the image (and words in the question) the model is looking at (and listening to) while answering a question about an image – do not help people better predict its behavior

J

JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling JAGS is a program for Bayesian Graphical modelling which aims for compatibility with classic BUGS. The program could eventually be developed as an R package. This article explains the motivations for this program, briefly describes the architecture and then discusses some ideas for a vectorized form of the BUGS language.
Judge the Judges: A Large-Scale Evaluation Study of Neural Language Models for Online Review Generation Recent advances in deep learning have resulted in a resurgence in the popularity of natural language generation (NLG). Many deep learning based models, including recurrent neural networks and generative adversarial networks, have been proposed and applied to generating various types of text. Despite the fast development of methods, how to better evaluate the quality of these natural language generators remains a significant challenge. We conduct an in-depth empirical study to evaluate the existing evaluation methods for natural language generation. We compare human-based evaluators with a variety of automated evaluation procedures, including discriminative evaluators that measure how well the generated text can be distinguished from human-written text, as well as text overlap metrics that measure how similar the generated text is to human-written references. We measure to what extent these different evaluators agree on the ranking of a dozen of state-of-the-art generators for online product reviews. We find that human evaluators do not correlate well with discriminative evaluators, leaving a bigger question of whether adversarial accuracy is the correct objective for natural language generation. In general, distinguishing machine-generated text is a challenging task even for human evaluators, and their decisions tend to correlate better with text overlap metrics. We also find that diversity is an intriguing metric that is indicative of the assessments of different evaluators.
Julia for R Programmers (Slide Deck)

K

Kernel clustering: Breiman’s bias and solutions Clustering is widely used in data analysis where kernel methods are particularly popular due to their generality and discriminating power. However, kernel clustering has a practically significant bias to small dense clusters, e.g. empirically observed in (Shi and Malik, TPAMI’00). Its causes have never been analyzed and understood theoretically, even though many attempts were made to improve the results. We provide conditions and formally prove this bias in kernel clustering. Moreover, we show a general class of locally adaptive kernels directly addressing these conditions. Previously, (Breiman, ML’96) proved a bias to histogram mode isolation in discrete Gini criterion for decision tree learning. We found that kernel clustering reduces to a continuous generalization of Gini criterion for a common class of kernels where we prove a bias to density mode isolation and call it Breiman’s bias. These theoretical findings suggest that a principal solution for the bias should directly address data density inhomogeneity. In particular, our density law shows how density equalization can be done implicitly using certain locally adaptive geodesic kernels. Interestingly, a popular heuristic kernel in (Zelnik-Manor and Perona, NIPS’04) approximates a special case of our Riemannian kernel framework. Our general ideas are relevant to any algorithms for kernel clustering. We show many synthetic and real data experiments illustrating Breiman’s bias and its solution. We anticipate that theoretical understanding of kernel clustering limitations and their principled solutions will be important for a broad spectrum of data analysis applications in diverse disciplines.
Kernel Density Estimation with Ripley’s Circumferential Correction In this paper, we investigate (and extend) Ripley’s circumference method to correct bias of density estimation of edges (or frontiers) of regions. The idea of the method was theoretical and diffcult to implement. We provide a simple technique based of properties of Gaussian kernels to effciently compute weights to correct border bias on frontiers of the region of interest, with an automatic selection of an optimal radius for the method. We illustrate the use of that technique to visualize hot spots of car accidents and campsite locations, as well as location of bike thefts.
Kernel Mean Embedding of Distributions: A Review and Beyonds A Hilbert space embedding of distributions—in short, kernel mean embedding—has recently emerged as a powerful machinery for probabilistic modeling, statistical inference, machine learning, and causal discovery. The basic idea behind this framework is to map distributions into a reproducing kernel Hilbert space (RKHS) in which the whole arsenal of kernel methods can be extended to probability measures. It gave rise to a great deal of research and novel applications of positive definite kernels. The goal of this survey is to give a comprehensive review of existing works and recent advances in this research area, and to discuss some of the most challenging issues and open problems that could potentially lead to new research directions. The survey begins with a brief introduction to the RKHS and positive definite kernels which forms the backbone of this survey, followed by a thorough discussion of the Hilbert space embedding of marginal distributions, theoretical guarantees, and review of its applications. The embedding of distributions enables us to apply RKHS methods to probability measures which prompts a wide range of applications such as kernel two-sample testing, independent testing, group anomaly detection, and learning on distributional data. Next, we discuss the Hilbert space embedding for conditional distributions, give theoretical insights, and review some applications. The conditional mean embedding enables us to perform sum, product, and Bayes’ rules—which are ubiquitous in graphical model, probabilistic inference, and reinforcement learning—in a non-parametric way using the new representation of distributions in RKHS. We then discuss relationships between this framework and other related areas. Lastly, we give some suggestions on future research directions.
Keynote: Small Neural Nets Are Beautiful: Enabling Embedded Systems with Small Deep-Neural-Network Architectures Over the last five years Deep Neural Nets have offered more accurate solutions to many problems in speech recognition, and computer vision, and these solutions have surpassed a threshold of acceptability for many applications. As a result, Deep Neural Networks have supplanted other approaches to solving problems in these areas, and enabled many new applications. While the design of Deep Neural Nets is still something of an art form, in our work we have found basic principles of design space exploration used to develop embedded microprocessor architectures to be highly applicable to the design of Deep Neural Net architectures. In particular, we have used these design principles to create a novel Deep Neural Net called SqueezeNet that requires as little as 480KB of storage for its model parameters. We have further integrated all these experiences to develop something of a playbook for creating small Deep Neural Nets for embedded systems.
Know-Evolve: Deep Reasoning in Temporal Knowledge Graphs Knowledge Graphs are important tools to model multi-relational data that serves as information pool for various applications. Traditionally, these graphs are considered to be static in nature. However, recent availability of large scale event-based interaction data has given rise to dynamically evolving knowledge graphs that contain temporal information for each edge. Reasoning over time in such graphs is not yet well understood. In this paper, we present a novel deep evolutionary knowledge network architecture to learn entity embeddings that can dynamically and non-linearly evolve over time. We further propose a multivariate point process framework to model the occurrence of a fact (edge) in continuous time. To facilitate temporal reasoning, the learned embeddings are used to compute relationship score that further parametrizes intensity function of the point process. We demonstrate improved performance over various existing relational learning models on two large scale real-world datasets. Further, our method effectively predicts occurrence or recurrence time of a fact which is novel compared to any prior reasoning approaches in multi-relational setting.
Knowledge Fusion via Embeddings from Text, Knowledge Graphs, and Images We present a baseline approach for cross-modal knowledge fusion. Different basic fusion methods are evaluated on existing embedding approaches to show the potential of joining knowledge about certain concepts across modalities in a fused concept representation.
Knowledge Representation Learning: A Quantitative Review Knowledge representation learning (KRL) aims to represent entities and relations in knowledge graph in low-dimensional semantic space, which have been widely used in massive knowledge-driven tasks. In this article, we introduce the reader to the motivations for KRL, and overview existing approaches for KRL. Afterwards, we extensively conduct and quantitative comparison and analysis of several typical KRL methods on three evaluation tasks of knowledge acquisition including knowledge graph completion, triple classification, and relation extraction. We also review the real-world applications of KRL, such as language modeling, question answering, information retrieval, and recommender systems. Finally, we discuss the remaining challenges and outlook the future directions for KRL. The codes and datasets used in the experiments can be found in https://…/OpenKE.
Knowledge Transfer Between Artificial Intelligence Systems We consider the fundamental question: how a legacy ‘student’ Artificial Intelligent (AI) system could learn from a legacy ‘teacher’ AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here ‘learning’ is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the ‘student’ Artificial Intelligent system have the structure of an $n$-dimensional topological vector space and $n$ is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for $n$ sufficiently large, with probability close to one, the ‘student’ system can successfully and non-iteratively learn $k\ll n$ new examples from the ‘teacher’ (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features.
Knowledge Transfer for Out-of-Knowledge-Base Entities: A Graph Neural Network Approach Knowledge base completion (KBC) aims to predict missing information in a knowledge base.In this paper, we address the out-of-knowledge-base (OOKB) entity problem in KBC:how to answer queries concerning test entities not observed at training time. Existing embedding-based KBC models assume that all test entities are available at training time, making it unclear how to obtain embeddings for new entities without costly retraining. To solve the OOKB entity problem without retraining, we use graph neural networks (Graph-NNs) to compute the embeddings of OOKB entities, exploiting the limited auxiliary knowledge provided at test time.The experimental results show the effectiveness of our proposed model in the OOKB setting.Additionally, in the standard KBC setting in which OOKB entities are not involved, our model achieves state-of-the-art performance on the WordNet dataset. The code and dataset are available at https://…/GNN-for-OOKB. This paper has been accepted by IJCAI17.
Knowledge-Driven Wireless Networks with Artificial Intelligence: Design, Challenges and Opportunities This paper discusses technology challenges and opportunities to embrace artificial intelligence (AI) era in the design of wireless networks. We aim to provide readers with motivation and general methodology for adoption of AI in the context of next-generation networks. First, we discuss the rise of network intelligence and then, we introduce a brief overview of AI with machine learning (ML) and their relationship to self-organization designs. Finally, we discuss design of intelligent agent and it’s functions to enable knowledge-driven wireless networks with AI.

L

L2 Regularization versus Batch and Weight Normalization Batch Normalization is a commonly used trick to improve the training of deep neural networks. These neural networks use L2 regularization, also called weight decay, ostensibly to prevent overfitting. However, we show that L2 regularization has no regularizing effect when combined with normalization. Instead, regularization has an influence on the scale of weights, and thereby on the effective learning rate. We investigate this dependence, both in theory, and experimentally. We show that popular optimization methods such as ADAM only partially eliminate the influence of normalization on the learning rate. This leads to a discussion on other ways to mitigate this issue.
Large Linear Classification When Data Cannot Fit In Memory Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. As data cannot fit in memory, many design considerations are very different from those for traditional algorithms. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.
Large-Scale Graph Visualization and Analytics Novel approaches to network visualization and analytics use sophisticated metrics that enable rich interactive network views and node grouping and filtering. A survey of graph layout and simplification methods reveals considerable progress in these new directions.
Latent Dirichlet Allocation We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents. Researchers have published many articles in the field of topic modeling and applied in various fields such as software engineering, political science, medical and linguistic science, etc. There are various methods for topic modeling, which Latent Dirichlet allocation (LDA) is one of the most popular methods in this field. Researchers have proposed various models based on the LDA in topic modeling. According to previous work, this paper can be very useful and valuable for introducing LDA approaches in topic modeling. In this paper, we investigated scholarly articles highly (between 2003 to 2016) related to Topic Modeling based on LDA to discover the research development, current trends and intellectual structure of topic modeling. Also, we summarize challenges and introduce famous tools and datasets in topic modeling based on LDA.
Latent Semantic Analysis and Topic Modeling: Roads to Text Meaning (Slide Deck)
Latent Variable Mixture Modeling The aim of this study was to provide an overview of mixture modeling techniques, specifically as applied to nursing research, and to present examples from two studies to illustrate how these techniques may be used crosssectionally and longitudinally.
Latent Variable Models A powerful approach to probabilistic modelling involves sup- plementing a set of observed variables with additional latent, or hidden, variables. By de ning a joint distribution over visible and latent variables, the corresponding distribution of the observed variables is then obtained by marginalization. This allows relatively complex distributions to be ex- pressed in terms of more tractable joint distributions over the expanded variable space. One well-known example of a hidden variable model is the mixture distribution in which the hidden variable is the discrete component label. In the case of continuous latent variables we obtain models such as factor analysis. The structure of such probabilistic models can be made particularly transparent by giving them a graphical representation, usually in terms of a directed acyclic graph, or Bayesian network. In this chapter we provide an overview of latent variable models for representing continuous variables. We show how a particular form of linear latent variable model can be used to provide a probabilistic formulation of the well-known tech- nique of principal components analysis (PCA). By extending this technique to mixtures, and hierarchical mixtures, of probabilistic PCA models we are led to a powerful interactive algorithm for data visualization. We also show how the probabilistic PCA approach can be generalized to non-linear latent variable models leading to the Generative Topographic Mapping algorithm (GTM). Finally, we show how GTM can itself be extended to model tem- poral data.
Layerwise Systematic Scan: Deep Boltzmann Machines and Beyond For Markov chain Monte Carlo methods, one of the greatest discrepancies between theory and system is the scan order – while most theoretical development on the mixing time analysis deals with random updates, real-world systems are implemented with systematic scans. We bridge this gap for models that exhibit a bipartite structure, including, most notably, the Restricted/Deep Boltzmann Machine. The de facto implementation for these models scans variables in a layerwise fashion. We show that the Gibbs sampler with a layerwise alternating scan order has its relaxation time (in terms of epochs) no larger than that of a random-update Gibbs sampler (in terms of variable updates). We also construct examples to show that this bound is asymptotically tight. Through standard inequalities, our result also implies a comparison on the mixing times.
LDAvis: A method for visualizing and interpreting topics We present LDAvis, a web-based interactive visualization of topics estimated using Latent Dirichlet Allocation that is built using a combination of R and D3. Our visualization provides a global view of the topics (and how they differ from each other), while at the same time allowing for a deep inspection of the terms most highly associated with each individual topic. First, we propose a novel method for choosing which terms to present to a user to aid in the task of topic interpretation, in which we define the relevance of a term to a topic. Second, we present results from a user study that suggest that ranking terms purely by their probability under a topic is suboptimal for topic interpretation. Last, we describe LDAvis, our visualization system that allows users to flexibly explore topic-term relationships using relevance to better understand a fitted LDA model.
Leakage in Data Mining: Formulation, Detection, and Avoidance Deemed ‘one of the top ten data mining mistakes’, leakage is essentially the introduction of information about the data mining target, which should not be legitimately available to mine from. In addition to our own industry experience with real-life projects, controversies around several major public data mining competi-tions held recently such as the INFORMS 2010 Data Mining Challenge and the IJCNN 2011 Social Network Challenge are evidence that this issue is as relevant today as it has ever been. While acknowledging the importance and prevalence of leakage in both synthetic competitions and real-life data mining projects, existing literature has largely left this idea unexplored. What little has been said turns out not to be broad enough to cover more complex cases of leakage, such as those where the classical i.i.d. assumption is violated, that have been recently documented. In our new approach, these cases and others are explained by expli-citly defining modeling goals and analyzing the broader frame-work of the data mining problem. The resulting definition enables us to derive general methodology for dealing with the issue. We show that it is possible to avoid leakage with a simple specific approach to data management followed by what we call a learn-predict separation, and present several ways of detecting leakage when the modeler has no control over how the data have been collected.
Learn to use R – Your Hands-on Guide R is hot. Whether measured by more than 6,100 add-on packages, the 41,000+ members of LinkedIn´s R group or the 170+ R Meetup groups currently in existence, there can be little doubt that interest in the R statistics language, especially for data analysis, is soaring. Why R It´s free, open source, powerful and highly extensible. ‘You have a lot of prepackaged stuff that´s already available, so you´re standing on the shoulders of giants,’ Google´s chief economist told The New York Times back in 2009. Because it´s a programmable environment that uses command-line scripting, you can store a series of complex data-analysis steps in R. That lets you re-use your analysis work on similar data more easily than if you were using a point-and-click interface, notes Hadley Wickham, author of several popular R packages and chief scientist with RStudio. That also makes it easier for others to validate research results and check your work for errors — an issue that cropped up in the news recently after an Excel coding error was among several flaws found in an influential economics analysis report known as Reinhart/Rogoff. The error itself wasn´t a surprise, blogs Christopher Gandrud, who earned a doctorate in quantitative research methodology from the London School of Economics. ‘Despite our best efforts we always will’ make errors, he notes. ‘The problem is that we often use tools and practices that make it difficult to find and correct our mistakes.’ Sure, you can easily examine complex formulas on a spreadsheet. But it´s not nearly as easy to run multiple data sets through spreadsheet formulas to check results as it is to put several data sets through a script, he explains. Indeed, the mantra of ‘Make sure your work is reproducible!’ is a common theme among R enthusiasts.
Learnable: Theory vs Applications Two different views on machine learning problem: Applied learning (machine learning with business applications) and Agnostic PAC learning are formalized and compared here. I show that, under some conditions, the theory of PAC Learnable provides a way to solve the Applied learning problem. However, the theory requires to have the training sets so large, that it would make the learning practically useless. I suggest shedding some theoretical misconceptions about learning to make the theory more aligned with the needs and experience of practitioners.
Learning Deep Architectures for AI Theoretical results strongly suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one needs deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult optimization task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas. This paper discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.
Learning Deep Representations for Semantic Image Parsing: a Comprehensive Overview Semantic image parsing, which refers to the process of decomposing images into semantic regions and constructing the structure representation of the input, has recently aroused widespread interest in the field of computer vision. The recent application of deep representation learning has driven this field into a new stage of development. In this paper, we summarize three aspects of the progress of research on semantic image parsing, i.e., category-level semantic segmentation, instance-level semantic segmentation, and beyond segmentation. Specifically, we first review the general frameworks for each task and introduce the relevant variants. The advantages and limitations of each method are also discussed. Moreover, we present a comprehensive comparison of different benchmark datasets and evaluation metrics. Finally, we explore the future trends and challenges of semantic image parsing.
Learning Features from Co-occurrences: A Theoretical Analysis Representing a word by its co-occurrences with other words in context is an effective way to capture the meaning of the word. However, the theory behind remains a challenge. In this work, taking the example of a word classification task, we give a theoretical analysis of the approaches that represent a word X by a function f(P(C|X)), where C is a context feature, P(C|X) is the conditional probability estimated from a text corpus, and the function f maps the co-occurrence measure to a prediction score. We investigate the impact of context feature C and the function f. We also explain the reasons why using the co-occurrences with multiple context features may be better than just using a single one. In addition, some of the results shed light on the theory of feature learning and machine learning in general.
Learning From Brains How to Regularize Machines Despite impressive performance on numerous visual tasks, Convolutional Neural Networks (CNNs) — unlike brains — are often highly sensitive to small perturbations of their input, e.g. adversarial noise leading to erroneous decisions. We propose to regularize CNNs using large-scale neuroscience data to learn more robust neural features in terms of representational similarity. We presented natural images to mice and measured the responses of thousands of neurons from cortical visual areas. Next, we denoised the notoriously variable neural activity using strong predictive models trained on this large corpus of responses from the mouse visual system, and calculated the representational similarity for millions of pairs of images from the model’s predictions. We then used the neural representation similarity to regularize CNNs trained on image classification by penalizing intermediate representations that deviated from neural ones. This preserved performance of baseline models when classifying images under standard benchmarks, while maintaining substantially higher performance compared to baseline or control models when classifying noisy images. Moreover, the models regularized with cortical representations also improved model robustness in terms of adversarial attacks. This demonstrates that regularizing with neural data can be an effective tool to create an inductive bias towards more robust inference.
Learning from Dyadic Data Dyadic data refers to a domain with two nite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This type of data arises naturally in many application ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domain-independent framework of learning from dyadic data by statistical mixture models. Our approach covers different models with flat and hierarchical latent class structures. We propose an annealed version of the standard EM algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains
Learning From Positive and Unlabeled Data: A Survey Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. This article provides a survey of the current state of the art in PU learning. It proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them.
Learning from the machine: interpreting machine learning algorithms for point- and extended- source classification We investigate star-galaxy classification for astronomical surveys in the context of four methods enabling the interpretation of black-box machine learning systems. The first is outputting and exploring the decision boundaries as given by decision tree based methods, which enables the visualization of the classification categories. Secondly, we investigate how the Mutual Information based Transductive Feature Selection (MINT) algorithm can be used to perform feature pre-selection. If one would like to provide only a small number of input features to a machine learning classification algorithm, feature pre-selection provides a method to determine which of the many possible input properties should be selected. Third is the use of the tree-interpreter package to enable popular decision tree based ensemble methods to be opened, visualized, and understood. This is done by additional analysis of the tree based model, determining not only which features are important to the model, but how important a feature is for a particular classification given its value. Lastly, we use decision boundaries from the model to revise an already existing method of classification, essentially asking the tree based method where decision boundaries are best placed and defining a new classification method. We showcase these techniques by applying them to the problem of star-galaxy separation using data from the Sloan Digital Sky Survey (hereafter SDSS). We use the output of MINT and the ensemble methods to demonstrate how more complex decision boundaries improve star-galaxy classification accuracy over the standard SDSS frames approach (reducing misclassifications by up to $\approx33\%$). We then show how tree-interpreter can be used to explore how relevant each photometric feature is when making a classification on an object by object basis.
Learning How to Self-Learn: Enhancing Self-Training Using Neural Reinforcement Learning Self-training is a useful strategy for semi-supervised learning, leveraging raw texts for enhancing model performances. Traditional self-training methods depend on heuristics such as model confidence for instance selection, the manual adjustment of which can be expensive. To address these challenges, we propose a deep reinforcement learning method to learn the self-training strategy automatically. Based on neural network representation of sentences, our model automatically learns an optimal policy for instance selection. Experimental results show that our approach outperforms the baseline solutions in terms of better tagging performances and stability.
Learning Models over Relational Data: A Brief Tutorial This tutorial overviews the state of the art in learning models over relational databases and makes the case for a first-principles approach that exploits recent developments in database research. The input to learning classification and regression models is a training dataset defined by feature extraction queries over relational databases. The mainstream approach to learning over relational data is to materialize the training dataset, export it out of the database, and then learn over it using a statistical package. This approach can be expensive as it requires the materialization of the training dataset. An alternative approach is to cast the machine learning problem as a database problem by transforming the data-intensive component of the learning task into a batch of aggregates over the feature extraction query and by computing this batch directly over the input database. The tutorial highlights a variety of techniques developed by the database theory and systems communities to improve the performance of the learning task. They rely on structural properties of the relational data and of the feature extraction query, including algebraic (semi-ring), combinatorial (hypertree width), statistical (sampling), or geometric (distance) structure. They also rely on factorized computation, code specialization, query compilation, and parallelization.
Learning Representations of Graph Data — A Survey Deep Neural Networks have shown tremendous success in the area of object recognition, image classification and natural language processing. However, designing optimal Neural Network architectures that can learn and output arbitrary graphs is an ongoing research problem. The objective of this survey is to summarize and discuss the latest advances in methods to Learn Representations of Graph Data. We start by identifying commonly used types of graph data and review basics of graph theory. This is followed by a discussion of the relationships between graph kernel methods and neural networks. Next we identify the major approaches used for learning representations of graph data namely: Kernel approaches, Convolutional approaches, Graph neural networks approaches, Graph embedding approaches and Probabilistic approaches. A variety of methods under each of the approaches are discussed and the survey is concluded with a brief discussion of the future of learning representation of graph data.
Learning Sparse Structural Changes in High-dimensional Markov Networks: A Review on Methodologies and Theories Recent years have seen an increasing popularity of learning the sparse \emph{changes} in Markov Networks. Changes in the structure of Markov Networks reflect alternations of interactions between random variables under different regimes and provide insights into the underlying system. While each individual network structure can be complicated and difficult to learn, the overall change from one network to another can be simple. This intuition gave birth to an approach that \emph{directly} learns the sparse changes without modelling and learning the individual (possibly dense) networks. In this paper, we review such a direct learning method with some latest developments along this line of research.
Learning the k in k-means When clustering a dataset, the right number k of clusters to use is often not obvious, and choosing k automatically is a hard algorithmic problem. In this paper we present an improved algorithm for learning k while clustering. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution. G-means runs k-means with increasing k in a hierarchical fashion until the test accepts the hypothesis that the data assigned to each k-means center are Gaussian. Two key advantages are that the hypothesis test does not limit the covariance of the data and does not compute a full covariance matrix. Additionally, G-means only requires one intuitive parameter, the standard statistical significance level . We present results from experiments showing that the algorithm works well, and better than a recent method based on the BIC penalty for model complexity. In these experiments, we show that the BIC is ineffective as a scoring function, since it does not penalize strongly enough the model´s complexity.
Learning the parts of objects by non-negative matrix factorization Is perception of the whole based on perception of its parts There is psychological1 and physiological2,3 evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations4,5. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations.Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.
Learning Theory and Support Vector Machines – a primer The main goal of statistical learning theory is to provide a fundamental framework for the problem of decision making and model construction based on sets of data. Here, we present a brief introduction to the fundamentals of statistical learning theory, in particular the difference between empirical and structural risk minimization, including one of its most prominent implementations, i.e. the Support Vector Machine.
Learning to Communicate in Multi-Agent Reinforcement Learning : A Review We consider the issue of multiple agents learning to communicate through reinforcement learning within partially observable environments, with a focus on information asymmetry in the second part of our work. We provide a review of the recent algorithms developed to improve the agents’ policy by allowing the sharing of information between agents and the learning of communication strategies, with a focus on Deep Recurrent Q-Network-based models. We also describe recent efforts to interpret the languages generated by these agents and study their properties in an attempt to generate human-language-like sentences. We discuss the metrics used to evaluate the generated communication strategies and propose a novel entropy-based evaluation metric. Finally, we address the issue of the cost of communication and introduce the idea of an experimental setup to expose this cost in cooperative-competitive game.
Learning to Extract International Relations from Political Context We describe a new probabilistic model for extracting events between major political actors from news corpora. Our unsupervised model brings together familiar components in natural language processing (like parsers and topic models) with contextual political information— temporal and dyad dependence—to infer latent event classes. We quantitatively evaluate the model´s performance on political science benchmarks: recovering expert-assigned event class valences, and detecting real-world conflict. We also conduct a small case study based on our model´s inferences.
Learning to Hash for Indexing Big Data – A Survey The explosive growth in big data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the straightforward solution using exhaustive comparison is infeasible due to the prohibitive computational complexity and memory requirement. In response, Approximate Nearest Neighbor (ANN) search based on hashing techniques has become popular due to its promising performance in both efficiency and accuracy. Prior randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore data-independent hash functions with random projections or permutations. Although having elegant theoretic guarantees on the search quality in certain metric spaces, performance of randomized hashing has been shown insufficient in many real-world applications. As a remedy, new approaches incorporating data-driven learning methods in development of advanced hash functions have emerged. Such learning to hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions. Importantly, the learned hash codes are able to preserve the proximity of neighboring data in the original feature spaces in the hash code spaces. The goal of this paper is to provide readers with systematic understanding of insights, pros and cons of the emerging techniques. We provide a comprehensive survey of the learning to hash framework and representative techniques of various types, including unsupervised, semi-supervised, and supervised. In addition, we also summarize recent hashing approaches utilizing the deep learning models. Finally, we discuss the future direction and trends of research in this area.
Learning to Optimize Neural Nets Learning to Optimize is a recently proposed framework for learning optimization algorithms using reinforcement learning. In this paper, we explore learning an optimization algorithm for training shallow neural nets. Such high-dimensional stochastic optimization problems present interesting challenges for existing reinforcement learning algorithms. We develop an extension that is suited to learning optimization algorithms in this setting and demonstrate that the learned optimization algorithm consistently outperforms other known optimization algorithms even on unseen tasks and is robust to changes in stochasticity of gradients and the neural net architecture. More specifically, we show that an optimization algorithm trained with the proposed method on the problem of training a neural net on MNIST generalizes to the problems of training neural nets on the Toronto Faces Dataset, CIFAR-10 and CIFAR-100.
Learning to Reason Automated theorem proving has long been a key task of artificial intelligence. Proofs form the bedrock of rigorous scientific inquiry. Many tools for both partially and fully automating their derivations have been developed over the last half a century. Some examples of state-of-the-art provers are E (Schulz, 2013), VAMPIRE (Kov\’acs and Voronkov, 2013), and Prover9 (McCune, 2005-2010). Newer theorem provers, such as E, use superposition calculus in place of more traditional resolution and tableau based methods. There have also been a number of past attempts to apply machine learning methods to guiding proof search. Suttner and Ertel proposed a multilayer-perceptron based method using hand-engineered features as far back as 1990; Urban et al (2011) apply machine learning to tableau calculus; and Loos et al (2017) recently proposed a method for guiding the E theorem prover using deep nerual networks. All of this prior work, however, has one common limitation: they all rely on the axioms of classical first-order logic. Very little attention has been paid to automated theorem proving for non-classical logics. One of the only recent examples is McLaughlin and Pfenning (2008) who applied the polarized inverse method to intuitionistic propositional logic. The literature is otherwise mostly silent. This is truly unfortunate, as there are many reasons to desire non-classical proofs over classical. Constructive/intuitionistic proofs should be of particular interest to computer scientists thanks to the well-known Curry-Howard correspondence (Howard, 1980) which tells us that all terminating programs correspond to a proof in intuitionistic logic and vice versa. This work explores using Q-learning (Watkins, 1989) to inform proof search for a specific system called non-classical logic called Core Logic (Tennant, 2017).
Learning to Succeed while Teaching to Fail: Privacy in Closed Machine Learning Systems Security, privacy, and fairness have become critical in the era of data science and machine learning. More and more we see that achieving universally secure, private, and fair systems is practically impossible. We have seen for example how generative adversarial networks can be used to learn about the expected private training data; how the exploitation of additional data can reveal private information in the original one; and how what looks like unrelated features can teach us about each other. Confronted with this challenge, in this paper we open a new line of research, where the security, privacy, and fairness is learned and used in a closed environment. The goal is to ensure that a given entity (e.g., the company or the government), trusted to infer certain information with our data, is blocked from inferring protected information from it. For example, a hospital might be allowed to produce diagnosis on the patient (the positive task), without being able to infer the gender of the subject (negative task). Similarly, a company can guarantee that internally it is not using the provided data for any undesired task, an important goal that is not contradicting the virtually impossible challenge of blocking everybody from the undesired task. We design a system that learns to succeed on the positive task while simultaneously fail at the negative one, and illustrate this with challenging cases where the positive task is actually harder than the negative one being blocked. Fairness, to the information in the negative task, is often automatically obtained as a result of this proposed approach. The particular framework and examples open the door to security, privacy, and fairness in very important closed scenarios, ranging from private data accumulation companies like social networks to law-enforcement and hospitals.
Learning Tree Distributions by Hidden Markov Models Hidden tree Markov models allow learning distributions for tree structured data while being interpretable as nondeterministic automata. We provide a concise summary of the main approaches in literature, focusing in particular on the causality assumptions introduced by the choice of a specific tree visit direction. We will then sketch a novel non-parametric generalization of the bottom-up hidden tree Markov model with its interpretation as a nondeterministic tree automaton with infinite states.
Learning Whenever Learning is Possible: Universal Learning under General Stochastic Processes This work initiates a general study of learning and generalization without the i.i.d. assumption, starting from first principles. While the standard approach to statistical learning theory is based on assumptions chosen largely for their convenience (e.g., i.i.d. or stationary ergodic), in this work we are interested in developing a theory of learning based only on the most fundamental and natural assumptions implicit in the requirements of the learning problem itself. We specifically study universally consistent function learning, where the objective is to obtain low long-run average loss for any target function, when the data follow a given stochastic process. We are then interested in the question of whether there exist learning rules guaranteed to be universally consistent given only the assumption that universally consistent learning is possible for the given data process. The reasoning that motivates this criterion emanates from a kind of optimist’s decision theory, and so we refer to such learning rules as being optimistically universal. We study this question in three natural learning settings: inductive, self-adaptive, and online. Remarkably, as our strongest positive result, we find that optimistically universal learning rules do indeed exist in the self-adaptive learning setting. Establishing this fact requires us to develop new approaches to the design of learning algorithms. Along the way, we also identify concise characterizations of the family of processes under which universally consistent learning is possible in the inductive and self-adaptive settings. We additionally pose a number of enticing open problems, particularly for the online learning setting.
Learning Word Representation Considering Proximity and Ambiguity Distributed representations of words (aka word embedding) have proven helpful in solving natural language processing (NLP) tasks. Training distributed representations of words with neural networks has lately been a major focus of researchers in the field. Recent work on word embedding, the Continuous Bag-of-Words (CBOW) model and the Continuous Skip-gram (Skip-gram) model, have produced particularly impressive results, significantly speeding up the training process to enable word representation learning from largescale data. However, both CBOW and Skip-gram do not pay enough attention to word proximity in terms of model or word ambiguity in terms of linguistics. In this paper, we propose Proximity-Ambiguity Sensitive (PAS) models (i.e. PAS CBOW and PAS Skip-gram) to produce high quality distributed representations of words considering both word proximity and ambiguity. From the model perspective, we introduce proximity weights as parameters to be learned in PAS CBOWand used in PAS Skip-gram. By better modeling word proximity, we reveal the strength of pooling-structured neural networks in word representation learning. The proximity sensitive pooling layer can also be applied to other neural network applications that employ pooling layers. From the linguistics perspective, we train multiple representation vectors per word. Each representation vector corresponds to a particular group of POS tags of the word. By using PAS models, we achieved a 16.9% increase in accuracy over state-of-theart models.
Least Square Projection: a fast high precision multidimensional projection technique and its application to document mapping The problem of projecting multidimensional data into lower dimensions has been pursued by many researchers due to its potential application to data analysis of various kinds. This paper presents a novel multidimensional projection technique based on least square approximations. The approximations compute the coordinates of a set of projected points based on the coordinates of a reduced number of control points with defined geometry. We name the technique Least Square Projections (LSP). From an initial projection of the control points, LSP defines the positioning of their neighboring points through a numerical solution that aims at preserving a similarity relationship between the points given by a metric in mD. In order to perform the projection, a small number of distance calculations is necessary and no repositioning of the points is required to obtain a final solution with satisfactory precision. The results show the capability of the technique to form groups of points by degree of similarity in 2D. We illustrate that capability through its application to mapping collections of textual documents from varied sources, a strategic yet difficult application. LSP is faster and more accurate than other existing high quality methods, particularly where it was mostly tested, that is, for mapping text sets.
Lecture Notes on ‘Free Probability Theory’ This in an introduction to free probability theory, covering the basic combinatorial and analytic theory, as well as the relations to random matrices and operator algebras. The material is mainly based on the two books of the lecturer, one joint with Nica and one joint with Mingo. Free probability is here restricted to the scalar-valued setting, the operator-valued version is treated in the subsequent lecture series on ‘Non-Commutative Distributions’. The material here was presented in the winter term 2018/19 at Saarland University in 26 lectures of 90 minutes each. The lectures were recorded and can be found online at https://…/index.html
Lecture Notes on Stochastic Processes This is lecture notes on the course “Stochastic Processes”. In this format, the course was taught in the spring semesters 2017 and 2018 for third-year bachelor students of the Department of Control and Applied Mathematics, School of Applied Mathematics and Informatics AT Moscow Institute of Physics and Technology. The base of this course was formed and taught for decades by professors from the Department of Mathematical Foundations of Control A.A. Natan, S.A. Guz, and O.G. Gorbachev. Besides standard chapters of stochastic processes theory (correlation theory, Markov processes) in this book (and lectures) the following chapters are included: von Neumann–Birkhoff–Khinchin ergodic theorem, macrosystem equilibrium concept, Markov Chain Monte Carlo, Markov decision processes and the secretary problem.
Lecture Notes: Selected topics on robust statistical learning theory These notes gather recent results on robust statistical learning theory. The goal is to stress the main principles underlying the construction and theoretical analysis of these estimators rather than provide an exhaustive account on this rapidly growing field. The notes are the basis of lectures given at the conference StatMathAppli 2019.
Lecture Notes: Temporal Point Processes and the Conditional Intensity Function These short lecture notes contain a not too technical introduction to point processes on the time line. The focus lies on defining these processes using the conditional intensity function. Furthermore, likelihood inference, methods of simulation and residual analysis for temporal point processes specified by a conditional intensity function are considered.
Lectures on Statistics in Theory: Prelude to Statistics in Practice This is a writeup of lectures on ‘statistics’ that have evolved from the 2009 Hadron Collider Physics Summer School at CERN to the forthcoming 2018 school at Fermilab. The emphasis is on foundations, using simple examples to illustrate the points that are still debated in the professional statistics literature. The three main approaches to interval estimation (Neyman confidence, Bayesian, likelihood ratio) are discussed and compared in detail, with and without nuisance parameters. Hypothesis testing is discussed mainly from the frequentist point of view, with pointers to the Bayesian literature. Various foundational issues are emphasized, including the conditionality principle and the likelihood principle.
Legible Normativity for AI Alignment: The Value of Silly Rules It has become commonplace to assert that autonomous agents will have to be built to follow human rules of behavior–social norms and laws. But human laws and norms are complex and culturally varied systems, in many cases agents will have to learn the rules. This requires autonomous agents to have models of how human rule systems work so that they can make reliable predictions about rules. In this paper we contribute to the building of such models by analyzing an overlooked distinction between important rules and what we call silly rules–rules with no discernible direct impact on welfare. We show that silly rules render a normative system both more robust and more adaptable in response to shocks to perceived stability. They make normativity more legible for humans, and can increase legibility for AI systems as well. For AI systems to integrate into human normative systems, we suggest, it may be important for them to have models that include representations of silly rules.
Lenia – Biology of Artificial Life We report a new model of artificial life called Lenia (from Latin lenis ‘smooth’), a two-dimensional cellular automaton with continuous space-time-state and generalized local rule. Computer simulations show that Lenia supports a great diversity of complex autonomous patterns or ‘lifeforms’ bearing resemblance to real-world microscopic organisms. More than 400 species in 18 families have been identified, many discovered via interactive evolutionary computation. We present basic observations of the model regarding the properties of space-time and basic settings. We provide a board survey of the lifeforms, categorize them into a hierarchical taxonomy, and map their distribution in the parameter hyperspace. We describe their morphological structures and behavioral dynamics, propose possible mechanisms of their self-propulsion, self-organization and plasticity. Finally, we discuss how the study of Lenia would be related to biology, artificial life, and artificial intelligence.
Let´s Debunk the Myths about Data Mining Data mining is about knowledge and information, but only occasionally about predicting the future. For as long as the field has existed, data miners have worked to explain the difference between data mining and other forms of data analysis. The terms ‘predictive analysis´ and ‘predictive modelling´ have been adopted widely to distinguish data mining and its modelling from other kinds. Unfortunately, this has led to the erroneous belief among non-practitioners that data mining is all about prediction, which it is not. Rather, data mining is about information and knowledge. Take a look at the diagram: On the left, we have the myth which has grown up around data mining: the idea that starting from data we create models which make predictions to guide action. This places a false emphasis on models; a more accurate picture of what really happens is shown on the right. Knowledge is applied to data, producing new knowledge which can again be applied to the data: an iterative process. At any point in this cycle, knowledge and data can be used together to produce new information. This creation of new information is sometimes called ‘prediction´, but it is often not information about the future. It may have some implications for the future, as many pieces of information do, but it is not a prediction in the usual sense of the word. In summary, the left hand diagram is erroneous because it leaves out knowledge, which is both an essential prerequisite and a product of data mining, and is used at every step. Data mining often produces models but these are only one kind of knowledge that it can produce, the other being human knowledge (knowledge in the head).
Let’s Push Things Forward: A Survey on Robot Pushing As robot make their way out of factories into human environments, outer space, and beyond, they require the skill to manipulate their environment in multifarious, unforeseeable circumstances. With this regard, pushing is an essential motion primitive that dramatically extends a robot’s manipulation repertoire. In this work, we review the robotic pushing literature. While focusing on work concerned with predicting the motion of pushed objects, we also cover relevant applications of pushing for planning and control. Beginning with analytical approaches, under which we also subsume physics engines, we then proceed to discuss work on learning models from data. In doing so, we dedicate a separate section to deep learning approaches which have seen a recent upsurge in the literature. Concluding remarks and further research perspectives are given at the end of the paper.
Leveraging Flexible Data Management with Graph Databases Integrating up-to-date information into databases from different heterogeneous data sources is still a time-consuming and mostly manual job that can only be accomplished by skilled experts. For this reason, enterprises often lack information regarding the current market situation, preventing a holistic view that is needed to conduct sound data analysis and market predictions. Ironically, the Web consists of a huge and growing number of valuable information from diverse organizations and data providers, such as the Linked Open Data cloud, common knowledge sources like Freebase, and social networks. One desirable usage scenario for this kind of data is its integration into a single database in order to apply data analytics. However, in today’s business intelligence tools there is an evident lack of support for so-called situational or ad-hoc data integration. What we need is a system which 1) provides a exible storage of heterogeneous information of di erent degrees of structure in an ad-hoc manner, and 2) supports mass data operations suited for data analytics. In this paper, we will provide our vision of such a system and describe an extension of the well-studied property graph model that allows to \integrate and analyze as you go’ external data exposed in the RDF format in a seamless manner. The proposed integration approach extends the internal graph model with external data from the Linked Open Data cloud, which stores over 31 billion RDF triples (September 2011) from a variety of domains.
lfe: Linear Group Fixed Effects Linear models with fixed effects and many dummy variables are common in some fields. Such models are straightforward to estimate unless the factors have too many levels. The R package lfe solves this problem by implementing a generalization of the within transformation to multiple factors, tailored for large problems.
Lifelong Metric Learning The state-of-the-art online learning approaches is only capable of learning the metric for predefined tasks. In this paper, we consider lifelong learning problem to mimic ‘human learning’, i.e., endow a new capability to the learned metric for a new task from new online samples and incorporating previous experiences and knowledge. Therefore, we propose a new framework: lifelong metric learning (LML), which only utilizes the data of the new task to train the metric model while preserving the original capabilities. More specifically, the proposed LML maintains a common subspace for all learned metrics, named lifelong dictionary, transfers knowledge from the common subspace to each new metric task with task-specific idiosyncrasy, and redefines the common subspace over time to maximize performance across all metric tasks. We apply online Passive Aggressive optimization to solve the proposed LML framework. Finally, we evaluate our approach by analyzing several multi-task metric learning datasets. Extensive experimental results demonstrate effectiveness and efficiency of the proposed framework.
Limits of End-to-End Learning End-to-end learning refers to training a possibly complex learning system by applying gradient-based learning to the system as a whole. End-to-end learning system is specifically designed so that all modules are differentiable. In effect, not only a central learning machine, but also all ‘peripheral’ modules like representation learning and memory formation are covered by a holistic learning process. The power of end-to-end learning has been demonstrated on many tasks, like playing a whole array of Atari video games with a single architecture. While pushing for solutions to more challenging tasks, network architectures keep growing more and more complex. In this paper we ask the question whether and to what extent end-to-end learning is a future-proof technique in the sense of scaling to complex and diverse data processing architectures. We point out potential inefficiencies, and we argue in particular that end-to-end learning does not make optimal use of the modular design of present neural networks. Our surprisingly simple experiments demonstrate these inefficiencies, up to the complete breakdown of learning.
Linear and Quadratic Discriminant Analysis: Tutorial This tutorial explains Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) as two fundamental classification methods in statistical and probabilistic learning. We start with the optimization of decision boundary on which the posteriors are equal. Then, LDA and QDA are derived for binary and multiple classes. The estimation of parameters in LDA and QDA are also covered. Then, we explain how LDA and QDA are related to metric learning, kernel principal component analysis, Mahalanobis distance, logistic regression, Bayes optimal classifier, Gaussian naive Bayes, and likelihood ratio test. We also prove that LDA and Fisher discriminant analysis are equivalent. We finally clarify some of the theoretical concepts with simulations we provide.
Linear Dimensionality Reduction (Slide Deck)
Linear models and linear mixed effects models in R with linguistic applications Part 1: Linear modeling Part 2: A very basic tutorial for performing linear mixed effects analyses
Linked Open Data: The Essentials A Quick Start Guide for Decision Makers
Listen First! Turning Social Media Conversations into Business Advantage Nearly anywhere you turn online, people are talking about your products and categories, what they like and dislike, what they want, what pleases them or ticks them off, and what they would like you to do, or stop doing. Twitter, Facebook, YouTube, millions of blogs, forums, Web sites, and review sites make most of these conversations public, accessible, and researchable to every company. You also hear individuals talk about the richness and texture of their lives and your role in them. You learn about their aspirations, families, relationships, and homes; music and movies; vacations, hobbies, and sports; finances, jobs, and careers; education and technology; what they had for lunch, what they crave; and much more. By listening in on those conversations, you position yourself to develop powerful insights into people that, coupled with strategy, drive your business forward and create an enduring advantage. Listen First! Turning Social Media Conversations into Business Advantage will show you how.
Listen, Interact and Talk: Learning to Speak via Interaction One of the long-term goals of artificial intelligence is to build an agent that can communicate intelligently with human in natural language. Most existing work on natural language learning relies heavily on training over a pre-collected dataset with annotated labels, leading to an agent that essentially captures the statistics of the fixed external training data. As the training data is essentially a static snapshot representation of the knowledge from the annotator, the agent trained this way is limited in adaptiveness and generalization of its behavior. Moreover, this is very different from the language learning process of humans, where language is acquired during communication by taking speaking action and learning from the consequences of speaking action in an interactive manner. This paper presents an interactive setting for grounded natural language learning, where an agent learns natural language by interacting with a teacher and learning from feedback, thus learning and improving language skills while taking part in the conversation. To achieve this goal, we propose a model which incorporates both imitation and reinforcement by leveraging jointly sentence and reward feedbacks from the teacher. Experiments are conducted to validate the effectiveness of the proposed approach.
Living Together: Mind and Machine Intelligence In this paper we consider the nature of the machine intelligences we have created in the context of our human intelligence. We suggest that the fundamental difference between human and machine intelligence comes down to \emph{embodiment factors}. We define embodiment factors as the ratio between an entity’s ability to communicate information vs compute information. We speculate on the role of embodiment factors in driving our own intelligence and consciousness. We briefly review dual process models of cognition and cast machine intelligence within that framework, characterising it as a dominant System Zero, which can drive behaviour through interfacing with us subconsciously. Driven by concerns about the consequence of such a system we suggest prophylactic courses of action that could be considered. Our main conclusion is that it is \emph{not} sentient intelligence we should fear but \emph{non-sentient} intelligence.
Logistic Regression, Neural Networks and Dempster-Shafer Theory: a New Perspective We revisit logistic regression and its nonlinear extensions, including multilayer feedforward neural networks, by showing that these classifiers can be viewed as converting input or higher-level features into Dempster-Shafer mass functions and aggregating them by Dempster’s rule of combination. The probabilistic outputs of these classifiers are the normalized plausibilities corresponding to the underlying combined mass function. This mass function is more informative than the output probability distribution. In particular, it makes it possible to distinguish between lack of evidence (when none of the features provides discriminant information) from conflicting evidence (when different features support different classes). This expressivity of mass functions allows us to gain insight into the role played by each input feature in logistic regression, and to interpret hidden unit outputs in multilayer neural networks. It also makes it possible to use alternative decision rules, such as interval dominance, which select a set of classes when the available evidence does not unambiguously point to a single class, thus trading reduced error rate for higher imprecision.
Low Impact Artificial Intelligences There are many goals for an AI that could become dangerous if the AI becomes superintelligent or otherwise powerful. Much work on the AI control problem has been focused on constructing AI goals that are safe even for such AIs. This paper looks at an alternative approach: defining a general concept of `low impact’. The aim is to ensure that a powerful AI which implements low impact will not modify the world extensively, even if it is given a simple or dangerous goal. The paper proposes various ways of defining and grounding low impact, and discusses methods for ensuring that the AI can still be allowed to have a (desired) impact despite the restriction. The end of the paper addresses known issues with this approach and avenues for future research.
Low-Power Neuromorphic Hardware for Signal Processing Applications Machine learning has emerged as the dominant tool for implementing complex cognitive tasks that require supervised, unsupervised, and reinforcement learning. While the resulting machines have demonstrated in some cases even super-human performance, their energy consumption has often proved to be prohibitive in the absence of costly super-computers. Most state-of-the-art machine learning solutions are based on memory-less models of neurons. This is unlike the neurons in the human brain, which encode and process information using temporal information in spike events. The different computing principles underlying biological neurons and how they combine together to efficiently process information is believed to be a key factor behind their superior efficiency compared to current machine learning systems. Inspired by the time-encoding mechanism used by the brain, third generation spiking neural networks (SNNs) are being studied for building a new class of information processing engines. Modern computing systems based on the von Neumann architecture, however, are ill-suited for efficiently implementing SNNs, since their performance is limited by the need to constantly shuttle data between physically separated logic and memory units. Hence, novel computational architectures that address the von Neumann bottleneck are necessary in order to build systems that can implement SNNs with low energy budgets. In this paper, we review some of the architectural and system level design aspects involved in developing a new class of brain-inspired information processing engines that mimic the time-based information encoding and processing aspects of the brain.
Luck is Hard to Beat: The Difficulty of Sports Prediction Predicting the outcome of sports events is a hard task. We quantify this difficulty with a coefficient that measures the distance between the observed final results of sports leagues and idealized perfectly balanced competitions in terms of skill. This indicates the relative presence of luck and skill. We collected and analyzed all games from 198 sports leagues comprising 1503 seasons from 84 countries of 4 different sports: basketball, soccer, volleyball and handball. We measured the competitiveness by countries and sports. We also identify in each season which teams, if removed from its league, result in a completely random tournament. Surprisingly, not many of them are needed. As another contribution of this paper, we propose a probabilistic graphical model to learn about the teams’ skills and to decompose the relative weights of luck and skill in each game. We break down the skill component into factors associated with the teams’ characteristics. The model also allows to estimate as 0.36 the probability that an underdog team wins in the NBA league, with a home advantage adding 0.09 to this probability. As shown in the first part of the paper, luck is substantially present even in the most competitive championships, which partially explains why sophisticated and complex feature-based models hardly beat simple models in the task of forecasting sports’ outcomes.

M

Machine Learned Learning Machines There are two common approaches for optimizing the performance of a machine: genetic algorithms and machine learning. A genetic algorithm is applied over many generations whereas machine learning works by applying feedback until the system meets a performance threshold. Though these are methods that typically operate separately, we combine evolutionary adaptation and machine learning into one approach. Our focus is on machines that can learn during their lifetime, but instead of equipping them with a machine learning algorithm we aim to let them evolve their ability to learn by themselves. We use evolvable networks of probabilistic and deterministic logic gates, known as Markov Brains, as our computational model organism. The ability of Markov Brains to learn is augmented by a novel adaptive component that can change its computational behavior based on feedback. We show that Markov Brains can indeed evolve to incorporate these feedback gates to improve their adaptability to variable environments. By combining these two methods, we now also implemented a computational model that can be used to study the evolution of learning.
Machine Learning This book is based partly on content from the 2013 session of the on-line Machine Learning course run by Andrew Ng (Stanford University). The on-line course is provided for free via the Coursera platform (www.coursera.org). The author is no way affiliated with Coursera, Stanford University or Andrew Ng.
Machine Learning – The Complete Guide This is a Wikipedia book, a collection of Wikipedia articles that can be easily saved, rendered electronically, and ordered as a printed book.
Machine Learning and Applied Linguistics This entry introduces the topic of machine learning and provides an overview of its relevance for applied linguistics and language learning. The discussion will focus on giving an introduction to the methods and applications of machine learning in applied linguistics, and will provide references for further study.
Machine Learning and Cloud Computing: Survey of Distributed and SaaS Solutions Applying popular machine learning algorithms to large amounts of data raised new challenges for the ML practitioners. Traditional ML libraries does not support well processing of huge datasets, so that new approaches were needed. Parallelization using modern parallel computing frameworks, such as MapReduce, CUDA, or Dryad gained in popularity and acceptance, resulting in new ML libraries developed on top of these frameworks. We will briefly introduce the most prominent industrial and academic outcomes, such as Apache Mahout, GraphLab or Jubatus. We will investigate how cloud computing paradigm impacted the field of ML. First direction is of popular statistics tools and libraries (R system, Python) deployed in the cloud. A second line of products is augmenting existing tools with plugins that allow users to create a Hadoop cluster in the cloud and run jobs on it. Next on the list are libraries of distributed implementations for ML algorithms, and on-premise deployments of complex systems for data analytics and data mining. Last approach on the radar of this survey is ML as Software-as-a-Service, several BigData start-ups (and large companies as well) already opening their solutions to the market.
Machine Learning and Cognitive Technology for Intelligent Wireless Networks The ability to dynamically and efficiently allocate resources to meet the need of growing diversity in services and user behavior marks the future of wireless networks, giving rise to intelligent processing, which aims at enabling the system to perceive and assess the available resources, to autonomously learn to adapt to the perceived wireless environment, and to reconfigure its operating mode to maximize the utility of the available resources. The perception capability and reconfigurability are the essential features of cognitive technology while modern machine learning techniques project effectiveness in system adaptation. In this paper, we discuss the development of the cognitive technology and machine learning techniques and emphasize their roles in improving both spectrum and energy efficiency of the future wireless networks. We describe in detail the state-of-the-art of cognitive technology, covering spectrum sensing and access approaches that may enhance spectrum utilization and curtail energy consumption. We discuss powerful machine learning algorithms that enable spectrum- and energy-efficient communications in dynamic wireless environments. We also present practical applications of these techniques to the existing and future wireless communication systems, such as heterogeneous networks and device-to-device communications, and identify some research opportunities and challenges in cognitive technology and machine learning as applied to future wireless networks.
Machine Learning and the Future of Realism The preceding three decades have seen the emergence, rise, and proliferation of machine learning (ML). From half-recognised beginnings in perceptrons, neural nets, and decision trees, algorithms that extract correlations (that is, patterns) from a set of data points have broken free from their origin in computational cognition to embrace all forms of problem solving, from voice recognition to medical diagnosis to automated scientific research and driverless cars, and it is now widely opined that the real industrial revolution lies less in mobile phone and similar than in the maturation and universal application of ML. Among the consequences just might be the triumph of anti-realism over realism.
Machine Learning as Ecology Machine learning methods have had spectacular success on numerous problems. Here we show that a prominent class of learning algorithms – including Support Vector Machines (SVMs) — have a natural interpretation in terms of ecological dynamics. We use these ideas to design new online SVM algorithms that exploit ecological invasions, and benchmark performance using the MNIST dataset. Our work provides a new ecological lens through which we can view statistical learning and opens the possibility of designing ecosystems for machine learning. Supplemental code is found at https://…/EcoSVM.
Machine Learning at the Network Edge: A Survey Devices comprising the Internet of Things, such as sensors and small cameras, usually have small memories and limited computational power. The proliferation of such resource-constrained devices in recent years has led to the generation of large quantities of data. These data-producing devices are appealing targets for machine learning applications but struggle to run machine learning algorithms due to their limited computing capability. They typically offload input data to external computing systems (such as cloud servers) for further processing. The results of the machine learning computations are communicated back to the resource-scarce devices, but this worsens latency, leads to increased communication costs, and adds to privacy concerns. Therefore, efforts have been made to place additional computing devices at the edge of the network, i.e close to the IoT devices where the data is generated. Deploying machine learning systems on such edge devices alleviates the above issues by allowing computations to be performed close to the data sources. This survey describes major research efforts where machine learning has been deployed at the edge of computer networks.
Machine Learning for Business: Eight Best Practices to Get Started As organizations look to advance with analytics, predictive analytics is frequently on their road map. Businesses are interested in better understanding their customers, predicting behavior, and improving operational processes. They want more accurate insights and the ability to respond faster to change. Machine learning—building systems that can learn from data to identify patterns and predict future outcomes with minimal human intervention—is often on their radar. Data scientists who engage in analysis are an important piece of the equation. Data scientists can build new models, develop algorithms and applications, and help the organization innovate. However, these data scientists are not always easy to find. TDWI research indicates that organizations are often looking to supplement the data science team by growing the skills of business analysts to use tools such as machine learning. For example, in a recent TDWI survey, 51 percent of respondents said that enhancing business analysts´ skills was one of their top two strategies for growing their data science competencies in the organization.1 That means that organizations need productivity tools for data scientists as well as a way to equip power users and business analysts to perform advanced analytics. These business analysts can work together with data scientists and other team members to bring machine learning into the organization. How do businesses get started with machine learning How do organizations equip business analysts to use machine learning techniques and work in conjunction with data scientists What do these organizations need to know This Checklist defines machine learning and discusses best practices for the business as it takes the next step on its analytics journey toward using machine learning.
Machine Learning for Data-Driven Movement Generation: a Review of the State of the Art The rise of non-linear and interactive media such as video games has increased the need for automatic movement animation generation. In this survey, we review and analyze different aspects of building automatic movement generation systems using machine learning techniques and motion capture data. We cover topics such as high-level movement characterization, training data, features representation, machine learning models, and evaluation methods. We conclude by presenting a discussion of the reviewed literature and outlining the research gaps and remaining challenges for future work.
Machine Learning for E-mail Spam Filtering: Review,Techniques and Trends We present a comprehensive review of the most effective content-based e-mail spam filtering techniques. We focus primarily on Machine Learning-based spam filters and their variants, and report on a broad review ranging from surveying the relevant ideas, efforts, effectiveness, and the current progress. The initial exposition of the background examines the basics of e-mail spam filtering, the evolving nature of spam, spammers playing cat-and-mouse with e-mail service providers (ESPs), and the Machine Learning front in fighting spam. We conclude by measuring the impact of Machine Learning-based filters and explore the promising offshoots of latest developments.
Machine Learning for Fluid Mechanics The field of fluid mechanics is rapidly advancing, driven by unprecedented volumes of data from experiments, field measurements, and large-scale simulations at multiple spatiotemporal scales. Machine learning presents us with a wealth of techniques to extract information from data that can be translated into knowledge about the underlying fluid mechanics. Moreover, machine learning algorithms can augment domain knowledge and automate tasks related to flow control and optimization. This article presents an overview of past history, current developments, and emerging opportunities of machine learning for fluid mechanics. We outline fundamental machine learning methodologies and discuss their uses for understanding, modeling, optimizing, and controlling fluid flows. The strengths and limitations of these methods are addressed from the perspective of scientific inquiry that links data with modeling, experiments, and simulations. Machine learning provides a powerful information processing framework that can augment, and possibly even transform, current lines of fluid mechanics research and industrial applications.
Machine learning for Internet of Things data analysis: A survey Rapid developments in hardware, software, and communication technologies have allowed the emergence of Internet-connected sensory devices that provide observation and data measurement from the physical world. By 2020, it is estimated that the total number of Internet-connected devices being used will be between 25 and 50 billion. As the numbers grow and technologies become more mature, the volume of data published will increase. Internet-connected devices technology, referred to as Internet of Things (IoT), continues to extend the current Internet by providing connectivity and interaction between the physical and cyber worlds. In addition to increased volume, the IoT generates Big Data characterized by velocity in terms of time and location dependency, with a variety of multiple modalities and varying data quality. Intelligent processing and analysis of this Big Data is the key to developing smart IoT applications. This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case. The key contribution of this study is presentation of a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information. The potential and challenges of machine learning for IoT data analytics will also be discussed. A use case of applying Support Vector Machine (SVM) on Aarhus Smart City traffic data is presented for a more detailed exploration.
Machine Learning for Spatiotemporal Sequence Forecasting: A Survey Spatiotemporal systems are common in the real-world. Forecasting the multi-step future of these spatiotemporal systems based on the past observations, or, Spatiotemporal Sequence Forecasting (STSF), is a significant and challenging problem. Although lots of real-world problems can be viewed as STSF and many research works have proposed machine learning based methods for them, no existing work has summarized and compared these methods from a unified perspective. This survey aims to provide a systematic review of machine learning for STSF. In this survey, we define the STSF problem and classify it into three subcategories: Trajectory Forecasting of Moving Point Cloud (TF-MPC), STSF on Regular Grid (STSF-RG) and STSF on Irregular Grid (STSF-IG). We then introduce the two major challenges of STSF: 1) how to learn a model for multi-step forecasting and 2) how to adequately model the spatial and temporal structures. After that, we review the existing works for solving these challenges, including the general learning strategies for multi-step forecasting, the classical machine learning based methods for STSF, and the deep learning based methods for STSF. We also compare these methods and point out some potential research directions.
Machine Learning for Survival Analysis: A Survey Accurately predicting the time of occurrence of an event of interest is a critical problem in longitudinal data analysis. One of the main challenges in this context is the presence of instances whose event outcomes become unobservable after a certain time point or when some instances do not experience any event during the monitoring period. Such a phenomenon is called censoring which can be effectively handled using survival analysis techniques. Traditionally, statistical approaches have been widely developed in the literature to overcome this censoring issue. In addition, many machine learning algorithms are adapted to effectively handle survival data and tackle other challenging problems that arise in real-world data. In this survey, we provide a comprehensive and structured review of the representative statistical methods along with the machine learning techniques used in survival analysis and provide a detailed taxonomy of the existing methods. We also discuss several topics that are closely related to survival analysis and illustrate several successful applications in various real-world application domains. We hope that this paper will provide a more thorough understanding of the recent advances in survival analysis and offer some guidelines on applying these approaches to solve new problems that arise in applications with censored data.
Machine Learning for Wireless Networks with Artificial Intelligence: A Tutorial on Neural Networks Next-generation wireless networks must support ultra-reliable, low-latency communication and intelligently manage a massive number of Internet of Things (IoT) devices in real-time, within a highly dynamic environment. This need for stringent communication quality-of-service (QoS) requirements as well as mobile edge and core intelligence can only be realized by integrating fundamental notions of artificial intelligence (AI) and machine learning across the wireless infrastructure and end-user devices. In this context, this paper provides a comprehensive tutorial that introduces the main concepts of machine learning, in general, and artificial neural networks (ANNs), in particular, and their potential applications in wireless communications. For this purpose, we present a comprehensive overview on a number of key types of neural networks that include feed-forward, recurrent, spiking, and deep neural networks. For each type of neural network, we present the basic architecture and training procedure, as well as the associated challenges and opportunities. Then, we provide an in-depth overview on the variety of wireless communication problems that can be addressed using ANNs, ranging from communication using unmanned aerial vehicles to virtual reality and edge caching.For each individual application, we present the main motivation for using ANNs along with the associated challenges while also providing a detailed example for a use case scenario and outlining future works that can be addressed using ANNs. In a nutshell, this article constitutes one of the first holistic tutorials on the development of machine learning techniques tailored to the needs of future wireless networks.
Machine learning in acoustics: a review Acoustic data provide scientific and engineering insights in fields ranging from biology and communications to ocean and Earth science. We survey the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics. ML is a broad family of statistical techniques for automatically detecting and utilizing patterns in data. Relative to conventional acoustics and signal processing, ML is data-driven. Given sufficient training data, ML can discover complex relationships between features. With large volumes of training data, ML can discover models describing complex acoustic phenomena such as human speech and reverberation. ML in acoustics is rapidly developing with compelling results and significant future promise. We first introduce ML, then highlight ML developments in five acoustics research areas: source localization in speech processing, source localization in ocean acoustics, bioacoustics, seismic exploration, and environmental sounds in everyday scenes.
Machine Learning in Official Statistics In the first half of 2018, the Federal Statistical Office of Germany (Destatis) carried out a ‘Proof of Concept Machine Learning’ as part of its Digital Agenda. A major component of this was surveys on the use of machine learning methods in official statistics, which were conducted at selected national and international statistical institutions and among the divisions of Destatis. It was of particular interest to find out in which statistical areas and for which tasks machine learning is used and which methods are applied. This paper is intended to make the results of the surveys publicly accessible.
Machine Learning Interpretability: A Science rather than a tool The term ‘interpretability’ is oftenly used by machine learning researchers each with their own intuitive understanding of it. There is no universal well agreed upon definition of interpretability in machine learning. As any type of science discipline is mainly driven by the set of formulated questions rather than by different tools in that discipline, e.g. astrophysics is the discipline that learns the composition of stars, not as the discipline that use the spectroscopes. Similarly, we propose that machine learning interpretability should be a discipline that answers specific questions related to interpretability. These questions can be of statistical, causal and counterfactual nature. Therefore, there is a need to look into the interpretability problem of machine learning in the context of questions that need to be addressed rather than different tools. We discuss about a hypothetical interpretability framework driven by a question based scientific approach rather than some specific machine learning model. Using a question based notion of interpretability, we can step towards understanding the science of machine learning rather than its engineering. This notion will also help us understanding any specific problem more in depth rather than relying solely on machine learning methods.
Machine Learning Methods Economists Should Know About We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.
Machine Learning Methods for Computer Security The study of learning in adversarial environments is an emerging discipline at the juncture between machine learning and computer security that raises new questions within both fields. The interest in learning-based methods for security and system design applications comes from the high degree of complexity of phenomena underlying the security and reliability of computer systems. As it becomes increasingly difficult to reach the desired properties by design alone, learning methods are being used to obtain a better understanding of various data collected from these complex systems. However, learning approaches can be co-opted or evaded by adversaries, who change to counter them. To-date, there has been limited research into learning techniques that are resilient to attacks with provable robustness guarantees making the task of designing secure learning-based systems a lucrative open research area with many challenges. The Perspectives Workshop, ‘Machine Learning Methods for Computer Security’ was convened to bring together interested researchers from both the computer security and machine learning communities to discuss techniques, challenges, and future research directions for secure learning and learning-based security applications. This workshop featured twenty-two invited talks from leading researchers within the secure learning community covering topics in adversarial learning, game-theoretic learning, collective classification, privacy-preserving learning, security evaluation metrics, digital forensics, authorship identification, adversarial advertisement detection, learning for offensive security, and data sanitization. The workshop also featured workgroup sessions organized into three topic: machine learning for computer security, secure learning, and future applications of secure learning.
Machine Learning Testing: Survey, Landscapes and Horizons This paper provides a comprehensive survey of Machine Learning Testing (ML testing) research. It covers 128 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets, research trends, and research focus, concluding with research challenges and promising research directions in ML testing.
Machine learning the thermodynamic arrow of time The mechanism by which thermodynamics sets the direction of time’s arrow has long fascinated scientists. Here, we show that a machine learning algorithm can learn to discern the direction of time’s arrow when provided with a system’s microscopic trajectory as input. The performance of our algorithm matches fundamental bounds predicted by nonequilibrium statistical mechanics. Examination of the algorithm’s decision-making process reveals that it discovers the underlying thermodynamic mechanism and the relevant physical observables. Our results indicate that machine learning techniques can be used to study systems out of equilibrium, and ultimately to uncover physical principles.
Machine Learning with World Knowledge: The Position and Survey Machine learning has become pervasive in multiple domains, impacting a wide variety of applications, such as knowledge discovery and data mining, natural language processing, information retrieval, computer vision, social and health informatics, ubiquitous computing, etc. Two essential problems of machine learning are how to generate features and how to acquire labels for machines to learn. Particularly, labeling large amount of data for each domain-specific problem can be very time consuming and costly. It has become a key obstacle in making learning protocols realistic in applications. In this paper, we will discuss how to use the existing general-purpose world knowledge to enhance machine learning processes, by enriching the features or reducing the labeling work. We start from the comparison of world knowledge with domain-specific knowledge, and then introduce three key problems in using world knowledge in learning processes, i.e., explicit and implicit feature representation, inference for knowledge linking and disambiguation, and learning with direct or indirect supervision. Finally we discuss the future directions of this research topic.
Machine Learning, Big Data, And Smart Buildings: A Comprehensive Survey Future buildings will offer new convenience, comfort, and efficiency possibilities to their residents. Changes will occur to the way people live as technology involves into people’s lives and information processing is fully integrated into their daily living activities and objects. The future expectation of smart buildings includes making the residents’ experience as easy and comfortable as possible. The massive streaming data generated and captured by smart building appliances and devices contains valuable information that needs to be mined to facilitate timely actions and better decision making. Machine learning and big data analytics will undoubtedly play a critical role to enable the delivery of such smart services. In this paper, we survey the area of smart building with a special focus on the role of techniques from machine learning and big data analytics. This survey also reviews the current trends and challenges faced in the development of smart building services.
Machine Learning, Deepest Learning: Statistical Data Assimilation Problems We formulate a strong equivalence between machine learning, artificial intelligence methods and the formulation of statistical data assimilation as used widely in physical and biological sciences. The correspondence is that layer number in the artificial network setting is the analog of time in the data assimilation setting. Within the discussion of this equivalence we show that adding more layers (making the network deeper) is analogous to adding temporal resolution in a data assimilation framework. How one can find a candidate for the global minimum of the cost functions in the machine learning context using a method from data assimilation is discussed. Calculations on simple models from each side of the equivalence are reported. Also discussed is a framework in which the time or layer label is taken to be continuous, providing a differential equation, the Euler-Lagrange equation, which shows that the problem being solved is a two point boundary value problem familiar in the discussion of variational methods. The use of continuous layers is denoted ‘deepest learning’. These problems respect a symplectic symmetry in continuous time/layer phase space. Both Lagrangian versions and Hamiltonian versions of these problems are presented. Their well-studied implementation in a discrete time/layer, while respected the symplectic structure, is addressed. The Hamiltonian version provides a direct rationale for back propagation as a solution method for the canonical momentum.
Machine Learning: A Probabilistic Perspective This books adopts the view that the best way to solve such problems is to use the tools of probability theory. Probability theory can be applied to any problem involving uncertainty. In machine learning, uncertainty comes in many forms: what is the best prediction about the future given some past data what is the best model to explain some data what measurement should I perform next etc. The probabilistic approach to machine learning is closely related to the field of statistics, but differs slightly in terms of its emphasis and terminology. We will describe a wide variety of probabilistic models, suitable for a wide variety of data and tasks. We will also describe a wide variety of algorithms for learning and using such models. The goal is not to develop a cook book of ad hoc techiques, but instead to present a unified view of the field through the lens of probabilistic modeling and inference. Although we will pay attention to computational effciency, details on how to scale these methods to truly massive datasets are better described in other books, such as (Rajaraman and Ullman 2011; Bekkerman et al. 2011).
Machine Learning: The High-Interest Credit Card of Technical Debt Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns.
Machine Teaching: A New Paradigm for Building Machine Learning Systems The current processes for building machine learning systems require practitioners with deep knowledge of machine learning. This significantly limits the number of machine learning systems that can be created and has led to a mismatch between the demand for machine learning systems and the ability for organizations to build them. We believe that in order to meet this growing demand for machine learning systems we must significantly increase the number of individuals that can teach machines. We postulate that we can achieve this goal by making the process of teaching machines easy, fast and above all, universally accessible. While machine learning focuses on creating new algorithms and improving the accuracy of learners, the machine teaching discipline focuses on the efficacy of the teachers. Machine teaching as a discipline is a paradigm shift that follows and extends principles of software engineering and programming languages. We put a strong emphasis on the teacher and the teacher’s interaction with data, as well as crucial components such as techniques and design principles of interaction and visualization. In this paper, we present our position regarding the discipline of machine teaching and articulate fundamental machine teaching principles. We also describe how, by decoupling knowledge about machine learning algorithms from the process of teaching, we can accelerate innovation and empower millions of new uses for machine learning models.
Machine Translation Evaluation: A Survey This paper introduces the state-of-the-art MT evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. We classify the automatic evaluation methods into two categories, including lexical similarity and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, and word order, etc. The linguistic features can be divided into syntactic features and semantic features. Subsequently, we also introduce the evaluation methods for MT evaluation and the recent quality estimation tasks for MT.
Machine Translation Using Semantic Web Technologies: A Survey A large number of machine translation approaches has been developed recently with the aim of migrating content easily across languages. However, the literature suggests that many boundaries have to be dealt with to achieve better automatic translations. A central issue that machine translation systems must handle is ambiguity. A promising way of overcoming this problem is using semantic web technologies. This article presents the results of a systematic review of approaches that rely on semantic web technologies within machine translation approaches for translating natural-language sentences. Overall, our survey suggests that while semantic web technologies can enhance the quality of machine translation outputs for various problems, the combination of both is still in its infancy.
Machine Translation: A Literature Review Machine translation (MT) plays an important role in benefiting linguists, sociologists, computer scientists, etc. by processing natural language to translate it into some other natural language. And this demand has grown exponentially over past couple of years, considering the enormous exchange of information between different regions with different regional languages. Machine Translation poses numerous challenges, some of which are: a) Not all words in one language has equivalent word in another language b) Two given languages may have completely different structures c) Words can have more than one meaning. Owing to these challenges, along with many others, MT has been active area of research for more than five decades. Numerous methods have been proposed in the past which either aim at improving the quality of the translations generated by them, or study the robustness of these systems by measuring their performance on many different languages. In this literature review, we discuss statistical approaches (in particular word-based and phrase-based) and neural approaches which have gained widespread prominence owing to their state-of-the-art results across multiple major languages.
Machine Vision in the Context of Robotics: A Systematic Literature Review Machine vision is critical to robotics due to a wide range of applications which rely on input from visual sensors such as autonomous mobile robots and smart production systems. To create the smart homes and systems of tomorrow, an overview about current challenges in the research field would be of use to identify further possible directions, created in a systematic and reproducible manner. In this work a systematic literature review was conducted covering research from the last 10 years. We screened 172 papers from four databases and selected 52 relevant papers. While robustness and computation time were improved greatly, occlusion and lighting variance are still the biggest problems faced. From the number of recent publications, we conclude that the observed field is of relevance and interest to the research community. Further challenges arise in many areas of the field.
MAGIX: Model Agnostic Globally Interpretable Explanations Explaining the behavior of a black box machine learning model at the instance level is useful for building trust. However, what is also important is understanding how the model behaves globally. Such an understanding provides insight into both the data on which the model was trained and the generalization power of the rules it learned. We present here an approach that learns rules to explain globally the behavior of black box machine learning models. Collectively these rules represent the logic learned by the model and are hence useful for gaining insight into its behavior. We demonstrate the power of the approach on three publicly available data sets.
Making AI meaningful again Artificial intelligence (AI) research enjoyed an initial period of enthusiasm in the 1970s and 80s. But this enthusiasm was tempered by a long interlude of frustration when genuinely useful AI applications failed to be forthcoming. Today, we are experiencing once again a period of enthusiasm, fired above all by the successes of the technology of deep neural networks or deep machine learning. In this paper we draw attention to what we take to be serious problems underlying current views of artificial intelligence encouraged by these successes, especially in the domain of language processing. We then show an alternative approach to language-centric AI, in which we identify a role for philosophy.
Making Predictive Analytics More Practical With Alteryx Businesses today are in a conundrum. While the current economic climate has made organizations hesitant to take a risk on a strategic investment that could be a ‘bad bet´, the pace of business and the competitive marketplace dictate that organizations move quickly to take advantage of opportunities that could create a huge revenue windfall—and create a significant competitive advantage. Unfortunately, although the vast majority of organizations have a deep visibility into the past, thanks to traditional Business Intelligence (BI) tools that analyze the historical performance of the business, many still depend on intuition or simple optimism to better understand the present and future, forcing them to ‘fly blind´ when mapping their company´s future strategy. Why There are two primary reasons: First, traditional BI tools and platforms do not provide forward-looking insight, rendering them useless in anticipating future performance. While they are able to deliver a wide range of detailed reports, sophisticated dashboards, and complex visualizations, all are based on historical information, leaving organizations stuck using guesswork, ‘gut feel´, or simple spreadsheets to anticipate the future, thereby ignoring the tremendous potential of their information assets that could give them predictive insight for competitive advantage. Second, most of the predictive analytical tools on the market today are complex, time-consuming, and expensive to use. Requiring multiple different technology systems and highly trained, specialized personnel to get from business question to predictive answer, companies that use the predictive analytical tools available today often find that the business opportunity is already in their rear-view mirror—by several weeks or months—before they have an answer. Clearly, this is unacceptable in today´s competitive reality. What today´s cutting-edge businesses need is a powerful yet easy to use predictive analytics platform that enables them to gain significant business value from predicting future business performance—quickly and inexpensively—using forward-looking insight rather than historical data. This paper will examine today´s business reality of information overload, the hurdles organizations must overcome using today´s complex, expensive, and time-consuming predictive analytical tools, and the new approach to agile predictive analytics offered by Alteryx.
Making the business case for text analytics This report hopes to establish some of the key barriers that prevent successful commercial deployments while providing real-world assistance so obstacles can be overcome. It will focus on the di erent needs of an initial text analytics adoption, including what our contributors all cited as the top company need: strong high-level executive support to help ensure necessary long-term funding. Text analytics can be applied in almost every business case and multiple units within the same organization can benefit from a centralized analytics division. The market´s future is still a concern because of the shortage in text analytics professionals, and this reality is a guiding force for today´s successful pilots and programs.
Malicious URL Detection using Machine Learning: A Survey Malicious URL, a.k.a. malicious website, is a common and serious threat to cybersecurity. Malicious URLs host unsolicited content (spam, phishing, drive-by exploits, etc.) and lure unsuspecting users to become victims of scams (monetary loss, theft of private information, and malware installation), and cause losses of billions of dollars every year. It is imperative to detect and act on such threats in a timely manner. Traditionally, this detection is done mostly through the usage of blacklists. However, blacklists cannot be exhaustive, and lack the ability to detect newly generated malicious URLs. To improve the generality of malicious URL detectors, machine learning techniques have been explored with increasing attention in recent years. This article aims to provide a comprehensive survey and a structural understanding of Malicious URL Detection techniques using machine learning. We present the formal formulation of Malicious URL Detection as a machine learning task, and categorize and review the contributions of literature studies that addresses different dimensions of this problem (feature representation, algorithm design, etc.). Further, this article provides a timely and comprehensive survey for a range of different audiences, not only for machine learning researchers and engineers in academia, but also for professionals and practitioners in cybersecurity industry, to help them understand the state of the art and facilitate their own research and practical applications. We also discuss practical issues in system design, open research challenges, and point out some important directions for future research.
Managing Big Data: A TDWI Best Practices Report The emerging phenomenon called big data is forcing numerous changes in businesses and other organizations. Many struggle just to manage the massive data sets and non-traditional data structures that are typical of big data. Others are managing big data by extending their data management skills and their portfolios of data management software. This empowers them to automate more business processes, operate closer to real time, and through analytics, learn valuable new facts about business operations, customers, partners, and so on. The result is big data management (BDM), an amalgam of old and new best practices, skills, teams, data types, and home-grown or vendor-built functionality. All of these are expanding and realigning so that businesses can fully leverage big data, not merely manage it. At the same time, big data must eventually find a permanent place in enterprise data management. BDM is well worth doing because managing big data leads to a number of benefits. According to this report´s survey, the business and technology tasks that improve most are analytic insights, the completeness of analytic data sets, business value drawn from big data, and all sales and marketing activities. BDM also has challenges, and common barriers include low organizational maturity relative to big data, weak business support, and the need to learn new technology approaches. Despite the newness of big data, half of organizations surveyed are actively managing big data today. For a quarter of organizations, big data mostly takes the form of the relational and structured data that comes from traditional applications, whereas another quarter manages traditional data along with big data from new sources such as Web servers, machines, sensors, customer interactions, and social media. A quarter of surveyed organizations have managed to scale up preexisting applications and databases to handle burgeoning volumes of relational big data. Another quarter has gone out on the leading edge by acquiring new data management platforms that are purpose-built for managing and analyzing multi-structured big data. Many more are evaluating such big data platforms now, creating a brisk market of vendor products and services for managing big data. According to the survey, the Hadoop Distributed File System (HDFS), MapReduce, and various Hadoop tools will be the software products most aggressively adopted for BDM in the next three years. Others include complex event processing (for streaming big data), NoSQL databases (for schema-free big data), in-memory databases (for real-time analytic processing of big data), private clouds, in-database analytics, and grid computing. Organizations are adjusting their technical best practices to accommodate BDM. Most are schooled in extract, transform, and load (ETL) in support of data warehousing (DW) and reporting. Preparing big data for analytics is similar, but different. Organizations are retraining existing personnel, augmenting their teams with consultants, and hiring new personnel. The focus is on data analysts, data scientists, and data architects who can develop the applications for data exploration and discovery analytics that organizations need for getting value from big data. This report accelerates users´ understanding of the many options that are available for big data management (BDM), including old, new, and upcoming options. The report brings readers up to date so they can make intelligent decisions about which tools, techniques, and team structures to apply to their next-generation solutions for BDM.
Many perspectives on Deborah Mayo’s ‘Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars’ The new book by philosopher Deborah Mayo is relevant to data science for topical reasons, as she takes various controversial positions regarding hypothesis testing and statistical practice, and also as an entry point to thinking about the philosophy of statistics. The present article is a slightly expanded version of a series of informal reviews and comments on Mayo’s book. We hope this discussion will introduce people to Mayo’s ideas along with other perspectives on the topics she addresses.
Map-Reduce for Machine Learning on Multicore We are at the beginning of the multicore era. Computers will have increasingly many cores (processors), but there is still no good programming framework for these architectures, and thus no simple and unified way for machine learning to take advantage of the potential speed up. In this paper, we develop a broadly applicable parallel programming method, one that is easily applied to many different learning algorithms. Our work is in distinct contrast to the tradition in machine learning of designing (often ingenious) ways to speed up a single algorithm at a time. Specifically, we show that algorithms that fit the Statistical Query model can be written in a certain ‘summation form,’ which allows them to be easily parallelized on multicore computers. We adapt Google’s map-reduce paradigm to demonstrate this parallel speed up technique on a variety of learning algorithms including locally weighted linear regression (LWLR), k-means, logistic regression (LR), naive Bayes (NB), SVM, ICA, PCA, gaussian discriminant analysis (GDA), EM, and backpropagation (NN). Our experimental results show basically linear speedup with an increasing number of processors.
MapReduce: Simplified Data Processing on Large Clusters MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers nd the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day.
Marketing Analytics: Methods, Practice, Implementation, and Links to Other Fields Marketing analytics is a diverse field, with both academic researchers and practitioners coming from a range of backgrounds including marketing, operations research, statistics, and computer science. This paper provides an integrative review at the boundary of these three areas. The topics of visualization, segmentation, and class prediction are featured. Links between the disciplines are emphasized. For each of these topics, a historical overview is given, starting with initial work in the 1960s and carrying through to the present day. Recent innovations for modern large and complex ‘big data’ sets are described. Practical implementation advice is given, along with a directory of open source R routines for implementing marketing analytics techniques.
Markov Brains: A Technical Introduction Markov Brains are a class of evolvable artificial neural networks (ANN). They differ from conventional ANNs in many aspects, but the key difference is that instead of a layered architecture, with each node performing the same function, Markov Brains are networks built from individual computational components. These computational components interact with each other, receive inputs from sensors, and control motor outputs. The function of the computational components, their connections to each other, as well as connections to sensors and motors are all subject to evolutionary optimization. Here we describe in detail how a Markov Brain works, what techniques can be used to study them, and how they can be evolved.
Markov Chains Most of our study of probability has dealt with independent trials processes. These processes are the basis of classical probability theory and much of statistics. We have discussed two of the principal theorems for these processes: the Law of Large Numbers and the Central Limit Theorem. We have seen that when a sequence of chance experiments forms an independent trials process, the possible outcomes for each experiment are the same and occur with the same probability. Further, knowledge of the outcomes of the previous experiments does not influence our predictions for the outcomes of the next experiment. The distribution for the outcomes of a single experiment is sufficient to construct a tree and a tree measure for a sequence of n experiments, and we can answer any probability question about these experiments by using this tree measure. Modern probability theory studies chance processes for which the knowledge of previous outcomes influences predictions for future experiments. In principle, when we observe a sequence of chance experiments, all of the past outcomes could influence our predictions for the next experiment. For example, this should be the case in predicting a student’s grades on a sequence of exams in a course. But to allow this much generality would make it very difficult to prove general results. In 1907, A. A. Markov began the study of an important new type of chance process. In this process, the outcome of a given experiment can affect the outcome of the next experiment. This type of process is called a Markov chain.
Matchbox: Large Scale Online Bayesian Recommendations We present a probabilistic model for generating personalised recommendations of items to users of a web service. The Matchbox system makes use of content information in the form of user and item meta data in combination with collaborative filtering information from previous user behavior in order to predict the value of an item for a user. Users and items are represented by feature vectors which are mapped into a low-dimensional ‘trait space´ in which similarity is measured in terms of inner products. The model can be trained from different types of feedback in order to learn user-item preferences. Here we present three alternatives: direct observation of an absolute rating each user gives to some items, observation of a binary preference (like/ don´t like) and observation of a set of ordinal ratings on a userspecific scale. Efficient inference is achieved by approximate message passing involving a combination of Expectation Propagation (EP) and Variational Message Passing. We also include a dynamics model which allows an item´s popularity, a user´s taste or a user´s personal rating scale to drift over time. By using Assumed-Density Filtering (ADF) for training, the model requires only a single pass through the training data. This is an on-line learning algorithm capable of incrementally taking account of new data so the system can immediately reflect the latest user preferences. We evaluate the performance of the algorithm on the MovieLens and Netflix data sets consisting of approximately 1,000,000 and 100,000,000 ratings respectively. This demonstrates that training the model using the on-line ADF approach yields state-of-the-art performance with the option of improving performance further if computational resources are available by performing multiple EP passes over the training data.
Math for Machine Learning The goal of this document is to provide a \refresher’ on continuous mathematics for computer science students. It is by no means a rigorous course on these topics. The presentation, motivation, etc., are all from a machine learning perspective. The hope, however, is that it’s useful in other contexts. The two major topics covered are linear algebra and calculus (probability is currently left o )).
Mathematics of Deep Learning Recently there has been a dramatic increase in the performance of recognition systems due to the introduction of deep architectures for representation learning and classification. However, the mathematical reasons for this success remain elusive. This tutorial will review recent work that aims to provide a mathematical justification for several properties of deep networks, such as global optimality, geometric stability, and invariance of the learned representations.
Matrix decompositions for regression analysis
Matrix Differentiation Throughout this presentation I have chosen to use a symbolic matrix notation. This choice was not made lightly. I am a strong advocate of index notation, when appropriate. For example, index notation greatly simplifies the presentation and manipulation of differential geometry. As a rule-of-thumb, if your work is going to primarily involve differentiation with respect to the spatial coordinates, then index notation is almost surely the appropriate choice. In the present case, however, I will be manipulating large systems of equations in which the matrix calculus is relatively simply while the matrix algebra and matrix arithmetic is messy and more involved. Thus, I have chosen to use symbolic notation.
Matrix Factorization Techniques for Recommender Systems Modern consumers are inundated with choices. Electronic retailers and content providers offer a huge selection of products, with unprecedented opportunities to meet a variety of special needs and tastes. Matching consumers with the most appropriate products is key to enhancing user satisfaction and loyalty. Therefore, more retailers have become interested in recommender systems, which analyze patterns of user interest in products to provide personalized recommendations that suit a user´s taste. Because good personalized recommendations can add another dimension to the user experience, e-commerce leaders like Amazon.com and Netflix have made recommender systems a salient part of their websites. Such systems are particularly useful for entertainment products such as movies, music, and TV shows. Many customers will view the same movie, and each customer is likely to view numerous different movies. Customers have proven willing to indicate their level of satisfaction with particular movies, so a huge volume of data is available about which movies appeal to which customers. Companies can analyze this data to recommend movies to particular customers.
Maximize the Effectiveness of your Text Analytics Initiatives (Slide Deck)
Maximizing the value provided by a Big Data Platform
Max-Margin Markov Networks In typical classification tasks, we seek a function which assigns a label to a single object. Kernel-based approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use high-dimensional feature spaces, and from their strong theoretical guarantees. However, many real-world tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernel-based methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle high-dimensional feature spaces, and lack strong theoretical generalization guarantees. In this paper, we present a new framework that combines the advantages of both approaches: Maximum margin Markov (M3) networks incorporate both kernels, which efficiently deal with high-dimensional features, and the ability to capture correlations in structured data. We present an efficient algorithm for learning M3 networks based on a compact quadratic program formulation. We provide a new theoretical bound for generalization in structured domains. Experiments on the task of handwritten character recognition and collective hypertext classification demonstrate very significant gains over previous approaches.
Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks We present a formal measure-theoretical theory of neural networks (NN) built on probability coupling theory. Our main contributions are summarized as follows. * Built on the formalism of probability coupling theory, we derive an algorithm framework, named Hierarchical Measure Group and Approximate System (HMGAS), nicknamed S-System, that is designed to learn the complex hierarchical, statistical dependency in the physical world. * We show that NNs are special cases of S-System when the probability kernels assume certain exponential family distributions. Activation Functions are derived formally. We further endow geometry on NNs through information geometry, show that intermediate feature spaces of NNs are stochastic manifolds, and prove that ‘distance’ between samples is contracted as layers stack up. * S-System shows NNs are inherently stochastic, and under a set of realistic boundedness and diversity conditions, it enables us to prove that for large size nonlinear deep NNs with a class of losses, including the hinge loss, all local minima are global minima with zero loss errors, and regions around the minima are flat basins where all eigenvalues of Hessians are concentrated around zero, using tools and ideas from mean field theory, random matrix theory, and nonlinear operator equations. * S-System, the information-geometry structure and the optimization behaviors combined completes the analog between Renormalization Group (RG) and NNs. It shows that a NN is a complex adaptive system that estimates the statistic dependency of microscopic object, e.g., pixels, in multiple scales. Unlike clear-cut physical quantity produced by RG in physics, e.g., temperature, NNs renormalize/recompose manifolds emerging through learning/optimization that divide the sample space into highly semantically meaningful groups that are dictated by supervised labels (in supervised NNs).
Measurement Drives Behavior Measurement impacts our personal lives every single day. If we want to lose some weight, we start by standing on the scale. Based on the outcome, we decide how much weight we need to lose, and every other day we check our progress. If there is enough progress, we become encouraged to lose more, and if we are disappointed, we´re driven to add even more effort in order to achieve our goal. In short, measurement drives our behavior. It can be witnessed in countless ways in our private lives. In fact, it is an important principle in the social sciences, often called the Hawthorne effect. In the business world this is no different; measurement also drives our professional behavior. Once your business starts measuring the results of a certain process, your employees will start focusing on it. There are numerous examples: If the CFO starts tracking the days-sales-outstanding (DSO—i.e., the average number of days it takes customers to pay their bills) on a daily basis, instead of assuming that customers will pay within 14 days or so, the people in the accounts receivable departments are more likely to pay attention and exert greater effort to make collections. If hotel managers and their front desk staff are held accountable for the percentage of guests that fill out the customer satisfaction survey, they will be more likely to remind guests of the survey. Measurement helps us not only to focus on our goals and objectives, but also to balance our actions. If you measure production speed alone in a manufacturing process, it is likely that quality issues will arise. For balance, you also need to measure how many produced units need rework. If a procurement department is only measured on how much additional discount it can squeeze out of contract manufacturers, it becomes hard to avoid unethical practices, such as the use of child labor in low-wage countries and the use of cheaper and environmentally unfriendly materials and production processes. Procurement departments need to identify a balanced set of metrics that includes ethical issues as well as price. In each of the functional disciplines within an organization—finance, sales, marketing, logistics, manufacturing, procurement, human resources (HR) or information technology (IT)—measurement is a key element of management, and ultimately of bottom-line performance.
Measures of dissimilarity Patterns or objects analysed using the techniques described in this book are usually represented by a vector of measurements. Many of the techniques require some measure of dissimilarity or distance between two pattern vectors, although sometimes data can arise directly in the form of a dissimilarity matrix….
Measuring Distances Applied multivariate statistics
Measuring Predictability: Theory and Macroeconomic Applications We propose a measure of predictability based on the ratio of the expected loss of a short-run forecast to the expected loss of a long-run forecast. This predictability measure can be tailored to the forecast horizons of interest, and it allows for general loss functions, univariate or multivariate information sets, and covariance stationary or difference stationary processes. We propose a simple estimator, and we suggest resampling methods for inference. We then provide several macroeconomic applications. First, we illustrate the implementation of predictability measures based on fitted parametric models for several U.S. macroeconomic time series. Second, we analyze the internal propagation mechanism of a standard dynamic macroeconomic model by comparing the predictability of model inputs and model outputs. Third, we use predictability as a metric for assessing the similarity of data simulated from the model and actual data. Finally, we outline several nonparametric extensions of our approach.
mediation: R Package for Causal Mediation Analysis In this paper, we describe the R package mediation for conducting causal mediation analysis in applied empirical research. In many scientific disciplines, the goal of researchers is not only estimating causal effects of a treatment but also understanding the process in which the treatment causally affects the outcome. Causal mediation analysis is frequently used to assess potential causal mechanisms. The mediation package implements a comprehensive suite of statistical tools for conducting such an analysis. The package is organized into two distinct approaches. Using the model-based approach, researchers can estimate causal mediation effects and conduct sensitivity analysis under the standard research design. Furthermore, the design-based approach provides several analysis tools that are applicable under different experimental designs. This approach requires weaker assumptions than the model-based approach. We also implement a statistical method for dealing with multiple (causally dependent) mediators, which are often encountered in practice. Finally, the package also offers a methodology for assessing causal mediation in the presence of treatment noncompliance, a common problem in randomized trials.
Memory Aware Synapses: Learning what (not) to forget Humans can learn in a continuous manner. Old rarely utilized knowledge can be overwritten by new incoming information while important, frequently used knowledge is prevented from being erased. In artificial learning systems, lifelong learning so far has focused mainly on accumulating knowledge over tasks and overcoming catastrophic forgetting. In this paper, we argue that, given the limited model capacity and the unlimited new information to be learned, knowledge has to be preserved or erased selectively. Inspired by neuroplasticity, we propose an online method to compute the importance of the parameters of a neural network, based on the data that the network is actively applied to, in an unsupervised manner. After learning a task, whenever a sample is fed to the network, we accumulate an importance measure for each parameter of the network, based on how sensitive the predicted output is to a change in this parameter. When learning a new task, changes to important parameters are penalized. We show that a local version of our method is a direct application of Hebb’s rule in identifying the important connections between neurons. We test our method on a sequence of object recognition tasks and on the challenging problem of learning an embedding in a continuous manner. We show state of the art performance and the ability to adapt the importance of the parameters towards what the network needs (not) to forget, which may be different for different test conditions.
Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm Learning to learn is a powerful paradigm for enabling models to learn from data more effectively and efficiently. A popular approach to meta-learning is to train a recurrent model to read in a training dataset as input and output the parameters of a learned model, or output predictions for new test inputs. Alternatively, a more recent approach to meta-learning aims to acquire deep representations that can be effectively fine-tuned, via standard gradient descent, to new tasks. In this paper, we consider the meta-learning problem from the perspective of universality, formalizing the notion of learning algorithm approximation and comparing the expressive power of the aforementioned recurrent models to the more recent approaches that embed gradient descent into the meta-learner. In particular, we seek to answer the following question: does deep representation combined with standard gradient descent have sufficient capacity to approximate any learning algorithm We find that this is indeed true, and further find, in our experiments, that gradient-based meta-learning consistently leads to learning strategies that generalize more widely compared to those represented by recurrent models.
Metalearning for Feature Selection A general formulation of optimization problems in which various candidate solutions may use different feature-sets is presented, encompassing supervised classification, automated program learning and other cases. A novel characterization of the concept of a ‘good quality feature’ for such an optimization problem is provided; and a proposal regarding the integration of quality based feature selection into metalearning is suggested, wherein the quality of a feature for a problem is estimated using knowledge about related features in the context of related problems. Results are presented regarding extensive testing of this ‘feature metalearning’ approach on supervised text classification problems; it is demonstrated that, in this context, feature metalearning can provide significant and sometimes dramatic speedup over standard feature selection heuristics.
Meta-Learning: A Survey Meta-learning, or learning to learn, is the science of systematically observing how different machine learning approaches perform on a wide range of learning tasks, and then learning from this experience, or meta-data, to learn new tasks much faster than otherwise possible. Not only does this dramatically speed up and improve the design of machine learning pipelines or neural architectures, it also allows us to replace hand-engineered algorithms with novel approaches learned in a data-driven way. In this chapter, we provide an overview of the state of the art in this fascinating and continuously evolving field.
Methods for Analyzing Large Spatial Data: A Review and Comparison The Gaussian process is an indispensable tool for spatial data analysts. The onset of the ‘big data’ era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each which was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online.
Methods for Interpreting and Understanding Deep Neural Networks This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. It introduces some recently proposed techniques of interpretation, along with theory, tricks and recommendations, to make most efficient use of these techniques on real data. It also discusses a number of practical applications.
Metrics for Graph Comparison: A Practitioner’s Guide Comparison of graph structure is a ubiquitous task in data analysis and machine learning, with diverse applications in fields such as neuroscience, cyber security, social network analysis, and bioinformatics, among others. Discovery and comparison of structures such as modular communities, rich clubs, hubs, and trees in data in these fields yields insight into the generative mechanisms and functional properties of the graph. Often, two graphs are compared via a pairwise distance measure, with a small distance indicating structural similarity and vice versa. Common choices include spectral distances (also known as $\lambda$ distances) and distances based on node affinities. However, there has of yet been no comparative study of the efficacy of these distance measures in discerning between common graph topologies and different structural scales. In this work, we compare commonly used graph metrics and distance measures, and demonstrate their ability to discern between common topological features found in both random graph models and empirical datasets. We put forward a multi-scale picture of graph structure, in which the effect of global and local structure upon the distance measures is considered. We make recommendations on the applicability of different distance measures to empirical graph data problem based on this multi-scale view. Finally, we introduce the Python library NetComp which implements the graph distances used in this work.
Mining Approximate Top K Subspace Anomalies in MultiDimensional TimeSeries Data Market analysis is a representative data analysis process with many applications. In such an analysis, critical numerical measures, such as profit and sales, fluctuate over time and form timeseries data. Moreover, the time series data correspond to market segments, which are described by a set of attributes, such as age, gender, education, income level, and product-category, that form a multi-dimensional structure. To better understand market dynamics and predict future trends, it is crucial to study the dynamics of time-series in multi-dimensional market segments. This is a topic that has been largely ignored in time series and data cube research. In this study, we examine the issues of anomaly detection in multi-dimensional time-series data. We propose timeseries data cube to capture the multidimensional space formed by the attribute structure. This facilitates the detection of anomalies based on expected values derived from higher level, \more general’ time-series. Anomaly detection in a time-series data cube poses computational challenges, especially for high-dimensional, large data sets. To this end, we also propose an efficient search algorithm to iteratively select subspaces in the original high-dimensional space and detect anomalies within each one. Our experiments with both synthetic and real-world data demonstrate the effectiveness and efficiency of the proposed solution.
Mining Software Quality from Software Reviews: Research Trends and Open Issues Software review text fragments have considerably valuable information about users experience. It includes a huge set of properties including the software quality. Opinion mining or sentiment analysis is concerned with analyzing textual user judgments. The application of sentiment analysis on software reviews can find a quantitative value that represents software quality. Although many software quality methods are proposed they are considered difficult to customize and many of them are limited. This article investigates the application of opinion mining as an approach to extract software quality properties. We found that the major issues of software reviews mining using sentiment analysis are due to software lifecycle and the diverse users and teams.
Missing Data and Prediction Missing data are a common problem for both the construction and implementation of a prediction algorithm. Pattern mixture kernel submodels (PMKS) – a series of submodels for every missing data pattern that are fit using only data from that pattern – are a computationally efficient remedy for both stages. Here we show that PMKS yield the most predictive algorithm among all standard missing data strategies. Specifically, we show that the expected loss of a forecasting algorithm is minimized when each pattern-specific loss is minimized. Simulations and a re-analysis of the SUPPORT study confirms that PMKS generally outperforms zero-imputation, mean-imputation, complete-case analysis, complete-case submodels, and even multiple imputation (MI). The degree of improvement is highly dependent on the missingness mechanism and the effect size of missing predictors. When the data are Missing at Random (MAR) MI can yield comparable forecasting performance but generally requires a larger computational cost. We see that predictions from the PMKS are equivalent to the limiting predictions for a MI procedure that uses a mean model dependent on missingness indicators (the MIMI model). Consequently, the MIMI model can be used to assess the MAR assumption in practice. The focus of this paper is on out-of-sample prediction behavior, implications for model inference are only briefly explored.
Mobile big data analysis with machine learning This paper investigates to identify the requirement and the development of machine learning-based mobile big data analysis through discussing the insights of challenges in the mobile big data (MBD). Furthermore, it reviews the state-of-the-art applications of data analysis in the area of MBD. Firstly, we introduce the development of MBD. Secondly, the frequently adopted methods of data analysis are reviewed. Three typical applications of MBD analysis, namely wireless channel modeling, human online and offline behavior analysis, and speech recognition in the internet of vehicles, are introduced respectively. Finally, we summarize the main challenges and future development directions of mobile big data analysis.
Mobile Edge Computing: Survey and Research Outlook Driven by the visions of Internet of Things and 5G communications, recent years have seen a paradigm shift in mobile computing, from the centralized Mobile Cloud Computing towards Mobile Edge Computing (MEC). The main feature of MEC is to push mobile computing, network control and storage to the network edges (e.g., base stations and access points) so as to enable computation-intensive and latency-critical applications at the resource-limited mobile devices. MEC promises dramatic reduction in latency and mobile energy consumption, tackling the key challenges for materializing 5G vision. The promised gains of MEC have motivated extensive efforts in both academia and industry on developing the technology. A main thrust of MEC research is to seamlessly merge the two disciplines of wireless communications and mobile computing, resulting in a wide-range of new designs ranging from techniques for computation offloading to network architectures. This paper provides a comprehensive survey of the state-of-the-art MEC research with a focus on joint radio-and-computational resource management. We also present a research outlook consisting of a set of promising directions for MEC research, including MEC system deployment, cache-enabled MEC, mobility management for MEC, green MEC, as well as privacy-aware MEC. Advancements in these directions will facilitate the transformation of MEC from theory to practice. Finally, we introduce recent standardization efforts on MEC as well as some typical MEC application scenarios.
Modal Regression using Kernel Density Estimation: a Review We review recent advances in modal regression studies using kernel density estimation. Modal regression is an alternative approach for investigating relationship between a response variable and its covariates. Specifically, modal regression summarizes the interactions between the response variable and covariates using the conditional mode or local modes. We first describe the underlying model of modal regression and its estimators based on kernel density estimation. We then review the asymptotic properties of the estimators and strategies for choosing the smoothing bandwidth. We also discuss useful algorithms and similar alternative approaches for modal regression, and propose future direction in this field.
Model Selection Techniques — An Overview In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of- the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection.
Model-based Machine Learning Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation formodel-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for modelbased machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.
Model-free, Model-based, and General Intelligence During the 60s and 70s, AI researchers explored intuitions about intelligence by writing programs that displayed intelligent behavior. Many good ideas came out from this work but programs written by hand were not robust or general. After the 80s, research increasingly shifted to the development of learners capable of inferring behavior and functions from experience and data, and solvers capable of tackling well-defined but intractable models like SAT, classical planning, Bayesian networks, and POMDPs. The learning approach has achieved considerable success but results in black boxes that do not have the flexibility, transparency, and generality of their model-based counterparts. Model-based approaches, on the other hand, require models and scalable algorithms. Model-free learners and model-based solvers have close parallels with Systems 1 and 2 in current theories of the human mind: the first, a fast, opaque, and inflexible intuitive mind; the second, a slow, transparent, and flexible analytical mind. In this paper, I review developments in AI and draw on these theories to discuss the gap between model-free learners and model-based solvers, a gap that needs to be bridged in order to have intelligent systems that are robust and general.
Modeling and Optimization for Big Data Analytics With pervasive sensors continuously collecting and storing massive amounts of information, there is no doubt this is an era of data deluge. Learning from these large volumes of data is expected to bring significant science and engineering advances along with improvements in quality of life. However, with such a big blessing come big challenges. Running analytics on voluminous data sets by central processors and storage units seems infeasible, and with the advent of streaming data sources, learning must often be performed in real time, typically without a chance to revisit past entries. ‘Workhorse’ signal processing (SP) and statistical learning tools have to be re-examined in today´s high-dimensional data regimes. This article contributes to the ongoing cross-disciplinary efforts in data science by putting forth encompassing models capturing a wide range of SP-relevant data analytic tasks, such as principal component analysis (PCA), dictionary learning (DL), compressive sampling (CS), and subspace clustering. It offers scalable architectures and optimization algorithms for decentralized and online learning problems, while revealing fundamental insights into the various analytic and implementation tradeoffs involved. Extensions of the encompassing models to timely data-sketching, tensor- and kernel-based learning tasks are also provided. Finally, the close connections of the presented framework with several big data tasks, such as network visualization, decentralized and dynamic estimation, prediction, and imputation of network link load traffic, as well as imputation in tensor-based medical imaging are highlighted.
Modeling Influence with Semantics in Social Networks: a Survey The discovery of influential entities in all kinds of networks (e.g. social, digital, or computer) has always been an important field of study. In recent years, Online Social Networks (OSNs) have been established as a basic means of communication and often influencers and opinion makers promote politics, events, brands or products through viral content. In this work, we present a systematic review across i) online social influence metrics, properties, and applications and ii) the role of semantic in modeling OSNs information. We end up with the conclusion that both areas can jointly provide useful insights towards the qualitative assessment of viral user-generated content, as well as for modeling the dynamic properties of influential content and its flow dynamics.
Modern Deep Reinforcement Learning Algorithms Recent advances in Reinforcement Learning, grounded on combining classical theoretical results with Deep Learning paradigm, led to breakthroughs in many artificial intelligence tasks and gave birth to Deep Reinforcement Learning (DRL) as a field of research. In this work latest DRL algorithms are reviewed with a focus on their theoretical justification, practical limitations and observed empirical properties.
Monotonic classification: an overview on algorithms, performance measures and data sets Currently, knowledge discovery in databases is an essential step to identify valid, novel and useful patterns for decision making. There are many real-world scenarios, such as bankruptcy prediction, option pricing or medical diagnosis, where the classification models to be learned need to fulfil restrictions of monotonicity (i.e. the target class label should not decrease when input attributes values increase). For instance, it is rational to assume that a higher debt ratio of a company should never result in a lower level of bankruptcy risk. Consequently, there is a growing interest from the data mining research community concerning monotonic predictive models. This paper aims to present an overview about the literature in the field, analyzing existing techniques and proposing a taxonomy of the algorithms based on the type of model generated. For each method, we review the quality metrics considered in the evaluation and the different data sets and monotonic problems used in the analysis. In this way, this paper serves as an overview of the research about monotonic classification in specialized literature and can be used as a functional guide of the field.
Monte Carlo Gradient Estimation in Machine Learning This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation in machine learning and across the statistical sciences: the problem of computing the gradient of an expectation of a function with respect to parameters defining the distribution that is integrated; the problem of sensitivity analysis. In machine learning research, this gradient problem lies at the core of many learning problems, in supervised, unsupervised and reinforcement learning. We will generally seek to rewrite such gradients in a form that allows for Monte Carlo estimation, allowing them to be easily and efficiently used and analysed. We explore three strategies–the pathwise, score function, and measure-valued gradient estimators–exploring their historical developments, derivation, and underlying assumptions. We describe their use in other fields, show how they are related and can be combined, and expand on their possible generalisations. Wherever Monte Carlo gradient estimators have been derived and deployed in the past, important advances have followed. A deeper and more widely-held understanding of this problem will lead to further advances, and it is these advances that we wish to support.
Moving Objects Analytics: Survey on Future Location and Trajectory Prediction Methods The tremendous growth of positioning technologies and GPS enabled devices has produced huge volumes of tracking data during the recent years. This source of information constitutes a rich input for data analytics processes, either offline (e.g. cluster analysis, hot motion discovery) or online (e.g. short-term forecasting of forthcoming positions). This paper focuses on predictive analytics for moving objects (could be pedestrians, cars, vessels, planes, animals, etc.) and surveys the state-of-the-art in the context of future location and trajectory prediction. We provide an extensive review of over 50 works, also proposing a novel taxonomy of predictive algorithms over moving objects. We also list the properties of several real datasets used in the past for validation purposes of those works and, motivated by this, we discuss challenges that arise in the transition from conventional to Big Data applications. CCS Concepts: Information systems > Spatial-temporal systems; Information systems > Data analytics; Information systems > Data mining; Computing methodologies > Machine learning Additional Key Words and Phrases: mobility data, moving object trajectories, trajectory prediction, future location prediction.
mtk: A General-Purpose and Extensible R Environment for Uncertainty and Sensitivity Analyses of Numerical Experiments Along with increased complexity of the models used for scientific activities and engineering, come diverse and greater uncertainties. Today, effectively quantifying the uncertainties contained in a model appears to be more important than ever. Scientific fellows know how serious it is to calibrate their model in a robust way, and decision-makers describe how critical it is to keep the best effort to reduce the uncertainties about the model. Effectively accessing the uncertainties about the model requires mastering all the tasks involved in the numerical experiments, from optimizing the experimental design to managing the very time consuming aspect of model simulation and choosing the adequate indicators and analysis methods. In this paper, we present an open framework for organizing the complexity associated with numerical model simulation and analyses. Named mtk (Mexico Toolkit), the developed system aims at providing practitioners from different disciplines with a systematic and easy way to compare and to find the best method to effectively uncover and quantify the uncertainties contained in the model and further to evaluate their impact on the performance of the model. Such requirements imply that the system must be generic, universal, homogeneous, and extensible. This paper discusses such an implementation using the R scientific computing platform and demonstrates its functionalities with examples from agricultural modeling. The package mtk is of general purpose and easy to extend. Numerous methods are already available in the actual release version, including Fast, Sobol, Morris, Basic Monte-Carlo, Regression, LHS (Latin Hypercube Sampling), PLMM (Polynomial Linear metamodel). Most of them are compiled from available R packages with extension tools delivered by package mtk.
Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches Reinforcement Learning (RL) is a learning paradigm concerned with learning to control a system so as to maximize an objective over the long term. This approach to learning has received immense interest in recent times and success manifests itself in the form of human-level performance on games like \textit{Go}. While RL is emerging as a practical component in real-life systems, most successes have been in Single Agent domains. This report will instead specifically focus on challenges that are unique to Multi-Agent Systems interacting in mixed cooperative and competitive environments. The report concludes with advances in the paradigm of training Multi-Agent Systems called \textit{Decentralized Actor, Centralized Critic}, based on an extension of MDPs called \textit{Decentralized Partially Observable MDP}s, which has seen a renewed interest lately.
Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms Recent years have witnessed significant advances in reinforcement learning (RL), which has registered great success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-agent RL techniques. Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. In this chapter, we provide a selective overview of MARL, with focus on algorithms backed by theoretical analysis. More specifically, we review the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. We also introduce several significant but challenging applications of these algorithms. Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. Some of the new angles extrapolate from our own research endeavors and interests. Our overall goal with this chapter is, beyond providing an assessment of the current state of the field on the mark, to identify fruitful future research directions on theoretical studies of MARL. We expect this chapter to serve as continuing stimulus for researchers interested in working on this exciting while challenging topic.
Multidimensional Constellations for Uplink SCMA Systems — A Comparative Study Sparse code multiple access (SCMA) is a class of non-orthogonal multiple access (NOMA) that is proposed to support uplink machine-type communication services. In an SCMA system, designing multidimensional constellation plays an important role in the performance of the system. Since the behaviour of multidimensional constellations highly depends on the type of the channel, it is crucial to employ a constellation that is suitable for a certain application. In this paper, we first highlight and review the key performance indicators (KPIs) of multidimensional constellations that should be considered in their design process for various channel scenarios. We then provide a survey on the known multidimensional constellations in the context of SCMA systems with their design criteria. The performance of some of those constellations are evaluated for uncoded, high-rate, and low-rate LTE turbo-coded SCMA systems under different channel conditions through extensive simulations. All turbo-coded comparisons are performed for bit-interleaved coded modulation, with a concatenated detection and decoding scheme. Simulation results confirm that multidimensional constellations that satisfy KPIs of a certain channel scenario outperform others. Moreover, the bit error rate performance of uncoded systems, and the performance of the coded systems are coupled to their bit-labeling. The performance of the systems also remarkably depends on the behavior of the multi-user detector at different signal-to-noise ratio regions.
Multidimensional Scaling by Majorization: A Review A major breakthrough in the visualization of dissimilarities between pairs of objects was the formulation of the least-squares multidimensional scaling (MDS) model as defined by the Stress function. This function is quite flexible in that it allows possibly nonlinear transformations of the dissimilarities to be represented by distances between points in a low dimensional space. To obtain the visualization, the Stress function should be minimized over the coordinates of the points and the over the transformation. In a series of papers, Jan de Leeuw has made a significant contribution to majorization methods for the minimization of Stress in least-squares MDS. In this paper, we present a review of the majorization algorithm for MDS as implemented in the smacof package and related approaches. We present several illustrative examples and special cases.
Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to a given ad landing page and have focussed on making recommendations from a small set of phrases extracted (and expanded) from the page using NLP and ranking based techniques. In this paper, we eschew this paradigm, and demonstrate that it is possible to efficiently predict the relevant subset of queries from a large set of monetizable ones by posing the problem as a multi-label learning task with each query being represented by a separate label. We develop Multi-label Random Forests to tackle problems with millions of labels. Our proposed classifier has prediction costs that are logarithmic in the number of labels and can make predictions in a few milliseconds using 10 Gb of RAM. We demonstrate that it is possible to generate training data for our classifier automatically from click logs without any human annotation or intervention. We train our classifier on tens of millions of labels, features and training points in less than two days on a thousand node cluster. We develop a sparse semi-supervised multi-label learning formulation to deal with training set biases and noisy labels harvested automatically from the click logs. This formulation is used to infer a belief in the state of each label for each training ad and the random forest classifier is extended to train on these beliefs rather than the given labels. Experiments reveal significant gains over ranking and NLP based techniques on a large test set of 5 million ads using multiple metrics.
Multimodal Machine Learning: A Survey and Taxonomy Our experience of the world is multimodal – we see objects, hear sounds, feel texture, smell odors, and taste flavors. Modality refers to the way in which something happens or is experienced and a research problem is characterized as multimodal when it includes multiple such modalities. In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals together. Multimodal machine learning aims to build models that can process and relate information from multiple modalities. It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential. Instead of focusing on specific multimodal applications, this paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy. We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and co-learning. This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research.
Multi-Objective Multi-Agent Decision Making: A Utility-based Analysis and Survey The majority of multi-agent system (MAS) implementations aim to optimise agents’ policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multi-objective multi-agent systems (MOMAS) explicitly consider the possible trade-offs between conflicting objective functions. We argue that, in MOMAS, such compromises should be analysed on the basis of the utility that these compromises have for the users of a system. As is standard in multi-objective optimisation, we model the user utility using utility functions that map value or return vectors to scalar values. This approach naturally leads to two different optimisation criteria: expected scalarised returns (ESR) and scalarised expected returns (SER). We develop a new taxonomy which classifies multi-objective multi-agent decision making settings, on the basis of the reward structures, and which and how utility functions are applied. This allows us to offer a structured view of the field, to clearly delineate the current state-of-the-art in multi-objective multi-agent decision making approaches and to identify promising directions for future research. Starting from the execution phase, in which the selected policies are applied and the utility for the users is attained, we analyse which solution concepts apply to the different settings in our taxonomy. Furthermore, we define and discuss these solution concepts under both ESR and SER optimisation criteria. We conclude with a summary of our main findings and a discussion of many promising future research directions in multi-objective multi-agent systems.
Multiple Change-point Detection: a Selective Overview Very long and noisy sequence data arise from biological sciences to social science including high throughput data in genomics and stock prices in econometrics. Often such data are collected in order to identify and understand shifts in trend, e.g., from a bull market to a bear market in finance or from a normal number of chromosome copies to an excessive number of chromosome copies in genetics. Thus, identifying multiple change points in a long, possibly very long, sequence is an important problem. In this article, we review both classical and new multiple change-point detection strategies. Considering the long history and the extensive literature on the change-point detection, we provide an in-depth discussion on a normal mean change-point model from aspects of regression analysis, hypothesis testing, consistency and inference. In particular, we present a strategy to gather and aggregate local information for change-point detection that has become the cornerstone of several emerging methods because of its attractiveness in both computational and theoretical properties.
Multiple Factor Analysis Multiple factor analysis (MFA, see Escofier and Pagès, 1990, 1994) analyzes observations described by several ‘blocks’ or sets of variables. MFA seeks the common structures present in all or some of these sets. MFA is performed in two steps. First a principal component analysis (PCA) is performed on each data set which is then ‘normalized’ by dividing all its elements by the square root of the first eigenvalue obtained from of its PCA. Second, the normalized data sets are merged to form a unique matrix and a global PCA is performed on this matrix. The individual data sets are then projected onto the global analysis to analyze communalities and discrepancies. MFA is used in very different domains such as sensory evaluation, economy, ecology, and chemistry.
Multiple Imputation: A Review of Practical and Theoretical Findings Multiple imputation is a straightforward method for handling missing data in a principled fashion. This paper presents an overview of multiple imputation, including important theoretical results and their practical implications for generating and using multiple imputations. A review of strategies for generating imputations follows, including recent developments in flexible joint modeling and sequential regression/chained equations/fully conditional specification approaches. Finally, we compare and contrast different methods for generating imputations on a range of criteria before identifying promising avenues for future research.
Multiple Instance Learning: A Survey of Problem Characteristics and Applications Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research.
Multisensor data fusion: A review of the state-of-the-art There has been an ever-increasing interest in multi-disciplinary research on multisensor data fusion technology, driven by its versatility and diverse areas of application. Therefore, there seems to be a real need for an analytical review of recent developments in the data fusion domain. This paper proposes a comprehensive review of the data fusion state of the art, exploring its conceptualizations, benefits, and challenging aspects, as well as existing methodologies. In addition, several future directions of research in the data fusion community are highlighted and described.
Multi-Stakeholder Recommendation: Applications and Challenges Recommender systems have been successfully applied to assist decision making by producing a list of item recommendations tailored to user preferences. Traditional recommender systems only focus on optimizing the utility of the end users who are the receiver of the recommendations. By contrast, multi-stakeholder recommendation attempts to generate recommendations that satisfy the needs of both the end users and other parties or stakeholders. This paper provides an overview and discussion about the multi-stakeholder recommendations from the perspective of practical applications, available data sets, corresponding research challenges and potential solutions.
Multivariate Archimax Copulas A multivariate extension of the bivariate class of Archimax copulas was recently proposed by Mesiar and J agr (2013), who asked under which conditions it holds. This paper answers their question and provides a stochastic representation of multivariate Archimax copulas. A few basic properties of these copulas are explored, including their minimum and maximum domains of attraction. Several non-trivial examples of multivariate Archimax copulas are also provided.
Multivariate Linear Models in R The multivariate linear model is Y (n m) = X (n k+1) B (k+1 m) + E (n m) where Y is a matrix of n observations on m response variables; X is a model matrix with columns for k + 1 regressors, typically including an initial column of 1s for the regression constant; B is a matrix of regression coe cients, one column for each response variable; and E is a matrix of errors. This model can be t with the lm function in R, where the left-hand side of the model comprises a matrix of response variables, and the right-hand side is speci ed exactly as for a univariate linear model (i.e., with a single response variable). This appendix to Fox and Weisberg (2011) explains how to use the Anova and linearHypothesis functions in the car package to test hypotheses for parameters in multivariate linear models, including models for repeated-measures data.
Multivariate Pricing Price strategy is the key marketing tool for companies to increase their competitive edge but too often, prices are based on costs, not on customers´ perceptions of value. Value-based pricing is a business strategy which sets selling prices based on the perceived value to the customer, rather than the actual cost of the product, the market price, competitors´ prices, or the historical price. Practically speaking, the goal is to align the money spent with the value perceived. For example, the number of users, lifetime spending, number of transactions, value of transaction, return-on-investment, cost saving, revenue; the list can continue. The most common techniques employ straightforward methods such as: ‘Would you pay for this item at this price ´. While the van Westendorf method and conjoint analysis are useful, this article focuses on multivariate pricing techniques that allow flexibility and agility into the pricing models, and therefore can be more widely employed by clients and product managers.
Multivariate Spatial Data Visualization: A Survey Multivariate spatial data plays an important role in computational science and engineering simulations. The potential features and hidden relationships in multivariate data can assist scientists to gain an in-depth understanding of a scientific process, verify a hypothesis and further discover a new physical or chemical law. In this paper, we present a comprehensive survey of the state-of-the-art techniques for multivariate spatial data visualization. We first introduce the basic concept and characteristics of multivariate spatial data, and describe three main tasks in multivariate data visualization: feature classification, fusion visualization, and correlation analysis. Finally, we prospect potential research topics for multivariate data visualization according to the current research.

N

Narrative Science Systems: A Review Automatic narration of events and entities is the need of the hour, especially when live reporting is critical and volume of information to be narrated is huge. This paper discusses the challenges in this context, along with the algorithms used to build such systems. From a systematic study, we can infer that most of the work done in this area is related to statistical data. It was also found that subjective evaluation or contribution of experts is also limited for narration context.
Natively Interpretable Machine Learning and Artificial Intelligence: Preliminary Results and Future Directions Machine learning models have become more and more complex in order to better approximate complex functions. Although fruitful in many domains, the added complexity has come at the cost of model interpretability. The once popular k-nearest neighbors (kNN) approach, which finds and uses the most similar data for reasoning, has received much less attention in recent decades due to numerous problems when compared to other techniques. We show that many of these historical problems with kNN can be overcome, and our contribution has applications not only in machine learning but also in online learning, data synthesis, anomaly detection, model compression, and reinforcement learning, without sacrificing interpretability. We introduce a synthesis between kNN and information theory that we hope will provide a clear path towards models that are innately interpretable and auditable. Through this work we hope to gather interest in combining kNN with information theory as a promising path to fully auditable machine learning and artificial intelligence.
Natural Disasters Detection in Social Media and Satellite imagery: a survey The analysis of natural disaster-related multimedia content got great attention in recent years. Being one of the most important sources of information, social media have been crawled over the years to collect and analyze disaster-related multimedia content. Satellite imagery has also been widely explored for disasters analysis. In this paper, we survey the existing literature on disaster detection and analysis of the retrieved information from social media and satellites. Literature on disaster detection and analysis of related multimedia content on the basis of the nature of the content can be categorized into three groups, namely (i) disaster detection in text; (ii) analysis of disaster-related visual content from social media; and (iii) disaster detection in satellite imagery. We extensively review different approaches proposed in these three domains. Furthermore, we also review benchmarking datasets available for the evaluation of disaster detection frameworks. Moreover, we provide a detailed discussion on the insights obtained from the literature review, and identify future trends and challenges, which will provide an important starting point for the researchers in the field.
Natural Language Generation at Scale: A Case Study for Open Domain Question Answering Current approaches to Natural Language Generation (NLG) focus on domain-specific, task-oriented dialogs (e.g. restaurant booking) using limited ontologies (up to 20 slot types), usually without considering the previous conversation context. Furthermore, these approaches require large amounts of data for each domain, and do not benefit from examples that may be available for other domains. This work explores the feasibility of statistical NLG for conversational applications with larger ontologies, which may be required by multi-domain dialog systems as well as open-domain knowledge graph based question answering (QA). We focus on modeling NLG through an Encoder-Decoder framework using a large dataset of interactions between real-world users and a conversational agent for open-domain QA. First, we investigate the impact of increasing the number of slot types on the generation quality and experiment with different partitions of the QA data with progressively larger ontologies (up to 369 slot types). Second, we explore multi-task learning for NLG and benchmark our model on a popular NLG dataset and perform experiments with open-domain QA and task-oriented dialog. Finally, we integrate conversation context by using context embeddings as an additional input for generation to improve response quality. Our experiments show the feasibility of learning statistical NLG models for open-domain contextual QA with larger ontologies.
Natural Language Processing: State of The Art, Current Trends and Challenges Natural language processing (NLP) has recently gained much attention for representing and analysing human language computationally. It has spread its applications in various fields such as machine translation, email spam detection, information extraction, summarization, medical, and question answering etc. The paper distinguishes four phases by discussing different levels of NLP and components of Natural Language Generation (NLG) followed by presenting the history and evolution of NLP, state of the art presenting the various applications of NLP and current trends and challenges.
Navigating Diverse Data Science Learning: Critical Reflections Towards Future Practice Data Science is currently a popular field of science attracting expertise from very diverse backgrounds. Current learning practices need to acknowledge this and adapt to it. This paper summarises some experiences relating to such learning approaches from teaching a postgraduate Data Science module, and draws some learned lessons that are of relevance to others teaching Data Science.
Negative Results in Computer Vision: A Perspective A negative result is when the outcome of an experiment or a model is not what is expected or when a hypothesis does not hold. Despite being often overlooked in the scientific community, negative results are results and they carry value. While this topic has been extensively discussed in other fields such as social sciences and biosciences, less attention has been paid to it in the computer vision community. The unique characteristics of computer vision, in particular its experimental aspect, calls for a special treatment of this matter. In this paper, I will address questions such as what makes negative results important, how they should be disseminated, and how they should be incentivized. Further, I will discuss issues such as computer and human vision interaction, experimental design and statistical hypothesis testing, performance evaluation and model comparison, as well as computer vision research culture.
Network Community Detection: A Review and Visual Survey Community structure is an important area of research. It has received a considerable attention from the scientific community. Despite its importance, one of the key problems in locating information about community detection is the diverse spread of related articles across various disciplines. To the best of our knowledge, there is no current comprehensive review of recent literature which uses a scientometric analysis using complex networks analysis covering all relevant articles from the Web of Science (WoS). Here we present a visual survey of key literature using CiteSpace. The idea is to identify emerging trends besides using network techniques to examine the evolution of the domain. Towards that end, we identify the most influential, central, as well as active nodes using scientometric analyses. We examine authors, key articles, cited references, core subject categories, key journals, institutions, as well as countries. The exploration of the scientometric literature of the domain reveals that Yong Wang is a pivot node with the highest centrality. Additionally, we have observed that Mark Newman is the most highly cited author in the network. We have also identified that the journal, ‘Reviews of Modern Physics’ has the strongest citation burst. In terms of cited documents, an article by Andrea Lancichinetti has the highest centrality score. We have also discovered that the origin of the key publications in this domain is from the United States. Whereas Scotland has the strongest and longest citation burst. Additionally, we have found that the categories of ‘Computer Science’ and ‘Engineering’ lead other categories based on frequency and centrality respectively.
Network Embedding: An Overview Networks are one of the most powerful structures for modeling problems in the real world. Downstream machine learning tasks defined on networks have the potential to solve a variety of problems. With link prediction, for instance, one can predict whether two persons will become friends on a social network. Many machine learning algorithms, however, require that each input example is a real vector. Network embedding encompasses various methods for unsupervised, and sometimes supervised, learning of feature representations of nodes and links in a network. Typically, embedding methods are based on the assumption that the similarity between nodes in the network should be reflected in the learned feature representations. In this paper, we review significant contributions to network embedding in the last decade. In particular, we look at four methods: Spectral Clustering, DeepWalk, Large-scale Information Network Embedding (LINE), and node2vec. We describe each method and list its advantages and shortcomings. In addition, we give examples of real-world machine learning problems on networks in which the embedding is critical in order to maximize the predictive performance of the machine learning task. Finally, we take a look at research trends and state-of-the art methods in the research on network embedding.
Network reconstruction with local partial correlation: comparative evaluation Over the past decade, various methods have been proposed for the reconstruction of networks modeled as Gaussian Graphical Models. In this work we analyzed three different approaches: the Graphical Lasso (GLasso), Graphical Ridge (GGMridge) and Local Partial Correlation (LPC). For the evaluation of the methods, we used high dimensional data generated from simulated random graphs (Erd\’os-R\’enyi, Barab\’asi-Albert, Watts-Strogatz). The performance was assessed through the Receiver Operating Characteristic (ROC) curve. In addition the methods were used for reconstruction of co-expression network, for differentially expressed genes in human cervical cancer data. LPC method outperformed the GLasso in most of the simulation cases, even though GGMridge produced better ROC curves then both other methods. LPC obtained similar outcomes as GGMridge in real data studies.
Network Representation Learning: Consolidation and Renewed Bearing Graphs are a natural abstraction for many problems where nodes represent entities and edges represent a relationship across entities. An important area of research that has emerged over the last decade is the use of graphs as a vehicle for non-linear dimensionality reduction in a manner akin to previous efforts based on manifold learning with uses for downstream database processing, machine learning and visualization. In this systematic yet comprehensive experimental survey, we benchmark several popular network representation learning methods operating on two key tasks: link prediction and node classification. We examine the performance of 12 unsupervised embedding methods on 15 datasets. To the best of our knowledge, the scale of our study — both in terms of the number of methods and number of datasets — is the largest to date. Our results reveal several key insights about work-to-date in this space. First, we find that certain baseline methods (task-specific heuristics, as well as classic manifold methods) that have often been dismissed or are not considered by previous efforts can compete on certain types of datasets if they are tuned appropriately. Second, we find that recent methods based on matrix factorization offer a small but relatively consistent advantage over alternative methods (e.g., random-walk based methods) from a qualitative standpoint. Specifically, we find that MNMF, a community preserving embedding method, is the most competitive method for the link prediction task. While NetMF is the most competitive baseline for node classification. Third, no single method completely outperforms other embedding methods on both node classification and link prediction tasks. We also present several drill-down analysis that reveals settings under which certain algorithms perform well (e.g., the role of neighborhood context on performance) — guiding the end-user.
Network Structure Inference, A Survey: Motivations, Methods, and Applications Networks are used to represent relationships between entities in many complex systems, spanning from online social networks to biological cell development and brain activity. These networks model relationships which present various challenges. In many cases, relationships between entities are unambiguously known: are two users friends in a social network Do two researchers collaborate on a published paper Do two road segments in a transportation system intersect These are unambiguous and directly observable in the system in question. In most cases, relationship between nodes are not directly observable and must be inferred: does one gene regulate the expression of another Do two animals who physically co-locate have a social bond Who infected whom in a disease outbreak Existing approaches use specialized knowledge in different home domains to infer and measure the goodness of inferred network for a specific task. However, current research lacks a rigorous validation framework which employs standard statistical validation. In this survey, we examine how network representations are learned from non-network data, the variety of questions and tasks on these data over several domains, and validation strategies for measuring the inferred network’s capability of answering questions on the original system of interest.
Neural Approaches to Conversational AI The present paper surveys neural approaches to conversational AI that have been developed in the last few years. We group conversational systems into three categories: (1) question answering agents, (2) task-oriented dialogue agents, and (3) chatbots. For each category, we present a review of state-of-the-art neural approaches, draw the connection between them and traditional approaches, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies.
Neural Architecture Search: A Survey Deep Learning has enabled remarkable progress over the last years on a variety of tasks, such as image recognition, speech recognition, and machine translation. One crucial aspect for this progress are novel neural architectures. Currently employed architectures have mostly been developed manually by human experts, which is a time-consuming and error-prone process. Because of this, there is growing interest in automated neural architecture search methods. We provide an overview of existing work in this field of research and categorize them according to three dimensions: search space, search strategy, and performance estimation strategy.
Neural Distributed Autoassociative Memories: A Survey Introduction. Neural network models of autoassociative, distributed memory allow storage and retrieval of many items (vectors) where the number of stored items can exceed the vector dimension (the number of neurons in the network). This opens the possibility of a sublinear time search (in the number of stored items) for approximate nearest neighbors among vectors of high dimension. The purpose of this paper is to review models of autoassociative, distributed memory that can be naturally implemented by neural networks (mainly with local learning rules and iterative dynamics based on information locally available to neurons). Scope. The survey is focused mainly on the networks of Hopfield, Willshaw and Potts, that have connections between pairs of neurons and operate on sparse binary vectors. We discuss not only autoassociative memory, but also the generalization properties of these networks. We also consider neural networks with higher-order connections and networks with a bipartite graph structure for non-binary data with linear constraints. Conclusions. In conclusion we discuss the relations to similarity search, advantages and drawbacks of these techniques, and topics for further research. An interesting and still not completely resolved question is whether neural autoassociative memories can search for approximate nearest neighbors faster than other index structures for similarity search, in particular for the case of very high dimensional vectors.
Neural Graph Machines: Learning Neural Networks Using Graphs Label propagation is a powerful and flexible semi-supervised learning technique on graphs. Neural networks, on the other hand, have proven track records in many supervised learning tasks. In this work, we propose a training framework with a graph-regularised objective, namely ‘Neural Graph Machines’, that can combine the power of neural networks and label propagation. This work generalises previous literature on graph-augmented training of neural networks, enabling it to be applied to multiple neural architectures (Feed-forward NNs, CNNs and LSTM RNNs) and a wide range of graphs. The new objective allows the neural networks to harness both labeled and unlabeled data by: (a) allowing the network to train using labeled data as in the supervised setting, (b) biasing the network to learn similar hidden representations for neighboring nodes on a graph, in the same vein as label propagation. Such architectures with the proposed objective can be trained efficiently using stochastic gradient descent and scaled to large graphs, with a runtime that is linear in the number of edges. The proposed joint training approach convincingly outperforms many existing methods on a wide range of tasks (multi-label classification on social graphs, news categorization, document classification and semantic intent classification), with multiple forms of graph inputs (including graphs with and without node-level features) and using different types of neural networks.
Neural Machine Reading Comprehension: Methods and Trends Machine Reading Comprehension (MRC), which requires the machine to answer questions based on the given context, has gained increasingly wide attention with the appearance of deep learning over the past few years. Although the research of MRC based on deep learning is flourishing, there is a lack of a comprehensive survey article to summarize the proposed approaches and the recent trends. As a result, we conduct a thorough overview of recent research efforts on this promising field. To be concrete, we compare MRC tasks in different dimensions and introduce the general architecture. We further provide a taxonomy of state-of-the-art approaches utilized in prevalent models. Finally, we discuss some new trends and conclude by describing some open issues in the field.
Neural Machine Translation and Sequence-to-sequence Models: A Tutorial This tutorial introduces a new and powerful set of techniques variously called ‘neural machine translation’ or ‘neural sequence-to-sequence models’. These techniques have been used in a number of tasks regarding the handling of human language, and can be a powerful tool in the toolbox of anyone who wants to model sequential data of some sort. The tutorial assumes that the reader knows the basics of math and programming, but does not assume any particular experience with neural networks or natural language processing. It attempts to explain the intuition behind the various methods covered, then delves into them with enough mathematical detail to understand them concretely, and culiminates with a suggestion for an implementation exercise, where readers can test that they understood the content in practice.
Neural networks and rational functions Neural networks and rational functions efficiently approximate each other. In more detail, it is shown here that for any ReLU network, there exists a rational function of degree $O(\text{polylog}(1/\epsilon))$ which is $\epsilon$-close, and similarly for any rational function there exists a ReLU network of size $O(\text{polylog}(1/\epsilon))$ which is $\epsilon$-close. By contrast, polynomials need degree $\Omega(\text{poly}(1/\epsilon))$ to approximate even a single ReLU. When converting a ReLU network to a rational function as above, the hidden constants depend exponentially on the number of layers, which is shown to be tight; in other words, a compositional representation can be beneficial even for rational functions.
Neural Networks for Information Retrieval Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many approaches to many IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR.
Neural Style Transfer: A Review The recent work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNN) in creating artistic fantastic imagery by separating and recombing the image content and style. This process of using CNN to migrate the semantic content of one image to different styles is referred to as Neural Style Transfer. Since then, Neural Style Transfer has become a trending topic both in academic literature and industrial applications. It is receiving increasing attention from computer vision researchers and several methods are proposed to either improve or extend the original neural algorithm proposed by Gatys et al. However, there is no comprehensive survey presenting and summarizing recent Neural Style Transfer literature. This review aims to provide an overview of the current progress towards Neural Style Transfer, as well as discussing its various applications and open problems for future research.
Neural Text Generation: A Practical Guide Deep learning methods have recently achieved great empirical success on machine translation, dialogue response generation, summarization, and other text generation tasks. At a high level, the technique has been to train end-to-end neural network models consisting of an encoder model to produce a hidden representation of the source text, followed by a decoder model to generate the target. While such models have significantly fewer pieces than earlier systems, significant tuning is still required to achieve good performance. For text generation models in particular, the decoder can behave in undesired ways, such as by generating truncated or repetitive outputs, outputting bland and generic responses, or in some cases producing ungrammatical gibberish. This paper is intended as a practical guide for resolving such undesired behavior in text generation models, with the aim of helping enable real-world applications.
Neural-Symbolic Learning and Reasoning: A Survey and Interpretation The study and understanding of human behaviour is relevant to computer science, artificial intelligence, neural computation, cognitive science, philosophy, psychology, and several other areas. Presupposing cognition as basis of behaviour, among the most prominent tools in the modelling of behaviour are computational-logic systems, connectionist models of cognition, and models of uncertainty. Recent studies in cognitive science, artificial intelligence, and psychology have produced a number of cognitive models of reasoning, learning, and language that are underpinned by computation. In addition, efforts in computer science research have led to the development of cognitive computational systems integrating machine learning and automated reasoning. Such systems have shown promise in a range of applications, including computational biology, fault diagnosis, training and assessment in simulators, and software verification. This joint survey reviews the personal ideas and views of several researchers on neural-symbolic learning and reasoning. The article is organised in three parts: Firstly, we frame the scope and goals of neural-symbolic computation and have a look at the theoretical foundations. We then proceed to describe the realisations of neural-symbolic computation, systems, and applications. Finally we present the challenges facing the area and avenues for further research.
Next Generation Business Intelligence and Analytics: A Survey Business Intelligence and Analytics (BIandA) is the process of extracting and predicting business-critical insights from data. Traditional BI focused on data collection, extraction, and organization to enable efficient query processing for deriving insights from historical data. With the rise of big data and cloud computing, there are many challenges and opportunities for the BI. Especially with the growing number of data sources, traditional BI\andA are evolving to provide intelligence at different scales and perspectives – operational BI, situational BI, self-service BI. In this survey, we review the evolution of business intelligence systems in full scale from back-end architecture to and front-end applications. We focus on the changes in the back-end architecture that deals with the collection and organization of the data. We also review the changes in the front-end applications, where analytic services and visualization are the core components. Using a uses case from BI in Healthcare, which is one of the most complex enterprises, we show how BI\andA will play an important role beyond the traditional usage. The survey provides a holistic view of Business Intelligence and Analytics for anyone interested in getting a complete picture of the different pieces in the emerging next generation BI\andA solutions.
Next Generation Resilient Cyber-Physical Systems Cyber-Physical Systems (CPS) consist of distributed engineered environments where the monitoring and surveillance tasks are governed by tightly integrated computing, communication and control technologies. CPS are omnipresent in our everyday life. Hacking and failures of such systems have impact on critical services with potentially significant and lasting consequences. In this paper, we review which requirements a CPS must meet to address the challenges of tomorrow. Two key challenges are understanding and reinforcing the resilience of CPS.
Nine Quick Tips for Analyzing Network Data These tips provide a quick and concentrated guide for beginners in the analysis of network data.
Non-computability of human intelligence We revisit the question (most famously) initiated by Turing:Can human intelligence be completely modelled by a Turing machine? To give away the ending we show here that the answer is \emph{no}. More specifically we show that at least some thought processes of the brain cannot be Turing computable. In particular some physical processes are not Turing computable, which is not entirely expected. The main difference of our argument with the well known Lucas-Penrose argument is that we do not use G\’odel’s incompleteness theorem, (although our argument seems related to G\’odel’s) and we do not need to assume fundamental consistency of human reasoning powers, (which is controversial) we also side-step some meta-logical issues with their argument, which have also been controversial. The argument is via a thought experiment and at least partly physical, but no serious physical assumptions are made. Furthermore the argument can be reformed as an actual (likely future) experiment.
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated thinking of optimization and statistics leads to fruitful research findings.
Nonlinear functional regression: a functional RKHS approach This paper deals with functional regression, in which the input attributes as well as the response are functions. To deal with this problem, we develop a functional reproducing kernel Hilbert space approach; here, a kernel is an operator acting on a function and yielding a function. We demonstrate basic properties of these functional RKHS, as well as a representer theorem for this setting; we investigate the construction of kernels; we provide some experimental insight.
Nonlinear probability. A theory with incompatible stochastic variables In 1991 J.F. Aarnes introduced the concept of quasi-measures in a compact topological space $\Omega$ and established the connection between quasi-states on $C (\Omega)$ and quasi-measures in $\Omega$. This work solved the linearity problem of quasi-states on $C^*$-algebras formulated by R.V. Kadison in 1965. The answer is that a quasi-state need not be linear, so a quasi-state need not be a state. We introduce nonlinear measures in a space $\Omega$ which is a generalization of a measurable space. In this more general setting we are still able to define integration and establish a representation theorem for the corresponding functionals. A probabilistic language is choosen since we feel that the subject should be of some interest to probabilists. In particular we point out that the theory allows for incompatible stochastic variables. The need for incompatible variables is well known in quantum mechanics, but the need seems natural also in other contexts as we try to explain by a questionary example. Keywords and phrases: Epistemic probability, Integration with respect to mea- sures and other set functions, Banach algebras of continuous functions, Set func- tions and measures on topological spaces, States, Logical foundations of quantum mechanics.
Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications Nonnegative matrix factorization (NMF) has become a workhorse for signal and data analytics, triggered by its model parsimony and interpretability. Perhaps a bit surprisingly, the understanding to its model identifiability—the major reason behind the interpretability in many applications such as topic mining and hyperspectral imaging—had been rather limited until recent years. Beginning from the 2010s, the identifiability research of NMF has progressed considerably: Many interesting and important results have been discovered by the signal processing (SP) and machine learning (ML) communities. NMF identifiability has a great impact on many aspects in practice, such as ill-posed formulation avoidance and performance-guaranteed algorithm design. On the other hand, there is no tutorial paper that introduces NMF from an identifiability viewpoint. In this paper, we aim at filling this gap by offering a comprehensive and deep tutorial on model identifiability of NMF as well as the connections to algorithms and applications. This tutorial will help researchers and graduate students grasp the essence and insights of NMF, thereby avoiding typical `pitfalls’ that are often times due to unidentifiable NMF formulations. This paper will also help practitioners pick/design suitable factorization tools for their own problems.
Notes: A Continuous Model of Neural Networks. Part I: Residual Networks In this series of notes, we try to model neural networks as as discretizations of continuous flows on the space of data, which can be called flow model. The idea comes from an observation of their similarity in mathematical structures. This conceptual analogy has not been proven useful yet, but it seems interesting to explore. In this part, we start with a linear transport equation (with nonlinear transport velocity field) and obtain a class of residual type neural networks. If the transport velocity field has a special form, the obtained network is found similar to the original ResNet. This neural network can be regarded as a discretization of the continuous flow defined by the transport flow. In the end, a summary of the correspondence between neural networks and transport equations is presented, followed by some general discussions.
Novel Artificial Human Optimization Field Algorithms – The Beginning New Artificial Human Optimization (AHO) Field Algorithms can be created from scratch or by adding the concept of Artificial Humans into other existing Optimization Algorithms. Particle Swarm Optimization (PSO) has been very popular for solving complex optimization problems due to its simplicity. In this work, new Artificial Human Optimization Field Algorithms are created by modifying existing PSO algorithms with AHO Field Concepts. These Hybrid PSO Algorithms comes under PSO Field as well as AHO Field. There are Hybrid PSO research articles based on Human Behavior, Human Cognition and Human Thinking etc. But there are no Hybrid PSO articles which based on concepts like Human Disease, Human Kindness and Human Relaxation. This paper proposes new AHO Field algorithms based on these research gaps. Some existing Hybrid PSO algorithms are given a new name in this work so that it will be easy for future AHO researchers to find these novel Artificial Human Optimization Field Algorithms. A total of 6 Artificial Human Optimization Field algorithms titled ‘Human Safety Particle Swarm Optimization (HuSaPSO)’, ‘Human Kindness Particle Swarm Optimization (HKPSO)’, ‘Human Relaxation Particle Swarm Optimization (HRPSO)’, ‘Multiple Strategy Human Particle Swarm Optimization (MSHPSO)’, ‘Human Thinking Particle Swarm Optimization (HTPSO)’ and ‘Human Disease Particle Swarm Optimization (HDPSO)’ are tested by applying these novel algorithms on Ackley, Beale, Bohachevsky, Booth and Three-Hump Camel Benchmark Functions. Results obtained are compared with PSO algorithm.
Novelty Detection in Learning Systems Novelty detection is concerned with recognising inputs that differ in some way from those that are usually seen. It is a useful technique in cases where an important class of data is under-represented in the training set. This means that the performance of the network will be poor for those classes. In some circumstances, such as medical data and fault detection, it is often precisely the class that is under-represented in the data, the disease or potential fault, that the network should detect. In novelty detection systems the network is trained only on the negative examples where that class is not present, and then detects inputs that do not fits into the model that it has acquired, that it, members of the novel class. This paper reviews the literature on novelty detection in neural networks and other machine learning techniques, as well as providing brief overviews of the related topics of statistical outlier detection and novelty detection in biological organisms.

O

Object Detection in 20 Years: A Survey Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today’s object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century’s time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.
Object Detection with Deep Learning: A Review Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.
Object Oriented Analysis using Natural Language Processing concepts: A Review The Software Development Life Cycle (SDLC) starts with eliciting requirements of the customers in the form of Software Requirement Specification (SRS). SRS document needed for software development is mostly written in Natural Language(NL) convenient for the client. From the SRS document only, the class name, its attributes and the functions incorporated in the body of the class are traced based on pre-knowledge of analyst. The paper intends to present a review on Object Oriented (OO) analysis using Natural Language Processing (NLP) techniques. This analysis can be manual where domain expert helps to generate the required diagram or automated system, where the system generates the required diagram, from the input in the form of SRS.
Observational Learning by Reinforcement Learning Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent. It is a core mechanism appearing in various instances of social learning and has been found to be employed in several intelligent species, including humans. In this paper, we investigate to what extent the explicit modelling of other agents is necessary to achieve observational learning through machine learning. Especially, we argue that observational learning can emerge from pure Reinforcement Learning (RL), potentially coupled with memory. Through simple scenarios, we demonstrate that an RL agent can leverage the information provided by the observations of an other agent performing a task in a shared environment. The other agent is only observed through the effect of its actions on the environment and never explicitly modeled. Two key aspects are borrowed from observational learning: i) the observer behaviour needs to change as a result of viewing a ‘teacher’ (another agent) and ii) the observer needs to be motivated somehow to engage in making use of the other agent’s behaviour. The later is naturally modeled by RL, by correlating the learning agent’s reward with the teacher agent’s behaviour.
On Being a Data Skeptic I´d like to set something straight right out of the gate. I´m not a data cynic, nor am I urging other people to be. Data is here, it´s growing, and it´s powerful. I´m not hiding behind the word ‘skeptic’ the way climate change ‘skeptics’ do, when they should call themselves deniers. Instead, I urge the reader to cultivate their inner skeptic, which I define by the following characteristic behavior. A skeptic is someone who maintains a consistently inquisitive attitude toward facts, opinions, or (especially) beliefs stated as facts. A skeptic asks questions when confronted with a claim that has been taken for granted. That´s not to say a skeptic brow-beats someone for their beliefs, but rather that they set up reasonable experiments to test those beliefs. A really excellent skeptic puts the ‘science’ into the term ‘data science.’ In this paper, I´ll make the case that the community of data practitioners needs more skepticism, or at least would benefit greatly from it, for the following reason: there´s a two-fold problem in this community. On the one hand, many of the people in it are overly enamored with data or data science tools. On the other hand, other people are overly pessimistic about those same tools. I´m charging myself with making a case for data practitioners to engage in active, intelligent, and strategic data skepticism. I´m proposing a middle-of-the-road approach: don´t be blindly optimistic, don´t be blindly pessimistic. Most of all, don´t be awed. Realize there are nuanced considerations and plenty of context and that you don´t necessarily have to be a mathematician to understand the issues. …
On Calibration of Modern Neural Networks Confidence calibration — the problem of predicting probability estimates representative of the true correctness likelihood — is important for classification models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration. We evaluate the performance of various post-processing calibration methods on state-of-the-art architectures with image and document classification datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling — a single-parameter variant of Platt Scaling — is surprisingly effective at calibrating predictions.
On Clustering Validation Techniques Cluster analysis aims at identifying groups of similar objects and, therefore helps to discover distribution of patterns and interesting correlations in large data sets. It has been subject of wide research since it arises in many application domains in engineering, business and social sciences. Especially, in the last years the availability of huge transactional and experimental data sets and the arising requirements for data mining created needs for clustering algorithms that scale and can be applied in diverse domains. This paper introduces the fundamental concepts of clustering while it surveys the widely known clustering algorithms in a comparative way. Moreover, it addresses an important issue of clustering process regarding the quality assessment of the clustering results. This is also related to the inherent features of the data set under concern. A review of clustering validity measures and approaches available in the literature is presented. Furthermore, the paper illustrates the issues that are under-addressed by the recent algorithms and gives the trends in clustering process.
On Cognitive Preferences and the Interpretability of Rule-based Models It is conventional wisdom in machine learning and data mining that logical models such as rule sets are more interpretable than other models, and that among such rule-based models, simpler models are more interpretable than more complex ones. In this position paper, we question this latter assumption, and recapitulate evidence for and against this postulate. We also report the results of an evaluation in a crowd-sourcing study, which does not reveal a strong preference for simple rules, whereas we can observe a weak preference for longer rules in some domains. We then continue to review criteria for interpretability from the psychological literature, evaluate some of them, and briefly discuss their potential use in machine learning.
On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes We compare discriminative and generative learning as typified by logistic regression and naive Bayes. We show, contrary to a widelyheld belief that discriminative classifiers are almost always to be preferred, that there can often be two distinct regimes of performance as the training set size is increased, one in which each algorithm does better. This stems from the observation – which is borne out in repeated experiments – that while discriminative learning has lower asymptotic error, a generative classifier may also approach its (higher) asymptotic error much faster.
On Ensuring that Intelligent Machines Are Well-Behaved Machine learning algorithms are everywhere, ranging from simple data analysis and pattern recognition tools used across the sciences to complex systems that achieve super-human performance on various tasks. Ensuring that they are well-behaved—that they do not, for example, cause harm to humans or act in a racist or sexist way—is therefore not a hypothetical problem to be dealt with in the future, but a pressing one that we address here. We propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesirable behaviors. To show the viability of this new framework, we use it to create new machine learning algorithms that preclude the sexist and harmful behaviors exhibited by standard machine learning algorithms in our experiments. Our framework for designing machine learning algorithms simplifies the safe and responsible application of machine learning.
On Evaluating Commercial Cloud Services: A Systematic Review Background: Cloud Computing is increasingly booming in industry with many competing providers and services. Accordingly, evaluation of commercial Cloud services is necessary. However, the existing evaluation studies are relatively chaotic. There exists tremendous confusion and gap between practices and theory about Cloud services evaluation. Aim: To facilitate relieving the aforementioned chaos, this work aims to synthesize the existing evaluation implementations to outline the state-of-the-practice and also identify research opportunities in Cloud services evaluation. Method: Based on a conceptual evaluation model comprising six steps, the Systematic Literature Review (SLR) method was employed to collect relevant evidence to investigate the Cloud services evaluation step by step. Results: This SLR identified 82 relevant evaluation studies. The overall data collected from these studies essentially represent the current practical landscape of implementing Cloud services evaluation, and in turn can be reused to facilitate future evaluation work. Conclusions: Evaluation of commercial Cloud services has become a world-wide research topic. Some of the findings of this SLR identify several research gaps in the area of Cloud services evaluation (e.g., the Elasticity and Security evaluation of commercial Cloud services could be a long-term challenge), while some other findings suggest the trend of applying commercial Cloud services (e.g., compared with PaaS, IaaS seems more suitable for customers and is particularly important in industry). This SLR study itself also confirms some previous experiences and reveals new Evidence-Based Software Engineering (EBSE) lessons.
On Generalization and Regularization in Deep Learning Why do large neural network generalize so well on complex tasks such as image classification or speech recognition What exactly is the role regularization for them These are arguably among the most important open questions in machine learning today. In a recent and thought provoking paper [C. Zhang et al.] several authors performed a number of numerical experiments that hint at the need for novel theoretical concepts to account for this phenomenon. The paper stirred quit a lot of excitement among the machine learning community but at the same time it created some confusion as discussions on OpenReview.net testifies. The aim of this pedagogical paper is to make this debate accessible to a wider audience of data scientists without advanced theoretical knowledge in statistical learning. The focus here is on explicit mathematical definitions and on a discussion of relevant concepts, not on proofs for which we provide references.
On k-Anonymity and the Curse of Dimensionality In recent years, the wide availability of personal data has made the problem of privacy preserving data mining an important one. A number of methods have recently been proposed for privacy preserving data mining of multidimensional data records. One of the methods for privacy preserving data mining is that of anonymization, in which a record is released only if it is indistinguishable from k other entities in the data. We note that methods such as k-anonymity are highly dependent upon spatial locality in order to effectively implement the technique in a statistically robust way. In high dimensional space the data be- comes sparse, and the concept of spatial locality is no longer easy to define from an application point of view. In this paper, we view the k-anonymization problem from the perspec- tive of inference attacks over all possible combinations of attributes. We show that when the data contains a large number of attributes which may be considered quasi-identifiers, it becomes difficult to anonymize the data without an unacceptably high amount of information loss. This is because an exponential number of combinations of dimensions can be used to make precise inference attacks, even when individual attributes are partially specified within a range. We provide an analysis of the effect of dimensionality on k-anonymity methods. We conclude that when a data set contains a large number of attributes which are open to inference attacks, we are faced with a choice of either completely suppressing most of the data or losing the desired level of anonymity. Thus, this paper shows that the curse of high dimensionality also applies to the problem of privacy preserving data mining.
On the computation of counterfactual explanations — A survey Due to the increasing use of machine learning in practice it becomes more and more important to be able to explain the prediction and behavior of machine learning models. An instance of explanations are counterfactual explanations which provide an intuitive and useful explanations of machine learning models. In this survey we review model-specific methods for efficiently computing counterfactual explanations of many different machine learning models and propose methods for models that have not been considered in literature so far.
On the Difficulty of Evaluating Baselines: A Study on Recommender Systems Numerical evaluations with comparisons to baselines play a central role when judging research in recommender systems. In this paper, we show that running baselines properly is difficult. We demonstrate this issue on two extensively studied datasets. First, we show that results for baselines that have been used in numerous publications over the past five years for the Movielens 10M benchmark are suboptimal. With a careful setup of a vanilla matrix factorization baseline, we are not only able to improve upon the reported results for this baseline but even outperform the reported results of any newly proposed method. Secondly, we recap the tremendous effort that was required by the community to obtain high quality results for simple methods on the Netflix Prize. Our results indicate that empirical findings in research papers are questionable unless they were obtained on standardized benchmarks where baselines have been tuned extensively by the research community.
On the Diversity of Memory and Storage Technologies The last decade has seen tremendous developments in memory and storage technologies, starting with Flash Memory and continuing with the upcoming Storage-Class Memories. Combined with an explosion of data processing, data analytics, and machine learning, this led to a segmentation of the memory and storage market. Consequently, the traditional storage hierarchy, as we know it today, might be replaced by a multitude of storage hierarchies, with potentially different depths, each tailored for specific workloads. In this context, we explore in this ‘Kurz Erkl\’art’ the state of memory technologies and reflect on their future use with a focus on data management systems.
On the Implicit Assumptions of GANs Generative adversarial nets (GANs) have generated a lot of excitement. Despite their popularity, they exhibit a number of well-documented issues in practice, which apparently contradict theoretical guarantees. A number of enlightening papers have pointed out that these issues arise from unjustified assumptions that are commonly made, but the message seems to have been lost amid the optimism of recent years. We believe the identified problems deserve more attention, and highlight the implications on both the properties of GANs and the trajectory of research on probabilistic models. We recently proposed an alternative method that sidesteps these problems.
On the Learning Dynamics of Deep Neural Networks While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm empirical observations by proving that the classification error also follows a sigmoidal shape in nonlinear architectures. We show that given proper initialization, learning expounds parallel independent modes and that certain regions of parameter space might lead to failed training. We also demonstrate that input norm and features’ frequency in the dataset lead to distinct convergence speeds which might shed some light on the generalization capabilities of deep neural networks. We provide a comparison between the dynamics of learning with cross-entropy and hinge losses, which could prove useful to understand recent progress in the training of generative adversarial networks. Finally, we identify a phenomenon that we baptize gradient starvation where the most frequent features in a dataset prevent the learning of other less frequent but equally informative features.
On the Origin of Deep Learning This paper is a review of the evolutionary history of deep learning models. It covers from the genesis of neural networks when associationism modeling of the brain is studied, to the models that dominate the last decade of research in deep learning like convolutional neural networks, deep belief networks, and recurrent neural networks, and extends to popular recent models like variational autoencoder and generative adversarial nets. In addition to a review of these models, this paper primarily focuses on the precedents of the models above, examining how the initial ideas are assembled to construct the early models and how these preliminary models are developed into their current forms. Many of these evolutionary paths last more than half a century and have a diversity of directions. For example, CNN is built on prior knowledge of biological vision system; DBN is evolved from a trade-off of modeling power and computation complexity of graphical models and many nowadays models are neural counterparts of ancient linear models. This paper reviews these evolutionary paths and offers a concise thought flow of how these models are developed, and aims to provide a thorough background for deep learning. More importantly, along with the path, this paper summarizes the gist behind these milestones and proposes many directions to guide the future research of deep learning.
On the Robustness of Projection Neural Networks For Efficient Text Representation: An Empirical Study Recently, there has been strong interest in developing natural language applications that live on personal devices such as mobile phones, watches and IoT with the objective to preserve user privacy and have low memory. Advances in Locality-Sensitive Hashing (LSH)-based projection networks have demonstrated state-of-the-art performance without any embedding lookup tables and instead computing on-the-fly text representations. However, previous works have not investigated ‘What makes projection neural networks effective at capturing compact representations for text classification?’ and ‘Are these projection models resistant to perturbations and misspellings in input text?’. In this paper, we analyze and answer these questions through perturbation analyses and by running experiments on multiple dialog act prediction tasks. Our results show that the projections are resistant to perturbations and misspellings compared to widely-used recurrent architectures that use word embeddings. On ATIS intent prediction task, when evaluated with perturbed input data, we observe that the performance of recurrent models that use word embeddings drops significantly by more than 30% compared to just 5% with projection networks, showing that LSH-based projection representations are robust and consistently lead to high quality performance.
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping on the performance of a state-of-the-art text classifier based on convolutional neural networks. Despite potentially affecting the final performance of any given model, this aspect has not received a substantial interest in the deep learning literature. We perform an extensive evaluation in standard benchmarks from text categorization and sentiment analysis. Our results show that a simple tokenization of the input text is often enough, but also highlight the importance of being consistent in the preprocessing of the evaluation set and the corpus used for training word embeddings.
On the Spectral Bias of Deep Neural Networks It is well known that over-parametrized deep neural networks (DNNs) are an overly expressive class of functions that can memorize even random data with $100\%$ training accuracy. This raises the question why they do not easily overfit real data. To answer this question, we study deep networks using Fourier analysis. We show that deep networks with finite weights (or trained for finite number of steps) are inherently biased towards representing smooth functions over the input space. Specifically, the magnitude of a particular frequency component ($k$) of deep ReLU network function decays at least as fast as $\mathcal{O}(k^{-2})$, with width and depth helping polynomially and exponentially (respectively) in modeling higher frequencies. This shows for instance why DNNs cannot perfectly \textit{memorize} peaky delta-like functions. We also show that DNNs can exploit the geometry of low dimensional data manifolds to approximate complex functions that exist along the manifold with simple functions when seen with respect to the input space. As a consequence, we find that all samples (including adversarial samples) classified by a network to belong to a certain class are connected by a path such that the prediction of the network along that path does not change. Finally we find that DNN parameters corresponding to functions with higher frequency components occupy a smaller volume in the parameter.
On the State of the Art of Evaluation in Neural Language Models Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. We establish a new state of the art on the Penn Treebank and Wikitext-2 corpora, as well as strong baselines on the Hutter Prize dataset.
On the Transferability of Representations in Neural Networks Between Datasets and Tasks Deep networks, composed of multiple layers of hierarchical distributed representations, tend to learn low-level features in initial layers and transition to high-level features towards final layers. Paradigms such as transfer learning, multi-task learning, and continual learning leverage this notion of generic hierarchical distributed representations to share knowledge across datasets and tasks. Herein, we study the layer-wise transferability of representations in deep networks across a few datasets and tasks and note some interesting empirical observations.
On the Turing Completeness of Modern Neural Network Architectures Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational properties of these alternatives have not yet been fully explored. We study the computational power of two of the most paradigmatic architectures exemplifying these mechanisms: the Transformer (Vaswani et al., 2017) and the Neural GPU (Kaiser and Sutskever, 2016). We show both models to be Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. In particular, neither the Transformer nor the Neural GPU requires access to an external memory to become Turing complete. Our study also reveals some minimal sets of elements needed to obtain these completeness results.
On Unifying Deep Generative Models Deep generative models have achieved impressive success in recent years. Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), as powerful frameworks for deep generative model learning, have largely been considered as two distinct paradigms and received extensive independent study respectively. This paper establishes formal connections between deep generative modeling approaches through a new formulation of GANs and VAEs. We show that GANs and VAEs are essentially minimizing KL divergences with opposite directions and reversed latent/visible treatments, extending the two learning phases of classic wake-sleep algorithm, respectively. The unified view provides a powerful tool to analyze a diverse set of existing model variants, and enables to exchange ideas across research lines in a principled way. For example, we transfer the importance weighting method in VAE literatures for improved GAN learning, and enhance VAEs with an adversarial mechanism. Quantitative experiments show generality and effectiveness of the imported extensions.
On-Device Machine Learning: An Algorithms and Learning Theory Perspective The current paradigm for using machine learning models on a device is to train a model in the cloud and perform inference using the trained model on the device. However, with the increasing number of smart devices and improved hardware, there is interest in performing model training on the device. Given this surge in interest, a comprehensive survey of the field from a device-agnostic perspective sets the stage for both understanding the state-of-the-art and for identifying open challenges and future avenues of research. Since on-device learning is an expansive field with connections to a large number of related topics in AI and machine learning (including online learning, model adaptation, one/few-shot learning, etc), covering such a large number of topics in a single survey is impractical. Instead, this survey finds a middle ground by reformulating the problem of on-device learning as resource constrained learning where the resources are compute and memory. This reformulation allows tools, techniques, and algorithms from a wide variety of research areas to be compared equitably. In addition to summarizing the state of the art, the survey also identifies a number of challenges and next steps for both the algorithmic and theoretical aspects of on-device learning.
On-Disk Data Processing: Issues and Future Directions In this paper, we present a survey of ‘on-disk’ data processing (ODDP). ODDP, which is a form of near-data processing, refers to the computing arrangement where the secondary storage drives have the data processing capability. Proposed ODDP schemes vary widely in terms of the data processing capability, target applications, architecture and the kind of storage drive employed. Some ODDP schemes provide only a specific but heavily used operation like sort whereas some provide a full range of operations. Recently, with the advent of Solid State Drives, powerful and extensive ODDP solutions have been proposed. In this paper, we present a thorough review of architectures developed for different on-disk processing approaches along with current and future challenges and also identify the future directions which ODDP can take.
One Big Net For Everything I apply recent work on ‘learning to think’ (2015) and on PowerPlay (2011) to the incremental training of an increasingly general problem solver, continually learning to solve new tasks without forgetting previous skills. The problem solver is a single recurrent neural network (or similar general purpose computer) called ONE. ONE is unusual in the sense that it is trained in various ways, e.g., by black box optimization / reinforcement learning / artificial evolution as well as supervised / unsupervised learning. For example, ONE may learn through neuroevolution to control a robot through environment-changing actions, and learn through unsupervised gradient descent to predict future inputs and vector-valued reward signals as suggested in 1990. User-given tasks can be defined through extra goal-defining input patterns, also proposed in 1990. Suppose ONE has already learned many skills. Now a copy of ONE can be re-trained to learn a new skill, e.g., through neuroevolution without a teacher. Here it may profit from re-using previously learned subroutines, but it may also forget previous skills. Then ONE is retrained in PowerPlay style (2011) on stored input/output traces of (a) ONE’s copy executing the new skill and (b) previous instances of ONE whose skills are still considered worth memorizing. Simultaneously, ONE is retrained on old traces (even those of unsuccessful trials) to become a better predictor, without additional expensive interaction with the enviroment. More and more control and prediction skills are thus collapsed into ONE, like in the chunker-automatizer system of the neural history compressor (1991). This forces ONE to relate partially analogous skills (with shared algorithmic information) to each other, creating common subroutines in form of shared subnetworks of ONE, to greatly speed up subsequent learning of additional, novel but algorithmically related skills.
Online Algorithms This book chapter reviews fundamental concepts and results in the area of online algorithms. We first address classical online problems and then study various applications of current interest. Online algorithms represent a theoretical framework for studying problems in interactive computing. They model, in particular, that the input in an interactive system does not arrive as a batch but as a sequence of input portions and that the system must react in response to each incoming portion. Moreover, they take into account that at any point in time future input is unknown. As the name suggests, online algorithms consider the algorithmic aspects of interactive systems: We wish to design strategies that always compute good output and keep a given system in good state. No assumptions are made about the input stream. The input can even be generated by an adversary that creates new input portions based on the system´s reactions to previous ones. We seek algorithms that have a provably good performance.
Online Learning and Online Convex Optimization Online learning is a well established learning paradigm which has both theoretical and practical appeals. The goal of online learning is to make a sequence of accurate predictions given knowledge of the correct answer to previous prediction tasks and possibly additional available information. Online learning has been studied in several research fields including game theory, information theory, and machine learning. It also became of great interest to practitioners due the recent emergence of large scale applications such as online advertisement placement and online web ranking. In this survey we provide a modern overview of online learning. Our goal is to give the reader a sense of some of the interesting ideas and in particular to underscore the centrality of convexity in deriving efficient online learning algorithms. We do not mean to be comprehensive but rather to give a high-level, rigorous yet easy to follow, survey.
Online Learning: A Comprehensive Survey Online learning represents an important family of machine learning algorithms, in which a learner attempts to resolve an online prediction (or any type of decision-making) task by learning a model/hypothesis from a sequence of data instances one at a time. The goal of online learning is to ensure that the online learner would make a sequence of accurate predictions (or correct decisions) given the knowledge of correct answers to previous prediction or learning tasks and possibly additional information. This is in contrast to many traditional batch learning or offline machine learning algorithms that are often designed to train a model in batch from a given collection of training data instances. This survey aims to provide a comprehensive survey of the online machine learning literatures through a systematic review of basic ideas and key principles and a proper categorization of different algorithms and techniques. Generally speaking, according to the learning type and the forms of feedback information, the existing online learning works can be classified into three major categories: (i) supervised online learning where full feedback information is always available, (ii) online learning with limited feedback, and (iii) unsupervised online learning where there is no feedback available. Due to space limitation, the survey will be mainly focused on the first category, but also briefly cover some basics of the other two categories. Finally, we also discuss some open issues and attempt to shed light on potential future research directions in this field.
Online Machine Learning in Big Data Streams The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software architectures and efficient algorithms. The second one also imposes nontrivial theoretical restrictions on the modeling methods: In the data stream model, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives. In this article, we provide an overview of distributed software architectures and libraries as well as machine learning models for online learning. We highlight the most important ideas for classification, regression, recommendation, and unsupervised modeling from streaming data, and we show how they are implemented in various distributed data stream processing systems. This article is a reference material and not a survey. We do not attempt to be comprehensive in describing all existing methods and solutions; rather, we give pointers to the most important resources in the field. All related sub-fields, online algorithms, online learning, and distributed data processing are hugely dominant in current research and development with conceptually new research results and software components emerging at the time of writing. In this article, we refer to several survey results, both for distributed data processing and for online machine learning. Compared to past surveys, our article is different because we discuss recommender systems in extended detail.
Online Portfolio Selection: A Survey Online portfolio selection is a fundamental problem in computational finance, which has been extensively studied across several research communities, including finance, statistics, artificial intelligence, machine learning, and data mining. This article aims to provide a comprehensive survey and a structural understanding of online portfolio selection techniques published in the literature. From an online machine learning perspective, we first formulate online portfolio selection as a sequential decision problem, and then we survey a variety of state-of-the-art approaches, which are grouped into several major categories, including benchmarks, Follow-the-Winner approaches, Follow-the-Loser approaches, Pattern-Matching–based approaches, and Meta-Learning Algorithms. In addition to the problem formulation and related algorithms, we also discuss the relationship of these algorithms with the capital growth theory so as to better understand the similarities and differences of their underlying trading ideas. This article aims to provide a timely and comprehensive survey for both machine learning and data mining researchers in academia and quantitative portfolio managers in the financial industry to help them understand the state of the art and facilitate their research and practical applications. We also discuss some open issues and evaluate some emerging new trends for future research.
Online Principal Component Analysis Principal Component Analysis (PCA) is one of the most well known and widely used procedures in scienti c computing. It is used for dimension reduction, signal denoising, regression, correlation analysis, visualization etc. It can be described in many ways but one is particularly appealing in the context of online algorithms. In the online setting, the algorithm receives the input vectors xt one ofter the other and must always output yt before receiving xt+1.
Ontology Learning from Text: A Survey of Methods After the vision of the Semantic Web was broadcasted at the turn of the millennium, ontology became a synonym for the solution to many problems concerning the fact that computers do not understand human language: if there were an ontology and every document were marked up with it and we had agents that would understand the markup, then computers would finally be able to process our queries in a really sophisticated way. Some years later, the success of Google shows us that the vision has not come true, being hampered by the incredible amount of extra work required for the intellectual encoding of semantic mark-up – as compared to simply uploading an HTML page. To alleviate this acquisition bottleneck, the field of ontology learning has since emerged as an important sub-field of ontology engineering. …
Ontology-based Approach for Semantic Data Extraction from Social Big Data: State-of-the-art and Research Directions A challenge of managing and extracting useful knowledge from social media data sources has attracted much attention from academic and industry. To address this challenge, semantic analysis of textual data is focused in this paper. We propose an ontology-based approach to extract semantics of textual data and define the domain of data. In other words, we semantically analyse the social data at two levels i.e. the entity level and the domain level. We have chosen Twitter as a social channel challenge for a purpose of concept proof. Domain knowledge is captured in ontologies which are then used to enrich the semantics of tweets provided with specific semantic conceptual representation of entities that appear in the tweets. Case studies are used to demonstrate this approach. We experiment and evaluate our proposed approach with a public dataset collected from Twitter and from the politics domain. The ontology-based approach leverages entity extraction and concept mappings in terms of quantity and accuracy of concept identification.
OPEB: Open Physical Environment Benchmark for Artificial Intelligence Artificial Intelligence methods to solve continuous- control tasks have made significant progress in recent years. However, these algorithms have important limitations and still need significant improvement to be used in industry and real- world applications. This means that this area is still in an active research phase. To involve a large number of research groups, standard benchmarks are needed to evaluate and compare proposed algorithms. In this paper, we propose a physical environment benchmark framework to facilitate collaborative research in this area by enabling different research groups to integrate their designed benchmarks in a unified cloud-based repository and also share their actual implemented benchmarks via the cloud. We demonstrate the proposed framework using an actual implementation of the classical mountain-car example and present the results obtained using a Reinforcement Learning algorithm.
Open-endedness in AI systems, cellular evolution and intellectual discussions One of the biggest challenges that artificial intelligence (AI) research is facing in recent times is to develop algorithms and systems that are not only good at performing a specific intelligent task but also good at learning a very diverse of skills somewhat like humans do. In other words, the goal is to be able to mimic biological evolution which has produced all the living species on this planet and which seems to have no end to its creativity. The process of intellectual discussions is also somewhat similar to biological evolution in this regard and is responsible for many of the innovative discoveries and inventions that scientists and engineers have made in the past. In this paper, we present an information theoretic analogy between the process of discussions and the molecular dynamics within a cell, showing that there is a common process of information exchange at the heart of these two seemingly different processes, which can perhaps help us in building AI systems capable of open-ended innovation. We also discuss the role of consciousness in this process and present a framework for the development of open-ended AI systems.
Opening the black box of deep learning The great success of deep learning shows that its technology contains profound truth, and understanding its internal mechanism not only has important implications for the development of its technology and effective application in various fields, but also provides meaningful insights into the understanding of human brain mechanism. At present, most of the theoretical research on deep learning is based on mathematics. This dissertation proposes that the neural network of deep learning is a physical system, examines deep learning from three different perspectives: microscopic, macroscopic, and physical world views, answers multiple theoretical puzzles in deep learning by using physics principles. For example, from the perspective of quantum mechanics and statistical physics, this dissertation presents the calculation methods for convolution calculation, pooling, normalization, and Restricted Boltzmann Machine, as well as the selection of cost functions, explains why deep learning must be deep, what characteristics are learned in deep learning, why Convolutional Neural Networks do not have to be trained layer by layer, and the limitations of deep learning, etc., and proposes the theoretical direction and basis for the further development of deep learning now and in the future. The brilliance of physics flashes in deep learning, we try to establish the deep learning technology based on the scientific theory of physics.
Operational Analytics from A to Z
Opportunities in Machine Learning for Healthcare Healthcare is a natural arena for the application of machine learning, especially as modern electronic health records (EHRs) provide increasingly large amounts of data to answer clinically meaningful questions. However, clinical data and practice present unique challenges that complicate the use of common methodologies. This article serves as a primer on addressing these challenges and highlights opportunities for members of the machine learning and data science communities to contribute to this growing domain.
Optimal Machine Intelligence Near the Edge of Chaos It has long been suggested that living systems, in particular the brain, may operate near some critical point. How about machines? Through dynamical stability analysis on various computer vision models, we find direct evidence that optimal deep neural network performance occur near the transition point separating stable and chaotic attractors. In fact modern neural network architectures push the model closer to this edge of chaos during the training process. Our dissection into their fully connected layers reveals that they achieve the stability transition through self-adjusting an oscillation-diffusion process embedded in the weights. Further analogy to the logistic map leads us to believe that the optimality near the edge of chaos is a consequence of maximal diversity of stable states, which maximize the effective expressivity.
Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. It is written with an INFORMS audience in mind, specifically those readers who are familiar with the basics of optimization algorithms, but less familiar with machine learning. We begin by deriving a formulation of a supervised learning problem and show how it leads to various optimization problems, depending on the context and underlying assumptions. We then discuss some of the distinctive features of these optimization problems, focusing on the examples of logistic regression and the training of deep neural networks. The latter half of the tutorial focuses on optimization algorithms, first for convex logistic regression, for which we discuss the use of first-order methods, the stochastic gradient method, variance reducing stochastic methods, and second-order methods. Finally, we discuss how these approaches can be employed to the training of deep neural networks, emphasizing the difficulties that arise from the complex, nonconvex structure of these models.
Optimization Models for Machine Learning: A Survey This paper surveys the machine learning literature and presents machine learning as optimization models. Such models can benefit from the advancement of numerical optimization techniques which have already played a distinctive role in several machine learning settings. Particularly, mathematical optimization models are presented for commonly used machine learning approaches for regression, classification, clustering, and deep neural networks as well new emerging applications in machine teaching and empirical model learning. The strengths and the shortcomings of these models are discussed and potential research directions are highlighted.
Optimization of Tree Ensembles Tree ensemble models such as random forests and boosted trees are among the most widely used and practically successful predictive models in applied machine learning and business analytics. Although such models have been used to make predictions based on exogenous, uncontrollable independent variables, they are increasingly being used to make predictions where the independent variables are controllable and are also decision variables. In this paper, we study the problem of tree ensemble optimization: given a tree ensemble that predicts some dependent variable using controllable independent variables, how should we set these variables so as to maximize the predicted value We formulate the problem as a mixed-integer optimization problem. We theoretically examine the strength of our formulation, provide a hierarchy of approximate formulations with bounds on approximation quality and exploit the structure of the problem to develop two large-scale solution methods, one based on Benders decomposition and one based on iteratively generating tree split constraints. We test our methodology on real data sets, including two case studies in drug design and customized pricing, and show that our methodology can efficiently solve large-scale instances to near or full optimality, and outperforms solutions obtained by heuristic approaches. In our drug design case, we show how our approach can identify compounds that efficiently trade-off predicted performance and novelty with respect to existing, known compounds. In our customized pricing case, we show how our approach can efficiently determine optimal store-level prices under a random forest model that delivers excellent predictive accuracy.
Optimization theory in Statistics This paper addresses the issues of optimization theory and related numerical issues within the context of Statistics. Focusing on the problem of concave regression, several estimation techniques for nonparametric shape-constrained regression are classified, analyzed and compared qualitatively and quantitatively through numerical simulations. In particular, their main features, strengths and limitations for solving large instances of the problem are examined through this paper. Several improvements to enhance numerical stability and bound the computational cost are proposed. For each analyzed algorithm, the pseudo-code and its corresponding code in Scilab are provided. The results from this study demonstrate that the choice of the optimization approach strongly impact algorithmic performances. Interestingly, it is also shown that, currently, are not available methods able to solve efficiently large instance of the concave regression problems (more than many thousands of points). We suggest that further research to fill this gap in the literature should focus on finding a way to exploit and adapt classical multi-scale strategy to compute an approximate solution.
Overcoming the Barriers to Production-Ready Machine Learning Workflows (Slide Deck)
Overview of Annotation Creation: Processes and Tools Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform. Because tool support is so central to achieving high quality, reusable annotations with low cost, the focus is on identifying capabilities that are necessary or useful for annotation tools, as well as common problems these tools present that reduce their utility. Although examples of specific tools are provided in many cases, this chapter concentrates more on abstract capabilities and problems because new tools appear continuously, while old tools disappear into disuse or disrepair. The two core capabilities tools must have are support for the chosen annotation scheme and the ability to work on the language under study. Additional capabilities are organized into three categories: those that are widely provided; those that often useful but found in only a few tools; and those that have as yet little or no available tool support.
Overview of Approximate Bayesian Computation This Chapter, ‘Overview of Approximate Bayesian Computation’, is to appear as the first chapter in the forthcoming Handbook of Approximate Bayesian Computation (2018). It details the main ideas and concepts behind ABC methods with many examples and illustrations.
Overview: A Hierarchical Framework for Plan Generation and Execution in Multi-Robot Systems The authors present an overview of a hierarchical framework for coordinating task- and motion-level operations in multirobot systems. Their framework is based on the idea of using simple temporal networks to simultaneously reason about precedence/causal constraints required for task-level coordination and simple temporal constraints required to take some kinematic constraints of robots into account. In the plan-generation phase, the framework provides a computationally scalable method for generating plans that achieve high-level tasks for groups of robots and take some of their kinematic constraints into account. In the plan-execution phase, the framework provides a method for absorbing an imperfect plan execution to avoid time-consuming re-planning in many cases. The authors use the multirobot path-planning problem as a case study to present the key ideas behind their framework for the long-term autonomy of multirobot systems.

P

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, LDA does not capture correlations between topics. In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). The leaves of the DAG represent individual words in the vocabulary, while each interior node represents a correlation among its children, which may be words or other interior nodes (topics). PAM provides a flexible alternative to recent work by Blei and Lafferty (2006), which captures correlations only between pairs of topics. Using text data from newsgroups, historic NIPS proceedings and other research paper corpora, we show improved performance of PAM in document classification, likelihood of held-out data, the ability to support finer-grained topics, and topical keyword coherence.
Parallel Statistical Computing with R: An Illustration on Two Architectures To harness the full benefit of new computing platforms, it is necessary to develop software with parallel computing capabilities. This is no less true for statisticians than for astrophysicists. The R programming language, which is perhaps the most popular software environment for statisticians today, has many packages available for parallel computing. Their diversity in approach can be difficult to navigate. Some have attempted to alleviate this problem by designing common interfaces. However, these approaches offer limited flexibility to the user; additionally, they often serve as poor abstractions to the reality of modern hardware, leading to poor performance. We give a short introduction to two basic parallel computing approaches that closely align with hardware reality, allow the user to understand its performance, and provide sufficient capability to fully utilize multicore and multinode environments. We illustrate both approaches by working through a simple example fitting a random forest model. Beginning with a serial algorithm, we derive two parallel versions. Our objective is to illustrate the use of multiple cores on a single processor and the use of multiple processors in a cluster computer. We discuss the differences between the two versions and how the underlying hardware is used in each case.
Parameter estimation for text analysis Presents parameter estimation methods common with discrete probability distributions, which is of particular interest in text modeling. Starting with maximum likelihood, a posteriori and Bayesian estimation, central concepts like conjugate distributions and Bayesian networks are reviewed. As an application, the model of latent Dirichlet allocation (LDA) is explained in detail with a full derivation of an approximate inference algorithm based on Gibbs sampling, including a discussion of Dirichlet hyperparameter estimation.
PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA Graph analytics is a crucial element in extracting insights from Big Data because it helps discover hidden relationships by connecting the dots. A graph, meaning the network of nodes and relationships, treats the linkage between objects as equally important as the objects themselves. Social networks or supply chains are obvious examples, but graphs include any network of objects such as customers, products, purchase orders, customer support calls, product inventory, etc. HiperGraph, PARC´s breakthrough Big Data technology, is a high-performance graph analytics engine. Through a four-month research project with SAP, we added HiperGraph´s analytics to SAP HANA to demonstrate a live, real-time marketing insights use case. Graph reasoning technologies provide the ability to contextualize relational data with the tapestry of information and can go beyond simplistic reporting and dashboards. This creates opportunities to rapidly experiment, gain new insights, and identify root causes. The demonstrated technology match between HANA and HiperGraph has great disruptive potential, especially in the identification of key patterns within datasets (e.g., via clustering). With HANA and HiperGraph we can finally deliver on the promise of a closed feedback loop in the enterprise where transactions are analyzed and reacted to in real-time. The intelligence that is implicit in large volumes of structured and unstructured data from varieties of sources from inside or outside of the enterprise can be delivered to the users in the form of smart business applications. We concluded that the existing commercial or open source algorithms either did not provide the real-time response or were unable to scale to the large volumes of data. The requirements from our customer (an online retailer) required real-time response from their Big Data system. PARC´s graph reasoning, versatile goal-directed clustering, egocentric recommendations, and real-time recommendation algorithms combined with the power of HANA in-memory technologies far exceeded the expectations. Brand managers can use this solution to automatically find clusters of customers with similar purchases, clusters of products that are frequently bought together, clusters of products that tend to be purchased on sale vs. those that are purchased at full price, and so on, and act on these insights during the customer´s shopping experience. There is a great opportunity for businesses to gain value by combining the HANA in-memory technology with HiperGraph reasoning, recommendation, matrix factorization, egocentric collaborative filtering, and versatile goal-directed clustering. With SAP and PARC co-innovation in Big Data analytics we can now reduce and/or eliminate the need for complex extract, transform, and load (ETL) processes; increase speed in clustering; and introduce new accessibility for business users to directly explore data clusters. We are democratizing data science for all business users in the enterprise.
Particle Swarm Optimization: A survey of historical and recent developments with hybridization perspectives Particle Swarm Optimization (PSO) is a metaheuristic global optimization paradigm that has gained prominence in the last two decades due to its ease of application in unsupervised, complex multidimensional problems which cannot be solved using traditional deterministic algorithms. The canonical particle swarm optimizer is based on the flocking behavior and social co-operation of birds and fish schools and draws heavily from the evolutionary behavior of these organisms. This paper serves to provide a thorough survey of the PSO algorithm with special emphasis on the development, deployment and improvements of its most basic as well as some of the state-of-the-art implementations. Concepts and directions on choosing the inertia weight, constriction factor, cognition and social weights and perspectives on convergence, parallelization, elitism, niching and discrete optimization as well as neighborhood topologies are outlined. Hybridization attempts with other evolutionary and swarm paradigms in selected applications are covered and an up-to-date review is put forward for the interested reader.
Patent Retrieval: A Literature Review With the ever increasing number of filed patent applications every year, the need for effective and efficient systems for managing such tremendous amounts of data becomes inevitably important. Patent Retrieval (PR) is considered is the pillar of almost all patent analysis tasks. PR is a subfield of Information Retrieval (IR) which is concerned with developing techniques and methods that effectively and efficiently retrieve relevant patent documents in response to a given search request. In this paper we present a comprehensive review on PR methods and approaches. It is clear that, recent successes and maturity in IR applications such as Web search can not be transferred directly to PR without deliberate domain adaptation and customization. Furthermore, state-of-the-art performance in automatic PR is still around average. These observations motivates the need for interactive search tools which provide cognitive assistance to patent professionals with minimal effort. These tools must also be developed in hand with patent professionals considering their practices and expectations. We additionally touch on related tasks to PR such as patent valuation, litigation, licensing, and highlight potential opportunities and open directions for computational scientists in these domains.
Perception of visual numerosity in humans and machines Numerosity perception is foundational to mathematical learning, but its computational bases are strongly debated. Some investigators argue that humans are endowed with a specialized system supporting numerical representation; others argue that visual numerosity is estimated using continuous magnitudes, such as density or area, which usually co-vary with number. Here we reconcile these contrasting perspectives by testing deep networks on the same numerosity comparison task that was administered to humans, using a stimulus space that allows to measure the contribution of non-numerical features. Our model accurately simulated the psychophysics of numerosity perception and the associated developmental changes: discrimination was driven by numerosity information, but non-numerical features had a significant impact, especially early during development. Representational similarity analysis further highlighted that both numerosity and continuous magnitudes were spontaneously encoded even when no task had to be carried out, demonstrating that numerosity is a major, salient property of our visual environment.
Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology Performance metrics (error measures) are vital components of the evaluation frameworks in various fields. The intention of this study was to overview of a variety of performance metrics and approaches to their classification. The main goal of the study was to develop a typology that will help to improve our knowledge and understanding of metrics and facilitate their selection in machine learning regression, forecasting and prognostics. Based on the analysis of the structure of numerous performance metrics, we propose a framework of metrics which includes four (4) categories: primary metrics, extended metrics, composite metrics, and hybrid sets of metrics. The paper identified three (3) key components (dimensions) that determine the structure and properties of primary metrics: method of determining point distance, method of normalization, method of aggregation of point distances over a data set.
Perspectives of Predictive Modeling (Slide Deck)
Physically optimizing inference Data is scaling exponentially in fields ranging from genomics to neuroscience to economics. A central question is whether modern machine learning methods can be applied to construct predictive models based on large data sets drawn from complex, natural systems like cells and brains. In machine learning, the predictive power or generalizability of a model is determined by the statistics of training data. In this paper, we ask how predictive inference is impacted when training data is generated by the statistical behavior of a physical system. We develop an information-theoretic analysis of a canonical problem, spin network inference. Our analysis reveals the essential role that thermal fluctuations play in determining the efficiency of predictive inference. Thermal noise drives a system to explore a range of configurations providing `raw’ information for a learning algorithm to construct a predictive model. Conversely, thermal energy degrades information by blurring energetic differences between network states. In general, spin networks have an intrinsic optimal temperature at which inference becomes maximally efficient. Simple active learning protocols allow optimization of network temperature, without prior knowledge, to dramatically increase the efficiency of inference. Our results reveal a fundamental link between physics and information and show how the physical environment can be tuned to optimize the efficiency of machine learning.
Physicist’s Journeys Through the AI World – A Topical Review. There is no royal road to unsupervised learning Artificial Intelligence (AI), defined in its most simple form, is a technological tool that makes machines intelligent. Since learning is at the core of intelligence, machine learning poses itself as a core sub-field of AI. Then there comes a subclass of machine learning, known as deep learning, to address the limitations of their predecessors. AI has generally acquired its prominence over the past few years due to its considerable progress in various fields. AI has vastly invaded the realm of research. This has led physicists to attentively direct their research towards implementing AI tools. Their central aim has been to gain better understanding and enrich their intuition. This review article is meant to supplement the previously presented efforts to bridge the gap between AI and physics, and take a serious step forward to filter out the ‘Babelian’ clashes brought about from such gabs. This necessitates first to have fundamental knowledge about common AI tools. To this end, the review’s primary focus shall be on deep learning models called artificial neural networks. They are deep learning models which train themselves through different learning processes. It discusses also the concept of Markov decision processes. Finally, shortcut to the main goal, the review thoroughly examines how these neural networks are capable to construct a physical theory describing some observations without applying any previous physical knowledge.
Piecewise Linear Neural Network verification: A comparative study The success of Deep Learning and its potential use in many important safety- critical applications has motivated research on formal verification of Neural Network (NN) models. Despite the reputation of learned NN models to behave as black boxes and the theoretical hardness of proving their properties, researchers have been successful in verifying some classes of models by exploiting their piecewise linear structure. Unfortunately, most of these approaches test their algorithms without comparison with other approaches. As a result, the pros and cons of the different algorithms are not well understood. Motivated by the need to accelerate progress in this very important area, we investigate the trade-offs of a number of different approaches based on Mixed Integer Programming, Satisfiability Modulo Theory, as well as a novel method based on the Branch-and-Bound framework. We also propose a new data set of benchmarks, in addition to a collection of pre- viously released testcases that can be used to compare existing methods. Our analysis not only allows a comparison to be made between different strategies, the comparison of results from different solvers also revealed implementation bugs in published methods. We expect that the availability of our benchmark and the analysis of the different approaches will allow researchers to develop and evaluate promising approaches for making progress on this important topic.
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
Practical Approaches to Principal Component Analysis in the Presence of Missing Values Principal component analysis (PCA) is a classical data analysis technique that finds linear transformations of data that retain the maximal amount of variance. We study a case where some of the data values are missing, and show that this problem has many features which are usually associated with nonlinear models, such as overfitting and bad locally optimal solutions. A probabilistic formulation of PCA provides a good foundation for handling missing values, and we provide formulas for doing that. In case of high dimensional and very sparse data, overfitting becomes a severe problem and traditional algorithms for PCA are very slow. We introduce a novel fast algorithm and extend it to variational Bayesian learning. Different versions of PCA are compared in artificial experiments, demonstrating the effects of regularization and modeling of posterior variance. The scalability of the proposed algorithm is demonstrated by applying it to the Netflix problem.
Practical Bayesian Optimization of Machine Learning Algorithms Machine learning algorithms frequently require careful tuning of model hyperparameters, regularization terms, and optimization parameters. Unfortunately, this tuning is often a ‘black art’ that requires expert experience, unwritten rules of thumb, or sometimes brute-force search. Much more appealing is the idea of developing automatic approaches which can optimize the performance of a given learning algorithm to the task at hand. In this work, we consider the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm’s generalization performance is modeled as a sample from a Gaussian process (GP). The tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next. Here we show how the effects of the Gaussian process prior and the associated inference procedure can have a large impact on the success or failure of Bayesian optimization. We show that thoughtful choices can lead to results that exceed expert-level performance in tuning machine learning algorithms. We also describe new algorithms that take into account the variable cost (duration) of learning experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.
Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments (single-factor or factorial designs), A/B tests (and their generalizations), split tests, Control/Treatment tests, and parallel flights. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. We provide a practical guide to conducting online experiments, where end-users can help guide the development of features. Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Person´s Opinion (HiPPO). We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques, which we show are not as simple in practice as is often assumed. Controlled experiments typically generate large amounts of data, which can be analyzed using data mining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses. Based on our extensive practical experience with multiple systems and organizations, we share key lessons that will help practitioners in running trustworthy controlled experiments.
Practical Machine Learning: A New Look at Anomaly Detection Everyone loves a mystery, and at the heart of it, that´s what anomaly detection is—spotting the unusual, catching the fraud, discovering the strange activity. Anomaly detection has a wide range of useful applications, from banking security to natural sciences to medicine to marketing. Anomaly detection carried out by a machine-learning program is actually a form of artificial intelligence. With the ever-increasing volume of data and the new types of data, such as sensor data from an increasingly large variety of objects that needs to be considered, it´s no surprise that there also is a growing interest in being able to handle more decisions automatically via machine-learning applications. But in the case of anomaly detection, at least some of the appeal is the excitement of the chase itself. …
Practical Machine Learning: Innovations in Recommendation A key to one of most sophisticated and effective approaches in machine learning and recommendation is contained in the observation: ‘I want a pony.’ As it turns out, building a simple but powerful recommender is much easier than most people think, and wanting a pony is part of the key. Machine learning, especially at the scale of huge datasets, can be a daunting task. There is a dizzying array of algorithms from which to choose, and just making the choice between them presupposes that you have sufficiently advanced mathematical background to understand the alternatives and make a rational choice. The options are also changing, evolving constantly as a result of the work of some very bright, very dedicated researchers who are continually refining existing algorithms and coming up with new ones.
Predicting Good Probabilities With Supervised Learning We examine the relationship between the predictions made by different learning algorithms and true posterior probabilities. We show that maximum margin methods such as boosted trees and boosted stumps push probability mass away from 0 and 1 yielding a characteristic sigmoid shaped distortion in the predicted probabilities. Models such as Naive Bayes, which make unrealistic independence assumptions, push probabilities toward 0 and 1. Other models such as neural nets and bagged trees do not have these biases and predict well calibrated probabilities. We experiment with two ways of correcting the biased probabilities predicted by some learning methods: Platt Scaling and Isotonic Regression. We qualitatively examine what kinds of distortions these calibration methods are suitable for and quantitatively examine how much data they need to be effective. The empirical results show that after calibration boosted trees, random forests, and SVMs predict the best probabilities.
Predicting the future of predictive analytics The proliferation of data and the increasing awareness of the potential to gain valuable insight and a competitive advantage from that information are driving organizations to place data at the heart of their corporate strategy. Consumers regularly benefit from predictive analytics, in the form of anything from weather forecasts to insurance premiums. Organizations are now exploring the possibilities of using historical data to exploit growth opportunities and minimize business risks, a field known as predictive analytics. SAP commissioned Loudhouse to conduct primary research among business decision-makers in UK and US organizations to understand their attitudes to and experiences of predictive analytics, as well as a future view of usage, value and investment. The research reveals that businesses are struggling to take full advantage of the burgeoning and already overwhelming amount of data being collected. Challenges abound as firms seek to make effective use of data. While many businesses are investing in predictive analytics and already seeing benefits in a number of areas, even more see this as a future investment priority for their business. The research points to a data-driven future where advanced predictive analytics sits at the core of the business function rather than being siloed, is embraced by a greater proportion of the workforce and is used to drive decision-making across the whole business. To achieve this future vision, however, it is clear that businesses need to up-skill their workforce and invest in more intuitive technology. While firms in the UK and US recognize the potential of predictive analytics and the need for investment in skills, the US is further along the adoption curve than the UK. US organizations show greater promise for future investment in – and roll-out of – predictive analytics software across the workforce. Furthermore, US organizations perceive fewer challenges in using data to inform corporate strategy, and sense a greater need for training to embed the benefits of the technology into day-to-day business.
Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions A recent flurry of research activity has attempted to quantitatively define ‘fairness’ for decisions based on statistical and machine learning (ML) predictions. The rapid growth of this new field has led to wildly inconsistent terminology and notation, presenting a serious challenge for cataloguing and comparing definitions. This paper attempts to bring much-needed order. First, we explicate the various choices and assumptions made—often implicitly—to justify the use of prediction-based decisions. Next, we show how such choices and assumptions can raise concerns about fairness and we present a notationally consistent catalogue of fairness definitions from the ML literature. In doing so, we offer a concise reference for thinking through the choices, assumptions, and fairness considerations of prediction-based decision systems.
Predictive Analytics – The rise and value of predictive analytics in enterprise decision making In the past few years, predictive analytics has gone from an exotic technique practiced in just a few niches, to a competitive weapon with a rapidly expanding range of uses. The increasing adoption of predictive analytics is fueled by converging trends: the Big Data phenomenon, ever-improving tools for data analysis, and a steady stream of demonstrated successes in new applications. The modern analyst would say, ‘Give me enough data, and I can predict anything.’
Predictive Analytics enters the Mainstream
Predictive Analytics for Business Advantage To compete effectively in an era in which advantages are ephemeral, companies need to move beyond historical, rear-view understandings of business performance and customer behavior and become more proactive. Organizations today want to be predictive; they want to gain information and insight from data that enables them to detect patterns and trends, anticipate events, spot anomalies, forecast using what-if simulations, and learn of changes in customer behavior so that staff can take actions that lead to desired business outcomes. Success in being predictive and proactive can be a game changer for many business functions and operations, including marketing and sales, operations management, finance, and risk management. Although it has been around for decades, predictive analytics is a technology whose time has finally come. A variety of market forces have joined to make this possible, including an increase in computing power, a better understanding of the value of the technology, the rise of certain economic forces, and the advent of big data. Companies are looking to use the technology to predict trends and understand behavior for better business performance. Forward-looking companies are using predictive analytics across a range of disparate data types to achieve greater value. Companies are looking to also deploy predictive analytics against their big data. Predictive analytics is also being operationalized more frequently as part of a business process. Predictive analytics complements business intelligence and data discovery, and can enable organizations to go beyond the analytic complexity limits of many online analytical processing (OLAP) implementations. It is evolving from a specialized activity once utilized only among elite firms and users to one that could become mainstream across industries and market sectors. This TDWI Best Practices Report focuses on how organizations can and are using predictive analytics to derive business value. It provides in-depth survey analysis of current strategies and future trends for predictive analytics across both organizational and technical dimensions including organizational culture, infrastructure, data, and processes. It looks at the features and functionalities companies are using for predictive analytics and the infrastructure trends in this space. The report offers recommendations and best practices for successfully implementing predictive analytics in the organization. TDWI Research finds a shift occurring in the predictive analytics user base. No longer is predictive analytics the realm of statisticians and mathematicians. There is a definite trend toward business analysts and other business users making use of this technology. Marketing and sales are big current users of predictive analytics and market analysts are making use of the technology. Therefore, the report also looks at the skills necessary to perform predictive analytics and how the technology can be utilized and operationalized across the organization. It explores cultural and business issues involved with making predictive analytics possible. A unique feature of this report is its examination of the characteristics of companies that have actually measured either top-line or bottom-line impact with predictive analytics. In other words, it explores how those companies compare against those that haven´t measured value.
Predictive Analytics in Cloud CRM Cloud CRM solutions have long since become mainstream and expanded beyond their initial foothold in small and mid-sized enterprises. Today B2B and B2C companies in many industries are eyeing cloud CRM solutions for their call center, their sales force and more. These CRM solutions offer the classic benefits of a cloud offering—multi-tenancy, usage pricing, location transparency, network access and high availability. What these solutions often do not offer, however, is advanced analytics. Typically limited to reporting and dashboards, many cloud CRM solutions do not allow companies to maximize the value of their data. The analytics that are available in a typical cloud CRM solution assume that users have the necessary decision-making expertise as well as the time required to make these decisions. In a typical high-volume call center environment, neither of these assumptions is reasonable. What companies using these CRM solutions need is predictive analytics, specifically predictive analytic solutions designed to drive better decisions in real-time. Delivering predictive analytic solutions in a cloud CRM environment, however, has its own challenges. Those adopting cloud CRM solutions don’t want (nor have the budget) to hire analytics teams to build predictive analytic models using traditional techniques or have to move their cloud CRM data to an on-premise analytic environment. They also don´t want predictive analytic models ‘in the lab,’ they want business-friendly decision-making solutions powered by sophisticated predictive analytics. To be successful with cloud CRM, these companies need predictive applications for the cloud, in the cloud.
Predictive Analytics in the Cloud Predictive analytics and cloud are hot topics in business today. Predictive analytics are increasingly the focus of many companies´ efforts to improve business performance with analytics while cloud is fast becoming the default option for purchasing and deploying software. Public, private and hybrid clouds are all evolving rapidly and are here to stay. But what´s happening at the intersection of these two technologies How can predictive analytics in the cloud add value and what are the critical risks and issues involved This paper explores the five key opportunities for organizations to use predictive analytics in the cloud: • Using the cloud to deliver predictive analytics-enabled ‘Decisions as a Service’ solutions • Embedding predictive analytics in Software as a Service (SaaS) and other cloud-deployed applications • Using the cloud to deliver predictive analytics to non-cloud applications across the extended enterprise • Building predictive analytics against data in the cloud • Using cloud computing to deliver elastic compute power for building predictive analytic models Before discussing the various options for predictive analytics in the cloud it is worth clarifying exactly what we mean by the various terms.
Predictive Analytics Whitepaper
Preference-based Online Learning with Dueling Bandits: A Survey In machine learning, the notion of multi-armed bandits refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives in the course of a sequential decision process. In the standard setting, the agent learns from stochastic feedback in the form of real-valued rewards. In many applications, however, numerical reward signals are not readily available — instead, only weaker information is provided, in particular relative preferences in the form of qualitative comparisons between pairs of alternatives. This observation has motivated the study of variants of the multi-armed bandit problem, in which more general representations are used both for the type of feedback to learn from and the target of prediction. The aim of this paper is to provide a survey of the state of the art in this field, referred to as preference-based multi-armed bandits or dueling bandits. To this end, we provide an overview of problems that have been considered in the literature as well as methods for tackling them. Our taxonomy is mainly based on the assumptions made by these methods about the data-generating process and, related to this, the properties of the preference-based feedback.
Principal Component Analysis: A Natural Approach to Data Exploration Principal component analysis (PCA) is often used for analysing data in the most diverse areas. In this work, we report an integrated approach to several theoretical and practical aspects of PCA. We start by providing, in an intuitive and accessible manner, the basic principles underlying PCA and its applications. Next, we present a systematic, though no exclusive, survey of some representative works illustrating the potential of PCA applications to a wide range of areas. An experimental investigation of the ability of PCA for variance explanation and dimensionality reduction is also developed, which confirms the efficacy of PCA and also shows that standardizing or not the original data can have important effects on the obtained results. Overall, we believe the several covered issues can assist researchers from the most diverse areas in using and interpreting PCA.
Principal Components: Mathematics, Example, Interpretation This paper will explain Principal Components Analysis, where ‘respecting structure’ means ‘preserving variance’. Explain how to do PCA, show an example, and describe some of the issues that come up in interpreting the results. PCA has been rediscovered many times in many fields, so it is also known as the Karhunen-Loeve transformation, the Hotelling transformation, the method of empirical orthogonal functions, and singular value decomposition. We will call it PCA.
Privacy Preserving Utility Mining: A Survey In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.
Probabilistic Forecasting A probabilistic forecast takes the form of a predictive probability distribution over future quantities or events of interest. Probabilistic forecasting aims to maximize the sharpness of the predictive distributions, subject to calibration, on the basis of the available information set. We formalize and study notions of calibration in a prediction space setting. In practice, probabilistic calibration can be checked by examining probability integral transform (PIT) histograms. Proper scoring rules such as the logarithmic score and the continuous ranked probability score serve to assess calibration and sharpness simultaneously. As a special case, consistent scoring functions provide decision-theoretically coherent tools for evaluating point forecasts.We emphasizemethodological links to parametric and nonparametric distributional regression techniques, which attempt to model and to estimate conditional distribution functions; we use the context of statistically postprocessed ensemble forecasts in numerical weather prediction as an example. Throughout, we illustrate concepts and methodologies in data examples.
Probabilistic Program Abstractions Abstraction is a fundamental tool for reasoning about complex systems. Program abstraction has been utilized to great effect for analyzing deterministic programs. At the heart of program abstraction is the relationship between a concrete program, which is difficult to analyze, and an abstraction, which is more tractable. We generalize non-deterministic program abstractions to probabilistic program abstractions by explicitly quantifying the non-deterministic choices made by traditional program abstractions. We upgrade key theoretical program abstraction insights to the probabilistic context. Probabilistic program abstractions provide avenues for utilizing abstraction techniques from the programming languages community to improve the analysis of probabilistic programs.
Probabilistic Programming Probabilistic programs are usual functional or imperative programs with two added constructs: (1) the ability to draw values at random from distributions, and (2) the ability to condition values of variables in a program via observations. Models from diverse application areas such as computer vision, coding theory, cryptographic protocols, biology and reliability analysis can be written as probabilistic programs. Probabilistic inference is the problem of computing an explicit representation of the probability distribution implicitly specified by a probabilistic program. Depending on the application, the desired output from inference may vary – we may want to estimate the expected value of some function f with respect to the distribution, or the mode of the distribution, or simply a set of samples drawn from the distribution. In this paper, we describe connections this research area called ‘Probabilistic Programming’ has with programming languages and software engineering, and this includes language design, and the static and dynamic analysis of programs. We survey current state of the art and speculate on promising directions for future research.
Probabilistic Syntax 1. The Tradition of Categoricity and Prospects for Stochasticity 2. The joys and perils of corpus linguistics 3. Probabilistic syntactic models 4. Continuous categories 5. Explaining more: probabilistic models of syntactic usage 6. Conclusion: There are many phenomena in syntax that cry out for non-categorical and probabilistic modeling and explanation. The opportunity to leave behind ill-fitting categorical assumptions, and to better model probabilities of use in syntax is exciting. The existence of ‘soft’ constraints within the variable output of an individual speaker, of exactly the same kind as the typological syntactic constraints found across languages, makes exploration of probabilistic grammar models compelling. We saw that one is not limited to simple surface representations: I have tried to outline how probabilistic models can be applied on top of one’s favorite sophisticated linguistic representations. The frequency evidence needed for parameter estimation in probabilistic models requires a lot more data collection, and a lot more careful evaluation and model building than traditional syntax, where one example can be the basis of a new theory, but the results can enrich linguistic theory by revealing the soft constraints at work in language use. This is an area ripe for exploration by the next generation of syntacticians.
Probabilistic Topic Models As our collective knowledge continues to be digitized and stored – in the form of news, blogs, Web pages, scientific articles, books, images, sound, video, and social networks – it becomes more difficult to find and discover what we are looking for. We need new computational tools to help organize, search, and understand these vast amounts of information. Right now, we work with online information using two main tools – search and links. We type keywords into a search engine and find a set of documents related to them. We look at the documents in that set, possibly navigating to other linked documents. This is a powerful way of interacting with our online archive, but something is missing. Imagine searching and exploring documents based on the themes that run through them. We might ‘zoom in’ and ‘zoom out’ to find specific or broader themes; we might look at how those themes changed through time or how they are connected to each other. Rather than finding documents through keyword search alone, we might first find the theme that we are interested in, and then examine the documents related to that theme.
Probabilistic Topic Models Many chapters in this book illustrate that applying a statistical method such as Latent Semantic Analysis (LSA; Landauer and Dumais, 1997; Landauer, Foltz, and Laham, 1998) to large databases can yield insight into human cognition. The LSA approach makes three claims: that semantic information can be derived from a word-document co-occurrence matrix; that dimensionality reduction is an essential part of this derivation; and that words and documents can be represented as points in Euclidean space. In this chapter, we pursue an approach that is consistent with the first two of these claims, but differs in the third, describing a class of statistical models in which the semantic properties of words and documents are expressed in terms of probabilistic topics.
Probability and Statistics – Cookbook This cookbook integrates a variety of topics in probability theory and statistics. It is based on literature and in-class material from courses of the statistics department at the University of California in Berkeley but also influenced by other sources.
Probability Cheatsheet (Cheat Sheet)
Probability Reversal and the Disjunction Effect in Reasoning Systems Data based judgments go into artificial intelligence applications but they undergo paradoxical reversal when seemingly unnecessary additional data is provided. Examples of this are Simpson’s reversal and the disjunction effect where the beliefs about the data change once it is presented or aggregated differently. Sometimes the significance of the difference can be evaluated using statistical tests such as Pearson’s chi-squared or Fisher’s exact test, but this may not be helpful in threshold-based decision systems that operate with incomplete information. To mitigate risks in the use of algorithms in decision-making, we consider the question of modeling of beliefs. We argue that evidence supports that beliefs are not classical statistical variables and they should, in the general case, be considered as superposition states of disjoint or polar outcomes. We analyze the disjunction effect from the perspective of the belief as a quantum vector.
Process Mining – seeing the real process (Poster)
Progressive Data Science: Potential and Challenges Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped up significantly by providing quick feedback on the impact of changes. The idea of progressive data science is to compute the results of changes in a progressive manner, returning a first approximation of results quickly and allow iterative refinements until converging to a final result. Enabling the user to interact with the intermediate results allows an early detection of erroneous or suboptimal choices, the guided definition of modifications to the pipeline and their quick assessment. In this paper, we discuss the progressiveness challenges arising in different steps of the data science pipeline. We describe how changes in each step of the pipeline impact the subsequent steps and outline why progressive data science will help to make the process more effective. Computing progressive approximations of outcomes resulting from changes creates numerous research challenges, especially if the changes are made in the early steps of the pipeline. We discuss these challenges and outline first steps towards progressiveness, which, we argue, will ultimately help to significantly speed-up the overall data science process.
ProM 6: The Process Mining Toolkit Process mining has been around for a decade, and it has proven to be a very fertile and successful researchfield. Part of this success can be contributed to the ProM tool, which combines most of the existing process mining techniques as plug-ins in a single tool. ProM 6 removes many limitations that existed in the previous versions, in par- ticular with respect to the tight integration between the tool and the GUI. ProM 6 has been developed from scratch and uses a completely redesigned architecture. The changes were driven by many real-life ap- plications and new insights into the design of process analysis software. Furthermore, the introduction of XESame in this toolkit allows for the conversion of logs to the ProM native format without programming.
Provable benefits of representation learning There is general consensus that learning representations is useful for a variety of reasons, e.g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data. Popular techniques for representation learning include clustering, manifold learning, kernel-learning, autoencoders, Boltzmann machines, etc. To study the relative merits of these techniques, it’s essential to formalize the definition and goals of representation learning, so that they are all become instances of the same definition. This paper introduces such a formal framework that also formalizes the utility of learning the representation. It is related to previous Bayesian notions, but with some new twists. We show the usefulness of our framework by exhibiting simple and natural settings — linear mixture models and loglinear models, where the power of representation learning can be formally shown. In these examples, representation learning can be performed provably and efficiently under plausible assumptions (despite being NP-hard), and furthermore: (i) it greatly reduces the need for labeled data (semi-supervised learning) and (ii) it allows solving classification tasks when simpler approaches like nearest neighbors require too much data (iii) it is more powerful than manifold learning methods.
Psychological State in Text: A Limitation of Sentiment Analysis Starting with the idea that sentiment analysis models should be able to predict not only positive or negative but also other psychological states of a person, we implement a sentiment analysis model to investigate the relationship between the model and emotional state. We first examine psychological measurements of 64 participants and ask them to write a book report about a story. After that, we train our sentiment analysis model using crawled movie review data. We finally evaluate participants’ writings, using the pretrained model as a concept of transfer learning. The result shows that sentiment analysis model performs good at predicting a score, but the score does not have any correlation with human’s self-checked sentiment.
Putting Data Science In Production A critical challenge of data science projects is getting everyone on the same page in terms of project challenges, responsibilities, and methodologies. More often than not, there is a disconnect between the worlds of development and production. Some teams may choose to re-code everything in an entirely different language while others may make changes to core elements, such as testing procedures, backup plans, and programming languages. Transitioning a data product into production could become a nightmare as different opinions and methods vie for supremacy, resulting in projects that needlessly drag on for months beyond promised deadlines. Successfully building a data product and then deploying it into production is not an easy task — it becomes twice as hard when teams are isolated and playing by their own rules.
Putting Hadoop To Work The Right Way Big data has rapidly progressed from an ambitious vision realized by a handful of innovators to a competitive advantage for businesses across dozens of industries. More data is available now – about customers, employees, competitors – than ever before. That data is intelligence that can have an impact on daily business decisions. Industry leaders rely on big data as a foundation to beat their rivals. This big data revolution is also behind the massive adoption of Hadoop. Hadoop has become the platform of choice for companies looking to harness big data´s power. Simply put, most traditional enterprise systems are too limited to keep up with the influx of big data; they are not designed to ingest large quantities of data first and analyze it later. The need to store and analyze big data cost-effectively is the main reason why Hadoop usage has grown exponentially in the last five years.
Putting Predictive Analytics to Work in Operations In a recent study, companies that tightly integrate predictive analytics into operational systems are more than twice as likely to report a transformative impact from predictive analytics as any others. Leaders are creating advantage across multiple core functions such as marketing, customer management, collections, customer service, distribution and more by applying predictive analytics to operational decisions. Small operational decisions, especially those about customers, are made over and over. The value of these decisions rapidly adds up. Making these decisions well is critical to business performance. Traditional approaches to analytics are hard to scale and hard to use in the real-time environment common for operational decisions. Predictive analytics work better, using data to make these decisions more precise, targeting and personalizing them to maximize customer value. Because these decisions about customers are made at the front line of an organization they must be made quickly and embedded in operational systems. This creates a challenge – the insight to action gap – that prevents many companies from taking advantage of predictive analytics in these decisions. To close the insight to action gap and put predictive analytics to work in operations, companies need to adopt Decision Management, a proven approach that leverages predictive analytics to make operational systems analytical.

Q

Quantizing deep convolutional networks for efficient inference: A whitepaper We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits, even when 8-bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights. We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x-3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX. Quantization-aware training can provide further improvements, reducing the gap to floating point to 1% at 8-bit precision. Quantization-aware training also allows for reducing the precision of weights to four bits with accuracy losses ranging from 2% to 10%, with higher accuracy drop for smaller networks.We introduce tools in TensorFlow and TensorFlowLite for quantizing convolutional networks and review best practices for quantization-aware training to obtain high accuracy with quantized weights and activations. We recommend that per-channel quantization of weights and per-layer quantization of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. We also propose that future processors and hardware accelerators for optimized inference support precisions of 4, 8 and 16 bits.
Quantum games: a survey for mathematicians Main papers on quantum games are written by physicists for physicists, and the inevitable exploitation of physics jargon may create difficulties for mathematicians or economists. Our goal here is to make clear the physical content and to stress the new features of the games that may be revealed in their quantum versions. We first introduce from scratch the most fundamental facts of finite-dimensional quantum mechanics. In the main sections the foundations of quantum games are built via the basic examples. We omit sometimes the lengthy calculations (referring to the original papers) once the physical part is sorted out and the problem is reformulated as pure game-theoretic problem of calculating the Nash or dominated equilibria. Finally we touch upon general theory of finite quantum static games and provide further links and references.
Query Expansion Techniques for Information Retrieval: a Survey With the ever increasing size of web, relevant information extraction on the Internet with a query formed by a few keywords has become a big challenge. To overcome this, query expansion (QE) plays a crucial role in improving the Internet searches, where the user’s initial query is reformulated to a new query by adding new meaningful terms with similar significance. QE — as part of information retrieval (IR) — has long attracted researchers’ attention. It has also become very influential in the field of personalized social document, Question Answering over Linked Data (QALD), and, Text Retrieval Conference (TREC) and REAL sets. This paper surveys QE techniques in IR from 1960 to 2017 with respect to core techniques, data sources used, weighting and ranking methodologies, user participation and applications (of QE techniques) — bringing out similarities and differences.

R

R Essentials R is a highly extensible, open-source programming language used mainly for statistical analysis and graphics. It is a GNU project very similar to the S language. R´s strengths include its varying data structures, which can be more intuitive than data storage in other languages; its built-in statistical and graphical functions; and its large collection of useful plugins that can enhance the language´s abilities in many different ways. R can be run either as a series of console commands, or as full scripts, depending on the use case. It is heavily object-oriented, and allows you to create your own functions. It also has a common API for interacting with most file structures to access data stored outside of R.
R for Machine Learning It is common for today´s scientific and business industries to collect large amounts of data, and the ability to analyze the data and learn from it is critical to making informed decisions. Familiarity with software such as R allows users to visualize data, run statistical tests, and apply machine learning algorithms. Even if you already know other software, there are still good reasons to learn R: 1. R is free. If your future employer does not already have R installed, you can always download it for free, unlike other proprietary software packages that require expensive licenses. No matter where you travel, you can have access to R on your computer. 2. R gives you access to cutting-edge technology. Top researchers develop statistical learning methods in R, and new algorithms are constantly added to the list of packages you can download. 3. R is a useful skill. Employers that value analytics recognize R as useful and important. If for no other reason, learning R is worthwhile to help boost your resume.
R Is Still Hot-and Getting Hotter For the white paper titled ‘R Is Hot’ about four years ago, the goal was to introduce the R programming language to a larger audience of statistical analysts and data scientists. As it turned out, the timing couldn´t have been better: R has now blossomed into the language of choice for data scientists worldwide. Today, R is widely used by scientists, researchers, and statisticians for modeling data and solving problems quickly and effectively. When people ask me which factors are driving the broader adoption of R among data analysts, I usually offer two key points: 1. R was designed specifically for statistical analysis, which means that analytics written in R typically require fewer lines of code (and hence less work) than analytics written in Java, Python, or C++. 2. R is an open source project, which means it is continually improved, upgraded, enhanced, and expanded by a global community of incredibly passionate developers and users.
R Markdown Cheat Sheet (Cheat Sheet)
R Quo Vadis (Slide Deck)
Radial Basis Function Approximations: Comparison and Applications Approximation of scattered data is often a task in many engineering problems. The Radial Basis Function (RBF) approximation is appropriate for large scattered (unordered) datasets in d-dimensional space. This approach is useful for a higher dimension d>2, because the other methods require the conversion of a scattered dataset to an ordered dataset (i.e. a semi-regular mesh is obtained by using some tessellation techniques), which is computationally expensive. The RBF approximation is non-separable, as it is based on the distance between two points. This method leads to a solution of Linear System of Equations (LSE) Ac=h. In this paper several RBF approximation methods are briefly introduced and a comparison of those is made with respect to the stability and accuracy of computation. The proposed RBF approximation offers lower memory requirements and better quality of approximation.
Random Forests Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Freund and Schapire[1996]), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Random Forests, Decision Trees, and Categorical Predictors: The ‘Absent Levels’ Problem One of the advantages that decision trees have over many other models is their ability to natively handle categorical predictors without having to first transform them (e.g., by using one-hot encoding). However, in this paper, we show how this capability can also lead to an inherent ‘absent levels’ problem for decision tree based algorithms that, to the best of our knowledge, has never been thoroughly discussed, and whose consequences have never been carefully explored. This predicament occurs whenever there is indeterminacy in how to handle an observation that has reached a categorical split which was determined when the observation’s level was absent during training. Although these incidents may appear to be innocuous, by using Leo Breiman and Adele Cutler’s random forests FORTRAN code and the randomForest R package as motivating case studies, we show how overlooking the absent levels problem can systematically bias a model. Afterwards, we discuss some heuristics that can possibly be used to help mitigate the absent levels problem and, using three real data examples taken from public repositories, we demonstrate the superior performance and reliability of these heuristics over some of the existing approaches that are currently being employed in practice due to oversights in the software implementations of decision tree based algorithms. Given how extensively these algorithms have been used, it is conceivable that a sizable number of these models have been unknowingly and seriously affected by this issue—further emphasizing the need for the development of both theory and software that accounts for the absent levels problem.
Random Graph Models and Matchings In this paper we will provide an introductory understanding of random graph models, and matchings in the case of Erdos-Renyi random graphs. We will provide a synthesis of background theory to this end. We will further examine pertinent recent results and provide a basis of further exploration.
Random Projection and Its Applications Random Projection is a foundational research topic that connects a bunch of machine learning algorithms under a similar mathematical basis. It is used to reduce the dimensionality of the dataset by projecting the data points efficiently to a smaller dimensions while preserving the original relative distance between the data points. In this paper, we are intended to explain random projection method, by explaining its mathematical background and foundation, the applications that are currently adopting it, and an overview on its current research perspective.
Rankcluster: An R Package for Clustering Multivariate Partial Rankings The Rankcluster package is the first R package proposing both modeling and clustering tools for ranking data, potentially multivariate and partial. Ranking data are modeled by the Insertion Sorting Rank (ISR) model, which is a meaningful model parametrized by a central ranking and a dispersion parameter. A conditional independence assumption allows multivariate rankings to be taken into account, and clustering is performed by means of mixtures of multivariate ISR models. The parameters of the cluster (central rankings and dispersion parameters) help the practitioners to interpret the clustering. Moreover, the Rankcluster package provides an estimate of the missing ranking positions when rankings are partial. After an overview of the mixture of multivariate ISR models, the Rankcluster package is described and its use is illustrated through the analysis of two real datasets.
Rapidly Mixing Markov Chains: A Comparison of Techniques (A Survey) We survey existing techniques to bound the mixing time of Markov chains. The mixing time is related to a geometric parameter called conductance which is a measure of edge-expansion. Bounds on conductance are typically obtained by a technique called ‘canonical paths’ where the idea is to find a set of paths, one between every source-destination pair, such that no edge is heavily congested. However, the canonical paths approach cannot always show rapid mixing of a rapidly mixing chain. This drawback disappears if we allow the flow between a pair of states to be spread along multiple paths. We prove that for a large class of Markov chains canonical paths does capture rapid mixing. Allowing multiple paths to route the flow still does help a great deal in proofs, as illustrated by a result of Morris and Sinclair (FOCS’99) on the rapid mixing of a Markov chain for sampling 0/1 knapsack solutions. A different approach to prove rapid mixing is ‘Coupling’. Path Coupling is a variant discovered by Bubley and Dyer (FOCS’97) that often tremendously reduces the complexity of designing good Couplings. We present several applications of Path Coupling in proofs of rapid mixing. These invariably lead to much better bounds on mixing time than known using conductance, and moreover Coupling based proofs are typically simpler. This motivates the question of whether Coupling can be made to work whenever the chain is rapidly mixing. This question was answered in the negative by Kumar and Ramesh (FOCS’99), who showed that no Coupling strategy can prove the rapid mixing of the Jerrum-Sinclair chain for sampling perfect and near-perfect matchings.
Rationality, Optimism and Guarantees in General Reinforcement Learning In this article,1 we present a top-down theoretical study of general reinforcement learning agents. We begin with rational agents with unlimited resources and then move to a setting where an agent can only maintain a limited number of hypotheses and optimizes plans over a horizon much shorter than what the agent designer actually wants. We axiomatize what is rational in such a setting in a manner that enables optimism, which is important to achieve systematic explorative behavior. Then, within the class of agents deemed rational, we achieve convergence and nite-error bounds. Such results are desirable since they imply that the agent learns well from its experiences, but the bounds do not directly guarantee good performance and can be achieved by agents doing things one should obviously not. Good performance cannot in fact be guaranteed for any agent in fully general settings. Our approach is to design agents that learn well from experience and act rationally. We introduce a framework for general reinforcement learning agents based on rationality axioms for a decision function and an hypothesis-generating function designed so as to achieve guarantees on the number errors. We will consistently use an optimistic decision function but the hypothesis-generating function needs to change depending on what is known/assumed. We investigate a number of natural situations having either a frequentist or Bayesian avor, deterministic or stochastic environments and either nite or countable hypothesis class. Further, to achieve su ciently good bounds as to hold promise for practical success we introduce a notion of a class of environments being generated by a set of laws. None of the above has previously been done for fully general reinforcement learning environments.
Real numbers, data science and chaos: How to fit any dataset with a single parameter We show how any dataset of any modality (time-series, images, sound…) can be approximated by a well-behaved (continuous, differentiable…) scalar function with a single real-valued parameter. Building upon elementary concepts from chaos theory, we adopt a pedagogical approach demonstrating how to adjust this parameter in order to achieve arbitrary precision fit to all samples of the data. Targeting an audience of data scientists with a taste for the curious and unusual, the results presented here expand on previous similar observations regarding expressiveness power and generalization of machine learning models.
Realization of Ontology Web Search Engine This paper describes the realization of the Ontology Web Search Engine. The Ontology Web Search Engine is realizable as independent project and as a part of other projects. The main purpose of this paper is to present the Ontology Web Search Engine realization details as the part of the Semantic Web Expert System and to present the results of the Ontology Web Search Engine functioning. It is expected that the Semantic Web Expert System will be able to process ontologies from the Web, generate rules from these ontologies and develop its knowledge base.
Reallocating and Resampling: A Comparison for Inference Simulation-based inference plays a major role in modern statistics, and often employs either reallocating (as in a randomization test) or resampling (as in bootstrapping). Reallocating mimics random allocation to treatment groups, while resampling mimics random sampling from a larger population; does it matter whether the simulation method matches the data collection method Moreover, do the results differ for testing versus estimation Here we answer these questions in a simple setting by exploring the distribution of a sample difference in means under a basic two group design and four different scenarios: true random allocation, true random sampling, reallocating, and resampling. For testing a sharp null hypothesis, reallocating is superior in small samples, but reallocating and resampling are asymptotically equivalent. For estimation, resampling is generally superior, unless the effect is truly additive. Moreover, these results hold regardless of whether the data were collected by random sampling or random allocation.
Real-Time Big Data Analytics: Emerging Architecture Imagine that it´s 2007. You´re a top executive at major search engine company, and Steve Jobs has just unveiled the iPhone. You immediately ask yourself, ‘Should we shift resources away from some of our current projects so we can create an experience expressly for iPhone users ‘ Then you begin wondering, ‘What if it´s all hype Steve is a great showman … how can we predict if the iPhone is a fad or the next big thing ‘ The good news is that you´ve got plenty of data at your disposal. The bad news is that you have no way of querying that data and discovering the answer to a critical question: How many people are accessing my sites from their iPhones Back in 2007, you couldn´t even ask the question without upgrading the schema in your data warehouse, an expensive process that might have taken two months. Your only choice was to wait and hope that a competitor didn´t eat your lunch in the meantime. Justin Erickson, a senior product manager at Cloudera, told me a version of that story and I wanted to share it with you because it neatly illustrates the difference between traditional analytics and real-time big data analytics. Back then, you had to know the kinds of questions you planned to ask before you stored your data. ‘Fast forward to the present and technologies like Hadoop give you the scale and flexibility to store data before you know how you are going to process it,’ says Erickson. ‘Technologies such as MapReduce, Hive and Impala enable you to run queries without changing the data structures underneath.’ Today, you are much less likely to face a scenario in which you cannot query data and get a response back in a brief period of time. Analytical processes that used to require month, days, or hours have been reduced to minutes, seconds, and fractions of seconds. But shorter processing times have led to higher expectations. Two years ago, many data analysts thought that generating a result from a query in less than 40 minutes was nothing short of miraculous. Today, they expect to see results in under a minute. That´s practically the speed of thought – you think of a query, you get a result, and you begin your experiment. ‘It´s about moving with greater speed toward previously unknown questions, defining new insights, and reducing the time between when an event happens somewhere in the world and someone responds or reacts to that event,’ says Erickson. A rapidly emerging universe of newer technologies has dramatically reduced data processing cycle time, making it possible to explore and experiment with data in ways that would not have been practical or even possible a few years ago. Despite the availability of new tools and systems for handling massive amounts of data at incredible speeds, however, the real promise of advanced data analytics lies beyond the realm of pure technology. ‘Real-time big data isn´t just a process for storing petabytes or exabytes of data in a data warehouse,’ says Michael Minelli, co-author of Big Data, Big Analytics. ‘It´s about the ability to make better decisions and take meaningful actions at the right time. It´s about detecting fraud while someone is swiping a credit card, or triggering an offer while a shopper is standing on a checkout line, or placing an ad on a website while someone is reading a specific article. It´s about combining and analyzing data so you can take the right action, at the right time, and at the right place.’ For some, real-time big data analytics (RTBDA) is a ticket to improved sales, higher profits and lower marketing costs. To others, it signals the dawn of a new era in which machines begin to think and respond more like humans.
Real-Time Enterprise Stories More than 20 detailed case studies from Bloomberg Businessweek Research Services and Forbes Insights featuring leading-edge enterprises across industries to explore the real value of the in-memory platform: SAP HANA
Real-Time Machine Learning: The Missing Pieces Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a new set of requirements, none of which are difficult to achieve in isolation, but the combination of which creates a challenge for existing distributed execution frameworks: computation with millisecond latency at high throughput, adaptive construction of arbitrary task graphs, and execution of heterogeneous kernels over diverse sets of resources. We assert that a new distributed execution framework is needed for such ML applications and propose a candidate approach with a proof-of-concept architecture that achieves a 63x performance improvement over a state-of-the-art execution framework for a representative application.
Recent Advances in Deep Learning for Object Detection Object detection is a fundamental visual recognition problem in computer vision and has been widely studied in the past decades. Visual object detection aims to find objects of certain target classes with precise localization in a given image and assign each object instance a corresponding class label. Due to the tremendous successes of deep learning based image classification, object detection techniques using deep learning have been actively studied in recent years. In this paper, we give a comprehensive survey of recent advances in visual object detection with deep learning. By reviewing a large body of recent related work in literature, we systematically analyze the existing object detection frameworks and organize the survey into three major parts: (i) detection components, (ii) learning strategies, and (iii) applications & benchmarks. In the survey, we cover a variety of factors affecting the detection performance in detail, such as detector architectures, feature learning, proposal generation, sampling strategies, etc. Finally, we discuss several future directions to facilitate and spur future research for visual object detection with deep learning. Keywords: Object Detection, Deep Learning, Deep Convolutional Neural Networks
Recent Advances in Deep Learning: An Overview Deep Learning is one of the newest trends in Machine Learning and Artificial Intelligence research. It is also one of the most popular scientific research trends now-a-days. Deep learning methods have brought revolutionary advances in computer vision and machine learning. Every now and then, new and new deep learning techniques are being born, outperforming state-of-the-art machine learning and even existing deep learning techniques. In recent years, the world has seen many major breakthroughs in this field. Since deep learning is evolving at a huge speed, its kind of hard to keep track of the regular advances especially for new researchers. In this paper, we are going to briefly discuss about recent advances in Deep Learning for past few years.
Recent Advances in Features Extraction and Description Algorithms: A Comprehensive Survey Computer vision is one of the most active research fields in information technology today. Giving machines and robots the ability to see and comprehend the surrounding world at the speed of sight creates endless potential applications and opportunities. Feature detection and description algorithms can be indeed considered as the retina of the eyes of such machines and robots. However, these algorithms are typically computationally intensive, which prevents them from achieving the speed of sight real-time performance. In addition, they differ in their capabilities and some may favor and work better given a specific type of input compared to others. As such, it is essential to compactly report their pros and cons as well as their performances and recent advances. This paper is dedicated to provide a comprehensive overview on the state-of-the-art and recent advances in feature detection and description algorithms. Specifically, it starts by overviewing fundamental concepts. It then compares, reports and discusses their performance and capabilities. The Maximally Stable Extremal Regions algorithm and the Scale Invariant Feature Transform algorithms, being two of the best of their type, are selected to report their recent algorithmic derivatives.
Recent Advances in Neural Program Synthesis In recent years, deep learning has made tremendous progress in a number of fields that were previously out of reach for artificial intelligence. The successes in these problems has led researchers to consider the possibilities for intelligent systems to tackle a problem that humans have only recently themselves considered: program synthesis. This challenge is unlike others such as object recognition and speech translation, since its abstract nature and demand for rigor make it difficult even for human minds to attempt. While it is still far from being solved or even competitive with most existing methods, neural program synthesis is a rapidly growing discipline which holds great promise if completely realized. In this paper, we start with exploring the problem statement and challenges of program synthesis. Then, we examine the fascinating evolution of program induction models, along with how they have succeeded, failed and been reimagined since. Finally, we conclude with a contrastive look at program synthesis and future research recommendations for the field.
Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks Object detection-the computer vision task dealing with detecting instances of objects of a certain class (e.g., ‘car’, ‘plane’, etc.) in images-attracted a lot of attention from the community during the last 5 years. This strong interest can be explained not only by the importance this task has for many applications but also by the phenomenal advances in this area since the arrival of deep convolutional neural networks (DCNN). This article reviews the recent literature on object detection with deep CNN, in a comprehensive way, and provides an in-depth view of these recent advances. The survey covers not only the typical architectures (SSD, YOLO, Faster-RCNN) but also discusses the challenges currently met by the community and goes on to show how the problem of object detection can be extended. This survey also reviews the public datasets and associated state-of-the-art algorithms.
Recent Advances in Open Set Recognition: A Survey In real-world recognition/classification tasks, limited by various objective factors, it is usually difficult to collect training samples to exhaust all classes when training a recognizer or classifier. A more realistic scenario is open set recognition (OSR), where incomplete knowledge of the world exists at training time, and unknown classes can be submitted to an algorithm during testing, requiring the classifiers not only to accurately classify the seen classes, but also to effectively deal with the unseen ones. This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, experiment setup and evaluation metrics. Furthermore, we briefly analyze the relationships between OSR and its related tasks including zero-shot, one-shot (few-shot) recognition/learning techniques, classification with reject option, and so forth. Additionally, we also overview the open world recognition which can be seen as a natural extension of OSR. Importantly, we highlight the limitations of existing approaches and point out some promising subsequent research directions in this field.
Recent Advances in Physical Reservoir Computing: A Review Reservoir computing is a computational framework suited for temporal/sequential data processing. It is derived from several recurrent neural network models, including echo state networks and liquid state machines. A reservoir computing system consists of a reservoir for mapping inputs into a high-dimensional space and a readout for extracting features of the inputs. Further, training is carried out only in the readout. Thus, the major advantage of reservoir computing is fast and simple learning compared to other recurrent neural networks. Another advantage is that the reservoir can be realized using physical systems, substrates, and devices, instead of recurrent neural networks. In fact, such physical reservoir computing has attracted increasing attention in various fields of research. The purpose of this review is to provide an overview of recent advances in physical reservoir computing by classifying them according to the type of the reservoir. We discuss the current issues and perspectives related to physical reservoir computing, in order to further expand its practical applications and develop next-generation machine learning systems.
Recent Research Advances on Interactive Machine Learning Interactive Machine Learning (IML) is an iterative learning process that tightly couples a human with a machine learner, which is widely used by researchers and practitioners to effectively solve a wide variety of real-world application problems. Although recent years have witnessed the proliferation of IML in the field of visual analytics, most recent surveys either focus on a specific area of IML or aim to summarize a visualization field that is too generic for IML. In this paper, we systematically review the recent literature on IML and classify them into a task-oriented taxonomy built by us. We conclude the survey with a discussion of open challenges and research opportunities that we believe are inspiring for future work in IML.
Recent Trends in Deep Learning Based Natural Language Processing Deep learning methods employ multiple processing layers to learn hierarchical representations of data, and have produced state-of-the-art results in many domains. Recently, a variety of model designs and methods have blossomed in the context of natural language processing (NLP). In this paper, we review significant deep learning related models and methods that have been employed for numerous NLP tasks and provide a walk-through of their evolution. We also summarize, compare and contrast the various models and put forward a detailed understanding of the past, present and future of deep learning in NLP.
Recent Trends in Deep Learning Based Personality Detection In the recent times, automatic detection of human personality traits has received a lot of attention. Specifically, multimodal personality trait prediction has emerged as a hot topic within the field of affective computing. In this paper, we give an overview of the advances in machine learning based automated personality detection with an emphasis on deep learning techniques. We compare various popular approaches in this field based on input modality, the computational datasets available and discuss potential industrial applications. We also discuss the state-of-the-art machine learning models for different modalities of input such as text, audio, visual and multimodal. Personality detection is a very broad topic and this literature survey focuses mainly on machine learning techniques rather than the psychological aspect of personality detection.
Recommendation System based on Semantic Scholar Mining and Topic modeling: A behavioral analysis of researchers from six conferences Recommendation systems have an important place to help online users in the internet society. Recommendation Systems in computer science are of very practical use these days in various aspects of the Internet portals, such as social networks, and library websites. There are several approaches to implement recommendation systems, Latent Dirichlet Allocation (LDA) is one the popular techniques in Topic Modeling. Recently, researchers have proposed many approaches based on Recommendation Systems and LDA. According to importance of the subject, in this paper we discover the trends of the topics and find relationship between LDA topics and Scholar-Context-documents. In fact, We apply probabilistic topic modeling based on Gibbs sampling algorithms for a semantic mining from six conference publications in computer science from DBLP dataset. According to our experimental results, our semantic framework can be effective to help organizations to better organize these conferences and cover future research topics.
Recommendation Systems for Tourism Based on Social Networks: A Survey Nowadays, recommender systems are present in many daily activities such as online shopping, browsing social networks, etc. Given the rising demand for reinvigoration of the tourist industry through information technology, recommenders have been included into tourism websites such as Expedia, Booking or Tripadvisor, among others. Furthermore, the amount of scientific papers related to recommender systems for tourism is on solid and continuous growth since 2004. Much of this growth is due to social networks that, besides to offer researchers the possibility of using a great mass of available and constantly updated data, they also enable the recommendation systems to become more personalised, effective and natural. This paper reviews and analyses many research publications focusing on tourism recommender systems that use social networks in their projects. We detail their main characteristics, like which social networks are exploited, which data is extracted, the applied recommendation techniques, the methods of evaluation, etc. Through a comprehensive literature review, we aim to collaborate with the future recommender systems, by giving some clear classifications and descriptions of the current tourism recommender systems.
Recommender Systems with Random Walks: A Survey Recommender engines have become an integral component in today’s e-commerce systems. From recommending books in Amazon to finding friends in social networks such as Facebook, they have become omnipresent. Generally, recommender systems can be classified into two main categories: content based and collaborative filtering based models. Both these models build relationships between users and items to provide recommendations. Content based systems achieve this task by utilizing features extracted from the context available, whereas collaborative systems use shared interests between user-item subsets. There is another relatively unexplored approach for providing recommendations that utilizes a stochastic process named random walks. This study is a survey exploring use cases of random walks in recommender systems and an attempt at classifying them.
Reducing the Dimensionality of Data with Neural Networks High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such ‘‘autoencoder´´ networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.
Reducing the Sampling Complexity of Topic Models Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately the generative model loses sparsity as the amount of data increases, requiring O(k) operations per word for k topics. In this paper we propose an algorithm which scales linearly with the number of actually instantiated topics kd in the document. For large document collections and in structured hierarchical models kd k. This yields an order of magnitude speedup. Our method applies to a wide variety of statistical models such as PDP and HDP. At its core is the idea that dense, slowly changing distributions can be approximated e ciently by the combination of a Metropolis-Hastings step, use of sparsity, and amortized constant time sampling via Walker’s alias method.
Reductions for Frequency-Based Data Mining Problems Studying the computational complexity of problems is one of the – if not the – fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining problems in different domains (e.g. graphs or sequences). Our results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader range of data mining problems. Our results show that, by allowing constraints in the pattern space, the complexities of many maximal frequent pattern mining problems collapse. These problems include maximal frequent subgraphs in labelled graphs, maximal frequent itemsets, and maximal frequent subsequences with no repetitions. In addition to theoretical interest, our results might yield more efficient algorithms for the studied problems.
Regularization for Deep Learning: A Taxonomy Regularization is one of the crucial ingredients of deep learning, yet the term regularization has various definitions, and regularization methods are often studied separately from each other. In our work we present a systematic, unifying taxonomy to categorize existing methods. We distinguish methods that affect data, network architectures, error terms, regularization terms, and optimization procedures. We do not provide all details about the listed methods; instead, we present an overview of how the methods can be sorted into meaningful categories and sub-categories. This helps revealing links and fundamental similarities between them. Finally, we include practical recommendations both for users and for developers of new regularization methods.
Regularized Discriminant Analysis Linear and quadratic discriminant analysis are considered in the small sample high-dimensional setting. Alternatives to the usual maximum likelihood (plug-in) estimates for the covariance matrices are proposed. These alternatives are characterized by two parameters, the values of which are customized to individual situations by jointly minimizing a sample based estimate of future misclassification risk. Computationally fast implementations are presented, and the efficacy of the approach is examined through simulation studies and application to data. These studies indicate that in many circumstances dramatic gains in classification accuracy can be achieved.
Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance. Injecting noises to hidden units during training, e.g., dropout, is known as a successful regularizer, but it is still not clear enough why such training techniques work well in practice and how we can maximize their benefit in the presence of two conflicting objectives—optimizing to true data distribution and preventing overfitting by regularization. This paper addresses the above issues by 1) interpreting that the conventional training methods with regularization by noise injection optimize the lower bound of the true objective and 2) proposing a technique to achieve a tighter lower bound using multiple noise samples per training example in a stochastic gradient descent iteration. We demonstrate the effectiveness of our idea in several computer vision applications.
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review The framework of reinforcement learning or optimal control provides a mathematical formalization of intelligent decision making that is powerful and broadly applicable. While the general form of the reinforcement learning problem enables effective reasoning about uncertainty, the connection between reinforcement learning and inference in probabilistic models is not immediately obvious. However, such a connection has considerable value when it comes to algorithm design: formalizing a problem as probabilistic inference in principle allows us to bring to bear a wide array of approximate inference tools, extend the model in flexible and powerful ways, and reason about compositionality and partial observability. In this article, we will discuss how a generalization of the reinforcement learning or optimal control problem, which is sometimes termed maximum entropy reinforcement learning, is equivalent to exact probabilistic inference in the case of deterministic dynamics, and variational inference in the case of stochastic dynamics. We will present a detailed derivation of this framework, overview prior work that has drawn on this and related ideas to propose new reinforcement learning and control algorithms, and describe perspectives on future research.
Reinforcement Learning Applications We start with a brief introduction to reinforcement learning (RL), about its successful stories, basics, an example, issues, the ICML 2019 Workshop on RL for Real Life, how to use it, study material and an outlook. Then we discuss a selection of RL applications, including recommender systems, computer systems, energy, finance, healthcare, robotics, and transportation.
Reinforcement Learning for Learning of Dynamical Systems in Uncertain Environment: a Tutorial In this paper, a review of model-free reinforcement learning for learning of dynamical systems in uncertain environments has discussed. For this purpose, the Markov Decision Process (MDP) will be reviewed. Furthermore, some learning algorithms such as Temporal Difference (TD) learning, Q-Learning, and Approximate Q-learning as model-free algorithms which constitute the main part of this article have been investigated, and benefits and drawbacks of each algorithm will be discussed. The discussed concepts in each section are explaining with details and examples.
Reinforcement Learning in Healthcare: A Survey As a subfield of machine learning, \emph{reinforcement learning} (RL) aims at empowering one’s capabilities in behavioural decision making by using interaction experience with the world and an evaluative feedback. Unlike traditional supervised learning methods that usually rely on one-shot, exhaustive and supervised reward signals, RL tackles with sequential decision making problems with sampled, evaluative and delayed feedback simultaneously. Such distinctive features make RL technique a suitable candidate for developing powerful solutions in a variety of healthcare domains, where diagnosing decisions or treatment regimes are usually characterized by a prolonged and sequential procedure. This survey will discuss the broad applications of RL techniques in healthcare domains, in order to provide the research community with systematic understanding of theoretical foundations, enabling methods and techniques, existing challenges, and new insights of this emerging paradigm. By first briefly examining theoretical foundations and key techniques in RL research from efficient and representational directions, we then provide an overview of RL applications in a variety of healthcare domains, ranging from dynamic treatment regimes in chronic diseases and critical care, automated medical diagnosis from both unstructured and structured clinical data, as well as many other control or scheduling domains that have infiltrated many aspects of a healthcare system. Finally, we summarize the challenges and open issues in current research, and point out some potential solutions and directions for future research.
Reinforcement Learning: A Tutorial The purpose of this tutorial is to provide an introduction to reinforcement learning (RL) at a level easily understood by students and researchers in a wide range of disciplines. The intent is not to present a rigorous mathematical discussion that requires a great deal of effort on the part of the reader, but rather to present a conceptual framework that might serve as an introduction to a more rigorous study of RL. The fundamental principles and techniques used to solve RL problems are presented. The most popular RL algorithms are presented. Section 1 presents an overview of RL and provides a simple example to develop intuition of the underlying dynamic programming mechanism. In Section 2 the parts of a reinforcement learning problem are discussed. These include the environment, reinforcement function, and value function. Section 3 gives a description of the most widely used reinforcement learning algorithms. These include TD(l) and both the residual and direct forms of value iteration, Q-learning, and advantage learning. In Section 4 some of the ancillary issues in RL are briefly discussed, such as choosing an exploration strategy and an appropriate discount factor. The conclusion is given in Section 5. Finally, Section 6 is a glossary of commonly used terms followed by references in Section 7 and a bibliography of RL applications in Section 8. The tutorial structure is such that each section builds on the information provided in previous sections. It is assumed that the reader has some knowledge of learning algorithms that rely on gradient descent (such as the backpropagation of errors algorithm).
Relational Marginal Problems: Theory and Estimation In the propositional setting, the marginal problem is to find a (maximum-entropy) distribution that has some given marginals. We study this problem in a relational setting and make the following contributions. First, we compare two different notions of relational marginals. Second, we show a duality between the resulting relational marginal problems and the maximum likelihood estimation of the parameters of relational models, which generalizes a well-known duality from the propositional setting. Third, by exploiting the relational marginal formulation, we present a statistically sound method to learn the parameters of relational models that will be applied in settings where the number of constants differs between the training and test data. Furthermore, based on a relational generalization of marginal polytopes, we characterize cases where the standard estimators based on feature’s number of true groundings needs to be adjusted and we quantitatively characterize the consequences of these adjustments. Fourth, we prove bounds on expected errors of the estimated parameters, which allows us to lower-bound, among other things, the effective sample size of relational training data.
Relational Representation Learning for Dynamic (Knowledge) Graphs: A Survey Graphs arise naturally in many real-world applications including social networks, recommender systems, ontologies, biology, and computational finance. Traditionally, machine learning models for graphs have been mostly designed for static graphs. However, many applications involve evolving graphs. This introduces important challenges for learning and inference since nodes, attributes, and edges change over time. In this survey, we review the recent advances in representation learning for dynamic graphs, including dynamic knowledge graphs. We describe existing models from an encoder-decoder perspective, categorize these encoders and decoders based on the techniques they employ, and analyze the approaches in each category. We also review several prominent applications and widely used datasets, and highlight directions for future research.
Relative Worst-Order Analysis: A Survey Relative worst-order analysis is a technique for assessing the relative quality of online algorithms. We survey the most important results obtained with this technique and compare it with other quality measures.
Relief-Based Feature Selection: Introduction and Review Feature selection plays a critical role in data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. interactions, so that informative features are not mistakenly eliminated prior to downstream modeling. This paper focuses on Relief-based algorithms (RBAs), a unique family of filter-style feature selection algorithms that strike an effective balance between these objectives while flexibly adapting to various data characteristics, e.g. classification vs. regression. First, this work broadly examines types of feature selection and defines RBAs within that context. Next, we introduce the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features. Lastly, we include an expansive review of RBA methodological research beyond Relief and its popular descendant, ReliefF. In particular, we characterize branches of RBA research, and provide comparative summaries of RBA algorithms including contributions, strategies, functionality, time complexity, adaptation to key data characteristics, and software availability.
Representation Learning on Graphs: Methods and Applications Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph convolutional networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.
Representation Learning: A Review and New Perspectives The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning. Index Terms – Deep learning, representation learning, feature learning, unsupervised learning, Boltzmann Machine, autoencoder, neural nets
Reproducing scientists’ mobility: A data-driven model This paper makes two important contributions to understand the mobility patters of scientists. First, by combining two large-scale data sets covering the publications of 3.5 mio scientists over 60 years, we are able to reveal the geographical ‘career paths’ of scientists. Each path contains, on the individual level, information about the cities (resolved on real geographical space) and the time (in years) spent there. A statistical analysis gives empirical insights into the geographical distance scientists move for a new affiliation and their age when moving. From the individual career paths, we further reconstruct the world network of movements of scientists, where the nodes represent cities and the links in- and out-flow of scientists between cities. We analyze the topological properties of this network with respect to degree distribution, local clustering coefficients, path lengths and assortativity. The second important contribution is an agent-based model that allows to reproduce the empirical findings, both on the level of scientists and of the network. The model considers that agents have a fitness and consider potential new locations if they allow to increase this fitness. Locations on the other hand rank agents against their fitness and consider them only if they still have a capacity for them. This leads to a matching problem which is solved algorithmically. Using empirical data to calibrate our model and to determine its initial conditions, we are able to validate the model against the measured distributions. This allows to interpret the model assumptions as microbased decision rules that explain the observed mobility patterns of scientists.
Resonant features in a forced population of excitatory neurons In recent years, the study of coupled excitable oscillators has largely benefited from a new analytical technique developed by Ott and Antonsen. This technique allows to express the dynamics of certain macroscopic observable in the ensemble in terms of a reduced set of ordinary differential equations. This makes it possible to build low-dimensional models for the global activity of neural systems from first principles. We investigated the macroscopic response of a large set of excitatory neurons to different forcing strategies. We report resonant behavior, that depends on the heterogeneity between the units and their coupling strength. This contrasts with the type of response that an external forcing can elicit in simple and widely used phenomenological models.
Resource Elasticity for Distributed Data Stream Processing: A Survey and Future Directions Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructures, and Internet of Things, continuous data streams must be processed under very short delays. Several solutions, including multiple software engines, have been developed for processing unbounded data streams in a scalable and efficient manner. This paper surveys state of the art on stream processing engines and mechanisms for exploiting resource elasticity features of cloud computing in stream processing. Resource elasticity allows for an application or service to scale out/in according to fluctuating demands. Although such features have been extensively investigated for enterprise applications, stream processing poses challenges on achieving elastic systems that can make efficient resource management decisions based on current load. This work examines some of these challenges and discusses solutions proposed in the literature to address them.
Resource Management in Fog/Edge Computing: A Survey Contrary to using distant and centralized cloud data center resources, employing decentralized resources at the edge of a network for processing data closer to user devices, such as smartphones and tablets, is an upcoming computing paradigm, referred to as fog/edge computing. Fog/edge resources are typically resource-constrained, heterogeneous, and dynamic compared to the cloud, thereby making resource management an important challenge that needs to be addressed. This article reviews publications as early as 1991, with 85% of the publications between 2013-2018, to identify and classify the architectures, infrastructure, and underlying algorithms for managing resources in fog/edge computing.
Restricted Boltzmann Machines: Introduction and Review The restricted Boltzmann machine is a network of stochastic units with undirected interactions between pairs of visible and hidden units. This model was popularized as a building block of deep learning architectures and has continued to play an important role in applied and theoretical machine learning. Restricted Boltzmann machines carry a rich structure, with connections to geometry, applied algebra, probability, statistics, machine learning, and other areas. The analysis of these models is attractive in its own right and also as a platform to combine and generalize mathematical tools for graphical models with hidden variables. This article gives an introduction to the mathematical analysis of restricted Boltzmann machines, reviews recent results on the geometry of the sets of probability distributions representable by these models, and suggests a few directions for further investigation.
Rethinking the Artificial Neural Networks: A Mesh of Subnets with a Central Mechanism for Storing and Predicting the Data The Artificial Neural Networks (ANNs) have been originally designed to function like a biological neural network, but does an ANN really work in the same way as a biological neural network? As we know, the human brain holds information in its memory cells, so if the ANNs use the same model as our brains, they should store datasets in a similar manner. The most popular type of ANN architecture is based on a layered structure of neurons, whereas a human brain has trillions of complex interconnections of neurons continuously establishing new connections, updating existing ones, and removing the irrelevant connections across different parts of the brain. In this paper, we propose a novel approach to building ANNs which are truly inspired by the biological network containing a mesh of subnets controlled by a central mechanism. A subnet is a network of neurons that hold the dataset values. We attempt to address the following fundamental questions: (1) What is the architecture of the ANN model? Whether the layered architecture is the most appropriate choice? (2) Whether a neuron is a process or a memory cell? (3) What is the best way of interconnecting neurons and what weight-assignment mechanism should be used? (4) How to incorporate prior knowledge, bias, and generalizations for features extraction and prediction? Our proposed ANN architecture leverages the accuracy on textual data and our experimental findings confirm the effectiveness of our model. We also collaborate with the construction of the ANN model for storing and processing the images.
Review of Deep Learning In recent years, China, the United States and other countries, Google and other high-tech companies have increased investment in artificial intelligence. Deep learning is one of the current artificial intelligence research’s key areas. This paper analyzes and summarizes the latest progress and future research directions of deep learning. Firstly, three basic models of deep learning are outlined, including multilayer perceptrons, convolutional neural networks, and recurrent neural networks. On this basis, we further analyze the emerging new models of convolution neural networks and recurrent neural networks. This paper then summarizes deep learning’s applications in many areas of artificial intelligence, including voice, computer vision, natural language processing and so on. Finally, this paper discusses the existing problems of deep learning and gives the corresponding possible solutions.
Review. Machine learning techniques for traffic sign detection An automatic road sign detection system localizes road signs from within images captured by an on-board camera of a vehicle, and support the driver to properly ride the vehicle. Most existing algorithms include a preprocessing step, feature extraction and detection step. This paper arranges the methods applied to road sign detection into two groups: general machine learning, neural networks. In this review, the issues related to automatic road sign detection are addressed, the popular existing methods developed to tackle the road sign detection problem are reviewed, and a comparison of the features of these methods is composed.
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era The success of deep learning in vision can be attributed to: (a) models with high capacity; (b) increased computational power; and (c) availability of large-scale labeled data. Since 2012, there have been significant advances in representation capabilities of the models and computational capabilities of GPUs. But the size of the biggest dataset has surprisingly remained constant. What will happen if we increase the dataset size by 10x or 100x This paper takes a step towards clearing the clouds of mystery surrounding the relationship between `enormous data’ and deep learning. By exploiting the JFT-300M dataset which has more than 375M noisy labels for 300M images, we investigate how the performance of current vision tasks would change if this data was used for representation learning. Our paper delivers some surprising (and some expected) findings. First, we find that the performance on vision tasks still increases linearly with orders of magnitude of training data size. Second, we show that representation learning (or pre-training) still holds a lot of promise. One can improve performance on any vision tasks by just training a better base model. Finally, as expected, we present new state-of-the-art results for different vision tasks including image classification, object detection, semantic segmentation and human pose estimation. Our sincere hope is that this inspires vision community to not undervalue the data and develop collective efforts in building larger datasets.
Robust Principal Component Analysis This paper is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the l1 norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces.
ROC Curve, Lift Chart and Calibration Plot This paper presents ROC curve, lift chart and calibration plot, three well known graphical techniques that are useful for evaluating the quality of classification models used in data mining and machine learning. Each technique, normally used and studied separately, defines its own measure of classification quality and its visualization. Here, we give a brief survey of the methods and establish a common mathematical framework which adds some new aspects, explanations and interrelations between these techniques. We conclude with an empirical evaluation and a few examples on how to use the presented techniques to boost classification accuracy
Role of Bloom Filter in Big Data Research: A Survey Big Data is the most popular emerging trends that becomes a blessing for human kinds and it is the necessity of day-to-day life. For example, Facebook. Every person involves with producing data either directly or indirectly. Thus, Big Data is a high volume of data with exponential growth rate that consists of a variety of data. Big Data touches all fields, including Government sector, IT industry, Business, Economy, Engineering, Bioinformatics, and other basic sciences. Thus, Big Data forms a data silo. Most of the data are duplicates and unstructured. To deal with such kind of data silo, Bloom Filter is a precious resource to filter out the duplicate data. Also, Bloom Filter is inevitable in a Big Data storage system to optimize the memory consumption. Undoubtedly, Bloom Filter uses a tiny amount of memory space to filter a very large data size and it stores information of a large set of data. However, functionality of the Bloom Filter is limited to membership filter, but it can be adapted in various applications. Besides, the Bloom Filter is deployed in diverse field, and also used in the interdisciplinary research area. Bioinformatics, for instance. In this article, we expose the usefulness of Bloom Filter in Big Data research.
RStorm: Developing and Testing Streaming Algorithms in R Streaming data, consisting of indefinitely evolving sequences, are becoming ubiquitous in many branches of science and in various applications. Computer Scientists have developed streaming applications such as Storm and the S4 distributed stream computing platform1 to deal with data streams. However, in current production packages testing and evaluating streaming algorithms is cumbersome. This paper presents RStorm for the development and evaluation of streaming algorithms analogous to these production packages, but implemented fully in R. RStorm allows developers of streaming algorithms to quickly test, iterate, and evaluate various implementations of streaming algorithms. The paper provides both a canonical computer science example, the streaming word count, and examples of several statistical applications of RStorm.
Rule-Mining based classification: a benchmark study This study proposed an exhaustive stable/reproducible rule-mining algorithm combined to a classifier to generate both accurate and interpretable models. Our method first extracts rules (i.e., a conjunction of conditions about the values of a small number of input features) with our exhaustive rule-mining algorithm, then constructs a new feature space based on the most relevant rules called ‘local features’ and finally, builds a local predictive model by training a standard classifier on the new local feature space. This local feature space is easy interpretable by providing a human-understandable explanation under the explicit form of rules. Furthermore, our local predictive approach is as powerful as global classical ones like logistic regression (LR), support vector machine (SVM) and rules based methods like random forest (RF) and gradient boosted tree (GBT).
Rules of Machine Learning: Best Practices for ML Engineering This document is intended to help those with a basic knowledge of machine learning get the benefit of best practices in machine learning from around Google. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming. If you have taken a class in machine learning, or built or worked on a machinelearned model, then you have the necessary background to read this document.
Run Time Prediction for Big Data Iterative ML Algorithms: a KMeans case study Data science and machine learning algorithms running on big data infrastructure are increasingly important in activities ranging from business intelligence and analytics to cybersecurity, smart city management, and many fields of science and engineering. As these algorithms are further integrated into daily operations, understanding how long they take to run on a big data infrastructure is paramount to controlling costs and delivery times. In this paper we discuss the issues involved in understanding the run time of iterative machine learning algorithms and provide a case study of such an algorithm – including a statistical characterization and model of the run time of an implementation of K-Means for the Spark big data engine using the Edward probabilistic programming language.

S

Saliency Prediction in the Deep Learning Era: An Empirical Investigation Visual saliency models have enjoyed a big leap in performance in recent years, thanks to advances in deep learning and large scale annotated data. Despite enormous effort and huge breakthroughs, however, models still fall short in reaching human-level accuracy. In this work, I explore the landscape of the field emphasizing on new deep saliency models, benchmarks, and datasets. A large number of image and video saliency models are reviewed and compared over two image benchmarks and two large scale video datasets. Further, I identify factors that contribute to the gap between models and humans and discuss remaining issues that need to be addressed to build the next generation of more powerful saliency models. Some specific questions that are addressed include: in what ways current models fail, how to remedy them, what can be learned from cognitive studies of attention, how explicit saliency judgments relate to fixations, how to conduct fair model comparison, and what are the emerging applications of saliency models.
Salient Object Detection in the Deep Learning Era: An In-Depth Survey As an important problem in computer vision, salient object detection (SOD) from images has been attracting an increasing amount of research effort over the years. Recent advances in SOD, not surprisingly, are dominantly led by deep learning-based solutions (named deep SOD) and reflected by hundreds of published papers. To facilitate the in-depth understanding of deep SODs, in this paper we provide a comprehensive survey covering various aspects ranging from algorithm taxonomy to unsolved open issues. In particular, we first review deep SOD algorithms from different perspectives including network architecture, level of supervision, learning paradigm and object/instance level detection. Following that, we summarize existing SOD evaluation datasets and metrics. Then, we carefully compile a thorough benchmark results of SOD methods based on previous work, and provide detailed analysis of the comparison results. Moreover, we study the performance of SOD algorithms under different attributes, which have been barely explored previously, by constructing a novel SOD dataset with rich attribute annotations. We further analyze, for the first time in the field, the robustness and transferability of deep SOD models w.r.t. adversarial attacks. We also look into the influence of input perturbations, and the generalization and hardness of existing SOD datasets. Finally, we discuss several open issues and challenges of SOD, and point out possible research directions in future. All the saliency prediction maps, our constructed dataset with annotations, and codes for evaluation are made publicly available at https://…/SODsurvey.
SAP HANA for Next-Generation Business: Applications and Real-Time Analytics Explore and Analyze Vast Quantities of Data from Virtually Any Source at the Speed of Thought SAP has introduced a new class of solutions that powers the next generation of business applications. The SAP HANA database is an in-memory database that combines transactional data processing, analytical data processing, and application logic processing functionality in memory. SAP HANA removes the limits of traditional database architecture that have severely constrained how business applications can be developed to support real-time business.
SAP Predictive Analysis – Real Life Use Case Predicting Who Will BuyAdditional Insurance This paper provides a step-by-step description, including screenshots, to evaluate how SAP Predictive Analysis and SAP InfiniteInsight, which as of this writing are now sold as a bundle, can be used to predict the potential customers who will buy additional products based on their behavior of interest. Using the combined strength of both SAP InfiniteInsight and SAP Predictive Analysis this article will also demonstrate how these two products can fulfill the challenge and also supplement each other to provide even better prediction models. This article is for educational purposes only and uses actual data from an insurance company. The data comes from the CoIL (Computational Intelligence and Learning) challenge from year 2000, which had the following goal: ‘Can you predict who would be interested in buying a caravan insurance policy and give an explanation why ‘ After reading this article you will be able to understand the differences between classification algorithms. You will learn how to simplify a dataset by determining which variables are important and vice versa, score a model and also how to use SAP Predictive Analysis and SAP InfiniteInsight to build models on existing data and run the custom models on new data. The authors of this article have closely observed the online Predictive Analysis community where members are constantly looking for two important things; which statistical algorithm to choose for which business case and real life business cases with actual data. The latter is certainly difficult to find but immensely crucial in understanding of this subject. Of course data is vital to each company and in today´s competitive market, gives a competitive advantage. Finding a real life case for educational purposes with actual data is a big challenge.
Scalable Strategies for Computing with Massive Data This paper presents two complementary statistical computing frameworks that address challenges in parallel processing and the analysis of massive data. First, the foreach package allows users of the R programming environment to de ne parallel loops that may be run sequentially on a single machine, in parallel on a symmetric multiprocessing (SMP) machine, or in cluster environments without platform-speci c code. Second, the bigmemory package implements memory- and le-mapped data structures that provide (a) access to arbitrarily large data while retaining a look and feel that is familiar to R users and (b) data structures that are shared across processor cores in order to support e cient parallel computing techniques. Although these packages may be used independently, this paper shows how they can be used in combination to address challenges that have e ectively been beyond the reach of researchers who lack specialized software development skills or expensive hardware.
Score Aggregation Techniques in Retrieval Experimentation Comparative evaluations of information retrieval systems are based on a number of key premises, including that representative topic sets can be created, that suitable relevance judgements can be generated, and that systems can be sensibly compared based on their aggregate performance over the selected topic set. This paper considers the role of the third of these assumptions – that the performance of a system on a set of topics can be represented by a single overall performance score such as the average, or some other central statistic. In particular, we experiment with score aggregation techniques including the arithmetic mean, the geometric mean, the harmonic mean, and the median. Using past TREC runs we show that an adjusted geometricmean providesmore consistent system rankings than the arithmetic mean when a significant fraction of the individual topic scores are close to zero, and that score standardization (Webber et al., SIGIR 2008) achieves the same outcome in a more consistent manner.
Searching Toward Pareto-Optimal Device-Aware Neural Architectures Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding. However, most existing works only optimize for model accuracy and largely ignore other important factors imposed by the underlying hardware and devices, such as latency and energy, when making inference. In this paper, we first introduce the problem of NAS and provide a survey on recent works. Then we deep dive into two recent advancements on extending NAS into multiple-objective frameworks: MONAS and DPP-Net. Both MONAS and DPP-Net are capable of optimizing accuracy and other objectives imposed by devices, searching for neural architectures that can be best deployed on a wide spectrum of devices: from embedded systems and mobile devices to workstations. Experimental results are poised to show that architectures found by MONAS and DPP-Net achieves Pareto optimality w.r.t the given objectives for various devices.
Security and Privacy Aspects in MapReduce on Clouds: A Survey MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed computation tool for a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and analysis of social networks. Security and privacy of data and MapReduce computations are essential concerns when a MapReduce computation is executed in public or hybrid clouds. In order to execute a MapReduce job in public and hybrid clouds, authentication of mappers-reducers, confidentiality of data-computations, integrity of data-computations, and correctness-freshness of the outputs are required. Satisfying these requirements shield the operation from several types of attacks on data and MapReduce computations. In this paper, we investigate and discuss security and privacy challenges and requirements, considering a variety of adversarial capabilities, and characteristics in the scope of MapReduce. We also provide a review of existing security and privacy protocols for MapReduce and discuss their overhead issues.
Security and Privacy of Sensitive Data in Cloud Computing: A Survey of Recent Developments Cloud computing is revolutionizing many ecosystems by providing organizations with computing resources featuring easy deployment, connectivity, configuration, automation and scalability. This paradigm shift raises a broad range of security and privacy issues that must be taken into consideration. Multi-tenancy, loss of control, and trust are key challenges in cloud computing environments. This paper reviews the existing technologies and a wide array of both earlier and state-of-the-art projects on cloud security and privacy. We categorize the existing research according to the cloud reference architecture orchestration, resource control, physical resource, and cloud service management layers, in addition to reviewing the existing developments in privacy-preserving sensitive data approaches in cloud computing such as privacy threat modeling and privacy enhancing protocols and solutions.
Security for Machine Learning-based Systems: Attacks and Challenges during Training and Inference The exponential increase in dependencies between the cyber and physical world leads to an enormous amount of data which must be efficiently processed and stored. Therefore, computing paradigms are evolving towards machine learning (ML)-based systems because of their ability to efficiently and accurately process the enormous amount of data. Although ML-based solutions address the efficient computing requirements of big data, they introduce (new) security vulnerabilities into the systems, which cannot be addressed by traditional monitoring-based security measures. Therefore, this paper first presents a brief overview of various security threats in machine learning, their respective threat models and associated research challenges to develop robust security measures. To illustrate the security vulnerabilities of ML during training, inferencing and hardware implementation, we demonstrate some key security threats on ML using LeNet and VGGNet for MNIST and German Traffic Sign Recognition Benchmarks (GTSRB), respectively. Moreover, based on the security analysis of ML-training, we also propose an attack that has a very less impact on the inference accuracy. Towards the end, we highlight the associated research challenges in developing security measures and provide a brief overview of the techniques used to mitigate such security threats.
Security Matters: A Survey on Adversarial Machine Learning Adversarial machine learning is a fast growing research area, which considers the scenarios when machine learning systems may face potential adversarial attackers, who intentionally synthesize input data to make a well-trained model to make mistake. It always involves a defending side, usually a classifier, and an attacking side that aims to cause incorrect output. The earliest studies of the adversarial learning starts from the information security area, which considers a variety of possible attacks. But recent research focus that popularized by the deep learning community places strong emphasis on how the ‘imperceivable’ perturbations on the normal inputs may cause dramatic mistakes by the deep learning with supposed super-human accuracy. This paper serves to give a comprehensive introduction to a wide range of aspects of the adversarial deep learning topic, including its foundations, typical attacking and defending strategies, and some extended studies. We also share our points of view on the root cause of its existence and possible future directions of this research field.
Security-related Research in Ubiquitous Computing — Results of a Systematic Literature Review In an endeavor to reach the vision of ubiquitous computing where users are able to use pervasive services without spatial and temporal constraints, we are witnessing a fast growing number of mobile and sensor-enhanced devices becoming available. However, in order to take full advantage of the numerous benefits offered by novel mobile devices and services, we must address the related security issues. In this paper, we present results of a systematic literature review (SLR) on security-related topics in ubiquitous computing environments. In our study, we found 5165 scientific contributions published between 2003 and 2015. We applied a systematic procedure to identify the threats, vulnerabilities, attacks, as well as corresponding defense mechanisms that are discussed in those publications. While this paper mainly discusses the results of our study, the corresponding SLR protocol which provides all details of the SLR is also publicly available for download.
See, Hear, and Read: Deep Aligned Representations We capitalize on large amounts of readily-available, synchronous data to learn a deep discriminative representations shared across three major natural modalities: vision, sound and language. By leveraging over a year of sound from video and millions of sentences paired with images, we jointly train a deep convolutional network for aligned representation learning. Our experiments suggest that this representation is useful for several tasks, such as cross-modal retrieval or transferring classifiers between modalities. Moreover, although our network is only trained with image+text and image+sound pairs, it can transfer between text and sound as well, a transfer the network never observed during training. Visualizations of our representation reveal many hidden units which automatically emerge to detect concepts, independent of the modality.
Seeing the forest for the trees An investigation of network knowledge This paper assesses the empirical content of one of the most prevalent assumptions in the economics of networks literature, namely the assumption that decision makers have full knowledge about the networks they interact on. Using network data from 75 villages, we ask 4,554 individuals to assess whether five randomly chosen pairs of households in their village are linked through financial, social, and informational relationships. We find that network knowledge is low and highly localized, declining steeply with the pair’s network distance to the respondent. 46% of respondents are not even able to offer a guess about the status of a potential link between a given pair of individuals. Even when willing to offer a guess, respondents can only correctly identify the links 37% of the time. We also find that a one-step increase in the social distance to the pair corresponds to a 10pp increase in the probability of misidentifying the link. We then investigate the theoretical implications of this assumption by showing that the predictions of various models change substantially if agents behave under the more realistic assumption of incomplete knowledge about the network. Taken together, our results suggest that the assumption of full network knowledge (i) may serve as a poor approximation to the real world and (ii) is not innocuous: allowing for incomplete network knowledge may have first-order implications for a range of qualitative and quantitative results in various contexts.
Seeking evidence of absence: Reconsidering tests of model assumptions Statistical tests can only reject the null hypothesis, never prove it. However, when researchers test modeling assumptions, they often interpret the failure to reject a null of ‘no violation’ as evidence that the assumption holds. We discuss the statistical and conceptual problems with this approach. We show that equivalence/non-inferiority tests, while giving correct Type I error, have low power to rule out many violations that are practically significant. We suggest sensitivity analyses that may be more appropriate than hypothesis testing.
Selecting a Visual Analytics Application Not surprisingly, everywhere you look, software companies are adopting the terms ‘visual analytics’ and ‘interactive data visualization.’ Tools that do little more than produce charts and dashboards are now laying claim to the label. How can you tell the cleverly named from the genuine What should you look for It´s important to know the defining characteristics of visual analytics before you shop. This paper introduces you to the seven essential elements of true visual analytics applications.
Self-Organization and Artificial Life: A Review Self-organization has been an important concept within a number of disciplines, which Artificial Life (ALife) also has heavily utilized since its inception. The term and its implications, however, are often confusing or misinterpreted. In this work, we provide a mini-review of self-organization and its relationship with ALife, aiming at initiating discussions on this important topic with the interested audience. We first articulate some fundamental aspects of self-organization, outline its usage, and review its applications to ALife within its soft, hard, and wet domains. We also provide perspectives for further research.
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the main components and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used image and video datasets and the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.
Semantic Composition via Probabilistic Model Theory Semantic composition remains an open problem for vector space models of semantics. In this paper, we explain how the probabilistic graphical model used in the framework of Functional Distributional Semantics can be interpreted as a probabilistic version of model theory. Building on this, we explain how various semantic phenomena can be recast in terms of conditional probabilities in the graphical model. This connection between formal semantics and machine learning is helpful in both directions: it gives us an explicit mechanism for modelling context-dependent meanings (a challenge for formal semantics), and also gives us well-motivated techniques for composing distributed representations (a challenge for distributional semantics). We present results on two datasets that go beyond word similarity, showing how these semantically-motivated techniques improve on the performance of vector models.
Sentiment Analysis of Twitter Data :A Survey of Techniques With the advancement of web technology and its growth, there is a huge volume of data present in the web for internet users and a lot of data is generated too. Internet has become a platform for online learning, exchanging ideas and sharing opinions. Social networking sites like Twitter, Facebook, Google+ are rapidly gaining popularity as they allow people to share and express their views about topics,have discussion with different communities, or post messages across the world. There has been lot of work in the field of sentiment analysis of twitter data. This survey focuses mainly on sentiment analysis of twitter data which is helpful to analyze the information in the tweets where opinions are highly unstructured, heterogeneous and are either positive or negative, or neutral in some cases. In this paper, we provide a survey and a comparative analyses of existing techniques for opinion mining like machine learning and lexicon-based approaches, together with evaluation metrics. Using various machine learning algorithms like Naive Bayes, Max Entropy, and Support Vector Machine, we provide a research on twitter data streams. We have also discussed general challenges and applications of Sentiment Analysis on Twitter
Sentiment/Subjectivity Analysis Survey for Languages other than English Subjective and sentiment analysis has gained considerable attention recently. Most of the resources and systems built so far are done for English. The need for designing systems for other languages is increasing. This paper surveys different ways used for building systems for subjective and sentiment analysis for languages other than English. There are three different types of systems used for building these systems. The first (and the best) one is the language specific systems. The second type of systems involves reusing or transferring sentiment resources from English to the target language. The third type of methods is based on using language independent methods. The paper presents a separate section devoted to Arabic sentiment analysis.
Sequence-Aware Recommender Systems Recommender systems are one of the most successful applications of data mining and machine learning technology in practice. Academic research in the field is historically often based on the matrix completion problem formulation, where for each user-item-pair only one interaction (e.g., a rating) is considered. In many application domains, however, multiple user-item interactions of different types can be recorded over time. And, a number of recent works have shown that this information can be used to build richer individual user models and to discover additional behavioral patterns that can be leveraged in the recommendation process. In this work we review existing works that consider information from such sequentially-ordered user- item interaction logs in the recommendation process. Based on this review, we propose a categorization of the corresponding recommendation tasks and goals, summarize existing algorithmic solutions, discuss methodological approaches when benchmarking what we call sequence-aware recommender systems, and outline open challenges in the area.
Sequences, yet Functions: The Dual Nature of Data-Stream Processing Data-stream processing has continuously risen in importance as the amount of available data has been steadily increasing over the last decade. Besides traditional domains such as data-center monitoring and click analytics, there is an increasing number of network-enabled production machines that generate continuous streams of data. Due to their continuous nature, queries on data-streams can be more complex, and distinctly harder to understand then database queries. As users have to consider operational details, maintenance and debugging become challenging. Current approaches model data-streams as sequences, because this is the way they are physically received. These models result in an implementation-focused perspective. We explore an alternate way of modeling datastreams by focusing on time-slicing semantics. This focus results in a model based on functions, which is better suited for reasoning about query semantics. By adapting the definitions of relevant concepts in stream processing to our model, we illustrate the practical useful- ness of our approach. Thereby, we link data-streams and query primitives to concepts in functional programming and mathematics. Most noteworthy, we prove that data-streams are monads, and show how to derive monad definitions for current data-stream models. We provide an abstract, yet practical perspective on data- stream related subjects based on a sound, consistent query model. Our work can serve as solid foundation for future data-stream query-languages.
Sequential Combining of Expert Information Using Mathematica In every real-world domain where reasoning under uncertainty is required, combining information from different sources (‘experts´) can be really a powerful tool to enhance accuracy and precision of the ‘final´ estimate of the unknown quantity. Bayesian paradigm offers a coherent perspective which can be used to address the problem, but an issue strictly related to information combining is how to perform an efficient process of sequential consulting: at each stage, the investigator can select the ‘best´ expert to be consulted and choose whether to stop or continue the consulting. The aim of this paper is to rephrase the Bayesian combining algorithm in a sequential context and use Mathematica to implement suitable selecting and stopping rules.
Sequential Pattern Mining – Approaches and Algorithms Sequences of events, items or tokens occurring in an ordered metric space appear often in data and the requirement to detect and analyse frequent subsequences is a common problem. Sequential Pattern Mining arose as a sub-field of data mining to focus on this field. This paper surveys the approaches and algorithms proposed to date.
Serverless Computing: Current Trends and Open Problems Serverless computing has emerged as a new compelling paradigm for the deployment of applications and services. It represents an evolution of cloud programming models, abstractions, and platforms, and is a testament to the maturity and wide adoption of cloud technologies. In this chapter, we survey existing serverless platforms from industry, academia, and open source projects, identify key characteristics and use cases, and describe technical challenges and open problems.
Set optimization – a rather short introduction Recent developments in set optimization are surveyed and extended including various set relations as well as fundamental constructions of a convex analysis for set- and vector-valued functions, and duality for set optimization problems. Extensive sections with bibliographical comments summarize the state of the art. Applica- tions to vector optimization and financial risk measures are discussed along with algorithmic approaches to set optimization problems.
Seven Myths in Machine Learning Research We present seven myths commonly believed to be true in machine learning research, circa Feb 2019. This is an archival copy of the blog post at https://…/seven-myths-in-machine-learning-research
Shallow and Deep Networks Intrusion Detection System: A Taxonomy and Survey Intrusion detection has attracted a considerable interest from researchers and industries. The community, after many years of research, still faces the problem of building reliable and efficient IDS that are capable of handling large quantities of data, with changing patterns in real time situations. The work presented in this manuscript classifies intrusion detection systems (IDS). Moreover, a taxonomy and survey of shallow and deep networks intrusion detection systems is presented based on previous and current works. This taxonomy and survey reviews machine learning techniques and their performance in detecting anomalies. Feature selection which influences the effectiveness of machine learning (ML) IDS is discussed to explain the role of feature selection in the classification and training phase of ML IDS. Finally, a discussion of the false and true positive alarm rates is presented to help researchers model reliable and efficient machine learning based intrusion detection systems.
Shannon’s entropy and its Generalizations towards Statistics, Reliability and Information Science during 1948-2018 Starting from the pioneering works of Shannon and Weiner in 1948, a plethora of works have been reported on entropy in different directions. Entropy-related review work in the direction of statistics, reliability and information science, to the best of our knowledge, has not been reported so far. Here we have tried to collect all possible works in this direction during the period 1948-2018 so that people interested in entropy, specially the new researchers, get benefited.
Shannon’s entropy and its Generalizations towards Statistics, Reliability and Information Science during 1948-2018 Starting from the pioneering works of Shannon and Weiner in 1948, a plethora of works have been reported on entropy in different directions. Entropy-related review work in the direction of statistics, reliability and information science, to the best of our knowledge, has not been reported so far. Here we have tried to collect all possible works in this direction during the period 1948-2018 so that people interested in entropy, specially the new researchers, get benefited.
Shiny Cheat Sheet (Cheat Sheet)
Short Text Topic Modeling Techniques, Applications, and Performance: A Survey Analyzing short texts infers discriminative and coherent latent topics that is a critical and fundamental task since many real-world applications require semantic understanding of short texts. Traditional long text topic modeling algorithms (e.g., PLSA and LDA) based on word co-occurrences cannot solve this problem very well since only very limited word co-occurrence information is available in short texts. Therefore, short text topic modeling has already attracted much attention from the machine learning research community in recent years, which aims at overcoming the problem of sparseness in short texts. In this survey, we conduct a comprehensive review of various short text topic modeling techniques proposed in the literature. We present three categories of methods based on Dirichlet multinomial mixture, global word co-occurrences, and self-aggregation, with example of representative approaches in each category and analysis of their performance on various tasks. We develop the first comprehensive open-source library, called STTM, for use in Java that integrates all surveyed algorithms within a unified interface, benchmark datasets, to facilitate the expansion of new methods in this research field. Finally, we evaluate these state-of-the-art methods on many real-world datasets and compare their performance against one another and versus long text topic modeling algorithm.
Siamese Learning Visual Tracking: A Survey The aim of this survey is an attempt to review the kind of machine learning and stochastic techniques and the ways existing work currently uses machine learning and stochastic methods for the challenging problem of visual tracking. It is not intended to study the whole tracking literature of the last decades as this seems impossible by the incredible vast number of published papers. This first draft version of the article focuses very targeted on recent literature that suggests Siamese networks for the learning of tracking. This approach promise a step forward in terms of robustness, accuracy and computational efficiency. For example, the representative tracker SINT performs currently best on the popular OTB-2013 benchmark with AuC/IoU/prec. 65.5/62.5/84.8 % for the one-pass experiment (OPE). The CVPR’17 work CVNet by the Oxford group shows the approach’s large potential of HW/SW co-design with network memory needs around 600 kB and frame-rates of 75 fps and beyond. Before a detailed description of this approach is given, the article recaps the definition of tracking, the current state-of-the-art view on designing algorithms and the state-of-the-art of trackers by summarising insights from existing literature. In future, the article will be extended by the review of two alternative approaches, the one using very general recurrent networks such as the Long Shortterm Memory (LSTM) networks and the other most obvious approach of applying sole convolutional networks (CNN), the earliest approach since the idea of deep learning tracking appeared at NIPS’13.
Silence The cost of communication is a substantial factor affecting the scalability of many distributed applications. Every message sent can incur a cost in storage, computation, energy and bandwidth. Consequently, reducing the communication costs of distributed applications is highly desirable. The best way to reduce message costs is by communicating without sending any messages whatsoever. This paper initiates a rigorous investigation into the use of silence in synchronous settings, in which processes can fail. We formalize sufficient conditions for information transfer using silence, as well as necessary conditions for particular cases of interest. This allows us to identify message patterns that enable communication through silence. In particular, a pattern called a {\em silent choir} is identified, and shown to be central to information transfer via silence in failure-prone systems. The power of the new framework is demonstrated on the {\em atomic commitment} problem (AC). A complete characterization of the tradeoff between message complexity and round complexity in the synchronous model with crash failures is provided, in terms of lower bounds and matching protocols. In particular, a new message-optimal AC protocol is designed using silence, in which processes decide in~3 rounds in the common case. This significantly improves on the best previously known message-optimal AC protocol, in which decisions were performed in $\Theta(n)$ rounds.
Simple Ain´t Easy: Real-World Problems with Basic Summary Statistics In applied statistical work, the use of even the most basic summary statistics, like means, medians and modes, can be seriously problematic. When forced to choose a single summary statistic, many considerations come into practice. This repo attempts to describe some of the non-obvious properties possessed by standard statistical methods so that users can make informed choices about methods.
Simulation optimization: A review of algorithms and applications Simulation Optimization (SO) refers to the optimization of an objective function subject to constraints, both of which can be evaluated through a stochastic simulation. To address specific features of a particular simulation—discrete or continuous decisions, expensive or cheap simulations, single or multiple outputs, homogeneous or heterogeneous noise—various algorithms have been proposed in the literature. As one can imagine, there exist several competing algorithms for each of these classes of problems. This document emphasizes the difficulties in simulation optimization as compared to mathematical programming, makes reference to state-of-the-art algorithms in the field, examines and contrasts the different approaches used, reviews some of the diverse applications that have been tackled by these methods, and speculates on future directions in the field.
Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods Small data challenges have emerged in many learning problems, since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is expensive to collect. To address it, many efforts have been made on training complex models with small data in an unsupervised and semi-supervised fashion. In this paper, we will review the recent progresses on these two major categories of methods. A wide spectrum of small data models will be categorized in a big picture, where we will show how they interplay with each other to motivate explorations of new ideas. We will review the criteria of learning the transformation equivariant, disentangled, self-supervised and semi-supervised representations, which underpin the foundations of recent developments. Many instantiations of unsupervised and semi-supervised generative models have been developed on the basis of these criteria, greatly expanding the territory of existing autoencoders, generative adversarial nets (GANs) and other deep networks by exploring the distribution of unlabeled data for more powerful representations. While we focus on the unsupervised and semi-supervised methods, we will also provide a broader review of other emerging topics, from unsupervised and semi-supervised domain adaptation to the fundamental roles of transformation equivariance and invariance in training a wide spectrum of deep networks. It is impossible for us to write an exclusive encyclopedia to include all related works. Instead, we aim at exploring the main ideas, principles and methods in this area to reveal where we are heading on the journey towards addressing the small data challenges in this big data era.
Small Sample Learning in Big Data Era As a promising area in artificial intelligence, a new learning paradigm, called Small Sample Learning (SSL), has been attracting prominent research attention in the recent years. In this paper, we aim to present a survey to comprehensively introduce the current techniques proposed on this topic. Specifically, current SSL techniques can be mainly divided into two categories. The first category of SSL approaches can be called ‘concept learning’, which emphasizes learning new concepts from only few related observations. The purpose is mainly to simulate human learning behaviors like recognition, generation, imagination, synthesis and analysis. The second category is called ‘experience learning’, which usually co-exists with the large sample learning manner of conventional machine learning. This category mainly focuses on learning with insufficient samples, and can also be called small data learning in some literatures. More extensive surveys on both categories of SSL techniques are introduced and some neuroscience evidences are provided to clarify the rationality of the entire SSL regime, and the relationship with human learning process. Some discussions on the main challenges and possible future research directions along this line are also presented.
Smart Data – Innovationen aus Daten Das Bundesministerium für Wirtschaft und Technologie wird mit dem Technologiewettbewerb „Smart Data – Innovationen aus Daten’ Forschungs- und Entwicklungsaktivitäten (FuE-Aktivitäten) fördern, die den zukünftigen Markt um Big Data für die Wirtschaft am Standort Deutschland nachhaltig erschließen. Studien prognostizieren einen rasanten Anstieg des weltweiten Umsatzvolumens mit Big Data auf über 15 Mrd. € im Jahr 2016 (Deutschland: 1,6 Mrd. €). Deutschland hat gute Chancen, im Bereich der skalierbaren Datenmanagement und -analysesysteme international eine führende Rolle einzunehmen. Sowohl etablierte Unternehmen der deutschen IT-Wirtschaft, zahlreiche Forschungseinrichtungen wie auch diverse Start-Ups sind im Umfeld von Big Data bereits aktiv. Mit „Smart Data’ soll ein Schwerpunkt auf die Entwicklung von innovativen Diensten und Dienstleistungen gelegt werden, um eine frühzeitige breitenwirksame Nutzung voranzutreiben. Die Verwertung der Big Data-Technologien steht noch weitgehend am Beginn und konzentriert sich dabei auf einige spezifische Bereiche wie Online-Werbung und E-Commerce in größeren Unternehmen und Organisationen. Von den entstehenden Lösungen wird erwartet, dass sie aufgrund ihrer Handhabbarkeit vor allem in Bezug auf Datensicherheit und Datenqualität in der Wirtschaft leicht Anklang finden. Insbesondere sollen durch die FuE-Aktivitäten innovative Systemlösungen für kleine und mittelständische Unternehmen (KMU) entstehen. „Smart Data’ steht für eine über die Technologieentwicklung hinaus gehende anwendungsnahe Perspektive, die auch KMU eine attraktive und rechtssichere Nutzung und Verwertung von Massendaten ermöglich. Dazu gehört auch, die grundlegenden Rahmenbedingungen, z.B. den Rechtsrahmen für die Nutzung von Big Data, zu adressieren. Gesucht werden Projekte mit Leuchtturmcharakter zur Beseitigung technischer, struktureller, organisatorischer und rechtlicher Hemmnisse für den Einsatz von Big Data-Technologien. Die Projekte sollen in den Anwendungs-bereichen Industrie, Mobilität, Energie und Gesundheit angesiedelt sein. Das Technologieprogramm „Smart Data’ folgt den Zielstellungen der IKT-Strategie „Deutschland Digital 2015′ der Bundesregierung sowie denen des Zukunftsprojekts „Internetbasierte Dienste für die Wirtschaft’ im Rahmen der Hightech-Strategie 2020 und liegt somit im erheblichen Bundesinteresse. Das Programm knüpft an wichtige Basistechnologien und Standards als Grundlage von Big Data an, die zum Beispiel in anderen BMWi-Technologieprogrammen wie THESEUS, Trusted Cloud, Autonomik für Industrie 4.0, Elektromobilität und E-Energy entwickelt wurden oder noch werden. Synergieeffekte mit dem vom Bundesministerium für Bildung und Forschung (BMBF) geförderten Programm „Management und Analyse großer Datenmengen (Big Data)’ oder mit korrespondierenden Programmen der Europäischen Kommission sind erwünscht.
Social Internet of Things and New Generation Computing — A Survey Social Internet of Things (SIoT) tries to overcome the challenges of Internet of Things (IoT) such as scalability, trust and discovery of resources, by inspiration from social computing. This survey aims to investigate the research done on SIoT from two perspectives including application domain and the integration to the new computing models. For this, a two-dimensional framework is proposed and the projects are investigated, accordingly. The first dimension considers and classifies available research from the application domain perspective and the second dimension performs the same from the integration to new computing models standpoint. The aim is to technically describe SIoT, to classify related research, to foster the dissemination of state-of-the-art, and to discuss open research directions in this field.
Social Media-based User Embedding: A Literature Review Automated representation learning is behind many recent success stories in machine learning. It is often used to transfer knowledge learned from a large dataset (e.g., raw text) to tasks for which only a small number of training examples are available. In this paper, we review recent advance in learning to represent social media users in low-dimensional embeddings. The technology is critical for creating high performance social media-based human traits and behavior models since the ground truth for assessing latent human traits and behavior is often expensive to acquire at a large scale. In this survey, we review typical methods for learning a unified user embeddings from heterogeneous user data (e.g., combines social media texts with images to learn a unified user representation). Finally we point out some current issues and future directions.
Social Network Fusion and Mining: A Survey Looking from a global perspective, the landscape of online social networks is highly fragmented. A large number of online social networks have appeared, which can provide users with various types of services. Generally, the information available in these online social networks is of diverse categories, which can be represented as heterogeneous social networks (HSN) formally. Meanwhile, in such an age of online social media, users usually participate in multiple online social networks simultaneously to enjoy more social networks services, who can act as bridges connecting different networks together. So multiple HSNs not only represent information in single network, but also fuse information from multiple networks. Formally, the online social networks sharing common users are named as the aligned social networks, and these shared users who act like anchors aligning the networks are called the anchor users. The heterogeneous information generated by users’ social activities in the multiple aligned social networks provides social network practitioners and researchers with the opportunities to study individual user’s social behaviors across multiple social platforms simultaneously. This paper presents a comprehensive survey about the latest research works on multiple aligned HSNs studies based on the broad learning setting, which covers 5 major research tasks network alignment, link prediction, community detection, information diffusion and network embedding respectively.
Software Alchemy: Turning Complex Statistical Computations into Embarrassingly-Parallel Ones The growth in the use of computationally intensive statistical procedures, especially with big data, has necessitated the usage of parallel computation on diverse platforms such as multicore, GPUs, clusters and clouds. However, slowdown due to interprocess communication costs typically limits such methods to ’embarrassingly parallel’ (EP) algorithms, especially on non-shared memory platforms. This paper develops a broadlyapplicable method for converting many non-EP algorithms into statistically equivalent EP ones. The method is shown to yield excellent levels of speedup for a variety of statistical computations. It also overcomes certain problems of memory limitations.
Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks Several papers have recently contained reports on applying machine learning (ML) to the automation of software engineering (SE) tasks, such as project management, modeling and development. However, there appear to be no approaches comparing how software engineers fare against machine-learning algorithms as applied to specific software development tasks. Such a comparison is essential to gain insight into which tasks are better performed by humans and which by machine learning and how cooperative work or human-in-the-loop processes can be implemented more effectively. In this paper, we present an empirical study that compares how software engineers and machine-learning algorithms perform and reuse tasks. The empirical study involves the synthesis of the control structure of an autonomous streetlight application. Our approach consists of four steps. First, we solved the problem using machine learning to determine specific performance and reuse tasks. Second, we asked software engineers with different domain knowledge levels to provide a solution to the same tasks. Third, we compared how software engineers fare against machine-learning algorithms when accomplishing the performance and reuse tasks based on criteria such as energy consumption and safety. Finally, we analyzed the results to understand which tasks are better performed by either humans or algorithms so that they can work together more effectively. Such an understanding and the resulting human-in-the-loop approaches, which take into account the strengths and weaknesses of humans and machine-learning algorithms, are fundamental not only to provide a basis for cooperative work in support of software engineering, but also, in other areas.
Software Escalation Prediction with Data Mining One of the most severe manifestations of poor quality of software products occurs when a customer ‘escalates’ a defect: an escalation is triggered when a defect significantly impacts a customer’s operations. Escalated defects are then quickly resolved, at a high cost, outside of the general product release engineering cycle. While the software vendor and its customers often detect and report defects before they are escalated it is not always possible to quickly and accurately prioritize reported defects for resolution. As a result, even previously known defects, in addition to newly discovered defects, are often escalated by customers. Labor cost of escalations from known defects to a software vendor can amount to millions of dollars per year. The total costs to the vendor are even greater, including loss of reputation, satisfaction, loyalty, and repeat revenue. The objective of Escalation Prediction (EP) is to avoid escalations from known product defects by predicting and proactively resolving those known defects that have the highest escalation risk. This short paper outlines the business case for EP, an analysis of the business problem, the solution architecture, and some preliminary validation results on the effectiveness of EP.
SoK: Applying Machine Learning in Security – A Survey The idea of applying machine learning(ML) to solve problems in security domains is almost 3 decades old. As information and communications grow more ubiquitous and more data become available, many security risks arise as well as appetite to manage and mitigate such risks. Consequently, research on applying and designing ML algorithms and systems for security has grown fast, ranging from intrusion detection systems(IDS) and malware classification to security policy management(SPM) and information leak checking. In this paper, we systematically study the methods, algorithms, and system designs in academic publications from 2008-2015 that applied ML in security domains. 98 percent of the surveyed papers appeared in the 6 highest-ranked academic security conferences and 1 conference known for pioneering ML applications in security. We examine the generalized system designs, underlying assumptions, measurements, and use cases in active research. Our examinations lead to 1) a taxonomy on ML paradigms and security domains for future exploration and exploitation, and 2) an agenda detailing open and upcoming challenges. Based on our survey, we also suggest a point of view that treats security as a game theory problem instead of a batch-trained ML problem.
Solutions Big Data IBM (Slide Deck)
Solving Differential Equations in R Although R is still predominantly applied for statistical analysis and graphical representation, it is rapidly becoming more suitable for mathematical computing. One of the fields where considerable progress has been made recently is the solution of differential equations. Here we give a brief overview of differential equations that can now be solved by R.
Some Class-Participation Demonstations for Decision Theory and Bayesian Statistics
Some models and methods for the analysis of observational data This article provides a short, concise and essentially self-contained exposition of some of the most important models and methods for the analysis of observational data, and a substantial number of illustrations of their application. Although for the most part our presentation follows P. Rosenbaum´s book, ‘Observational Studies’, and naturally draws on related literature, it contains original elements and simplifies and generalizes some basic results. The illustrations, based on simulated data, show the methods at work in some detail, highlighting pitfalls and emphasizing certain subjective aspects of the statistical analyses.
Some techniques in density estimation Density estimation is an interdisciplinary topic at the intersection of statistics, theoretical computer science and machine learning. We review some old and new techniques for bounding sample complexity of estimating densities of continuous distributions, focusing on the class of mixtures of Gaussians and its subclasses.
Sorting with GPUs: A Survey Sorting is a fundamental operation in computer science and is a bottleneck in many important fields. Sorting is critical to database applications, online search and indexing,biomedical computing, and many other applications. The explosive growth in computational power and availability of GPU coprocessors has allowed sort operations on GPUs to be done much faster than any equivalently priced CPU. Current trends in GPU computing shows that this explosive growth in GPU capabilities is likely to continue for some time. As such, there is a need to develop algorithms to effectively harness the power of GPUs for crucial applications such as sorting.
Sparse Principal Component Analysis Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA suffers from the fact that each principal component is a linear combination of all the original variables, thus it is often difficult to interpret the results.We introduce a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings.We first show that PCA can be formulated as a regression-type optimization problem; sparse loadings are then obtained by imposing the lasso (elastic net) constraint on the regression coefficients. Efficient algorithms are proposed to fit our SPCA models for both regular multivariate data and gene expression arrays. We also give a new formula to compute the total variance of modified principal components. As illustrations, SPCA is applied to real and simulated data with encouraging results.
Spatial interpolation: Techniques for spatial data analysis (Slide Deck)
Spatio-temporal Action Recognition: A Survey The task of action recognition or action detection involves analyzing videos and determining what action or motion is being performed. The primary subject of these videos are predominantly humans performing some action. However, this requirement can be relaxed to generalize over other subjects such as animals or robots. The applications can range from anywhere between human-computer inter-action to automated video editing proposals. When we consider spatiotemporal action recognition, we deal with action localization. This task not only involves determining what action is being performed but also when and where itis being performed in said video. This paper aims to survey the plethora of approaches and algorithms attempted to solve this task, give a comprehensive comparison between them, explore various datasets available for the problem, and determine the most promising approaches.
Spatio-Temporal Clustering: A Survey Spatio-temporal clustering is a process of grouping objects based on their spatial and temporal similarity. It is relatively new subfield of data mining which gained high popularity especially in geographic information sciences due to the pervasiveness of all kinds of location-based or environmental devices that record position, time or/and environmental properties of an object or set of objects in realtime. As a consequence, different types and large amounts of spatio-temporal data became available that introduce new challenges to data analysis and require novel approaches to knowledge discovery. In this chapter we concentrate on the spatiotemporal clustering in geographic space. First, we provide a classification of different types of spatio-temporal data. Then, we focus on one type of spatio-temporal clustering – trajectory clustering, pprovide an overview of the state-of-the-art approaches and methods of spatio-temporal clustering and finally present several scenarios in different application domains such as movement, cellular networks and environmental studies.
Spatio-Temporal Data Mining: A Survey of Problems and Methods Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains including, climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differs from relational data for which computational approaches are developed in the data mining community for multiple decades, in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data mining community. In this article we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data mining problems in each of these categories.
Spectral Clustering and Block Models: A Review And A New Algorithm We focus on spectral clustering of unlabeled graphs and review some results on clustering methods which achieve weak or strong consistent identification in data generated by such models. We also present a new algorithm which appears to perform optimally both theoretically using asymptotic theory and empirically.
Spectral Theory of Unsigned and Signed Graphs. Applications to Graph Clustering: a Survey This is a survey of the method of graph cuts and its applications to graph clustering of weighted unsigned and signed graphs. I provide a fairly thorough treatment of the method of normalized graph cuts, a deeply original method due to Shi and Malik, including complete proofs. The main thrust of this paper is the method of normalized cuts. I give a detailed account for K = 2 clusters, and also for K > 2 clusters, based on the work of Yu and Shi. I also show how both graph drawing and normalized cut K-clustering can be easily generalized to handle signed graphs, which are weighted graphs in which the weight matrix W may have negative coefficients. Intuitively, negative coefficients indicate distance or dissimilarity. The solution is to replace the degree matrix by the matrix in which absolute values of the weights are used, and to replace the Laplacian by the Laplacian with the new degree matrix of absolute values. As far as I know, the generalization of K-way normalized clustering to signed graphs is new. Finally, I show how the method of ratio cuts, in which a cut is normalized by the size of the cluster rather than its volume, is just a special case of normalized cuts.
Spectrum Sharing for Internet of Things: A Survey The Internet of Things (IoT) is a promising paradigm to accommodate massive device connections in 5G and beyond. To pave the way for future IoT, the spectrum should be planed in advance. Spectrum sharing is a preferable solution for IoT due to the scarcity of available spectrum resource. In particular, mobile operators are inclined to exploit the existing standards and infrastructures of current cellular networks and deploy IoT within licensed cellular spectrum. Yet, proprietary companies prefer to deploy IoT within unlicensed spectrum to avoid any licence fee. In this paper, we provide a survey on prevalent IoT technologies deployed within licensed cellular spectrum and unlicensed spectrum. Notably, emphasis will be on the spectrum sharing solutions including the shared spectrum, interference model, and interference management. To this end, we discuss both advantages and disadvantages of different IoT technologies. Finally, we identify challenges for future IoT and suggest potential research directions.
SQL-on-Hadoop Engines Explained Big Data And Hadoop – Hadoop is being regarded as one of the best platforms for storing and managing big data. It owes its success to its high data storage and processing scalability, low price/performance ratio, high performance, high availability, high schema flexibility, and its capability to handle all types of data. Unfortunately, Hadoop APIs, such as HDFS, MapReduce, and HBase, are quite complex. They require expertise in Java programming (or similar languages) and require in‐depth knowledge of how to parallelize query processing efficiently. The downsides of these interfaces are a small target audience, low productivity, and limited tool support. The Need For SQL-on-Hadoop Engines – What is needed is a programming interface that retains HDFS´s performance and scalability, offers high productivity and maintainability, is known to non‐technical users, and can be used by many reporting and analytical tools. The obvious choice is evidently SQL. SQL is a highlevel, declarative, and standardized database language, it´s familiar to countless BI specialists, it´s supported by almost all reporting and analytical tools, and has proven its worth over and over again. To offer SQL on Hadoop, SQL query engines are needed that can query and manipulate data stored in HDFS or HBase. Such products are called SQL‐on‐Hadoop engines. Lately, the popularity of SQL‐on‐Hadoop engine is growing rapidly. Here are just a few of the many SQLon‐ Hadoop engines available: Apache Drill, Apache Hive, CitusDB, Cloudera Impala, Concurrent Lingual, Hadapt, HP Vertica, InfiniDB, JethroData, MemSQL, Pivotal HAWQ, Progress DataDirect, ScleraDB, Shark, and SpliceMachine. On the outside most of the SQL‐on‐Hadoop engines look alike. They all support some SQL‐dialect that can be invoked through ODBC or JDBC. Internally, they can be very different. The differences stem from the purpose for which they have been designed. Here are some potential use cases for which they may have been designed: • batch‐oriented query environment (data mining) • interactive query environment (OLAP, self‐service BI, data visualization) • point‐queries (retrieving and manipulating individual objects) • investigative analytics (data science) • operational intelligence (real‐time analytics) • transactional (production systems) Undesired Big Data Silos – Most Hadoop‐based systems have been designed and developed by organizations for one or two use cases. The workload characteristics of these use cases are usually massive data load and execution of non‐interactive, complex forms of analytics. However, Hadoop implementations can support other use cases, including interactive reporting, data stream processing, transactional processing, and text search. The growing availability of SQL‐on‐Hadoop engines has just widen the range of use cases of Hadoop even more. Unfortunately, when deployed for a different use case, a specific Hadoop implementation may be unsuitable with regard to functionality or performance. Development of another use case may force an organization to develop a second solution in which data is stored again. In the long run, this results in many data management platforms: each one designed and optimized to support a limited number of use cases. Finally, this leads to undesirable big data silos. The disadvantages of having big data silos are: high costs because of data duplication, high data latency, complex data replication solutions, and data quality problems. Silos may work well temporarily, but history has shown that eventually the users of these silos will want to combine data from multiple data sources. When this happens, each application is extended to access multiple data sources. This leads to a dedicated integration solution for each one of them. The result is another undesired solution: an integration labyrinth. For an organization it´s almost impossible to guarantee that all these integration solutions are correct, efficient, and lead to consistent results. The Need For One Data Management Platform – The ROI on all big data stored in Hadoop is increased when it´s made available for as wide a range of use cases as possible, including all the new use cases offered by the SQL‐on‐Hadoop engines. What is needed is one Hadoop data management platform that has been designed to support all the current and future use cases, so that the need for duplication of all that big data is minimized and that the development of big data silos and an integration labyrinth is avoided. The Whitepaper – This whitepaper explains what SQL‐on‐Hadoop engines are, what the technological challenges are, and what potential use cases of SQL‐on‐Hadoop are. Besides a high‐level comparison of several of these engines, it also contains a detailed description of Apache Drill that brings to light some of the pertinent issues in providing SQL capabilities on big data. In addition, the MapR Technologies data management platform M7 is also described as an example of a big data platform that can support many different use cases.
SQLScript: Efficiently Analyzing Big Enterprise Data in SAP HANA Today, not only Internet companies such as Google, Facebook or Twitter do have Big Data but also Enterprise Information Systems store an ever growing amount of data (called Big Enterprise Data in this paper). In a classical SAP system landscape a central data warehouse (SAP BW) is used to integrate and analyze all enterprise data. In SAP BW most of the business logic required for complex analytical tasks (e.g., a complex currency conversion) is implemented in the application layer on top of a standard relational database. While being independent from the underlying database when using such an architecture, this architecture has two major drawbacks when analyzing Big Enterprise Data: (1) algorithms in ABAP do not scale with the amount of data and (2) data shipping is required. To this end, we present a novel programming language called SQLScript to efficiently support complex and scalable analytical tasks inside SAP´s new main-memory database HANA. SQLScript provides two major extensions to the SQL dialect of SAP HANA: A functional and a procedural extension. While the functional extension allows the definition of scalable analytical tasks on Big Enterprise Data, the procedural extension provides imperative constructs to orchestrate the analytical tasks. The major contributions of this paper are two novel functional extensions: First, an extended version of the MapReduce programming model for supporting parallelizable user-defined functions (UDFs). Second, compared to recursion in the SQL standard, a generalized version of recursion to support graph analytics as well as machine learning tasks.
Stacked Graphs – Geometry and Aesthetics In February 2008, the New York Times published an unusual chart of box office revenues for 7500 movies over 21 years. The chart was based on a similar visualization, developed by the first author, that displayed trends in music listening. This paper describes the design decisions and algorithms behind these graphics, and discusses the reaction on the Web. We suggest that this type of complex layered graph is effective for displaying large data sets to a mass audience. We provide a mathematical analysis of how this layered graph relates to traditional stacked graphs and to techniques such as ThemeRiver, showing how each method is optimizing a different ‘energy function’. Finally, we discuss techniques for coloring and ordering the layers of such graphs. Throughout the paper, we emphasize the interplay between considerations of aesthetics and legibility.
Stan: A Probabilistic Programming Language Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively de nes a log probability function over parameters conditioned on speci ed data and constants. As of version 2.2.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line, through R using the RStan package, or through < Python using the PyStan package. All three interfaces support sampling or optimization-based inference and analysis, and RStan and PyStan also provide access to log probabilities, gradients, Hessians, and I/O transforms.
Standards in Predictive Analytics: The role of R, Hadoop and PMML in the mainstreaming of predictive analytics. Just a few years ago it was common to develop a predictive analytic model using a single proprietary tool against a sample of structured data. This would then be applied in batch, storing scores for future use in a database or data warehouse. Recently this model has been disrupted. There is a move to real-time scoring, calculating the value of predictive analytic models when they are needed rather than looking for them in a database. At the same time the variety of model execution platforms has expanded with in-database execution, columnar and inmemory databases as well as MapReduce-based execution becoming increasingly common. Modeling too has changed: the open source analytic modeling language R has become extremely popular, with up to 70% of analytic professionals using it at least occasionally. The range of data types being used in models has expanded along with the approaches used for storage. Modelers increasingly want to analyze all their data, not just a sample, to build a model. This increasingly complex and multi-vendor environment has increased the value of standards, both published standards and open source standards. In this paper we will explore the growing role of standards for predictive analytics in expanding the analytic ecosystem, handling Big Data and supporting the move to real-time scoring.
State Representation Learning for Control: An Overview Representation learning algorithms are designed to learn abstract features that characterize data. State representation learning (SRL) focuses on a particular kind of representation learning where learned features are in low dimension, evolve through time, and are influenced by actions of an agent. As the representation learned captures the variation in the environment generated by agents, this kind of representation is particularly suitable for robotics and control scenarios. In particular, the low dimension helps to overcome the curse of dimensionality, provides easier interpretation and utilization by humans and can help improve performance and speed in policy learning algorithms such as reinforcement learning. This survey aims at covering the state-of-the-art on state representation learning in the most recent years. It reviews different SRL methods that involve interaction with the environment, their implementations and their applications in robotics control tasks (simulated or real). In particular, it highlights how generic learning objectives are differently exploited in the reviewed algorithms. Finally, it discusses evaluation methods to assess the representation learned and summarizes current and future lines of research.
Static and Dynamic Robust PCA via Low-Rank + Sparse Matrix Decomposition: A Review Principal Components Analysis (PCA) is one of the most widely used dimension reduction techniques. Robust PCA (RPCA) refers to the problem of PCA when the data may be corrupted by outliers. Recent work by Candes, Wright, Li, and Ma defined RPCA as a problem of decomposing a given data matrix into the sum of a low-rank matrix (true data) and a sparse matrix (outliers). The column space of the low-rank matrix then gives the PCA solution. This simple definition has lead to a large amount of interesting new work on provably correct, fast, and practically useful solutions to the RPCA problem. More recently, the dynamic (time-varying) version of the RPCA problem has been studied and a series of provably correct, fast, and memory efficient tracking solutions have been proposed. Dynamic RPCA (or robust subspace tracking) is the problem of tracking data lying in a (slowly) changing subspace while being robust to sparse outliers. This article provides an exhaustive review of the last decade of literature on RPCA and its dynamic counterpart (robust subspace tracking), along with describing their theoretical guarantees, discussing the pros and cons of various approaches, and providing empirical comparisons of performance and speed.
Statistical Inference for SPDEs: an overview The aim of this work is to give an overview of the recent developments in the area of statistical inference for parabolic stochastic partial differential equations. Significant part of the paper is devoted to the spectral approach, which is the most studied sampling scheme under which the observations are done in the Fourier space over some finite time interval. We also discuss into details the practically important case of discrete sampling of the solution. Other relevant methodologies and some open problems are briefly discussed over the course of the manuscript.
Statistical inference on random dot product graphs: a survey The random dot product graph (RDPG) is an independent-edge random graph that is analytically tractable and, simultaneously, either encompasses or can successfully approximate a wide range of random graphs, from relatively simple stochastic block models to complex latent position graphs. In this survey paper, we describe a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices. We examine the analogues, in graph inference, of several canonical tenets of classical Euclidean inference: in particular, we summarize a body of existing results on the consistency and asymptotic normality of the adjacency and Laplacian spectral embeddings, and the role these spectral embeddings can play in the construction of single- and multi-sample hypothesis tests for graph data. We investigate several real-world applications, including community detection and classification in large social networks and the determination of functional and biologically relevant network properties from an exploratory data analysis of the Drosophila connectome. We outline requisite background and current open problems in spectral graph inference.
Statistical Inference: The Big Picture Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results I suggest that a philosophy compatible with statistical practice, labeled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mischaracterize the process of statistical inference and I propose an alternative ‘big picture’ depiction.
Statistical Learning and Kernel Methods (Slide Deck)
Statistical Learning Theory: Models, Concepts, and Results Statistical learning theory provides the theoretical basis for many of today’s machine learning algorithms and is arguably one of the most beautifully developed branches of arti cial intelligence in general. It originated in Russia in the 1960s and gained wide popularity in the 1990s following the development of the so-called Support Vector Machine (SVM), which has become a standard tool for pattern recognition in a variety of domains ranging from computer vision to computational biology. Providing the basis of new learning algorithms, however, was not the only motivation for developing statistical learning theory. It was just as much a philosophical one, attempting to answer the question of what it is that allows us to draw valid conclusions from empirical data. In this article we attempt to give a gentle, non-technical overview over the key ideas and insights of statistical learning theory. We do not assume that the reader has a deep background in mathematics, statistics, or computer science. Given the nature of the subject matter, however, some familiarity with mathematical concepts and notations and some intuitive understanding of basic probability is required. There exist many excellent references to more technical surveys of the mathematics of statistical learning theory: the monographs by one of the founders of statistical learning theory (Vapnik, 1995, Vapnik, 1998), a brief overview over statistical learning theory in Section 5 of Scholkopf and Smola (2002), more technical overview papers such as Bousquet et al. (2003), Mendelson (2003), Boucheron et al. (2005), Herbrich and Williamson (2002), and the monograph Devroye et al. (1996).
Statistical methods research done as science rather than mathematics This paper is about how we study statistical methods. As an example, it uses the random regressions model, in which the intercept and slope of cluster-specific regression lines are modeled as a bivariate random effect. Maximizing this model’s restricted likelihood often gives a boundary value for the random effect correlation or variances. We argue that this is a problem; that it is a problem because our discipline has little understanding of how contemporary models and methods map data to inferential summaries; that we lack such understanding, even for models as simple as this, because of a near-exclusive reliance on mathematics as a means of understanding; and that math alone is no longer sufficient. We then argue that as a discipline, we can and should break open our black-box methods by mimicking the five steps that molecular biologists commonly use to break open Nature’s black boxes: design a simple model system, formulate hypotheses using that system, test them in experiments on that system, iterate as needed to reformulate and test hypotheses, and finally test the results in an ‘in vivo’ system. We demonstrate this by identifying conditions under which the random-regressions restricted likelihood is likely to be maximized at a boundary value. Resistance to this approach seems to arise from a view that it lacks the certainty or intellectual heft of mathematics, perhaps because simulation experiments in our literature rarely do more than measure a new method’s operating characteristics in a small range of situations. We argue that such work can make useful contributions including, as in molecular biology, the findings themselves and sometimes the designs used in the five steps; that these contributions have as much practical value as mathematical results; and that therefore they merit publication as much as the mathematical results our discipline esteems so highly.
Statistical Model Selection with ‘Big Data’ Big Data offer potential benefits for statistical modelling, but confront problems like an excess of false positives, mistaking correlations for causes, ignoring sampling biases, and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem), using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem) while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem), using a viable approach that resolves the computational problem of immense numbers of possible models.
Statistical Modeling: The Two Cultures There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.
Statistical Software (R, SAS, SPSS, and Minitab) for Blind Students and Practitioners Access to information is crucial for the blind person’s success in education, but transferring knowledge about the existence of techniques into actually being able to complete those tasks is what will ultimately improve the blind person’s employment prospects. This paper is based on the experiences of the two authors; as blind academics in statistics, we are dependent on the usefulness of statistical software for blind users more than most blind people. The use of the \we’ throughout this article is intentionally meant to be personal in terms of our own experiences but more importantly, also re ects the needs of the blind community as a whole. Blind students often bene t from one-to-one teaching resources which can aid in their uptake of statistical thinking and practice, but this additional service is only a temporary solution. Once the student has completed their rst course in statistics, they may embark on research at a university, or head out into industry to apply their knowledge. Irrespective of the direction they choose, they will need certainty in being able to independently create graphs for the sighted readers of their work. At the 2009 Workshop on E-Inclusion in Mathematics and Sciences, the rst author was able to meet other researchers who are concerned about the low rate of blind people entering the sciences in a broad sense and the mathematical sciences in particular. Godfrey (2009) presents what we believe is the rst formalized presentation (written by a blind person) of the current state of a airs for blind people taking statistics courses. Much of the material covered in that work still holds true today, although there have been some technological changes that have altered the landscape a little. The four main considerations of Godfrey (2009) were graphics, software, statistical tables, and mathematical formulae. Although software was just one element discussed, graphics and mathematical formulae are playing an increasing role in the usefulness of statistical software, especially with respect to the accessibility of support documentation. We have reviewed four statistical software packages that blind people might want to use in their university education. Our review is restricted to the Windows operating system because this is the predominant environment in which blind people are working. Before we review R, SAS, SPSS, and Minitab, we outline our expectations of statistical software, describe a simple task used to evaluate some practical experiences, and describe some issues with certain le formats and graphics. Following the software-speci c sections there is a general discussion of pertinent issues for software developers, including the relevant details of the legislative environment in the United States of America. The article closes with a simpli ed set of criteria and our overall assessment of the current state of the usefulness of statistical software for blind users.
Statistical Thinking: An Approach to Management (Slide Deck)
Statistical Validity and Consistency of Big Data Analytics: A General Framework Informatics and technological advancements have triggered generation of huge volume of data with varied complexity in its management and analysis. Big Data analytics is the practice of revealing hidden aspects of such data and making inferences from it. Although storage, retrieval and management of Big Data seem possible through efficient algorithm and system development, concern about statistical consistency remains to be addressed in view of its specific characteristics. Since Big Data does not conform to standard analytics, we need proper modification of the existing statistical theory and tools. Here we propose, with illustrations, a general statistical framework and an algorithmic principle for Big Data analytics that ensure statistical accuracy of the conclusions. The proposed framework has the potential to push forward advancement of Big Data analytics in the right direction. The partition-repetition approach proposed here is broad enough to encompass all practical data analytic problems.
Stein´s Paradox in Statistics
STL: A seasonal Trend decomposition Procedure based on LOESS
Stochastic Global Optimization Algorithms: A Systematic Formal Approach As we know, some global optimization problems cannot be solved using analytic methods, so numeric/algorithmic approaches are used to find near to the optimal solutions for them. A stochastic global optimization algorithm (SGoal) is an iterative algorithm that generates a new population (a set of candidate solutions) from a previous population using stochastic operations. Although some research works have formalized SGoals using Markov kernels, such formalization is not general and sometimes is blurred. In this paper, we propose a comprehensive and systematic formal approach for studying SGoals. First, we present the required theory of probability (\sigma-algebras, measurable functions, kernel, markov chain, products, convergence and so on) and prove that some algorithmic functions like swapping and projection can be represented by kernels. Then, we introduce the notion of join-kernel as a way of characterizing the combination of stochastic methods. Next, we define the optimization space, a formal structure (a set with a \sigma-algebra that contains strict \epsilon-optimal states) for studying SGoals, and we develop kernels, like sort and permutation, on such structure. Finally, we present some popular SGoals in terms of the developed theory, we introduce sufficient conditions for convergence of a SGoal, and we prove convergence of some popular SGoals.
Stochastic Gradient Descent Tricks Chapter 1 strongly advocates the stochastic back-propagation method to train neural networks. This is in fact an instance of a more general technique called stochastic gradient descent (SGD). This chapter provides background material, explains why SGD is a good learning algorithm when the training set is large, and provides useful recommendations.
Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem. The reformulations are governed by two user-defined parameters: a positive definite matrix defining a norm, and an arbitrary discrete or continuous distribution over random matrices. Our reformulation has several equivalent interpretations, allowing for researchers from various communities to leverage their domain specific insights. In particular, our reformulation can be equivalently seen as a stochastic optimization problem, stochastic linear system, stochastic fixed point problem and a probabilistic intersection problem. We prove sufficient, and necessary and sufficient conditions for the reformulation to be exact. Further, we propose and analyze three stochastic algorithms for solving the reformulated problem—basic, parallel and accelerated methods—with global linear convergence rates. The rates can be interpreted as condition numbers of a matrix which depends on the system matrix and on the reformulation parameters. This gives rise to a new phenomenon which we call stochastic preconditioning, and which refers to the problem of finding parameters (matrix and distribution) leading to a sufficiently small condition number. Our basic method can be equivalently interpreted as stochastic gradient descent, stochastic Newton method, stochastic proximal point method, stochastic fixed point method, and stochastic projection method, with fixed stepsize (relaxation parameter), applied to the reformulations.
Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use A common problem in regression analysis is that of variable selection. Often, you have a large number of potential independent variables, and wish to select among them, perhaps to create a ‘best´ model. One common method of dealing with this problem is some form of automated procedure, such as forward, backward, or stepwise selection. We show that these methods are not to be recommended, and present better alternatives using PROC GLMSELECT and other methods.
Strategyproof Linear Regression in High Dimensions This paper is part of an emerging line of work at the intersection of machine learning and mechanism design, which aims to avoid noise in training data by correctly aligning the incentives of data sources. Specifically, we focus on the ubiquitous problem of linear regression, where strategyproof mechanisms have previously been identified in two dimensions. In our setting, agents have single-peaked preferences and can manipulate only their response variables. Our main contribution is the discovery of a family of group strategyproof linear regression mechanisms in any number of dimensions, which we call generalized resistant hyperplane mechanisms. The game-theoretic properties of these mechanisms — and, in fact, their very existence — are established through a connection to a discrete version of the Ham Sandwich Theorem.
Strengths and Weaknesses of Weak-Strong Cluster Problems: A Detailed Overview of State-of-the-art Classical Heuristics vs Quantum Approaches To date, a conclusive detection of quantum speedup remains elusive. Recently, a team by Google Inc. [arXiv:1512.02206] proposed a weak-strong cluster model tailored to have tall and narrow energy barriers separating local minima, with the aim to highlight the value of finite-range tunneling. More precisely, results from quantum Monte Carlo simulations, as well as the D-Wave 2X quantum annealer scale considerably better than state-of-the-art simulated annealing simulations. Moreover, the D-Wave 2X quantum annealer is $\sim 10^8$ times faster than simulated annealing on conventional computer hardware for problems with approximately $10^3$ variables. Here, an overview of different sequential, nontailored, as well as specialized tailored algorithms on the Google instances is given. We show that the quantum speedup is limited to sequential approaches and study the typical complexity of the benchmark problems using insights from the study of spin glasses.
Structural Equation Models Structural equation models (SEMs), also called simultaneous equation models, are multivariate (i.e., multiequation) regression models. Unlike the more traditional multivariate linear model, however, the response variable in one regression equation in an SEM may appear as a predictor in another equation; indeed, variables in an SEM may influence one-another reciprocally, either directly or through other variables as intermediaries. These structural equations are meant to represent causal relationships among the variables in the model. …
Structural Intervention Distance (SID) for Evaluating Causal Graphs Causal inference relies on the structure of a graph, often a directed acyclic graph (DAG). Di erent graphs may result in di erent causal inference statements and di erent intervention distributions. To quantify such differences, we propose a (pre-) distance between DAGs, the structural intervention distance (SID). The SID is based on a graphical criterion only and quanti es the closeness between two DAGs in terms of their corresponding causal inference statements. It is therefore well-suited for evaluating graphs that are used for computing interventions. Instead of DAGs it is also possible to compare CPDAGs, completed partially directed acyclic graphs that represent Markov equivalence classes. Since it di ers significantly from the popular Structural Hamming Distance (SHD), the SID constitutes a valuable additional measure.
Structure Learning of Probabilistic Graphical Models: A Comprehensive Survey Probabilistic graphical models combine the graph theory and probability theory to give a multivariate statistical modeling. They provide a unified description of uncertainty using probability and complexity using the graphical model. Especially, graphical models provide the following several useful properties: • Graphical models provide a simple and intuitive interpretation of the structures of probabilistic models. On the other hand, they can be used to design and motivate new models. • Graphical models provide additional insights into the properties of the model, including the conditional independence properties. • Complex computations which are required to perform inference and learning in sophisticated models can be expressed in terms of graphical manipulations, in which the underlying mathematical expressions are carried along implicitly. The graphical models have been applied to a large number of fields, including bioinformatics, social science, control theory, image processing, marketing analysis, among others. However, structure learning for graphical models remains an open challenge, since one must cope with a combinatorial search over the space of all possible structures. In this paper, we present a comprehensive survey of the existing structure learning algorithms.
Structured Low-Rank Algorithms: Theory, MR Applications, and Links to Machine Learning In this survey, we provide a detailed review of recent advances in the recovery of continuous domain multidimensional signals from their few non-uniform (multichannel) measurements using structured low-rank matrix completion formulation. This framework is centered on the fundamental duality between the compactness (e.g., sparsity) of the continuous signal and the rank of a structured matrix, whose entries are functions of the signal. This property enables the reformulation of the signal recovery as a low-rank structured matrix completion, which comes with performance guarantees. We will also review fast algorithms that are comparable in complexity to current compressed sensing methods, which enables the application of the framework to large-scale magnetic resonance (MR) recovery problems. The remarkable flexibility of the formulation can be used to exploit signal properties that are difficult to capture by current sparse and low-rank optimization strategies. We demonstrate the utility of the framework in a wide range of MR imaging (MRI) applications, including highly accelerated imaging, calibration-free acquisition, MR artifact correction, and ungated dynamic MRI.
Structured Low-Rank Matrix Factorization: Global Optimality, Algorithms, and Applications Recently, convex formulations of low-rank matrix factorization problems have received considerable attention in machine learning. However, such formulations often require solving for a matrix of the size of the data matrix, making it challenging to apply them to large scale datasets. Moreover, in many applications the data can display structures beyond simply being low-rank, e.g., images and videos present complex spatio-temporal structures that are largely ignored by standard low-rank methods. In this paper we study a matrix factorization technique that is suitable for large datasets and captures additional structure in the factors by using a particular form of regularization that includes well-known regularizers such as total variation and the nuclear norm as particular cases. Although the resulting optimization problem is non-convex, we show that if the size of the factors is large enough, under certain conditions, any local minimizer for the factors yields a global minimizer. A few practical algorithms are also provided to solve the matrix factorization problem, and bounds on the distance from a given approximate solution of the optimization problem to the global optimum are derived. Examples in neural calcium imaging video segmentation and hyperspectral compressed recovery show the advantages of our approach on high-dimensional datasets.
Stupid Data Miner Tricks: Overfitting the S and P 500 It wasn´t too long ago that calling someone a data miner was a very bad thing. You could start a fistfight at a convention of statisticians with this kind of talk. It meant that you were finding the analytical equivalent of the bunnies in the clouds, poring over data until you found something. Everyone knew that if you did enough poring, you were bound to find that bunny sooner or later, but it was no more real than the one that blows over the horizon. Now, data mining is a small industry, with entire companies devoted to it. There are academic conferences devoted solely to data mining. The phrase no longer elicits as many invitations to step into the parking lot as it used to. What´s going on These new data mining people are not fools. Sometimes data mining makes sense, and sometimes it doesn´t. …
Subjectivity Learning Theory towards Artificial General Intelligence The construction of artificial general intelligence (AGI) was a long-term goal of AI research aiming to deal with the complex data in the real world and make reasonable judgments in various cases like a human. However, the current AI creations, referred to as ‘Narrow AI’, are limited to a specific problem. The constraints come from two basic assumptions of data, which are independent and identical distributed samples and single-valued mapping between inputs and outputs. We completely break these constraints and develop the subjectivity learning theory for general intelligence. We assign the mathematical meaning for the philosophical concept of subjectivity and build the data representation of general intelligence. Under the subjectivity representation, then the global risk is constructed as the new learning goal. We prove that subjectivity learning holds a lower risk bound than traditional machine learning. Moreover, we propose the principle of empirical global risk minimization (EGRM) as the subjectivity learning process in practice, establish the condition of consistency, and present triple variables for controlling the total risk bound. The subjectivity learning is a novel learning theory for unconstrained real data and provides a path to develop AGI.
Summarizing large text collection using topic modeling and clustering based on MapReduce framework. Document summarization provides an instrument for faster understanding the collection of text documents and has a number of real life applications. Semantic similarity and clustering can be utilized efficiently for generating effective summary of large text collections. Summarizing large volume of text is a challenging and time consuming problem particularly while considering the semantic similarity computation in summarization process. Summarization of text collection involves intensive text processing and computations to generate the summary. MapReduce is proven state of art technology for handling Big Data. In this paper, a novel framework based on MapReduce technology is proposed for summarizing large text collection. The proposed technique is designed using semantic similarity based clustering and topic modeling using Latent Dirichlet Allocation (LDA) for summarizing the large text collection over MapReduce framework. The summarization task is performed in four stages and provides a modular implementation of multiple documents summarization. The presented technique is evaluated in terms of scalability and various text summarization parameters namely, compression ratio, retention ratio, ROUGE and Pyramid score are also measured. The advantages of MapReduce framework are clearly visible from the experiments and it is also demonstrated that MapReduce provides a faster implementation of summarizing large text collections and is a powerful tool in Big Text Data analysis.
Supervised Classification: Quite a Brief Overview The original problem of supervised classification considers the task of automatically assigning objects to their respective classes on the basis of numerical measurements derived from these objects. Classifiers are the tools that implement the actual functional mapping from these measurements—also called features or inputs—to the so-called class label—or output. The fields of pattern recognition and machine learning study ways of constructing such classifiers. The main idea behind supervised methods is that of learning from examples: given a number of example input-output relations, to what extent can the general mapping be learned that takes any new and unseen feature vector to its correct class This chapter provides a basic introduction to the underlying ideas of how to come to a supervised classification problem. In addition, it provides an overview of some specific classification techniques, delves into the issues of object representation and classifier evaluation, and (very) briefly covers some variations on the basic supervised classification task that may also be of interest to the practitioner.
Supervised Speech Separation Based on Deep Learning: An Overview Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This article provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multi-talker separation), and speech dereverberation, as well as multi-microphone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.
Support vs Confidence in Association Rule Algorithms The discovery of interesting association relationships among large amounts of business transactions is currently vital for making appropriate business decisions. There are currently a variety of algorithms to discover association rules. Some of these algorithms depend on the use of minimum support to weed out the uninteresting rules. Other algorithms look for highly correlated items, that is, rules with high confidence. In this paper we present a description of these types of association rule algorithms and a comparison of two algorithms representative of these approaches, with the aim of understanding the pros and cons of the support- and confidence-based approaches.
Survey of Clustering Data Mining Techniques Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique computational requirements on relevant clustering algorithms. A variety of algorithms have recently emerged that meet these requirements and were successfully applied to real-life data mining problems. They are subject of the survey.
Survey of Consistent Network Updates Computer networks have become a critical infrastructure. Designing dependable computer networks however is challenging, as such networks should not only meet strict requirements in terms of correctness, availability, and performance, but they should also be flexible enough to support fast updates, e.g., due to a change in the security policy, an increasing traffic demand, or a failure. The advent of Software-Defined Networks (SDNs) promises to provide such flexiblities, allowing to update networks in a fine-grained manner, also enabling a more online traffic engineering. In this paper, we present a structured survey of mechanisms and protocols to update computer networks in a fast and consistent manner. In particular, we identify and discuss the different desirable update consistency properties a network should provide, the algorithmic techniques which are needed to meet these consistency properties, their implications on the speed and costs at which updates can be performed. We also discuss the relationship of consistent network update problems to classic algorithmic optimization problems. While our survey is mainly motivated by the advent of Software-Defined Networks (SDNs), the fundamental underlying problems are not new, and we also provide a historical perspective of the subject.
Survey of Distributed Decision We survey the recent distributed computing literature on checking whether a given distributed system configuration satisfies a given boolean predicate, i.e., whether the configuration is legal or illegal w.r.t. that predicate. We consider classical distributed computing environments, including mostly synchronous fault-free network computing (LOCAL and CONGEST models), but also asynchronous crash-prone shared-memory computing (WAIT-FREE model), and mobile computing (FSYNC model).
Survey of Expressivity in Deep Neural Networks We survey results on neural network expressivity described in ‘On the Expressive Power of Deep Neural Networks’. The paper motivates and develops three natural measures of expressiveness, which all display an exponential dependence on the depth of the network. In fact, all of these measures are related to a fourth quantity, trajectory length. This quantity grows exponentially in the depth of the network, and is responsible for the depth sensitivity observed. These results translate to consequences for networks during and after training. They suggest that parameters earlier in a network have greater influence on its expressive power — in particular, given a layer, its influence on expressivity is determined by the remaining depth of the network after that layer. This is verified with experiments on MNIST and CIFAR-10. We also explore the effect of training on the input-output map, and find that it trades off between the stability and expressivity.
Survey of Graph Analysis Applications Recently, many systems for graph analysis have been developed to address the growing needs of both industry and academia to study complex graphs. Insight into the practical uses of graph analysis will allow future developments of such systems to optimize for real-world usage, instead of targeting single use cases or hypothetical workloads. This insight may be derived from surveys on the applications of graph analysis. However, existing surveys are limited in the variety of application domains, datasets, and/or graph analysis techniques they study. In this work we present and apply a systematic method for identifying practical use cases of graph analysis. We identify commonly used graph features and analysis methods and use our findings to construct a taxonomy of graph analysis applications. We conclude that practical use cases of graph analysis cover a diverse set of graph features and analysis methods. Furthermore, most applications combine multiple features and methods. Our findings motivate further development of graph analysis systems to support a broader set of applications and to facilitate the combination of multiple analysis methods in an (interactive) workflow.
Survey of Keyword Extraction Techniques Keywords are commonly used for search engines and document databases to locate information and determine if two pieces of test are related to each other. Reading and summarizing the contents of large entries of text into a small set of topics is difficult and time consuming for a human, so much so that it becomes nearly impossible to accomplish with limited manpower as the size of the information grows. As a result, automated systems are being more commonly used to do this task. This problem is challenging due to the intricate complexities of natural lan- guage, as well as the inherent difficulty in determining if a word or set of words accurately represent topics present within the text. With the advent of the internet, there is now both a massive amount of information available, as well as a demand to be able to search through all of this information. Keyword extraction from text data is a common tool used by search engines and indexes alike to quickly categorize and locate specific data based on explicitly or implicitly supplied keywords.
Survey of Recent Advances in Visual Question Answering Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs – in terms of image processing and natural language processing. The algorithm further needs to learn how to perform reasoning over this multi-modal representation so it can answer the questions correctly. This paper presents a survey of different approaches proposed to solve the problem of Visual Question Answering. We also describe the current state of the art model in later part of paper. In particular, the paper describes the approaches taken by various algorithms to extract image features, text features and the way these are employed to predict answers. We also briefly discuss the experiments performed to evaluate the VQA models and report their performances on diverse datasets including newly released VQA2.0[8].
Survey of Visual Question Answering: Datasets and Techniques Visual question answering (or VQA) is a new and exciting problem that combines natural language processing and computer vision techniques. We present a survey of the various datasets and models that have been used to tackle this task. The first part of the survey details the various datasets for VQA and compares them along some common factors. The second part of this survey details the different approaches for VQA, classified into four types: non-deep learning models, deep learning models without attention, deep learning models with attention, and other models which do not fit into the first three. Finally, we compare the performances of these approaches and provide some directions for future work.
Survey on Evaluation Methods for Dialogue Systems In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class.
Survey on Feature Selection Feature selection plays an important role in the data mining process. It is needed to deal with the excessive number of features, which can become a computational burden on the learning algorithms. It is also necessary, even when computational resources are not scarce, since it improves the accuracy of the machine learning tasks, as we will see in the upcoming sections. In this review, we discuss the different feature selection approaches, and the relation between them and the various machine learning algorithms.
Survey on Models and Techniques for Root-Cause Analysis Automation and computer intelligence to support complex human decisions becomes essential to manage large and distributed systems in the Cloud and IoT era. Understanding the root cause of an observed symptom in a complex system has been a major problem for decades. As industry dives into the IoT world and the amount of data generated per year grows at an amazing speed, an important question is how to find appropriate mechanisms to determine root causes that can handle huge amounts of data or may provide valuable feedback in real-time. While many survey papers aim at summarizing the landscape of techniques for modelling system behavior and infering the root cause of a problem based in the resulting models, none of those focuses on analyzing how the different techniques in the literature fit growing requirements in terms of performance and scalability. In this survey, we provide a review of root-cause analysis, focusing on these particular aspects. We also provide guidance to choose the best root-cause analysis strategy depending on the requirements of a particular system and application.
Survival Analysis in R This document is intended to assist an individual who has familiarity with R and who is taking a survival analysis course. Specifically, this was constructed for a biostatistics course at UCLA. Many theoretical details have been intentionally omitted for brevity; it is assumed the reader is familiar with the theory of the topics presented. Likewise, it is assumed the reader has basic understanding of R including working with data frames, vectors, matrices, plotting, and linear model fitting and interpretation. Functions that are introduced will only have the key arguments mentioned and discussed. Most functions have several other (optional) arguments, however, many of these will not be useful for an introductory course. The functions, with the exception of those I wrote, have well-written descriptions that specify each of the potential arguments and their use. The functions I have written include documentation on the following web site: <Sustainability in the Age of Big Data Big data and climate change share one important characteristic: Both are changing the course of history. Carbon dioxide levels have not been this high in 800,000 years, and the amount of data being generated today is unprecedented. The question at the recent Wharton conference on ‘Sustainability in the Age of Big Data’ was how rapidly advancing information technologies can be brought together to forestall the worst ravages of global climate change. As Gary Survis, CMO of Big Data company Syncsort, IGEL senior fellow and conference moderator, noted, ‘It is rare that there is a confluence of two seismic events as transformative as climate change and big data. It presents amazing opportunities, as well as responsibilities.’ Coming to terms with the scope of big data is a challenge, but the promise is enormous. Big data has the potential to revolutionize the two industries that generate the most carbon dioxide – energy and agriculture. Machine-to-machine communication can help reduce energy demands and increase the viability of renewable power sources. On farms, data from the molecular level may help give rise to a new green revolution, and sensors in satellites, farmland, trucks and grocery stores promise to reduce waste industry-wide. Important questions remain. Can big data be used to influence people´s behavior without manipulating them Can private enterprise capitalize on big data´s possibilities without riding roughshod over the rights of those who generate the data And can the high-tech innovations already underway in the developed world help solve the problems of those most in need How well we answer these questions will determine whether we can realize the historic potential of ‘Sustainability in the Age of Big Data.’
Swarm Intelligence in Semi-supervised Classification This Paper represents a literature review of Swarm intelligence algorithm in the area of semi-supervised classification. There are many research papers for applying swarm intelligence algorithms in the area of machine learning. Some algorithms of SI are applied in the area of ML either solely or hybrid with other ML algorithms. SI algorithms are also used for tuning parameters of ML algorithm, or as a backbone for ML algorithms. This paper introduces a brief literature review for applying swarm intelligence algorithms in the field of semi-supervised learning
Swarm Intelligence: Past, Present and Future Many optimization problems in science and engineering are challenging to solve, and the current trend is to use swarm intelligence (SI) and SI-based algorithms to tackle such challenging problems. Some significant developments have been made in recent years, though there are still many open problems in this area. This paper provides a short but timely analysis about SI-based algorithms and their links with self-organization. Different characteristics and properties are analyzed here from both mathematical and qualitative perspectives. Future research directions are outlined and open questions are also highlighted.
Symbolic Calculus in Mathematical Statistics: A Review In the last ten years, the employment of symbolic methods has substantially extended both the theory and the applications of statistics and probability. This survey reviews the development of a symbolic technique arising from classical umbral calculus, as introduced by Rota and Taylor in $1994.$ The usefulness of this symbolic technique is twofold. The first is to show how new algebraic identities drive in discovering insights among topics apparently very far from each other and related to probability and statistics. One of the main tools is a formal generalization of the convolution of identical probability distributions, which allows us to employ compound Poisson random variables in various topics that are only somewhat interrelated. Having got a different and deeper viewpoint, the second goal is to show how to set up algorithmic processes performing efficiently algebraic calculations. In particular, the challenge of finding these symbolic procedures should lead to a new method, and it poses new problems involving both computational and conceptual issues. Evidence of efficiency in applying this symbolic method will be shown within statistical inference, parameter estimation, L\’evy processes, and, more generally, problems involving multivariate functions. The symbolic representation of Sheffer polynomial sequences allows us to carry out a unifying theory of classical, Boolean and free cumulants. Recent connections within random matrices have extended the applications of the symbolic method.
Symbolic Data Analysis: A Paradigm for Complex Data Mining Standard data mining techniques no longer adequately represent the complexity of the world. So, a new paradigm is necessary. Symbolic Data Analysis is a new type of data analysis that allows us to represent the complexity of reality, maintaining the internal variation and structure developed by Diday (2003). This new paradigm is based on the concept of symbolic object, which is a mathematical model of a concept. In this article the authors are going to present the fundamentals of the symbolic data analysis paradigm and the symbolic object concept. Theoretical aspects and examples allow the authors to understand the SDA paradigm as a tool for mining complex data.
Symbolic Data Analysis: Definitions and Examples With the advent of computers, large, very large datasets have become routine. What is not so routine is how to analyse these data and/or how to glean useful information from within their massive confines. One approach is to summarize large data sets in such a way that the resulting summary dataset is of a manageable size. One consequence of this is that the data may no longer be formatted as single values such as is the case for classical data, but may be represented by lists, intervals, distributions and the like. These summarized data are examples of symbolic data. This paper looks at the concept of symbolic data in general, and then attempts to review the methods currently available to analyse such data. It quickly becomes clear that the range of methodologies available draws analogies with developments prior to 1900 which formed a foundation for the inferential statistics of the 1900’s, methods that are largely limited to small (by comparison) data sets and limited to classical data formats. The scarcity of available methodologies for symbolic data also becomes clear and so draws attention to an enormous need for the development of a vast catalogue (so to speak) of new symbolic methodologies along with rigorous mathematical foundational work for these methods.
Symbolic, Distributed and Distributional Representations for Natural Language Processing in the Era of Deep Learning: a Survey Natural language and symbols are intimately correlated. Recent advances in machine learning (ML) and in natural language processing (NLP) seem to contradict the above intuition: symbols are fading away, erased by vectors or tensors called distributed and distributional representations. However, there is a strict link between distributed/distributional representations and symbols, being the first an approximation of the second. A clearer understanding of the strict link between distributed/distributional representations and symbols will certainly lead to radically new deep learning networks. In this paper we make a survey that aims to draw the link between symbolic representations and distributed/distributional representations. This is the right time to revitalize the area of interpreting how symbols are represented inside neural networks.
SynopSys: Large Graph Analytics in the SAP HANA Database Through Summarization Graph-structured data is ubiquitous and with the advent of social networking platforms has recently seen a significant increase in popularity amongst researchers. However, also many business applications deal with this kind of data and can therefore benefit greatly from graph processing functionality offered directly by the underlying database. This paper summarizes the current state of graph data processing capabilities in the SAP HANA database and describes our efforts to enable large graph analytics in the context of our research project SynopSys. With powerful graph pattern matching support at the core, we envision OLAP-like evaluation functionality exposed to the user in the form of easy-to-apply graph summarization templates. By combining them, the user is able to produce concise summaries of large graph-structured datasets. We also point out open questions and challenges that we plan to tackle in the future developments on our way towards large graph analytics.
Synthetic Knowing: The Politics of the Internet of Things All knowing is material. The challenge for Information Systems (IS) research is to specify how knowing is material by drawing on theoretical characterizations of the digital. Synthetic knowing is knowing informed by theorizing digital materiality. We focus on two defining qualities: liquefaction (unhinging digital representations from physical objects, qualities, or processes) and open-endedness (extendable and generative). The Internet of Things (IoT) is crucial because sensors are vehicles of liquefaction. Their expanding scope for real-time seeing, hearing, tasting, smelling, and touching increasingly mimics phenomenologically perceived reality. Empirically, we present a longitudinal case study of IoT-rendered marine environmental monitoring by an oil and gas company operating in the politically contested Arctic. We characterize synthetic knowing into four concepts, the former three tied to liquefaction and the latter to open-endedness: (i) the objects of knowing are algorithmic phenomena; (ii) the sensors increasingly conjure up phenomenological reality; (iii) knowing is scoped (configurable); and (iv) open knowing/data is politically charged.

T

Taking Human out of Learning Applications: A Survey on Automated Machine Learning Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursuit good learning performance, human experts are heavily engaged in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automatic machine learning~(AutoML) has emerged as a hot topic of both in industry and academy. In this paper, we provide a survey on existing AutoML works. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers almost all existing approaches but also guides the design for new methods. Afterward, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future researches.
Talking with Robots: Opportunities and Challenges Notwithstanding the tremendous progress that is taking place in spoken language technology, effective speech-based human-robot interaction still raises a number of important challenges. Not only do the fields of robotics and spoken language technology present their own special problems, but their combination raises an additional set of issues. In particular, there is a large gap between the formulaic speech that typifies contemporary spoken dialogue systems and the flexible nature of human-human conversation. It is pointed out that grounded and situated speech-based human-robot interaction may lead to deeper insights into the pragmatics of language usage, thereby overcoming the current `habitability gap’.
Taxonomy of Big Data: A Survey The Big Data is the most popular paradigm nowadays and it has almost no untouched area. For instance, science, engineering, economics, business, social science, and government. The Big Data are used to boost up the organization performance using massive amount of dataset. The Data are assets of the organization, and these data gives revenue to the organizations. Therefore, the Big Data is spawning everywhere to enhance the organizations’ revenue. Thus, many new technologies emerging based on Big Data. In this paper, we present the taxonomy of Big Data. Besides, we present in-depth insight on the Big Data paradigm.
Teaching Machines to Read and Comprehend Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.
Technical Report: On the Usability of Hadoop MapReduce, Apache Spark and Apache Flink for Data Science Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of advanced dataflow oriented platforms, most prominently Apache Spark and Apache Flink. Those platforms not only aim to improve performance through improved in-memory processing, but in particular provide built-in high-level data processing functionality, such as filtering and join operators, which should make data analysis tasks easier to develop than with plain Hadoop MapReduce. But is this indeed the case This paper compares three prominent distributed data processing platforms: Apache Hadoop MapReduce; Apache Spark; and Apache Flink, from a usability perspective. We report on the design, execution and results of a usability study with a cohort of masters students, who were learning and working with all three platforms in order to solve different use cases set in a data science context. Our findings show that Spark and Flink are preferred platforms over MapReduce. Among participants, there was no significant difference in perceived preference or development time between both Spark and Flink as platforms for batch-oriented big data analysis. This study starts an exploration of the factors that make big data platforms more – or less – effective for users in data science.
Techniques for Interpretable Machine Learning Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these classifiers arrive at a particular decision. Although many approaches have been proposed, a comprehensive understanding of the achievements and challenges is still lacking. This paper provides a survey covering existing techniques and methods to increase the interpretability of machine learning models and also discusses the crucial issues to consider in future work such as interpretation design principles and evaluation metrics in order to push forward the area of interpretable machine learning.
Temporal anomaly detection: calibrating the surprise We propose a hybrid approach to temporal anomaly detection in user-database access data — or more generally, any kind of subject-object co-occurrence data. Our methodology allows identifying anomalies based on a single stationary model, instead of requiring a full temporal one, which would be prohibitive in our setting. We learn our low-rank stationary model from the high-dimensional training data, and then fit a regression model for predicting the expected likelihood score of normal access patterns in the future. The disparity between the predicted and the observed likelihood scores is used to assess the ‘surprise’. This approach enables calibration of the anomaly score so that time-varying normal behavior patterns are not considered anomalous. We provide a detailed description of the algorithm, including a convergence analysis, and report encouraging empirical results. One of the datasets we tested is new for the public domain. It consists of two months’ worth of database access records from a live system. This dataset will be made publicly available, and is provided in the supplementary material.
Temporal Data Mining: An Overview To classify data mining problems and algorithms we used two dimensions: data type and type of mining operations. One of the main issue that arise during the data mining process is treating data that contains temporal information. The area of temporal data mining has very much attention in the last decade because from the time related feature of the data, one can extract much significant information which can not be extracted by the general methods of data mining. Many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as databases, statistics and machine learning the literature is scattered among many different sources. In this paper, we present a survey on techniques of temporal data mining.
Ten Simple Rules for Reproducible Computational Research Replication is the cornerstone of a cumulative science. However, new tools and technologies, massive amounts of data, interdisciplinary approaches, and the complexity of the questions being asked are complicating replication efforts, as are increased pressures on scientists to advance their research. As full replication of studies on independently collected data is often not feasible, there has recently been a call for reproducible research as an attainable minimum standard for assessing the value of scientific claims. This requires that papers in experimental science describe the results and provide a sufficiently clear protocol to allow successful repetition and extension of analyses based on original data. The importance of replication and reproducibility has recently been exemplified through studies showing that scientific papers commonly leave out experimental details essential for reproduction, studies showing difficulties with replicating published experimental results, an increase in retracted papers, and through a high number of failing clinical trials. This has led to discussions on how individual researchers, institutions, funding bodies, and journals can establish routines that increase transparency and reproducibility. In order to foster such aspects, it has been suggested that the scientific community needs to develop a ‘‘culture of reproducibility´´ for computational science, and to require it for published claims. We want to emphasize that reproducibility is not only a moral responsibility with respect to the scientific field, but that a lack of reproducibility can also be a burden for you as an individual researcher. As an example, a good practice of reproducibility is necessary in order to allow previously developed methodology to be effectively applied on new data, or to allow reuse of code and results for new projects. In other words, good habits of reproducibility may actually turn out to be a time-saver in the longer run. We further note that reproducibility is just as much about the habits that ensure reproducible research as the technologies that can make these processes efficient and realistic. Each of the following ten rules captures a specific aspect of reproducibility, and discusses what is needed in terms of information handling and tracking of procedures. If you are taking a bare-bones approach to bioinformatics analysis, i.e., running various custom scripts from the command line, you will probably need to handle each rule explicitly. If you are instead performing your analyses through an integrated framework (such as Gene- Pattern, Galaxy, LONI pipeline, or Taverna), the system may already provide full or partial support for most of the rules. What is needed on your part is then merely the knowledge of how to exploit these existing possibilities. In a pragmatic setting, with publication pressure and deadlines, one may face the need to make a trade-off between the ideals of reproducibility and the need to get the research out while it is still relevant. This trade-off becomes more important when considering that a large part of the analyses being tried out never end up yielding any results. However, frequently one will, with the wisdom of hindsight, contemplate the missed opportunity to ensure reproducibility, as it may already be too late to take the necessary notes from memory (or at least much more difficult than to do it while underway). We believe that the rewards of reproducibility will compensate for the risk of having spent valuable time developing an annotated catalog of analyses that turned out as blind alleys. As a minimal requirement, you should at least be able to reproduce the results yourself. This would satisfy the most basic requirements of sound research, allowing any substantial future questioning of the research to be met with a precise explanation. Although it may sound like a very weak requirement, even this level of reproducibility will often require a certain level of care in order to be met. There will for a given analysis be an exponential number of possible combinations of software versions, parameter values, preprocessing steps, and so on, meaning that a failure to take notes may make exact reproduction essentially impossible. With this basic level of reproducibility in place, there is much more that can be wished for. An obvious extension is to go from a level where you can reproduce results in case of a critical situation to a level where you can practically and routinely reuse your previous work and increase your productivity. A second extension is to ensure that peers have a practical possibility of reproducing your results, which can lead to increased trust in, interest for, and citations of your work. We here present ten simple rules for reproducibility of computational research. These rules can be at your disposal for whenever you want to make your research more accessible – be it for peers or for your future self.
Tensor Completion Algorithms in Big Data Analytics Tensor completion is a problem of filling the missing or unobserved entries of partially observed tensors. Due to the multidimensional character of tensors in describing complex datasets, tensor completion algorithms and their applications have received wide attention and achievement in data mining, computer vision, signal processing, and neuroscience, etc. In this survey, we provide a modern overview of recent advances in tensor completion algorithms from the perspective of big data analytics characterized by diverse variety, large volume, and high velocity. Towards a better comprehension and comparison of vast existing advances, we summarize and categorize them into four groups including general tensor completion algorithms, tensor completion with auxiliary information (variety), scalable tensor completion algorithms (volume) and dynamic tensor completion algorithms (velocity). Besides, we introduce their applications on real-world data-driven problems and present an open-source package covering several widely used tensor decomposition and completion algorithms. Our goal is to summarize these popular methods and introduce them to researchers for promoting the research process in this field and give an available repository for practitioners. In the end, we also discuss some challenges and promising research directions in this community for future explorations.
Tensor Networks in a Nutshell Tensor network methods are taking a central role in modern quantum physics and beyond. They can provide an efficient approximation to certain classes of quantum states, and the associated graphical language makes it easy to describe and pictorially reason about quantum circuits, channels, protocols, open systems and more. Our goal is to explain tensor networks and some associated methods as quickly and as painlessly as possible. Beginning with the key definitions, the graphical tensor network language is presented through examples. We then provide an introduction to matrix product states. We conclude the tutorial with tensor contractions evaluating combinatorial counting problems. The first one counts the number of solutions for Boolean formulae, whereas the second is Penrose’s tensor contraction algorithm, returning the number of $3$-edge-colorings of $3$-regular planar graphs.
Tensors Come of Age: Why the AI Revolution will help HPC This article discusses how the automation of tensor algorithms, based on A Mathematics of Arrays and Psi Calculus, and a new way to represent numbers, Unum Arithmetic, enables mechanically provable, scalable, portable, and more numerically accurate software.
Tests based on characterizations, and their efficiencies: a survey A survey of goodness-of-fit and symmetry tests based on the characterization properties of distributions is presented. This approach became popular in recent years. In most cases the test statistics are functionals of $U$-empirical processes. The limiting distributions and large deviations of new statistics under the null hypothesis are described. Their local Bahadur efficiency for various parametric alternatives is calculated and compared with each other as well as with diverse previously known tests. We also describe new directions of possible research in this domain.
Tests for Comparing Weighted Histograms. Review and Improvements Histograms with weighted entries are used to estimate probability density functions. Computer simulation is the main application of this type of histograms. A review on chi-square tests for comparing weighted histograms is presented in this paper. Improvements to these tests that have a size closer to its nominal value are proposed. Numerical examples are presented for evaluation and demonstration of various applications of the tests.
texreg: Conversion of Statistical Model Output in R to LATEX and HTML Tables A recurrent task in applied statistics is the (mostly manual) preparation of model output for inclusion in LATEX, Microsoft Word, or HTML documents – usually with more than one model presented in a single table along with several goodness-of- t statistics. However, statistical models in R have diverse object structures and summary methods, which makes this process cumbersome. This article first develops a set of guidelines for converting statistical model output to LATEX and HTML tables, then assesses to what extent existing packages meet these requirements, and finally presents the texreg package as a solution that meets all of the criteria set out in the beginning. After providing various usage examples, a blueprint for writing custom model extensions is proposed.
Text Classification Algorithms: A Survey In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in the real-world problem are discussed.
Text Detection and Recognition in images: A survey Text Detection and recognition is a one of the important aspect of image processing. This paper analyzes and compares the methods to handle this task. It summarizes the fundamental problems and enumerates factors that need consideration when addressing these problems. Existing techniques are categorized as either stepwise or integrated and sub-problems are highlighted including digit localization, verification, segmentation and recognition. Special issues associated with the enhancement of degraded text and the processing of video text and multi-oriented text are also addressed. The categories and sub-categories of text are illustrated, benchmark datasets are enumerated, and the performance of the most representative approaches is compared. This review also provides a fundamental comparison and analysis of the remaining problems in the field.
Text Similarity in Vector Space Models: A Comparative Study Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.
Text Summarization Techniques: A Brief Survey In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.
Text Understanding from Scratch This article demontrates that we can apply deep learning to text understanding from character level inputs all the way up to abstract text concepts, using temporal convolutional networks (LeCun et al., 1998) (ConvNets). We apply ConvNets to various large-scale datasets, including ontology classification, sentiment analysis, and text categorization. We show that temporal ConvNets can achieve astonishing performance without the knowledge of words, phrases, sentences and any other syntactic or semantic structures with regards to a human language. Evidence shows that our models can work for both English and Chinese.
The 30-Year Cycle In The AI Debate In the last couple of years, the rise of Artificial Intelligence and the successes of academic breakthroughs in the field have been inescapable. Vast sums of money have been thrown at AI start-ups. Many existing tech companies — including the giants like Google, Amazon, Facebook, and Microsoft — have opened new research labs. The rapid changes in these everyday work and entertainment tools have fueled a rising interest in the underlying technology itself; journalists write about AI tirelessly, and companies — of tech nature or not — brand themselves with AI, Machine Learning or Deep Learning whenever they get a chance. Confronting squarely this media coverage, several analysts are starting to voice concerns about over-interpretation of AI’s blazing successes and the sometimes poor public reporting on the topic. This paper reviews briefly the track-record in AI and Machine Learning and finds this pattern of early dramatic successes, followed by philosophical critique and unexpected difficulties, if not downright stagnation, returning almost to the clock in 30-year cycles since 1958.
The ALAMO approach to machine learning ALAMO is a computational methodology for leaning algebraic functions from data. Given a data set, the approach begins by building a low-complexity, linear model composed of explicit non-linear transformations of the independent variables. Linear combinations of these non-linear transformations allow a linear model to better approximate complex behavior observed in real processes. The model is refined, as additional data are obtained in an adaptive fashion through error maximization sampling using derivative-free optimization. Models built using ALAMO can enforce constraints on the response variables to incorporate first-principles knowledge. The ability of ALAMO to generate simple and accurate models for a number of reaction problems is demonstrated. The error maximization sampling is compared with Latin hypercube designs to demonstrate its sampling efficiency. ALAMO’s constrained regression methodology is used to further refine concentration models, resulting in models that perform better on validation data and satisfy upper and lower bounds placed on model outputs.
The Algorithm Selection Competition Series 2015-17 The algorithm selection problem is to choose the most suitable algorithm for solving a given problem instance and thus, it leverages the complementarity between different approaches that is present in many areas of AI. We report on the state of the art in algorithm selection, as defined by the Algorithm Selection Competition series 2015 to 2017. The results of these competitions show how the state of the art improved over the years. Although performance in some cases is very promising, there is still room for improvement in other cases. Finally, we provide insights into why some scenarios are hard, and pose challenges to the community on how to advance the current state of the art.
The Analytics Big Bang (infographic)
The Anatomy of Big Data Computing Advances in information technology and its widespread growth in several areas of business, engineering, medical and scientific studies are resulting in information/data explosion. Knowledge discovery and decision making from such rapidly growing voluminous data is a challenging task in terms of data organization and processing, which is an emerging trend known as Big Data Computing; a new paradigm which combines large scale compute, new data intensive techniques and mathematical models to build data analytics. Big Data computing demands a huge storage and computing for data curation and processing that could be delivered from on-premise or clouds infrastructures. This paper discusses the evolution of Big Data computing, differences between traditional data warehousing and Big Data, taxonomy of Big Data computing and underpinning technologies, integrated platform of Big Data and Clouds known as Big Data Clouds, layered architecture and components of Big Data Cloud and finally discusses open technical challenges and future directions.
The Art of Data Augmentation The term data augmentation refers to methods for constructing iterative optimization or sampling algorithms via the introduction of unobserved data or latent variables. For deterministic algorithms, the method was popularized in the general statistical community by the seminal article by Dempster, Laird, and Rubin on the EM algorithm for maximizing a likelihood function or, more generally, a posterior density. For stochastic algorithms, the method was popularized in the statistical literature by Tanner and Wong’s Data Augmentation algorithm for posterior sampling and in the physics literature by Swendsen and Wang’s algorithm for sampling from the Ising and Potts models and their generalizations; in the physics literature, the method of data augmentation is referred to as the method of auxiliary variables. Data augmentation schemes were used by Tanner and Wong to make simulation feasible and simple, while auxiliary variables were adopted by Swendsen and Wang to improve the speed of iterative simulation. In general, however, constructing data augmentation schemes that result in both simple and fast algorithms is a matter of art in that successful strategies vary greatly with the (observed-data) models being considered. After an overview of data augmentation/auxiliary variables and some recent developments in methods for constructing such efficient data augmentation schemes, we introduce an effective search strategy that combines the ideas of marginal augmentation and conditional augmentation, together with a deterministic approximation method for selecting good augmentation schemes. We then apply this strategy to three common classes of models (specifically, multivariate t, probit regression, and mixed-effects models) to obtain efŽficient Markov chain Monte Carlo algorithms for posterior sampling. We provide theoretical and empirical evidence that the resulting algorithms, while requiring similar programming effort, can show dramatic improvement over the Gibbs samplers commonly used for these models in practice. A key feature of all these new algorithms is that they are positive recurrent subchains of nonpositive recurrent Markov chains constructed in larger spaces.
The Art of Turning Data Into Product Having worked in academia, government and industry, I´ve had a unique opportunity to build products in each sector. Much of this product development has been around building data products. Just as methods for general product development have steadily improved, so have the ideas for developing data products. Thanks to large investments in the general area of data science, many major innovations (e.g., Hadoop, Voldemort, Cassandra, HBase, Pig, Hive, etc.) have made data products easier to build. Nonetheless, data products are unique in that they are often extremely difficult, and seemingly intractable for small teams with limited funds. Yet, they get solved every day. How Are the people who solve them superhuman data scientists who can come up with better ideas in five minutes than most people can in a lifetime Are they magicians of applied math who can cobble together millions of lines of code for high-performance machine learning in a few hours No. Many of them are incredibly smart, but meeting big problems head-on usually isn´t the winning approach. There´s a method to solving data problems that avoids the big, heavyweight solution, and instead, concentrates building something quickly and iterating. Smart data scientists don´t just solve big, hard problems; they also have an instinct for making big problems small. We call this Data Jujitsu: the art of using multiple data elements in clever ways to solve iterative problems that, when combined, solve a data problem that might otherwise be intractable. It´s related to Wikipedia´s definition of the ancient martial art of jujitsu: ‘the art or technique of manipulating the opponent´s force against himself rather than confronting it with one´s own force.’
The Basic AI Drives One might imagine that AI systems with harmless goals will be harmless. This paper instead shows that intelligent systems will need to be carefully designed to prevent them from behaving in harmful ways. We identify a number of ‘drives’ that will appear in sufficiently advanced AI systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted. We start by showing that goal-seeking systems will have drives to model their own operation and to improve themselves. We then show that self-improving systems will be driven to clarify their goals and represent them as economic utility functions. They will also strive for their actions to approximate rational economic behavior. This will lead almost all systems to protect their utility functions from modification and their utility measurement systems from corruption. We also discuss some exceptional systems which will want to modify their utility functions. We next discuss the drive toward self-protection which causes systems try to prevent themselves from being harmed. Finally we examine drives toward the acquisition of resources and toward their efficient utilization. We end with a discussion of how to incorporate these insights in designing intelligent technology which will lead to a positive future for humanity.
The Bayesian New Statistics: Two historical trends converge There have been two historical shifts in the practice of data analysis. One shift is from hypothesis testing to estimation with uncertainty and meta-analysis, which among frequentists in psychology has recently been dubbed ‘the New Statistics’ (Cumming, 2014). A second shift is from frequentist methods to Bayesian methods. We explain and applaud both of these shifts. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The two historical trends converge in Bayesian methods for estimation with uncertainty and meta-analysis.
The beginner´s guide to app analytics The web analytics industry has grown in size and sophistication over the past decade, enabling marketers to create targeted and revenue-driven online presences. This represents a huge shift in how brands operate and communicate with their consumers. And it´s awesome. But the web isn´t where the majority of your audience is going to be anymore, and it isn´t where the learning curve is. Today, there are over 1.2 billion mobile web users worldwide, and mobile traffic is slated to reach over 25% of total Internet traffic by the year-end. And now, US daily smartphone screen time has even exceeded television time. For brands designing, launching, promoting and investing in mobile apps, tracking the right engagement metrics is critical to long-term success in terms of ROI and growth. Using realtime app insights allows you to get to know and adapt to your users right from the start, so that they keep coming back, even if you´re just getting started. In this eBook, we outline how to define your brand´s mobile goals, switch from a web-only mindset, and get started identifying, measuring, and learning from your key app analytics.
The Big Data Economy: Why and how our future with data is cleaner, leaner, and smarter (Slide Deck)
The Big Potential of Big Data Big data works. Adopters have reaped benefits in ROI, customer interactions and insights into customer behavior. Of the organizations that used big data at least 50% of the time, three in five (60%) said that they had exceeded their goals. At the same time, of the companies that used big data less than 50% of the time, just 33% said that they had exceeded their goals. The more frequently that companies felt that they were making sufficient use of data, the more likely they exceeded their goals. More than nine in 10 companies (92%) who had always or frequently made sufficient use of data said that they had met or exceeded their goals, while just 5% who said that they were making sufficient use of data said that they were falling short of their goals. At the same time, marketers seem to be suffering from a personality split. The overwhelming majority of executives say they are satisfied with their marketing. When pressed for more detail, however, the participants´ rosy view contradicts other, more detailed findings. Executives believe that they are using big data enough when they aren´t. A majority of agencies and non-agencies said that they were frequently or always making sufficient use of data in marketing decisions. However, only about one in 10 non-agencies managed more than half their advertising/marketing with big data, and a third of agencies used big data in more than half their initiatives. Many executives may be struggling to define big data and its potential benefits. Just over half of senior executives (both at agencies and other companies) said that they agreed or strongly agreed that they had a good understanding of big data and its benefits. Systems that generate data quickly and can account for changing consumer behavior – those that utilize machine learning – will be increasingly important. Roughly a quarter of respondents called them critical to the success of their marketing, while another 43% of agency executives and 44% of senior executives at non-agency organizations said they would be increasingly important for most initiatives.
The Challenge of Non-Technical Loss Detection using Artificial Intelligence: A Survey Detection of non-technical losses (NTL) which include electricity theft, faulty meters or billing errors has attracted increasing attention from researchers in electrical engineering and computer science. NTLs cause significant harm to the economy, as in some countries they may range up to 40% of the total electricity distributed. The predominant research direction is employing artificial intelligence (AI) to solve this problem. Promising approaches have been reported falling into two categories: expert systems incorporating hand-crafted expert knowledge or machine learning, also called pattern recognition or data mining, which learns fraudulent consumption patterns from examples without being explicitly programmed. This paper first provides an overview about how NTLs are defined and their impact on economies. Next, it covers the fundamental pillars of AI relevant to this domain. It then surveys these research efforts in a comprehensive review of algorithms, features and data sets used. It finally identifies the key scientific and engineering challenges in NTL detection and suggests how they could be solved. We believe that those challenges have not sufficiently been addressed in past contributions and that covering those is necessary in order to advance NTL detection.
The Coming Age of Pervasive Data Processing Emerging Big Data analytics and machine learning applications require a significant amount of computational power. While there exists a plethora of large-scale data processing frameworks which thrive in handling the various complexities of data-intensive workloads, the ever-increasing demand of applications have made us reconsider the traditional ways of scaling (e.g., scale-out) and seek new opportunities for improving the performance. In order to prepare for an era where data collection and processing occur on a wide range of devices, from powerful HPC machines to small embedded devices, it is crucial to investigate and eliminate the potential sources of inefficiency in the current state of the art platforms. In this paper, we address the current and upcoming challenges of pervasive data processing and present directions for designing the next generation of large-scale data processing systems.
The Convergence of Machine Learning and Communications The areas of machine learning and communication technology are converging. Today’s communications systems generate a huge amount of traffic data, which can help to significantly enhance the design and management of networks and communication components when combined with advanced machine learning methods. Furthermore, recently developed end-to-end training procedures offer new ways to jointly optimize the components of a communication system. Also in many emerging application fields of communication technology, e.g., smart cities or internet of things, machine learning methods are of central importance. This paper gives an overview over the use of machine learning in different areas of communications and discusses two exemplar applications in wireless networking. Furthermore, it identifies promising future research topics and discusses their potential impact.
The Convergence of Markov chain Monte Carlo Methods: From the Metropolis method to Hamiltonian Monte Carlo From its inception in the 1950s to the modern frontiers of applied statistics, Markov chain Monte Carlo has been one of the most ubiquitous and successful methods in statistical computing. In that time its development has been fueled by increasingly difficult problems and novel techniques from physics. In this article I will review the history of Markov chain Monte Carlo from its inception with the Metropolis method to today’s state-of-the-art in Hamiltonian Monte Carlo. Along the way I will focus on the evolving interplay between the statistical and physical perspectives of the method.
The Dimensionality of Customer Satisfaction Survey Responses and Implications for Driver Analysis The canonical design of customer satisfaction surveys asks for global satisfaction with a product or service and for evaluations of its distinct attributes. Users of these surveys are often interested in the relationship between global satisfaction and the attributes, with regression analysis used to measure the conditional associations. Regression analysis is only appropriate when the global satisfaction measure results from the attribute evaluations, and is not appropriate when the covariance of the items lie in a low dimensional subspace, such as in a factor model. Potential reasons for low dimensional responses are responses that are haloed from overall satisfaction and an unintended lack of specificity of items. In this paper we develop a Bayesian mixture model that facilitates the empirical distinction between regression models and relatively much lower dimensional factor models. The model uses the dimensionality of the covariance among items in a survey as the primary classification criterion while accounting for heterogeneous usage of rating scales. We apply the model to four different customer satisfaction surveys evaluating hospitals, an academic program, smart-phones, and theme parks respectively. We show that correctly assessing the heterogeneous dimensionality of responses is critical for meaningful inferences by comparing our results to those from regression models.
The Dynamics of Learning: A Random Matrix Approach Understanding the learning dynamics of neural networks is one of the key issues for the improvement of optimization algorithms as well as for the theoretical comprehension of why deep neural nets work so well today. In this paper, we introduce a random matrix-based framework to analyze the learning dynamics of a single-layer linear network on a binary classification problem, for data of simultaneously large dimension and size, trained by gradient descent. Our results provide rich insights into common questions in neural nets, such as overfitting, early stopping and the initialization of training, thereby opening the door for future studies of more elaborate structures and models appearing in today’s neural networks.
The Enron Corpus: A New Dataset for Email classification Research Automated classification of email messages into user-speci c folders and information extraction from chronologically ordered email streams have become interesting areas in text learning research. However, the lack of large benchmark collections has been an obstacle for studying the problems and evaluating the solutions. In this paper, we introduce the Enron corpus as a new test bed. We analyze its suitability with respect to email folder prediction, and provide the baseline results of a stateof- the-art classi er (Support Vector Machines) under various conditions, including the cases of using individual sections (From, To, Subject and body) alone as the input to the classi er, and using all the sections in combination with regression weights.
The Evolution of Sentiment Analysis – A Review of Research Topics, Venues, and Top Cited Papers Research in sentiment analysis is increasing at a fast pace making it challenging to keep track of all the activities in the area. We present a computer-assisted literature review and analyze 5,163 papers from Scopus. We find that the roots of sentiment analysis are in studies on public opinion analysis at the start of 20th century, but the outbreak of computer-based sentiment analysis only occurred with the availability of subjective texts in the Web. Consequently, 99% of the papers have been published after 2005. Sentiment analysis papers are scattered to multiple publication venues and the combined number of papers in the top-15 venues only represent 29% of the papers in total. In recent years, sentiment analysis has shifted from analyzing online product reviews to social media texts from Twitter and Facebook. We created a taxonomy of research topics with text mining and qualitative coding. A meaningful future for sentiment analysis could be in ensuring the authenticity of public opinions, and detecting fake news.
The Expressive Power of Neural Networks: A View from the Width The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that \emph{depth-bounded} (e.g. depth-$2$) networks with suitable activation functions are universal approximators. We show a universal approximation theorem for \emph{width-bounded} ReLU networks: width-$(n+4)$ ReLU networks, where $n$ is the input dimension, are universal approximators. Moreover, except for a measure zero set, all functions cannot be approximated by width-$n$ ReLU networks, which exhibits a phase transition. Several recent works demonstrate the benefits of depth by proving the depth-efficiency of neural networks. That is, there are classes of deep networks which cannot be realized by any shallow network whose size is no more than an \emph{exponential} bound. Here we pose the dual question on the width-efficiency of ReLU networks: Are there wide networks that cannot be realized by narrow networks whose size is not substantially larger We show that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a \emph{polynomial} bound. On the other hand, we demonstrate by extensive experiments that narrow networks whose size exceed the polynomial bound by a constant factor can approximate wide and shallow network with high accuracy. Our results provide more comprehensive evidence that depth is more effective than width for the expressiveness of ReLU networks.
The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, lq Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME). These methods exploit different nonsmooth loss functions to gain modeling exibility, estimation robustness, and tuning insensitiveness. The developed solver is based on the alternating direction method of multipliers (ADMM). The package flare is coded in double precision C, and called from R by a user-friendly interface. The memory usage is optimized by using the sparse matrix output. The experiments show that flare is efficient and can scale up to large problems.
The Forrester Wave: Big Data Streaming Analytics Platforms, Q3 2014 Streaming analytics is anything but a sleepy, rearview mirror analysis of data. No, it is about knowing and acting on what’s happening in your business at this very moment – now. Forrester calls these perishable insights because they occur at a moment’s notice and you must act on them fast within a narrow window of opportunity before they quickly lose their value. The high velocity, white-water flow of data from innumerable real-time data sources such as market data, Internet of Things, mobile, sensors, clickstream, and even transactions remain largely unnavigated by most firms. The opportunity to leverage streaming analytics has never been greater. In Forrester’s 50-criteria evaluation of big data streaming analytics platforms, we evaluated seven platforms from IBM, Informatica, SAP, Software AG, SQLstream, Tibco Software, and Vitria.
The Forward Search and Data Visualisation The forward search is a powerful robust statistical method for exploring the relationship between data and fitted models, which produces an appreciable number of graphs that illuminate the structure of the data. Atkinson and Riani (2000) describe its use in linear and nonlinear regression, response transformation and in generalized linear models, where the emphasis is on the detection of unidentified subsets of the data and of multiple masked outliers and of their effect on inferences. In this talk we extend the method to the analysis of multivariate data, where the emphasis is rather more on the data and less on the multivariate normal model. The forward search orders the observations by closeness to the assumed model, starting from a small subset of the data and increasing the number of observations m used for fitting the model. Outliers and small unidentified subsets of observations enter at the end of the search. Even if there are a number of groups, as in cluster analysis, we start by fitting one multivariate normal distribution to the data. An important graphical tool is a variety of plots of the Mahalanobis distances of the individual observations during the search. Each unit, originally a point in v-dimensional space, is then represented by a curve in two dimensions connecting the almost n values of the distance for each unit calculated during the search. Our task is now to classify these curves. Forward plots of Mahalanobis distances give a good initial indication of clusters, if any, which can be refined by, for example, plots of distances for unclassified units compared to those for each established group. We can also start the forward search at a different points, for example inside each cluster in turn, when we obtain very different curves for each unit. If our aim is cluster analysis, we finish by fitting as many multivariate normal distributions as there are clusters, visually monitoring the behaviour of our forward clustering algorithm. Because we use Mahalanobis distances, it is important that the data are approximately normal. We therefore combine cluster analysis with a multivariate form of the Box-Cox family of transformations. We again use graphical methods, particularly a series of ‘fan plots’, to establish appropriate transformations.
The forward search: Theory and data analysis The Forward Search is a powerful general method, incorporating flexible data-driven trimming, for the detection of outliers and unsuspected structure in data and so for building robust models. Starting from small subsets of data, observations that are close to the fitted model are added to the observations used in parameter estimation. As this subset grows we monitor parameter estimates, test statistics and measures of fit such as residuals. The paper surveys theoretical development in work on the Forward Search over the last decade. The main illustration is a regression example with 330 observations and 9 potential explanatory variables. Mention is also made of procedures for multivariate data, including clustering, time series analysis and fraud detection.
The Foundations of Deep Learning with a Path Towards General Intelligence Like any field of empirical science, AI may be approached axiomatically. We formulate requirements for a general-purpose, human-level AI system in terms of postulates. We review the methodology of deep learning, examining the explicit and tacit assumptions in deep learning research. Deep Learning methodology seeks to overcome limitations in traditional machine learning research as it combines facets of model richness, generality, and practical applicability. The methodology so far has produced outstanding results due to a productive synergy of function approximation, under plausible assumptions of irreducibility and the efficiency of back-propagation family of algorithms. We examine these winning traits of deep learning, and also observe the various known failure modes of deep learning. We conclude by giving recommendations on how to extend deep learning methodology to cover the postulates of general-purpose AI including modularity, and cognitive architecture. We also relate deep learning to advances in theoretical neuroscience research.
The Future of Data Analysis
The Future of Misinformation Detection: New Perspectives and Trends The massive spread of misinformation in social networks has become a global risk, implicitly influencing public opinion and threatening social/political development. Misinformation detection (MID) has thus become a surging research topic in recent years. As a promising and rapid developing research field, we find that many efforts have been paid to new research problems and approaches of MID. Therefore, it is necessary to give a comprehensive review of the new research trends of MID. We first give a brief review of the literature history of MID, based on which we present several new research challenges and techniques of it, including early detection, detection by multimodal data fusion, and explanatory detection. We further investigate the extraction and usage of various crowd intelligence in MID, which paves a promising way to tackle MID challenges. Finally, we give our own views on the open issues and future research directions of MID, such as model adaptivity/generality to new events, embracing of novel machine learning models, explanatory detection models, and so on.
The Future of Retail Analytics Retail has always been a data-intensive industry. As the tools available to store, manage and analyze this data evolved, so did the role the analysis of data played in retail decision-making. From visibility and control, to transparency, to efficiency, to customer engagement. It is cheaper, faster and easier today to store and process more data than ever before. Retailers have gotten better at data management. Question is, how well are they able to leverage insights from this analysis to drive strategic decisions EKN conducted an industry survey to benchmark the state of the retail industry in terms of analytics maturity. Findings from the primary research covering 65+ respondents, interview based qualitative inputs from retail executives, and EKN´s secondary research from public and proprietary sources are presented in this report. In a retail environment where consumer spending is stunted and competition from newer, digital channels is eroding store sales, the route for brick and mortar retailers to earn a larger share of wallet of the customer is through deeper, Omni-channel customer engagement. Customer engagement is only as effective as how well you know the customer and how well you are equipped to act on that insight across your channels. Perhaps this is why customer insight emerges as retailers´ highest-priority goal from analytics initiatives in 2013. In addition, findings from EKN´s survey include: • Retailers´ analytics maturity is low: 2 in 5 retailers state they lag behind their competitors in terms of their analytics maturity and a further 2 in 5 suggest they are at par. The ‘analytical retailer’ is thus the exception rather than the rule. • Data management and integration will be a key area of investment in an effort to increase analytical maturity. Retailers are looking to integrate a variety of data sources over the next 2 years, however public and open data remains a relatively under-explored opportunity. • Retailers find their current analytics organizational setup sub-optimal. Only 18% currently have a shared services model for analytics in place whereas approximately 60% would like to move towards such a model. • Retailers will invest in contextual, visual and mobile-friendly delivery of insights to combat the biggest challenge that prevents them from leveraging analytics strategically – delivery of insights to the right resource at the right time. • Retailers´ eCommerce or Omni-channel function emerges as the business function with the highest potential opportunity for analytics impact, the highest rate of data growth and the highest planned technology investment. However, it is also currently the function with the lowest analytics maturity. • Usability is the most important feature retailers will look for when choosing analytics solutions in 2013. Even with the delivery of insights being their biggest challenge, mobile or tablet access ranks relatively low. The traditional view of data management and analysis in retail has been tool-driven – be it relational databases of decades past or Business Intelligence tools more recently. In EKN´s view, ‘business analytics’ is a concept that focuses on decisions and outcomes, and is a far better indicator of the future of retail analytics.
The GAN Landscape: Losses, Architectures, Regularization, and Normalization Generative Adversarial Networks (GANs) are a class of deep generative models which aim to learn a target distribution in an unsupervised fashion. While they were successfully applied to many problems, training a GAN is a notoriously challenging task and requires a significant amount of hyperparameter tuning, neural architecture engineering, and a non-trivial amount of ‘tricks’. The success in many practical applications coupled with the lack of a measure to quantify the failure modes of GANs resulted in a plethora of proposed losses, regularization and normalization schemes, and neural architectures. In this work we take a sober view of the current state of GANs from a practical perspective. We reproduce the current state of the art and go beyond fairly exploring the GAN landscape. We discuss common pitfalls and reproducibility issues, open-source our code on Github, and provide pre-trained models on TensorFlow Hub.
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers Automatically solving mathematical word problems (MWPs) is challenging, primarily due to the semantic gap between human-readable words and machine-understandable logics. Despite a long history dated back to the 1960s, MWPs has regained intensive attention in the past few years with the advancement of Artificial Intelligence (AI). To solve MWPs successfully is considered as a milestone towards general AI. Many systems have claimed promising results in self-crafted and small-scale datasets. However, when applied on large and diverse datasets, none of the proposed methods in the literatures achieves a high precision, revealing that current MWPs solvers are still far from intelligent. This motivated us to present a comprehensive survey to deliver a clear and complete picture of automatic math problem solvers. In this survey, we emphasize on algebraic word problems, summarize their extracted features and proposed techniques to bridge the semantic gap, and compare their performance in the publicly accessible datasets. We will also cover automatic solvers for other types of math problems such as geometric problems that require the understanding of diagrams. Finally, we will identify several emerging research directions for the readers with interests in MWPs.
The Global Impact of Open Data Open data has spurred economic innovation, social transformation, and fresh forms of political and government accountability in recent years, but few people understand how open data works. This comprehensive report, developed with support from Omidyar Network, presents detailed case studies of open data projects throughout the world, along with in-depth analysis of what works and what doesn´t. Authors Andrew Young and Stefaan Verhulst, both with The GovLab at New York University, explain how these projects have made governments more accountable and efficient, helped policymakers find solutions to previously intractable public problems, created new economic opportunities, and empowered citizens through new forms of social mobilization.
The Google File System We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.
The Graph Story of the SAP HANA Database Many traditional and new business applications work with inherently graphstructured data and therefore benefit from graph abstractions and operations provided in the data management layer. The property graph data model not only offers schema flexibility but also permits managing and processing data and metadata jointly. By having typical graph operations implemented directly in the database engine and exposing them both in the form of an intuitive programming interface and a declarative language, complex business application logic can be expressed more easily and executed very efficiently. In this paper we describe our ongoing work to extend the SAP HANA database with built-in graph data support. We see this as a next step on the way to provide an efficient and intuitive data management platform for modern business applications with SAP HANA.
The hidden costs of open source Clusters based on open-source software and the Linux operating system have come to dominate high performance computing (HPC). This is due in part to their superior performance, cost-effectiveness and flexibility. The same factors that make open-source software the choice of HPC professionals have also made it less accessible to smaller centers. The complexity and associated cost of deploying and managing open-source clusters threatens to erode the very cost benefits that have made them compelling in the first place. As customers choose between open-source and commercial alternatives, there are many different costs related to administration and productivity that should be considered. These are explored in this paper in order to give a true cost perspective. We also examine how a commercial management product, such as IBM Platform HPC, enables HPC customers to side-step many overhead cost and support issues that often plague open-source environments and enable them to deploy powerful, easy to use clusters.
The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches Deep learning has demonstrated tremendous success in variety of application domains in the past few years. This new field of machine learning has been growing rapidly and applied in most of the application domains with some new modalities of applications, which helps to open new opportunity. There are different methods have been proposed on different category of learning approaches, which includes supervised, semi-supervised and un-supervised learning. The experimental results show state-of-the-art performance of deep learning over traditional machine learning approaches in the field of Image Processing, Computer Vision, Speech Recognition, Machine Translation, Art, Medical imaging, Medical information processing, Robotics and control, Bio-informatics, Natural Language Processing (NLP), Cyber security, and many more. This report presents a brief survey on development of DL approaches, including Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) including Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL). In addition, we have included recent development of proposed advanced variant DL techniques based on the mentioned DL approaches. Furthermore, DL approaches have explored and evaluated in different application domains are also included in this survey. We have also comprised recently developed frameworks, SDKs, and benchmark datasets that are used for implementing and evaluating deep learning approaches. There are some surveys have published on Deep Learning in Neural Networks [1, 38] and a survey on RL [234]. However, those papers have not discussed the individual advanced techniques for training large scale deep learning models and the recently developed method of generative models [1].
The History of Digital Spam Spam!: that’s what Lorrie Faith Cranor and Brian LaMacchia exclaimed in the title of a popular call-to-action article that appeared twenty years ago on Communications of the ACM. And yet, despite the tremendous efforts of the research community over the last two decades to mitigate this problem, the sense of urgency remains unchanged, as emerging technologies have brought new dangerous forms of digital spam under the spotlight. Furthermore, when spam is carried out with the intent to deceive or influence at scale, it can alter the very fabric of society and our behavior. In this article, I will briefly review the history of digital spam: starting from its quintessential incarnation, spam emails, to modern-days forms of spam affecting the Web and social media, the survey will close by depicting future risks associated with spam and abuse of new technologies, including Artificial Intelligence (e.g., Digital Humans). After providing a taxonomy of spam, and its most popular applications emerged throughout the last two decades, I will review technological and regulatory approaches proposed in the literature, and suggest some possible solutions to tackle this ubiquitous digital epidemic moving forward.
The impact of social segregation on human mobility in developing and industrialized regions This study leverages mobile phone data to analyze human mobility patterns in a developing nation, especially in comparison to those of a more industrialized nation. Developing regions, such as the Ivory Coast, are marked by a number of factors that may influence mobility, such as less infrastructural coverage and maturity, less economic resources and stability, and in some cases, more cultural and language-based diversity. By comparing mobile phone data collected from the Ivory Coast to similar data collected in Portugal, we are able to highlight both qualitative and quantitative differences in mobility patterns – such as differences in likelihood to travel, as well as in the time required to travel – that are relevant to consideration on policy, infrastructure, and economic development. Our study illustrates how cultural and linguistic diversity in developing regions (such as Ivory Coast) can present challenges to mobility models that perform well and were conceptualized in less culturally diverse regions. Finally, we address these challenges by proposing novel techniques to assess the strength of borders in a regional partitioning scheme and to quantify the impact of border strength on mobility model accuracy.
The importance of being dissimilar in Recommendation Similarity measures play a fundamental role in memory-based nearest neighbors approaches. They recommend items to a user based on the similarity of either items or users in a neighborhood. In this paper we argue that, although it keeps a leading importance in computing recommendations, similarity between users or items should be paired with a value of dissimilarity (computed not just as the complement of the similarity one). We formally modeled and injected this notion in some of the most used similarity measures and evaluated our approach showing its effectiveness in terms of accuracy results.
The Internet of Things: a Survey and Outlook The recent history has witnessed disruptive advances in disciplines related to information and communication technologies that have laid a rich technological ecosystem for the growth and maturity of latent paradigms in this domain. Among them, sensor networks have evolved from the originally conceived set-up where hundreds of nodes with sensing and actuating functionalities were deployed to capture information from their environment and act accordingly (coining the so-called wireless sensor network concept) to the provision of such functionalities embedded in quotidian objects that communicate and work together to collaboratively accomplish complex tasks based on the information they acquire by sensing the environment. This is nowadays a reality, embracing the original idea of an Internet of things (IoT) forged in the late twentieth century, yet featuring unprecedented scales, capabilities and applications ignited by new radio interfaces, communication protocols and intelligent data-based models. This chapter examines the latest findings reported in the literature around these topics, with a clear focus on IoT communications, protocols and platforms, towards ultimately identifying opportunities and trends that will be at the forefront of IoT-related research in the near future.
The Internet of Things: Making sense of the next mega-trend The third wave of the Internet may be the biggest one yet
The Internet of Things: Secure Distributed Inference The growth in the number of devices connected to the Internet of Things (IoT) poses major challenges in security. The integrity and trustworthiness of data and data analytics are increasingly important concerns in IoT applications. These are compounded by the highly distributed nature of IoT devices, making it infeasible to prevent attacks and intrusions on all data sources. Adversaries may hijack devices and compromise their data. As a result, reactive countermeasures, such as intrusion detection and resilient analytics, become vital components of security. This paper overviews algorithms for secure distributed inference in IoT.
The Landscape of Deep Learning Algorithms This paper studies the landscape of empirical risk of deep neural networks by theoretically analyzing its convergence behavior to the population risk as well as its stationary points and properties. For an $l$-layer linear neural network, we prove its empirical risk uniformly converges to its population risk at the rate of $\mathcal{O}(r^{2l}\sqrt{d\log(l)}/\sqrt{n})$ with training sample size of $n$, the total weight dimension of $d$ and the magnitude bound $r$ of weight of each layer. We then derive the stability and generalization bounds for the empirical risk based on this result. Besides, we establish the uniform convergence of gradient of the empirical risk to its population counterpart. We prove the one-to-one correspondence of the non-degenerate stationary points between the empirical and population risks with convergence guarantees, which describes the landscape of deep neural networks. In addition, we analyze these properties for deep nonlinear neural networks with sigmoid activation functions. We prove similar results for convergence behavior of their empirical risks as well as the gradients and analyze properties of their non-degenerate stationary points. To our best knowledge, this work is the first one theoretically characterizing landscapes of deep learning algorithms. Besides, our results provide the sample complexity of training a good deep neural network. We also provide theoretical understanding on how the neural network depth $l$, the layer width, the network size $d$ and parameter magnitude determine the neural network landscapes.
The many faces of deep learning Deep learning has sparked a network of mutual interactions between different disciplines and AI. Naturally, each discipline focuses and interprets the workings of deep learning in different ways. This diversity of perspectives on deep learning, from neuroscience to statistical physics, is a rich source of inspiration that fuels novel developments in the theory and applications of machine learning. In this perspective, we collect and synthesize different intuitions scattered across several communities as for how deep learning works. In particular, we will briefly discuss the different perspectives that disciplines across mathematics, physics, computation, and neuroscience take on how deep learning does its tricks. Our discussion on each perspective is necessarily shallow due to the multiple views that had to be covered. The deepness in this case should come from putting all these faces of deep learning together in the reader’s mind, so that one can look at the same problem from different angles.
The Marginal Value of Adaptive Gradient Methods in Machine Learning Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We show that for simple overparameterized problems, adaptive methods often find drastically different solutions than gradient descent (GD) or stochastic gradient descent (SGD). We construct an illustrative binary classification problem where the data is linearly separable, GD and SGD achieve zero test error, and AdaGrad, Adam, and RMSProp attain test errors arbitrarily close to half. We additionally study the empirical generalization capability of adaptive methods on several state-of-the-art deep learning models. We observe that the solutions found by adaptive methods generalize worse (often significantly worse) than SGD, even when these solutions have better training performance. These results suggest that practitioners should reconsider the use of adaptive methods to train neural networks.
The Matrix Cookbook These pages are a collection of facts (identities, approximations, inequalities, relations, …) about matrices and matters relating to them. It is collected in this form for the convenience of anyone who wants a quick desktop reference .
The Measure of Intelligence To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to ‘buy’ arbitrary levels of skills for a system, in a way that masks the system’s own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.
The Measurement of Statistical Evidence as the Basis for Statistical Reasoning There are various approaches to the problem of how one is supposed to conduct a statistical analysis. Different analyses can lead to contradictory conclusions in some problems so this is not a satisfactory state of affairs. It seems that all approaches make reference to the evidence in the data concerning questions of interest as a justification for the methodology employed. It is fair to say, however, that none of the most commonly used methodologies is absolutely explicit about how statistical evidence is to be characterized and measured. We will discuss the general problem of statistical reasoning and the development of a theory for this that is based on being precise about statistical evidence. This will be shown to lead to the resolution of a number of problems.
The Metropolis-Hastings Algorithm This article is a self-contained introduction to the Metropolis- Hastings algorithm, this ubiquitous tool for producing dependent simula- tions from an arbitrary distribution. The document illustrates the principles of the methodology on simple examples with R codes and provides entries to the recent extensions of the method.
The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox To enable complex data-intensive applications such as personalized recommendations, targeted advertising, and intelligent services, the data management community has focused heavily on the design of systems to train complex models on large datasets. Unfortunately, the design of these systems largely ignores a critical component of the overall analytics process: the serving and management of models at scale. In this work, we present Velox, a new component of the Berkeley Data Analytics Stack. Velox is a data management system for facilitating the next steps in real-world, large-scale analytics pipelines: online model management, maintenance, and serving. Velox provides end-user applications and services with a low-latency, intuitive interface to models, transforming the raw statistical models currently trained using existing offline large-scale compute frameworks into full-blown, end-to-end data products capable of targeting advertisements, recommending products, and personalizing web content. To provide up-to-date results for these complex models, Velox also facilitates lightweight online model maintenance and selection (i.e., dynamic weighting). In this paper, we describe the challenges and architectural considerations required to achieve this functionality, including the abilities to span online and offline systems, to adaptively adjust model materialization strategies, and to exploit inherent statistical properties such as model error tolerance, all while operating at ‘Big Data’ scale.
The modal age of Statistics Recently, a number of statistical problems have found an unexpected solution by inspecting them through a ‘modal point of view’. These include classical tasks such as clustering or regression. This has led to a renewed interest in estimation and inference for the mode. This paper offers an extensive survey of the traditional approaches to mode estimation and explores the consequences of applying this modern modal methodology to other, seemingly unrelated, fields.
The Mode of Computing The Turing Machine is the paradigmatic case of computing machines, but there are others, such as Artificial Neural Networks, Table Computing, Relational-Indeterminate computing and diverse forms of analogical computing, each of which based on a particular underlying intuition of the phenomenon of computing. This variety can be captured in terms of system levels, re-interpreting and generalizing Newell’s hierarchy, which includes the knowledge level at the top and the symbol level immediately below it. In this re-interpretation the knowledge level consists of human knowledge and the symbol level is generalized into a new level that here is called The Mode of Computing. Each computing paradigm uses a particular mode, and a central question for Cognition is what is the mode of natural computing. The mode of computing provides a novel perspective on the phenomena of computing, the representational and non-representational views of cognition, and consciousness.
The Mythos of Model Interpretability Supervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet the task of interpretation appears underspecified. Papers provide diverse and sometimes non-overlapping motivations for interpretability, and offer myriad notions of what attributes render models interpretable. Despite this ambiguity, many papers proclaim interpretability axiomatically, absent further explanation. In this paper, we seek to refine the discourse on interpretability. First, we examine the motivations underlying interest in interpretability, finding them to be diverse and occasionally discordant. Then, we address model properties and techniques thought to confer interpretability, identifying transparency to humans and post-hoc explanations as competing notions. Throughout, we discuss the feasibility and desirability of different notions, and question the oft-made assertions that linear models are interpretable and that deep neural networks are not.
The Origins of Computational Mechanics: A Brief Intellectual History and Several Clarifications The principle goal of computational mechanics is to define pattern and structure so that the organization of complex systems can be detected and quantified. Computational mechanics developed from efforts in the 1970s and early 1980s to identify strange attractors as the mechanism driving weak fluid turbulence via the method of reconstructing attractor geometry from measurement time series and in the mid-1980s to estimate equations of motion directly from complex time series. In providing a mathematical and operational definition of structure it addressed weaknesses of these early approaches to discovering patterns in natural systems. Since then, computational mechanics has led to a range of results from theoretical physics and nonlinear mathematics to diverse applications—from closed-form analysis of Markov and non-Markov stochastic processes that are ergodic or nonergodic and their measures of information and intrinsic computation to complex materials and deterministic chaos and intelligence in Maxwellian demons to quantum compression of classical processes and the evolution of computation and language. This brief review clarifies several misunderstandings and addresses concerns recently raised regarding early works in the field (1980s). We show that misguided evaluations of the contributions of computational mechanics are groundless and stem from a lack of familiarity with its basic goals and from a failure to consider its historical context. For all practical purposes, its modern methods and results largely supersede the early works. This not only renders recent criticism moot and shows the solid ground on which computational mechanics stands but, most importantly, shows the significant progress achieved over three decades and points to the many intriguing and outstanding challenges in understanding the computational nature of complex dynamic systems.
The placement of the head that maximizes predictability. An information theoretic approach The minimization of the length of syntactic dependencies is a well-stablished principle of word order and the basis of a mathematical theory of word order. Here we complete that theory from the perspective of information theory, adding a competing word order principle: the maximization of predictability of a target element. These two principles are in conflict: to maximize the predictability of the head, the head should appear last, which maximizes the costs with respect to dependency length minimization. The implications of such a broad theoretical framework to understand the optimality, diversity and evolution of the six possible orderings of subject, object and verb are reviewed.
The Prediction Advantage: A Universally Meaningful Performance Measure for Classification and Regression We introduce the Prediction Advantage (PA), a novel performance measure for prediction functions under any loss function (e.g., classification or regression). The PA is defined as the performance advantage relative to the Bayesian risk restricted to knowing only the distribution of the labels. We derive the PA for well-known loss functions, including 0/1 loss, cross-entropy loss, absolute loss, and squared loss. In the latter case, the PA is identical to the well-known R-squared measure, widely used in statistics. The use of the PA ensures meaningful quantification of prediction performance, which is not guaranteed, for example, when dealing with noisy imbalanced classification problems. We argue that among several known alternative performance measures, PA is the best (and only) quantity ensuring meaningfulness for all noise and imbalance levels.
The principles of adaptation in organisms and machines I: machine learning, information theory, and thermodynamics How do organisms recognize their environment by acquiring knowledge about the world, and what actions do they take based on this knowledge? This article examines hypotheses about organisms’ adaptation to the environment from machine learning, information-theoretic, and thermodynamic perspectives. We start with constructing a hierarchical model of the world as an internal model in the brain, and review standard machine learning methods to infer causes by approximately learning the model under the maximum likelihood principle. This in turn provides an overview of the free energy principle for an organism, a hypothesis to explain perception and action from the principle of least surprise. Treating this statistical learning as communication between the world and brain, learning is interpreted as a process to maximize information about the world. We investigate how the classical theories of perception such as the infomax principle relates to learning the hierarchical model. We then present an approach to the recognition and learning based on thermodynamics, showing that adaptation by causal learning results in the second law of thermodynamics whereas inference dynamics that fuses observation with prior knowledge forms a thermodynamic process. These provide a unified view on the adaptation of organisms to the environment.
The Probability of Causation Many legal cases require decisions about causality, responsibility or blame, and these may be based on statistical data. However, causal inferences from such data are beset by subtle conceptual and practical difficulties, and in general it is, at best, possible to identify the ‘probability of causation’ as lying between certain empirically informed limits. These limits can be refined and improved if we can obtain additional information, from statistical or scientific data, relating to the internal workings of the causal processes. In this paper we review and extend recent work in this area, where additional information may be available on covariate and/or mediating variables.
The ProM framework: A new era in process mining tool support Under the umbrella of buzzwords such as ‘Business Activity Monitoring’ (BAM) and ‘Business Process Intelligence’ (BPI) both academic (e.g., EMiT, Little Thumb, InWoLvE, Process Miner, and MinSoN) and commercial tools (e.g., ARIS PPM, HP BPI, and ILOG JViews) have been developed. The goal of these tools is to extract knowledge from event logs (e.g., transaction logs in an ERP system or audit trails in a WFM system), i.e., to do process mining. Unfortunately, tools use different formats for reading/storing log files and present their results in different ways. This makes it difficult to use different tools on the same data sets and to compare the mining results. Furthermore, some of these tools implement concepts that can be very useful in the other tools but it is often difficult to combine tools. As a result, researchers working on new process mining techniques are forced to build a mining infrastructure from scratch or test their techniques in an isolated way, disconnected from any practical applications. To overcome these kind of problems, we have developed the ProM framework, i.e., an ‘plugable’ environment for process mining. The framework is flexible with respect to the input and output format, and is also open enough to allow for the easy reuse of code during the implementation of new process mining ideas. This paper introduces the ProM framework and gives an overview of the plug-ins that have been developed.
The Promise and Peril of Big Data
The Promise of Data Science for the Technosignatures Field This paper outlines some of the possible advancements for the technosignatures searches using the new methods currently rapidly developing in computer science, such as machine learning and deep learning. It also showcases a couple of case studies of large research programs where such methods have been already successfully implemented with notable results. We consider that the availability of data from all sky, all the time observations paired with the latest developments in computational capabilities and algorithms currently used in artificial intelligence, including automation, will spur an unprecedented development of the technosignatures search efforts.
The relationship between Biological and Artificial Intelligence Intelligence can be defined as a predominantly human ability to accomplish tasks that are generally hard for computers and animals. Artificial Intelligence [AI] is a field attempting to accomplish such tasks with computers. AI is becoming increasingly widespread, as are claims of its relationship with Biological Intelligence. Often these claims are made to imply higher chances of a given technology succeeding, working on the assumption that AI systems which mimic the mechanisms of Biological Intelligence should be more successful. In this article I will discuss the similarities and differences between AI and the extent of our knowledge about the mechanisms of intelligence in biology, especially within humans. I will also explore the validity of the assumption that biomimicry in AI systems aids their advancement, and I will argue that existing similarity to biological systems in the way Artificial Neural Networks [ANNs] tackle tasks is due to design decisions, rather than inherent similarity of underlying mechanisms. This article is aimed at people who understand the basics of AI (especially ANNs), and would like to be better able to evaluate the often wild claims about the value of biomimicry in AI.
The retail market as a complex system Aim of this paper is to introduce the complex system perspective into retail market analysis. Currently, to understand the retail market means to search for local patterns at the micro level, involving the segmentation, separation and profiling of diverse groups of consumers. In other contexts, however, markets are modelled as complex systems. Such strategy is able to uncover emerging regularities and patterns that make markets more predictable, e.g. enabling to predict how much a country´s GDP will grow. Rather than isolate actors in homogeneous groups, this strategy requires to consider the system as a whole, as the emerging pattern can be detected only as a result of the interaction between its self-organizing parts. This assumption holds also in the retail market: each customer can be seen as an independent unit maximizing its own utility function. As a consequence, the global behaviour of the retail market naturally emerges, enabling a novel description of its properties, complementary to the local pattern approach. Such task demands for a data-driven empirical framework. In this paper, we analyse a unique transaction database, recording the micro-purchases of a million customers observed for several years in the stores of a national supermarket chain. We show the emergence of the fundamental pattern of this complex system, connecting the products´ volumes of sales with the customers´ volumes of purchases. This pattern has a number of applications. We provide three of them. By enabling us to evaluate the sophistication of needs that a customer has and a product satisfies, this pattern has been applied to the task of uncovering the hierarchy of needs of the customers, providing a hint about what is the next product a customer could be interested in buying and predicting in which shop she is likely to go to buy it.
The Rise of Big Data Analytics for Marketing (Slide Deck)
The Risk of Machine Learning Many applied settings in empirical economics involve simultaneous estimation of a large number of parameters. In particular, applied economists are often interested in estimating the effects of many-valued treatments (like teacher effects or location effects), treatment effects for many groups, and prediction models with many regressors. In these settings, machine learning methods that combine regularized estimation and data-driven choices of regularization parameters are useful to avoid over-fitting. In this article, we analyze the performance of a class of machine learning estimators that includes ridge, lasso and pretest in contexts that require simultaneous estimation of many parameters. Our analysis aims to provide guidance to applied researchers on (i) the choice between regularized estimators in practice and (ii) data-driven selection of regularization parameters. To address (i), we characterize the risk (mean squared error) of regularized estimators and derive their relative performance as a function of simple features of the data generating process. To address (ii), we show that data-driven choices of regularization parameters, based on Stein’s unbiased risk estimate or on cross-validation, yield estimators with risk uniformly close to the risk attained under the optimal (unfeasible) choice of regularization parameters. We use data from recent examples in the empirical economics literature to illustrate the practical applicability of our results.
The Role of Big Data Analytics in Industrial Internet of Things Big data production in industrial Internet of Things (IIoT) is evident due to the massive deployment of sensors and Internet of Things (IoT) devices. However, big data processing is challenging due to limited computational, networking and storage resources at IoT device-end. Big data analytics (BDA) is expected to provide operational- and customer-level intelligence in IIoT systems. Although numerous studies on IIoT and BDA exist, only a few studies have explored the convergence of the two paradigms. In this study, we investigate the recent BDA technologies, algorithms and techniques that can lead to the development of intelligent IIoT systems. We devise a taxonomy by classifying and categorising the literature on the basis of important parameters (e.g. data sources, analytics tools, analytics techniques, requirements, industrial analytics applications and analytics types). We present the frameworks and case studies of the various enterprises that have benefited from BDA. We also enumerate the considerable opportunities introduced by BDA in IIoT.We identify and discuss the indispensable challenges that remain to be addressed as future research directions as well.
The SAP HANA Database – An Architecture Overview Requirements of enterprise applications have become much more demanding because they execute complex reports on transactional data while thousands of users may read or update records of the same data. The goal of the SAP HANA database is the integration of transactional and analytical workloads within the same database management system. To achieve this, a columnar engine exploits modern hardware (multiple CPU cores, large main memory, and caches), compression of database content, maximum parallelization in the database kernel, and database extensions required by enterprise applications, e.g., specialized data structures for hierarchies or support for domain specific languages. In this paper we highlight the architectural concepts employed in the SAP HANA database. We also report on insights gathered with the SAP HANA database in real-world enterprise application scenarios.
The Science of Data Science Although there is considerable talk of the need for data scientists in the United States and world economies, and although a number of universities throughout the United States are now offering degrees in the data science area, there is surprisingly little consensus as to what comprises the key scientific and engineering challenges of data science, which has recently been raised as a matter of national concern. Many current programs emphasize statistical sampling techniques, approaches to visualization, and the programming of analysis packages. However, we contend that these are not the only areas that are required for the big data needs of emergent ‘fourth paradigm’ data-driven science, where the scientific method is enhanced by the integration of significant data sources into the practice of scientific research.
The Singular Value Decomposition, Applications and Beyond The singular value decomposition (SVD) is not only a classical theory in matrix computation and analysis, but also is a powerful tool in machine learning and modern data analysis. In this tutorial we first study the basic notion of SVD and then show the central role of SVD in matrices. Using majorization theory, we consider variational principles of singular values and eigenvalues. Built on SVD and a theory of symmetric gauge functions, we discuss unitarily invariant norms, which are then used to formulate general results for matrix low rank approximation. We study the subdifferentials of unitarily invariant norms. These results would be potentially useful in many machine learning problems such as matrix completion and matrix data classification. Finally, we discuss matrix low rank approximation and its recent developments such as randomized SVD, approximate matrix multiplication, CUR decomposition, and Nyström approximation. Randomized algorithms are important approaches to large scale SVD as well as fast matrix computations.
The Split-Apply-Combine Strategy for Data Analysis Many data analysis problems involve the application of a split-apply-combine strategy, where you break up a big problem into manageable pieces, operate on each piece inde- pendently and then put all the pieces back together. This insight gives rise to a new R package that allows you to smoothly apply this strategy, without having to worry about the type of structure in which your data is stored. The paper includes two case studies showing how these insights make it easier to work with batting records for veteran baseball players and a large 3d array of spatio-temporal ozone measurements.
The State of Graph Databases – The State of Graph Databases If your organization is like many, you may be taking a ‘store everything’ approach to data. After all, storage has become more affordable than ever in recent years, and due to the accessibility of cloud-based technology, capacity has become almost limitless. The clear challenge is tapping into those massive volumes of data to derive actionable business value. That requires analytics. However, using a traditional relational database may not be the best approach, because adapting relational databases to answer deeply complex questions can create performance bottlenecks and added maintenance burden for your business. To gain a real-world perspective on how IT professionals are addressing these challenges, IBM, in partnership with TechValidate, conducted a global survey of 1,365 entrepreneurs and developers about the potential they see for graph databases as well as their current and planned use for this technology. We also queried them about how they are using graph to address problems, the benefits they are realizing, and examined how adoption of this technology differs by company size and industry. Survey respondents spanned small, medium and large companies in diverse industries across 74 countries. Specifically, large enterprises comprised 38 percent of the responses, with small businesses representing 36 percent, and mid-sized businesses representing 19 percent. The survey population included a wide array of professional roles, including developers, architects, IT managers and business leaders, with the largest percentages being attributed to developers/programmers (44 percent), application/ software architects (13 percent) and IT directors and managers (11 percent). Respondents represented a range of industries, with the majority in technical industries: computer services (42 percent) and computer software (22 percent). This paper provides an overview of graph technology, details the results of the survey—and highlights findings that debunk some of the most popularly held views about graph technology.
The State of the Art in Developing Fuzzy Ontologies: A Survey Conceptual formalism supported by typical ontologies may not be sufficient to represent uncertainty information which is caused due to the lack of clear cut boundaries between concepts of a domain. Fuzzy ontologies are proposed to offer a way to deal with this uncertainty. This paper describes the state of the art in developing fuzzy ontologies. The survey is produced by studying about 35 works on developing fuzzy ontologies from a batch of 100 articles in the field of fuzzy ontologies.
The stringdist Package for Approximate String Matching Comparing text strings in terms of distance functions is a common and fundamental task in many statistical text-processing applications. Thus far, string distance functionality has been somewhat scattered around R and its extension packages, leaving users with inconistent interfaces and encoding handling. The stringdist package was designed to offer a low-level interface to several popular string distance algorithms which have been re-implemented in C for this purpose. The package offers distances based on counting q-grams, edit-based distances, and some lesser known heuristic distance functions. Based on this functionality, the package also offers inexact matching equivalents of R´s native exact matching functions match and %in%.
The structure of evolved representations across different substrates for artificial intelligence Artificial neural networks (ANNs), while exceptionally useful for classification, are vulnerable to misdirection. Small amounts of noise can significantly affect their ability to correctly complete a task. Instead of generalizing concepts, ANNs seem to focus on surface statistical regularities in a given task. Here we compare how recurrent artificial neural networks, long short-term memory units, and Markov Brains sense and remember their environments. We show that information in Markov Brains is localized and sparsely distributed, while the other neural network substrates ‘smear’ information about the environment across all nodes, which makes them vulnerable to noise.
The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial In this tutorial paper, we first define mean squared error, variance, covariance, and bias of both random variables and classification/predictor models. Then, we formulate the true and generalization errors of the model for both training and validation/test instances where we make use of the Stein’s Unbiased Risk Estimator (SURE). We define overfitting, underfitting, and generalization using the obtained true and generalization errors. We introduce cross validation and two well-known examples which are $K$-fold and leave-one-out cross validations. We briefly introduce generalized cross validation and then move on to regularization where we use the SURE again. We work on both $\ell_2$ and $\ell_1$ norm regularizations. Then, we show that bootstrap aggregating (bagging) reduces the variance of estimation. Boosting, specifically AdaBoost, is introduced and it is explained as both an additive model and a maximum margin model, i.e., Support Vector Machine (SVM). The upper bound on the generalization error of boosting is also provided to show why boosting prevents from overfitting. As examples of regularization, the theory of ridge and lasso regressions, weight decay, noise injection to input/weights, and early stopping are explained. Random forest, dropout, histogram of oriented gradients, and single shot multi-box detector are explained as examples of bagging in machine learning and computer vision. Finally, boosting tree and SVM models are mentioned as examples of boosting.
The Theory is Predictive, but is it Complete An Application to Human Perception of Randomness When we test a theory using data, it is common to focus on correctness: do the predictions of the theory match what we see in the data But we also care about completeness: how much of the predictable variation in the data is captured by the theory This question is difficult to answer, because in general we do not know how much ‘predictable variation’ there is in the problem. In this paper, we consider approaches motivated by machine learning algorithms as a means of constructing a benchmark for the best attainable level of prediction. We illustrate our methods on the task of predicting human-generated random sequences. Relative to an atheoretical machine learning algorithm benchmark, we find that existing behavioral models explain roughly 15 percent of the predictable variation in this problem. This fraction is robust across several variations on the problem. We also consider a version of this approach for analyzing field data from domains in which human perception and generation of randomness has been used as a conceptual framework; these include sequential decision-making and repeated zero-sum games. In these domains, our framework for testing the completeness of theories provides a way of assessing their effectiveness over different contexts; we find that despite some differences, the existing theories are fairly stable across our field domains in their performance relative to the benchmark. Overall, our results indicate that (i) there is a significant amount of structure in this problem that existing models have yet to capture and (ii) there are rich domains in which machine learning may provide a viable approach to testing completeness.
The Three Pillars of Machine-Based Programming In this position paper, we describe our vision of the future of machine-based programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and(iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in the use of ML-based constructs to autonomously evolve software.
The Top Challenges in Big Data and Analytics Over the past few years, there has been a tremendous amount of hype around Big Data – data that doesn´t work well in traditional BI systems and warehouses because of its volume, its variety, and the velocity at which it is acquired and changed. Is the hype justified Lavastorm Analytics believes it is. Often Big Data has been talked about as a ‘problem’ because it couldn´t be easily processed with traditional systems based on relational databases, but it really is a tremendous opportunity to enhance and even transform how you run your business. The value of Big Data can be significant. It can lead to innovations, such as new pricing models, new ways to engage with your customers and partners, new product ideas, or new market opportunities. For example, at a recent conference, one large financial institution estimated that Big Data could help them reduce by 30- 65% the time to market and cost of their strategic innovation projects. …
The top five ways to get started with big data Remember what life was like before big data The term has become so prevalent in the business lexicon that sometimes it´s hard to remember that big data is a relatively recent phenomenon. Some may have viewed it as a fad, but data generated by people, processes and machines is only continuing to grow. Big data is here to stay. Make no mistake, data is an asset—but not when you´re drowning in it. In the information age, one of your greatest resources can also be your biggest downfall if your organization doesn´t know how to leverage it properly. So what can you do with your data
The Truth about Principal Components and Factor Analysis Principal components tries to re-express the data as a sum of uncorrelated components. There are lots of other techniques which try to do similar things, like Fourier analysis, or wavelet decomposition. Things like Fourier analysis decompose the data into a sum of a fixed set of basis functions or basis vectors. This has the advantage of making results comparable across data sets, and of making the meaning of the components clear. So why ever do PCA rather than a Fourier transform First, in some situations the idea of doing a Fourier transform is just embarrassingly weird. For the states or cars data ets, we could number the features and take cosines of the feature numbers, etc., but it just seems crazy. No such embarrassment attends PCA. Second, when using a fixed set of components, there is no guarantee that a small number of components will give a good reconstruction of the original data. PCA guarantees that the first q components will do a better (mean-square) job of reconstructing the original data than any other linear method using only q components. Third, it is good at preserving distances between the points – the component scores give the optimal linear multidimensional scaling. PCA gives us uncorrelated components, which are generally not independent components; for that you need independent component analysis. PCA looks for linear combinations of the original features; one could well do better by finding nonlinear combinations. Rather than directions in feature space, these would be curves or surfaces. PCA is purely a descriptive technique; in itself it makes no prediction about what future data will look like.
The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: A User Survey Graph processing is becoming increasingly prevalent across many application domains. In spite of this prevalence, there is little research about how graphs are actually used in practice. We conducted an online survey aimed at understanding: (i) the types of graphs users have; (ii) the graph computations users run; (iii) the types of graph software users use; and (iv) the major challenges users face when processing their graphs. We describe the responses of the participants to our questions, highlighting common patterns and challenges. The participants’ responses revealed surprising facts about graph processing in practice, which we hope can guide future research.
The Unreasonable Effectiveness of Data Eugene Wigner´s article ‘The Unreasonable Effectiveness of Mathematics in the Natural Sciences’ examines why so much of physics can be neatly explained with simple mathematical formulas such as f = ma or e = mc^2. Meanwhile, sciences that involve human beings rather than elementary particles have proven more resistant to elegant mathematics. Economists suffer from physics envy over their inability to neatly model human behavior. An informal, incomplete grammar of the English language runs over 1,700 pages. Perhaps when it comes to natural language processing and related fields, we´re doomed to complex theories that will never have the elegance of physics equations. But if that´s so, we should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data. One of us, as an undergraduate at Brown University, remembers the excitement of having access to the Brown Corpus, containing one million English words. Since then, our fi eld has seen several notable corpora that are about 100 times larger, and in 2006, Google released a trillion-word corpus with frequency counts for all sequences up to fi ve words long. In some ways this corpus is a step backwards from the Brown Corpus: it´s taken from unfi ltered Web pages and thus contains incomplete sentences, spelling errors, grammatical errors, and all sorts of other errors. It´s not annotated with carefully hand-corrected part-of-speech tags. But the fact that it´s a million times larger than the Brown Corpus outweighs these drawbacks. A trillion-word corpus—along with other Web-derived corpora of millions, billions, or trillions of links, videos, images, tables, and user interactions – captures even very rare aspects of human behavior. So, this corpus could serve as the basis of a complete model for certain tasks – if only we knew how to extract the model from the data.
The Use of Machine Learning Algorithms in Recommender Systems: A Systematic Review Recommender systems use algorithms to provide users product recommendations. Recently, these systems started using machine learning algorithms because of the progress and popularity of the artificial intelligence research field. However, choosing the suitable machine learning algorithm is difficult because of the sheer number of algorithms available in the literature. Researchers and practitioners are left with little information about the best approaches or the trends in algorithms usage. Moreover, the development of a recommender system featuring a machine learning algorithm has problems and open questions that must be evaluated, so software engineers know where to focus research efforts. This work presents a systematic review of the literature that analyzes the use of machine learning algorithms in recommender systems and identifies research opportunities for the software engineering research field. The study concluded that Bayesian and decision tree algorithms are widely used in recommender systems because of their low complexity, and that requirements and design phases of recommender system development must be investigated for research opportunities.
The Value of Big Data: Using big data to examine and discover the value in data for accurate analytics Data warehousing is a success, judging by its 25 year history of use across all industries. Business intelligence met the needs it was designed for: to give non-technical people within the organization access to important, shared data. The resulting improvements in all aspects of business operations are hard to dispute when compared to the prior era of static batch reporting. During the same period that data warehousing and BI matured, the automation and instrumenting of almost all processes and activities changed the data landscape in most companies. Where there were only a few applications and minimal monitoring 25 years ago, there is ubiquitous computing and data available about every activity today. Data warehouses have not been able to keep up with business demands for new sources of information, new types of data, more complex analysis and greater speed. Companies can put this data to use in countless ways, but for most it remains uncollected or unused, locked away in silos within IT. There has been a gradual maturing of data use in organizations. In the early days of BI it was enough to provide access to core financial and customer transactions. Better access enabled process changes, and these led to the need for more data and more varied uses of information. These changes put increasing strain on information processing and delivery capabilities that were designed under assumptions of stability and common use. Most companies now have a backlog of new data and analysis requests that BI groups are struggling to meet. Enter big data. Big data is not simply about growing data volumes – it´s also about the fact that the data being collected today is different in ways that make it unwieldy for conventional databases and BI tools. Big data is also about new technologies that were developed to support the storage, retrieval and processing of this new data. The technologies originated in the world of web applications and internet-based companies, but they are now spreading into enterprise applications of all sorts. New technology coupled with new data enables new practices like real-time monitoring of operations across retail channels, supply chain practices at finer grain and faster speed, and analysis of customers at the level of individual activities and behaviors. Until recently, large scale data collection and analysis capabilities like these would have required a Wal-Mart sized investment, limiting them to large organizations. These capabilities are now available to all, regardless of company size or budget. This is creating a rush to adopt big data technologies. As the use of big data grows, the need for data management will grow. Many organizations already struggle to manage existing data. Big data adds complexity, which will only increase the challenge. The combination of new data and new technology requires new data management capabilities and processes to capture the promised long-term value.
The Value of Signal (and the Cost of Noise): The New Economics of Meaning-Making Nearly every aspect of our daily lives generates a digital footprint. From mobile phones and social media to inventory look-ups and online purchases, we collect more data about processes, people and things than ever before. Winning companies are able to create business value by building a richer understanding of customers, products, employees and partners – extracting business meaning from this torrent of data. The business stakes of ‘meaning-making’ simply could not be higher. To level-set this concept, we conducted primary research, including direct interviews with executives, to understand how business analytics is being applied to uncover new revenue sources and to reduce costs. The 300 companies we surveyed told us they achieved a total economic benefit of roughly $766 billion over the past year based on their use of business analytics. Among those that participated in our research, investment in business analytics yielded an average 8.4% increase in revenues and an average 8.1% improvement in cost reduction in the previous fiscal year. Companies that generated the most value from business analytics expect to grow revenue faster and reduce costs more aggressively. Leading companies also clearly recognize that success means winning the battle for talent with business analytics skills. Many companies – perhaps most – are missing the opportunity for significant economic benefit. If the companies we surveyed were to begin deploying best practices in analytics, we estimate they could create $853 billion of value within the next 12 months. It´s a new era in business, one in which growth will be driven as much by insight and foresight as by physical products and assets. Importantly, as our research demonstrates, a roadmap for success is beginning to emerge.
The Visual Display of Quantitative Information (Slide Deck)
Theano and LSTM for Sentiment Analysis (Slide Deck)
Theoretical Analysis of Stochastic Search Algorithms Theoretical analyses of stochastic search algorithms, albeit few, have always existed since these algorithms became popular. Starting in the nineties a systematic approach to analyse the performance of stochastic search heuristics has been put in place. This quickly increasing basis of results allows, nowadays, the analysis of sophisticated algorithms such as population-based evolutionary algorithms, ant colony optimisation and artificial immune systems. Results are available concerning problems from various domains including classical combinatorial and continuous optimisation, single and multi-objective optimisation, and noisy and dynamic optimisation. This chapter introduces the mathematical techniques that are most commonly used in the runtime analysis of stochastic search heuristics. Careful attention is given to the very popular artificial fitness levels and drift analyses techniques for which several variants are presented. To aid the reader’s comprehension of the presented mathematical methods, these are applied to the analysis of simple evolutionary algorithms for artificial example functions. The chapter is concluded by providing references to more complex applications and further extensions of the techniques for the obtainment of advanced results.
Theoretical Analysis of the $k$-Means Algorithm – A Survey The $k$-means algorithm is one of the most widely used clustering heuristics. Despite its simplicity, analyzing its running time and quality of approximation is surprisingly difficult and can lead to deep insights that can be used to improve the algorithm. In this paper we survey the recent results in this direction as well as several extension of the basic $k$-means method.
Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution Current machine learning systems operate, almost exclusively, in a statistical, or model-free mode, which entails severe theoretical limits on their power and performance. Such systems cannot reason about interventions and retrospection and, therefore, cannot serve as the basis for strong AI. To achieve human level intelligence, learning machines need the guidance of a model of reality, similar to the ones used in causal inference tasks. To demonstrate the essential role of such models, I will present a summary of seven tasks which are beyond reach of current machine learning systems and which have been accomplished using the tools of causal modeling.
Theory of Deep Learning III: explaining the non-overfitting puzzle A main puzzle of deep networks revolves around the absence of overfitting despite overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamical systems associated with gradient descent minimization of nonlinear networks behave near zero stable minima of the empirical error as gradient system in a quadratic potential with degenerate Hessian. The proposition is supported by theoretical and numerical results, under the assumption of stable minima of the gradient. Our proposition provides the extension to deep networks of key properties of gradient descent methods for linear networks, that as, suggested in (1), can be the key to understand generalization. Gradient descent enforces a form of implicit regularization controlled by the number of iterations, and asymptotically converging to the minimum norm solution. This implies that there is usually an optimum early stopping that avoids overfitting of the loss (this is relevant mainly for regression). For classification, the asymptotic convergence to the minimum norm solution implies convergence to the maximum margin solution which guarantees good classification error for ‘low noise’ datasets. The implied robustness to overparametrization has suggestive implications for the robustness of deep hierarchically local networks to variations of the architecture with respect to the curse of dimensionality.
Theory of Machine Networks: A Case Study We propose a simplification of the Theory-of-Mind Network architecture, which focuses on modeling complex, deterministic machines as a proxy for modeling nondeterministic, conscious entities. We then validate this architecture in the context of understanding engines, which, we argue, meet the required internal and external complexity to yield meaningful abstractions.
There is no general AI: Why Turing machines cannot pass the Turing test Since 1950, when Alan Turing proposed what has since come to be called the Turing test, the ability of a machine to pass this test has established itself as the primary hallmark of general AI. To pass the test, a machine would have to be able to engage in dialogue in such a way that a human interrogator could not distinguish its behaviour from that of a human being. AI researchers have attempted to build machines that could meet this requirement, but they have so far failed. To pass the test, a machine would have to meet two conditions: (i) react appropriately to the variance in human dialogue and (ii) display a human-like personality and intentions. We argue, first, that it is for mathematical reasons impossible to program a machine which can master the enormously complex and constantly evolving pattern of variance which human dialogues contain. And second, that we do not know how to make machines that possess personality and intentions of the sort we find in humans. Since a Turing machine cannot master human dialogue behaviour, we conclude that a Turing machine also cannot possess what is called “general” Artificial Intelligence. We do, however, acknowledge the potential of Turing machines to master dialogue behaviour in highly restricted contexts, where what is called “narrow” AI can still be of considerable utility.
TherML: Thermodynamics of Machine Learning In this work we offer a framework for reasoning about a wide class of existing objectives in machine learning. We develop a formal correspondence between this work and thermodynamics and discuss its implications.
Throwing Stones and Collecting Bones: Looking for Poisson-like Random Measures We show that in a broad class of probabilistic random measures one may identify only three that are rescaled versions of themselves when restricted to a subspace. These are Poisson, binomial and negative binomial random measures. We provide some simple examples of possible applications of such measures.
Time Series Databases Time series databases enable a fundamental step in the central storage and analysis of many types of machine data. As such, they lie at the heart of the Internet of Things (IoT). There´s a revolution in sensor- to-insight data flow that is rapidly changing the way we perceive and understand the world around us. Much of the data generated by sensors, as well as a variety of other sources, benefits from being collected as time series. Although the idea of collecting and analyzing time series data is not new, the astounding scale of modern datasets, the velocity of data accumulation in many cases, and the variety of new data sources together contribute to making the current task of building scalable time series databases a huge challenge. A new world of time series data calls for new approaches and new tools.
Time Series Management Systems: A Survey The collection of time series data increases as more monitoring and automation are being deployed. These deployments range in scale from an Internet of things (IoT) device located in a household to enormous distributed Cyber-Physical Systems (CPSs) producing large volumes of data at high velocity. To store and analyze these vast amounts of data, specialized Time Series Management Systems (TSMSs) have been developed to overcome the limitations of general purpose Database Management Systems (DBMSs) for times series management. In this paper, we present a thorough analysis and classification of TSMSs developed through academic or industrial research and documented through publications. Our classification is organized into categories based on the architectures observed during our analysis. In addition, we provide an overview of each system with a focus on the motivational use case that drove the development of the system, the functionality for storage and querying of time series a system implements, the components the system is composed of, and the capabilities of each system with regard to Stream Processing and Approximate Query Processing (AQP). Last, we provide a summary of research directions proposed by other researchers in the field and present our vision for a next generation TSMS.
Time Series Prediction with the Self-Organizing Map: A Review We provide a comprehensive and updated survey on applications of Kohonen´s self-organizing map (SOM) to time series prediction (TSP). The main goal of the paper is to show that, despite being originally designed as an unsupervised learning algorithm, the SOM is flexible enough to give rise to a number of efficient supervised neural architectures devoted to TSP tasks. For each SOM-based architecture to be presented, we report its algorithm implementation in detail. Similarities and differences of such SOM-based TSP models with respect to standard linear and nonlinear TSP techniques are also highlighted. We conclude the paper with indications of possible directions for further research on this field.
Times Series: Cointegration An overview of results for the cointegrated VAR model for nonstationary I(1) variables is given. The emphasis is on the analysis of the model and the tools for asymptotic inference. These include: formulation of criteria on the parameters, for the process to be nonstationary and I(1), formulation of hypotheses of interest on the rank, the cointegrating relations and the adjustment coefficients. A discussion of the asymptotic distribution results that are used for inference. The results are illustrated by a few examples. A number of extensions of the theory are pointed out.
To Bayes or Not To Bayes That’s no longer the question! This paper seeks to provide a thorough account of the ubiquitous nature of the Bayesian paradigm in modern statistics, data science and artificial intelligence. Once maligned, on the one hand by those who philosophically hated the very idea of subjective probability used in prior specification, and on the other hand because of the intractability of the computations needed for Bayesian estimation and inference, the Bayesian school of thought now permeates and pervades virtually all areas of science, applied science, engineering, social science and even liberal arts, often in unsuspected ways. Thanks in part to the availability of powerful computing resources, but also to the literally unavoidable inherent presence of the quintessential building blocks of the Bayesian paradigm in all walks of life, the Bayesian way of handling statistical learning, estimation and inference is not only mainstream but also becoming the most central approach to learning from the data. This paper explores some of the most relevant elements to help to the reader appreciate the pervading power and presence of the Bayesian paradigm in statistics, artificial intelligence and data science, with an emphasis on how the Gospel according to Reverend Thomas Bayes has turned out to be the truly good news, and some cases the amazing saving grace, for all who seek to learn statistically from the data. To further help the reader gain deeper and tangible practical insights into the Bayesian machinery, we point to some computational tools designed for the R Statistical Software Environment to help explore Bayesian statistical learning.
To Cluster, or Not to Cluster: An Analysis of Clusterability Methods Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. However, methods for evaluating clusterability vary radically, making it challenging to select a suitable measure. In this paper, we perform an extensive comparison of measures of clusterability and provide guidelines that clustering users can reference to select suitable measures for their applications.
To Explain or To Predict Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal. The purpose of this paper is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process.
Too Big to Fail: Large Samples and the p-Value Problem The Internet has provided IS researchers with the opportunity to conduct studies with extremely large samples, frequently well over 10,000 observations. There are many advantages to large samples, but researchers using statistical inference must be aware of the p-value problem associated with them. In very large samples, p-values go quickly to zero, and solely relying on p-values can lead the researcher to claim support for results of no practical significance. In a survey of large sample IS research, we found that a significant number of papers rely on a low p-value and the sign of a regression coefficient alone to support their hypotheses. This research commentary recommends a series of actions the researcher can take to mitigate the p-value problem in large samples and illustrates them with an example of over 300,000 camera sales on eBay. We believe that addressing the p-value problem will increase the credibility of large sample IS research as well as provide more insights for readers.
Top 10 Algorithms in Data Mining This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community.With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and reviewcurrent and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development. http://…0-data-mining-algorithms-in-plain-english
Top 10 Trends in Business Intelligence for 2014 (Slide Deck)
Top 7 Trends in Big Data for 2015 • Big Data gets cloudy. • ETL gets personal. • SQL or NoSQL, that is the question. • Hadoop: Part of the new normal in data storage. • You will start trying to fish in the data lake. • The big data ecosystem will start to change form. • IOT (Internet of Things) will continue to grow, driving new data solutions.
Top Five High-Impact Use Cases for Big Data Analytics Today´s data-driven companies have a competitive edge over their peers. How They are generating breakthrough insights by bringing together all of their structured and unstructured data and analyzing it together – all at once.
Toward a System Building Agenda for Data Integration In this paper we argue that the data management community should devote far more effort to building data integration (DI) systems, in order to truly advance the field. Toward this goal, we make three contributions. First, we draw on our recent industrial experience to discuss the limitations of current DI systems. Second, we propose an agenda to build a new kind of DI systems to address these limitations. These systems guide users through the DI workflow, step by step. They provide tools to address the ‘pain points’ of the steps, and tools are built on top of the Python data science and Big Data ecosystem (PyData). We discuss how to foster an ecosystem of such tools within PyData, then use it to build DI systems for collaborative/cloud/crowd/lay user settings. Finally, we discuss ongoing work at Wisconsin, which suggests that these DI systems are highly promising and building them raises many interesting research challenges.
Toward Scalable Machine Learning and Data Mining: the Bioinformatics Case In an effort to overcome the data deluge in computational biology and bioinformatics and to facilitate bioinformatics research in the era of big data, we identify some of the most influential algorithms that have been widely used in the bioinformatics community. These top data mining and machine learning algorithms cover classification, clustering, regression, graphical model-based learning, and dimensionality reduction. The goal of this study is to guide the focus of scalable computing experts in the endeavor of applying new storage and scalable computation designs to bioinformatics algorithms that merit their attention most, following the engineering maxim of ‘optimize the common case’.
Toward Scalable Systems for Big Data Analytics: A Technology Tutorial Recent technological advancements have led to a deluge of data from distinctive domains (e.g., health care and scienti c sensors, user-generated data, Internet and nancial companies, and supply chain systems) over the past two decades. The term big data was coined to capture the meaning of this emerging trend. In addition to its sheer volume, big data also exhibits other unique characteristics as compared with traditional data. For instance, big data is commonly unstructured and require more real-time analysis. This development calls for new system architectures for data acquisition, transmission, storage, and large-scale data processing mechanisms. In this paper, we present a literature survey and system tutorial for big data analytics platforms, aiming to provide an overall picture for nonexpert readers and instill a do-it-yourself spirit for advanced audiences to customize their own big-data solutions. First, we present the de nition of big data and discuss big data challenges. Next, we present a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics. These four modules form a big data value chain. Following that, we present a detailed survey of numerous approaches and mechanisms from research and industry communities. In addition, we present the prevalent Hadoop framework for addressing big data challenges. Finally, we outline several evaluation benchmarks and potential research directions for big data systems.
Toward the Starting Line: A Systems Engineering Approach to Strong AI Artificial General Intelligence (AGI) or Strong AI aims to create machines with human-like or human-level intelligence, which is still a very ambitious goal when compared to the existing computing and AI systems. After many hype cycles and lessons from AI history, it is clear that a big conceptual leap is needed for crossing the starting line to kick-start mainstream AGI research. This position paper aims to make a small conceptual contribution toward reaching that starting line. After a broad analysis of the AGI problem from different perspectives, a system-theoretic and engineering-based research approach is introduced, which builds upon the existing mainstream AI and systems foundations. Several promising cross-fertilization opportunities between systems disciplines and AI research are identified. Specific potential research directions are discussed.
Towards a framework for the evolution of artificial general intelligence In this work, a novel framework for the emergence of general intelligence is proposed, where agents evolve through environmental rewards and learn throughout their lifetime without supervision, i.e., self-supervised learning through embodiment. The chosen control mechanism for agents is a biologically plausible neuron model based on spiking neural networks. Network topologies become more complex through evolution, i.e., the topology is not fixed, while the synaptic weights of the networks cannot be inherited, i.e., newborn brains are not trained and have no innate knowledge of the environment. What is subject to the evolutionary process is the network topology, the type of neurons, and the type of learning. This process ensures that controllers that are passed through the generations have the intrinsic ability to learn and adapt during their lifetime in mutable environments. We envision that the described approach may lead to the emergence of the simplest form of artificial general intelligence.
Towards a Quantum World Wide Web We elaborate a quantum model for corpora of written documents, like the pages forming the World Wide Web. To that end, we are guided by how physicists constructed quantum theory for microscopic entities, which unlike classical objects cannot be fully represented in our spatial theater. We suggest that a similar construction needs to be carried out by linguists and computational scientists, to capture the full meaning content of collections of documental entities. More precisely, we show how to associate a quantum-like ‘entity of meaning’ to a ‘language entity formed by printed documents’, considering the latter as the collection of traces that are left by the former, in specific results of search actions that we describe as measurements. In other words, we offer a perspective where a collection of documents, like the Web, is described as the space of manifestation of a more complex entity – the QWeb – which is the object of our modeling, drawing its inspiration from previous studies on operational-realistic approaches to quantum physics and quantum modeling of human cognition and decision-making. We emphasize that a consistent QWeb model needs to account for the observed correlations between words appearing in printed documents, e.g., co-occurrences, as the latter would depend on the ‘meaning connections’ existing between the concepts that are associated with these words. In that respect, we show that both ‘context and interference (quantum) effects’ are required to explain the probabilities calculated by counting the relative number of documents containing certain words and co-ocurrrences of words.
Towards a Science of Mind The ancient mind/body problem continues to be one of deepest mysteries of science and of the human spirit. Despite major advances in many fields, there is still no plausible link between subjective experience (qualia) and its realization in the body. This paper outlines some of the elements of a rigorous science of mind (SoM) – key ideas include scientific realism of mind, agnostic mysterianism, careful attention to language, and a focus on concrete (touchstone) questions and results.
Towards a Theoretical Understanding of Batch Normalization Normalization techniques such as Batch Normalization have been applied very successfully for training deep neural networks. Yet, despite its apparent empirical benefits, the reasons behind the success of Batch Normalization are mostly hypothetical. We thus aim to provide a more thorough theoretical understanding from an optimization perspective. Our main contribution towards this goal is the identification of various problem instances in the realm of machine learning where, under certain assumptions, Batch Normalization can provably accelerate optimization with gradient-based methods. We thereby turn Batch Normalization from an effective practical heuristic into a provably converging algorithm for these settings. Furthermore, we substantiate our analysis with empirical evidence that suggests the validity of our theoretical results in a broader context.
Towards balanced clustering – part 1 (preliminaries) The article contains a preliminary glance at balanced clustering problems. Basic balanced structures and combinatorial balanced problems are briefly described. A special attention is targeted to various balance/unbalance indices (including some new versions of the indices): by cluster cardinality, by cluster weights, by inter-cluster edge/arc weights, by cluster element structure (for element multi-type clustering). Further, versions of optimization clustering problems are suggested (including multicriteria problem formulations). Illustrative numerical examples describe calculation of balance indices and element multi-type balance clustering problems (including example for design of student teams).
Towards Bayesian Deep Learning: A Survey While perception tasks such as visual object recognition and text understanding play an important role in human intelligence, the subsequent tasks that involve inference, reasoning and planning require an even higher level of intelligence. The past few years have seen major advances in many perception tasks using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and Bayesian models within a principled probabilistic framework, which we call Bayesian deep learning. In this unified framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in return, the feedback from the inference process is able to enhance the perception of text or images. This survey provides a general introduction to Bayesian deep learning and reviews its recent applications on recommender systems, topic models, and control. In this survey, we also discuss the relationship and differences between Bayesian deep learning and other related topics like Bayesian treatment of neural networks.
Towards Better Analysis of Machine Learning Models: A Visual Analytics Perspective Interactive model analysis, the process of understanding, diagnosing, and refining a machine learning model with the help of interactive visualization, is very important for users to efficiently solve real-world artificial intelligence and data mining problems. Dramatic advances in big data analytics has led to a wide variety of interactive model analysis tasks. In this paper, we present a comprehensive analysis and interpretation of this rapidly developing area. Specifically, we classify the relevant work into three categories: understanding, diagnosis, and refinement. Each category is exemplified by recent influential work. Possible future research opportunities are also explored and discussed.
Towards Compositional Distributional Discourse Analysis Categorical compositional distributional semantics provide a method to derive the meaning of a sentence from the meaning of its individual words: the grammatical reduction of a sentence automatically induces a linear map for composing the word vectors obtained from distributional semantics. In this paper, we extend this passage from word-to-sentence to sentence-to-discourse composition. To achieve this we introduce a notion of basic anaphoric discourses as a mid-level representation between natural language discourse formalised in terms of basic discourse representation structures (DRS); and knowledge base queries over the Semantic Web as described by basic graph patterns in the Resource Description Framework (RDF). This provides a high-level specification for compositional algorithms for question answering and anaphora resolution, and allows us to give a picture of natural language understanding as a process involving both statistical and logical resources.
Towards Deep Learning Models Resistant to Adversarial Attacks Recent work has demonstrated that neural networks are vulnerable to adversarial examples, i.e., inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete, general guarantee to provide. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. This suggests that adversarially resistant deep learning models might be within our reach after all.
Towards Identifying and Managing Sources of Uncertainty in AI and Machine Learning Models – An Overview Quantifying and managing uncertainties that occur when data-driven models such as those provided by AI and machine learning methods are applied is crucial. This whitepaper provides a brief motivation and first overview of the state of the art in identifying and quantifying sources of uncertainty for data-driven components as well as means for analyzing their impact.
Towards Intelligent Vehicular Networks: A Machine Learning Framework As wireless networks evolve towards high mobility and providing better support for connected vehicles, a number of new challenges arise due to the resulting high dynamics in vehicular environments and thus motive rethinking of traditional wireless design methodologies. Future intelligent vehicles, which are at the heart of high mobility networks, are increasingly equipped with multiple advanced onboard sensors and keep generating large volumes of data. Machine learning, as an effective approach to artificial intelligence, can provide a rich set of tools to exploit such data for the benefit of the networks. In this article, we first identify the distinctive characteristics of high mobility vehicular networks and motivate the use of machine learning to address the resulting challenges. After a brief introduction of the major concepts of machine learning, we discuss its applications to learn the dynamics of vehicular networks and make informed decisions to optimize network performance. In particular, we discuss in greater detail the application of reinforcement learning in managing network resources as an alternative to the prevalent optimization approach. Finally, some open issues worth further investigation are highlighted.
Towards learning-to-learn In good old-fashioned artificial intelligence (GOFAI), humans specified systems that solved problems. Much of the recent progress in AI has come from replacing human insights by learning. However, learning itself is still usually built by humans — specifically the choice that parameter updates should follow the gradient of a cost function. Yet, in analogy with GOFAI, there is no reason to believe that humans are particularly good at defining such learning systems: we may expect learning itself to be better if we learn it. Recent research in machine learning has started to realize the benefits of that strategy. We should thus expect this to be relevant for neuroscience: how could the correct learning rules be acquired? Indeed, behavioral science has long shown that humans learn-to-learn, which is potentially responsible for their impressive learning abilities. Here we discuss ideas across machine learning, neuroscience, and behavioral science that matter for the principle of learning-to-learn.
Towards Statistical Reasoning in Description Logics over Finite Domains (Full Version) We present a probabilistic extension of the description logic $\mathcal{ALC}$ for reasoning about statistical knowledge. We consider conditional statements over proportions of the domain and are interested in the probabilistic-logical consequences of these proportions. After introducing some general reasoning problems and analyzing their properties, we present first algorithms and complexity results for reasoning in some fragments of Statistical $\mathcal{ALC}$.
Towards the Internet of Underground Things: A Systematic Survey This paper provides recent advances in the area of Internet of Underground Things (IoUT) with emphasis on enabling communication technologies, networking issues, and localization techniques. IoUT is enabled by underground things (sensors), communication technology, and networking protocols. This new paradigm of IoUT facilitates the integration of sensing and communication in the underground environment for various industries such as oil and gas, agriculture, seismic mapping, and border monitoring. These applications require to gather relevant information from the deployed underground things. However, the harsh underground propagation environment including sand, rock, and watersheds do not allow the use of single communication technology for information transfer between the surface and the underground things. Therefore, various wireless and wired communication technologies are used for underground communication. The wireless technologies are based on acoustic waves, electromagnetic waves, magnetic induction and visible light communication while the wired technologies use coaxial cable and optical fibers. In this paper, state-of-art communication technologies are surveyed, and the respective networking and localization techniques for IoUT are presented. Moreover, the advances and applications of IoUT are also reported. Also, new research challenges for the design and implementation of IoUT are identified.
Towards Understanding Adversarial Learning for Joint Distribution Matching We investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching. Within a framework of conditional entropy, we propose both adversarial and non-adversarial approaches to learn desirable matched joint distributions for unsupervised and supervised tasks. We unify a broad family of adversarial models as joint distribution matching problems. Our approach stabilizes learning of unsupervised bidirectional adversarial learning methods. Further, we introduce an extension for semi-supervised learning tasks. Theoretical results are validated in synthetic data and real-world applications.
Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks often generalize well, and reveal the difference between the minima (with the same training error) that generalize well and those they don’t. We show that it is the characteristics the landscape of the loss function that explains the good generalization capability. For the landscape of loss function for deep networks, the volume of basin of attraction of good minima dominates over that of poor minima, which guarantees optimization methods with random initialization to converge to good minima. We theoretically justify our findings through analyzing 2-layer neural networks; and show that the low-complexity solutions have a small norm of Hessian matrix with respect to model parameters. For deeper networks, extensive numerical evidence helps to support our arguments.
Towards Understanding the Invertibility of Convolutional Neural Networks Several recent works have empirically observed that Convolutional Neural Nets (CNNs) are (approximately) invertible. To understand this approximate invertibility phenomenon and how to leverage it more effectively, we focus on a theoretical explanation and develop a mathematical model of sparse signal recovery that is consistent with CNNs with random weights. We give an exact connection to a particular model of model-based compressive sensing (and its recovery algorithms) and random-weight CNNs. We show empirically that several learned networks are consistent with our mathematical analysis and then demonstrate that with such a simple theoretical framework, we can obtain reasonable re- construction results on real images. We also discuss gaps between our model assumptions and the CNN trained for classification in practical scenarios.
ToyArchitecture: Unsupervised Learning of Interpretable Models of the World Research in Artificial Intelligence (AI) has focused mostly on two extremes: either on small improvements in narrow AI domains, or on universal theoretical frameworks which are usually uncomputable, incompatible with theories of biological intelligence, or lack practical implementations. The goal of this work is to combine the main advantages of the two: to follow a big picture view, while providing a particular theory and its implementation. In contrast with purely theoretical approaches, the resulting architecture should be usable in realistic settings, but also form the core of a framework containing all the basic mechanisms, into which it should be easier to integrate additional required functionality. In this paper, we present a novel, purposely simple, and interpretable hierarchical architecture which combines multiple different mechanisms into one system: unsupervised learning of a model of the world, learning the influence of one’s own actions on the world, model-based reinforcement learning, hierarchical planning and plan execution, and symbolic/sub-symbolic integration in general. The learned model is stored in the form of hierarchical representations with the following properties: 1) they are increasingly more abstract, but can retain details when needed, and 2) they are easy to manipulate in their local and symbolic-like form, thus also allowing one to observe the learning process at each level of abstraction. On all levels of the system, the representation of the data can be interpreted in both a symbolic and a sub-symbolic manner. This enables the architecture to learn efficiently using sub-symbolic methods and to employ symbolic inference.
Tracking Network Dynamics: a review of distances and similarity metrics From longitudinal biomedical studies to social networks, graphs have emerged as a powerful framework for describing evolving interactions between agents in complex systems. In such studies, the data typically consists of a set of graphs representing a system’s state at different points in time or space. The analysis of the system’s dynamics depends on the selection of the appropriate tools. In particular, after specifying properties characterizing similarities between states, a critical step lies in the choice of a distance capable of reflecting such similarities. While the literature offers a number of distances that one could a priori choose from, their properties have been little investigated and no guidelines regarding the choice of such a distance have yet been provided. However, these distances’ sensitivity to perturbations in the network’s structure and their ability to identify important changes are crucial to the analysis, making the selection of an adequate metric a decisive — yet delicate — practical matter. In the spirit of Goldenberg, Zheng and Fienberg’s seminal 2009 review, the purpose of this article is to provide an overview of commonly-used graph distances and an explicit characterization of the structural changes that they are best able to capture. To see how this translates in real-life situations, we use as a guiding thread to our discussion the application of these distances to the analysis a longitudinal microbiome study — as well as on synthetic examples. Having unveiled some of traditional distances’ shortcomings, we also suggest alternative similarity metrics and highlight their relative advantages in specific analysis scenarios. Above all, we provide some guidance for choosing one distance over another in certain types of applications. Finally, we show an application of these different distances to a network created from worldwide recipes.
Training Quantized Nets: A Deeper Understanding Currently, deep neural networks are deployed on low-power embedded devices by first training a full-precision model using powerful computing hardware, and then deriving a corresponding low-precision model for efficient inference on such systems. However, training models directly with coarsely quantized weights is a key step towards learning on embedded platforms that have limited computing resources, memory capacity, and power consumption. Numerous recent publications have studied methods for training quantized network, but these studies have mostly been empirical. In this work, we investigate training methods for quantized neural networks from a theoretical viewpoint. We first explore accuracy guarantees for training methods under convexity assumptions. We then look at the behavior of algorithms for non-convex problems, and we show that training algorithms that exploit high-precision representations have an important annealing property that purely quantized training methods lack, which explains many of the observed empirical differences between these types of algorithms.
Training Very Deep Networks Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to overcome this. Our so-called highway networks allow unimpeded information flow across many layers on information highways. They are inspired by Long Short-Term Memory recurrent networks and use adaptive gating units to regulate the information flow. Even with hundreds of layers, highway networks can be trained directly through simple gradient descent. This enables the study of extremely deep and efficient architectures.
Transcription Methods for Trajectory Optimization: a beginners tutorial This report is an introduction to transcription methods for trajectory optimization techniques. The first few sections describe the two classes of transcription methods (shooting \and simultaneous) that are used to convert the trajectory optimization problem into a general constrained optimization form. The middle of the report discusses a few extensions to the basic methods, including how to deal with hybrid systems (such as walking robots). The final section goes over a variety of implementation details.
Transfer Adaptation Learning: A Decade Survey The world we see is ever-changing and it always changes with people, things, and the environment. Domain is referred to as the state of the world at a certain moment. A research problem is characterized as domain transfer adaptation when it needs knowledge correspondence between different moments. Conventional machine learning aims to find a model with the minimum expected risk on test data by minimizing the regularized empirical risk on the training data, which, however, supposes that the training and test data share similar joint probability distribution. Transfer adaptation learning aims to build models that can perform tasks of target domain by learning knowledge from a semantic related but distribution different source domain. It is an energetic research filed of increasing influence and importance. This paper surveys the recent advances in transfer adaptation learning methodology and potential benchmarks. Broader challenges being faced by transfer adaptation learning researchers are identified, i.e., instance re-weighting adaptation, feature adaptation, classifier adaptation, deep network adaptation, and adversarial adaptation, which are beyond the early semi-supervised and unsupervised split. The survey provides researchers a framework for better understanding and identifying the research status, challenges and future directions of the field.
Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis Modern software systems provide many configuration options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at a lower cost, it is unclear why and when transfer learning works for performance modeling. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling more efficient, e.g., by reducing the dimensionality of the configuration space.
Transfer Metric Learning: Algorithms, Applications and Outlooks Distance metric learning (DML) aims to find an appropriate way to reveal the underlying data relationship. It is critical in many machine learning, pattern recognition and data mining algorithms, and usually require large amount of label information (class labels or pair/triplet constraints) to achieve satisfactory performance. However, the label information may be insufficient in real-world applications due to the high-labeling cost, and DML may fail in this case. Transfer metric learning (TML) is able to mitigate this issue for DML in the domain of interest (target domain) by leveraging knowledge/information from other related domains (source domains). Although achieved a certain level of development, TML has limited success in various aspects such as selective transfer, theoretical understanding, handling complex data, big data and extreme cases. In this survey, we present a systematic review of the TML literature. In particular, we group TML into different categories according to different settings and metric transfer strategies, such as direct metric approximation, subspace approximation, distance approximation, and distribution approximation. A summarization and insightful discussion of the various TML approaches and their applications will be presented. Finally, we provide some challenges and possible future directions.
Transferrable Plausibility Model – A Probabilistic Interpretation of Mathematical Theory of Evidence This paper suggests a new interpretation of the Dempster-Shafer theory in terms of probabilistic interpretation of plausibility. A new rule of combination of independent evidence is shown and its preservation of interpretation is demonstrated.
Transform Your Organization With Strong Data Management – Executive Overview: The Data Management Playbook Business stakeholders are putting more pressure on IT to keep up with and meet market demands. Yet existing data management platforms and strategies were not designed for an agile business or the digital age. Data management is the backbone needed to define and orchestrate the digital experience and to ensure confidence in the data used in your key processes. Forrester´s data management playbook shows enterprise architecture (EA) professionals how to build a more elastic and flexible data management practice to meet the new data demands brought on by digital disruption.
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods Integration of vision and language tasks has seen a significant growth in the recent times due to surge of interest from multi-disciplinary communities such as deep learning, computer vision, and natural language processing. In this survey, we focus on ten different vision and language integration tasks in terms of their problem formulation, methods, existing datasets, evaluation measures, and comparison of results achieved with the corresponding state-of-the-art methods. This goes beyond earlier surveys which are either task-specific or concentrate only on one type of visual content i.e., image or video. We then conclude the survey by discussing some possible future directions for integration of vision and language research.
TSclust: An R Package for Time Series Clustering Time series clustering is an active research area with applications in a wide range of elds. One key component in cluster analysis is determining a proper dissimilarity measure between two data objects, and many criteria have been proposed in the literature to assess dissimilarity between two time series. The R package TSclust is aimed to implement a large set of well-established peer-reviewed time series dissimilarity measures, including measures based on raw data, extracted features, underlying parametric models, complexity levels, and forecast behaviors. Computation of these measures allows the user to perform clustering by using conventional clustering algorithms. TSclust also includes a clustering procedure based on p values from checking the equality of generating models, and some utilities to evaluate cluster solutions. The implemented dissimilarity functions are accessible individually for an easier extension and possible use out of the clustering context. The main features of TSclust are described and examples of its use are presented.
Turing Test Revisited: A Framework for an Alternative This paper aims to question the suitability of the Turing Test, for testing machine intelligence, in the light of advances made in the last 60 years in science, medicine, and philosophy of mind. While the main concept of the test may seem sound and valid, a detailed analysis of what is required to pass the test highlights a significant flow. Once the analysis of the test is presented, a systematic approach is followed in analysing what is needed to devise a test or tests for intelligent machines. The paper presents a plausible generic framework based on categories of factors implied by subjective perception of intelligence. An evaluative discussion concludes the paper highlighting some of the unaddressed issues within this generic framework.
Turning Big Data Into Useful Information Many Fortune 500 companies are recognizing enterprise data as a strategic business asset. Leading companies are using troves of operational data to optimize their processes, create intelligent products and delight their customers. In addition, increased demands for regulatory transparency are forcing companies to capture and maintain an audit trail of the information they use in their business decisions. Despite this, large companies struggle to access, manage and leverage the information that they create in their day-to-day processes. The rapid growth in the number of IT systems has resulted in a complex and fragmented landscape, where potentially valuable data lays trapped in fragmented inconsistent silos of applications, databases and organizations.
Tutorial: Deriving The Efficient Influence Curve for Large Models This paper aims to provide a tutorial for upper level undergraduate and graduate students in statistics and biostatistics on deriving influence functions for non-parametric and semi-parametric models. The author will build on previously known efficiency theory and provide a useful identity and formulaic technique only relying on the basics of integration which, are self-contained in this tutorial and can be used in most any setting one might encounter in practice. The paper provides many examples of such derivations for well-known influence functions as well as for new parameters of interest. The influence function remains a central object for constructing efficient estimators for large models, such as the one-step estimator and the targeted maximum likelihood estimator. We will not touch upon these estimators at all but readers familiar with these estimators might find this tutorial of particular use.
Twitter client for R Twitter is a popular service that allows users to broadcast short messages (‘tweets’) for others to read. Over the years this has become a valuable tool not just for standard social media purposes but also for data mining experiments such as sentiment analysis. The twitteR package is intended to provide access to the Twitter API within R, allowing users to grab interesting subsets of Twitter data for their analyses. This document is not intended to be exhaustive nor comprehensive but rather a brief introduction to some of the more common bits of functionality and some basic examples of how they can be used. In the last section I’ve included a variety of links to people using twitteR to solve real world problems.

U

Uncertainty and Sensitivity Analyses Methods for Agent-Based Mathematical Models: An Introductory Review Multiscale, agent-based mathematical models of biological systems are often associated with model uncertainty and sensitivity to parameter perturbations. Here, three uncertainty and sensitivity analyses methods, that are suitable to use when working with agent-based models, are discussed. These methods are namely Consistency Analysis, Robustness Analysis and Latin Hypercube Analysis. This introductory review discusses origins, conventions, implementation and result interpretation of the aforementioned methods. Information on how to implement the discussed methods in MATLAB is included.
Uncovering Social Network Structures through Penetration Data We propose a method for uncovering the structure of the adopters’ network underlying the diffusion process, based on penetration data alone. By uncovering the traces that this network leaves on the dissemination process, the degree distribution of the network can be estimated. We show that the network’s degree distribution has a significant effect on the contagion properties. Ignoring the network structure introduces significant errors to estimated diffusion parameters and may lead to flawed assessments of the magnitude of the contagion process. In three studies we validate the proposed method using data for known mapped networks and the adoption process propagating on them.
Understanding Big Data Quality for Maximum Information Usability The barriers to entry for high-performance scalable data management and computing continue to fall, and ‘big data’ is rapidly moving into the mainstream. So it´s easy to become so focused on the anticipated business benefits of large-scale data analytics that we lose sight of the intricacy associated with data acquisition, preparation and quality assurance. In some ways, the clamoring demand for large-scale analysis only heightens the need for data governance and data quality assurance. And while there are some emerging challenges associated with managing big data quality, reviewing good data management practices will help to maximize data usability. In this paper we examine some of the challenges presented by managing the quality and governance of big data, and how those can be balanced with the need to deliver usable analytical results. We explore the dimensions of data quality for big data, and examine the reasons for practical approaches to proactive monitoring, managing reference data and metadata, and sharing knowledge about interpreting and using data sets. By examining some examples, we can identify ways to balance governance with usability and come up with a strategic plan for data quality, including tactical steps for taking advantage of the power of the cluster to drive more meaning and value out of the data. Finally, we consider a checklist of characteristics to look for when evaluating information management tools for big data.
Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data
Understanding Convolutional Neural Network Training with Information Theory Using information theoretic concepts to understand and explore the inner organization of deep neural networks (DNNs) remains a big challenge. Recently, the concept of an information plane began to shed light on the analysis of multilayer perceptrons (MLPs). We provided an in-depth insight into stacked autoencoders (SAEs) using a novel matrix-based Renyi’s {\alpha}-entropy functional, enabling for the first time the analysis of the dynamics of learning using information flow in real-world scenario involving complex network architecture and large data. Despite the great potential of these past works, there are several open questions when it comes to applying information theoretic concepts to understand convolutional neural networks (CNNs). These include for instance the accurate estimation of information quantities among multiple variables, and the many different training methodologies. By extending the novel matrix-based Renyi’s {\alpha}-entropy functional to a multivariate scenario, this paper presents a systematic method to analyze CNNs training using information theory. Our results validate two fundamental data processing inequalities in CNNs, and also have direct impacts on previous work concerning the training and design of CNNs.
Understanding Deep Learning Generalization by Maximum Entropy Deep learning achieves remarkable generalization capability with overwhelming number of model parameters. Theoretical understanding of deep learning generalization receives recent attention yet remains not fully explored. This paper attempts to provide an alternative understanding from the perspective of maximum entropy. We first derive two feature conditions that softmax regression strictly apply maximum entropy principle. DNN is then regarded as approximating the feature conditions with multilayer feature learning, and proved to be a recursive solution towards maximum entropy principle. The connection between DNN and maximum entropy well explains why typical designs such as shortcut and regularization improves model generalization, and provides instructions for future model development.
Understanding Generalization and Stochastic Gradient Descent This paper tackles two related questions at the heart of machine learning; how can we predict if a minimum will generalize to the test set, and why does stochastic gradient descent find minima that generalize well Our work is inspired by Zhang et al. (2017), who showed deep networks can easily memorize randomly labeled training data, despite generalizing well when shown real labels of the same inputs. We show here that the same phenomenon occurs in small linear models. These observations are explained by evaluating the Bayesian evidence in favor of each model, which penalizes sharp minima. Next, we explore the ‘generalization gap’ between small and large batch training, identifying an optimum batch size which maximizes the test set accuracy. Noise in the gradient updates is beneficial, driving the dynamics towards robust minima for which the evidence is large. Interpreting stochastic gradient descent as a stochastic differential equation, we predict the optimum batch size is proportional to both the learning rate and the size of the training set, and verify these predictions empirically.
Understanding Neural Architecture Search Techniques Automatic methods for generating state-of-the-art neural network architectures without human experts have generated significant attention recently. This is because of the potential to remove human experts from the design loop which can reduce costs and decrease time to model deployment. Neural architecture search (NAS) techniques have improved significantly in their computational efficiency since the original NAS was proposed. This reduction in computation is enabled via weight sharing such as in Efficient Neural Architecture Search (ENAS). However, recently a body of work confirms our discovery that ENAS does not do significantly better than random search with weight sharing, contradicting the initial claims of the authors. We provide an explanation for this phenomenon by investigating the interpretability of the ENAS controller’s hidden state. We are interested in seeing if the controller embeddings are predictive of any properties of the final architecture – for example, graph properties like the number of connections, or validation performance. We find models sampled from identical controller hidden states have no correlation in various graph similarity metrics. This failure mode implies the RNN controller does not condition on past architecture choices. Importantly, we may need to condition on past choices if certain connection patterns prevent vanishing or exploding gradients. Lastly, we propose a solution to this failure mode by forcing the controller’s hidden state to encode pasts decisions by training it with a memory buffer of previously sampled architectures. Doing this improves hidden state interpretability by increasing the correlation controller hidden states and graph similarity metrics.
Understanding Neural Networks via Feature Visualization: A survey A neuroscience method to understanding the brain is to find and study the preferred stimuli that highly activate an individual cell or groups of cells. Recent advances in machine learning enable a family of methods to synthesize preferred stimuli that cause a neuron in an artificial or biological brain to fire strongly. Those methods are known as Activation Maximization (AM) or Feature Visualization via Optimization. In this chapter, we (1) review existing AM techniques in the literature; (2) discuss a probabilistic interpretation for AM; and (3) review the applications of AM in debugging and explaining networks.
Understanding predictive information criteria for Bayesian models We review the Akaike, deviance, and Watanabe-Akaike information criteria from a Bayesian perspective, where the goal is to estimate expected out-of-sample-prediction error using a biascorrected adjustment of within-sample error. We focus on the choices involved in setting up these measures, and we compare them in three simple examples, one theoretical and two applied. The contribution of this review is to put all these information criteria into a Bayesian predictive context and to better understand, through small examples, how these methods can apply in practice.
Understanding Probabilistic Classifiers Probabilistic classifiers are developed by assuming generative models which are product distributions over the original attribute space (as in naive Bayes) or more involved spaces (as in general Bayesian networks). While this paradigm has been shown experimentally successful on real world applications, despite vastly simplified probabilistic assumptions, the question of why these approaches work is still open. This paper resolves this question.We show that almost all joint distributions with a given set of marginals (i.e., all distributions that could have given rise to the classifier learned) or, equivalently, almost all data sets that yield this set of marginals, are very close (in terms of distributional distance) to the product distribution on the marginals; the number of these distributions goes down exponentially with their distance from the product distribution. Consequently, as we show, for almost all joint distributions with this set of marginals, the penalty incurred in using the marginal distribution rather than the true one is small. In addition to resolving the puzzle surrounding the success of probabilistic classifiers our results contribute to understanding the tradeoffs in developing probabilistic classifiers and will help in developing better classifiers.
Understanding Random Forests. From Theory to Practice. Data analysis and machine learning have become an integrative part of the modern scientific methodology, offering automated procedures for the prediction of a phenomenon based on past observations, unraveling underlying patterns in data and providing insights about the problem. Yet, caution should avoid using machine learning as a black-box tool, but rather consider it as a methodology, with a rational thought process that is entirely dependent on the problem under study. In particular, the use of algorithms should ideally require a reasonable understanding of their mechanisms, properties and limitations, in order to better apprehend and interpret their results. Accordingly, the goal of this thesis is to provide an in-depth analysis of random forests, consistently calling into question each and every part of the algorithm, in order to shed new light on its learning capabilities, inner workings and interpretability. The first part of this work studies the induction of decision trees and the construction of ensembles of randomized trees, motivating their design and purpose whenever possible. Our contributions follow with an original complexity analysis of random forests, showing their good computational performance and scalability, along with an in-depth discussion of their implementation details, as contributed within Scikit-Learn. In the second part of this work, we analyze and discuss the interpretability of random forests in the eyes of variable importance measures. The core of our contributions rests in the theoretical characterization of the Mean Decrease of Impurity variable importance measure, from which we prove and derive some of its properties in the case of multiway totally randomized trees and in asymptotic conditions. In consequence of this work, our analysis demonstrates that variable importances as computed from non-totally randomized trees (e.g., standard Random Forest) suffer from a combination of defects, due to masking effects, misestimations of node impurity or due to the binary structure of decision trees. Finally, the last part of this dissertation addresses limitations of random forests in the context of large datasets. Through extensive experiments, we show that subsampling both samples and features simultaneously provides on par performance while lowering at the same time the memory requirements. Overall this paradigm highlights an intriguing practical fact: there is often no need to build single models over immensely large datasets. Good performance can often be achieved by building models on (very) small random parts of the data and then combining them all in an ensemble, thereby avoiding all practical burdens of making large data fit into memory.
Understanding Regularization in Batch Normalization Batch Normalization (BN) makes output of hidden neuron had zero mean and unit variance, improving convergence and generalization when training neural networks. This work understands these phenomena theoretically. We analyze BN by using a building block of neural networks, which consists of a weight layer, a BN layer, and a nonlinear activation function. This simple network helps us understand the characteristics of BN, where the results are generalized to deep models in numerical studies. We explore BN in three aspects. First, by viewing BN as a stochastic process, an analytical form of regularization inherited in BN is derived. Second, the optimization dynamic with this regularization shows that BN enables training converged with large maximum and effective learning rates. Third, BN’s generalization with regularization is explored by using random matrix theory and statistical mechanics. Both simulations and experiments support our analyses.
Understanding the Basis of the Kalman Filter via a Simple and Intuitive Derivation This article provides a simple and intuitive derivation of the Kalman filter, with the aim of teaching this useful tool to students from disciplines that do not require a strong mathematical background. The most complicated level of mathematics required to understand this derivation is the ability to multiply two Gaussian functions together and reduce the result to a compact form. The Kalman filter is over 50 years old but is still one of the most important and common data fusion algorithms in use today. Named after Rudolf E. Kálmán, the great success of the Kalman filter is due to its small computational requirement, elegant recursive properties, and its status as the optimal estimator for one-dimensional linear systems with Gaussian error statistics [1] . Typical uses of the Kalman filter include smoothing noisy data and providing estimates of parameters of interest. Applications include global positioning system receivers, phaselocked loops in radio equipment, smoothing the output from laptop trackpads, and many more. From a theoretical standpoint, the Kalman filter is an algorithm permitting exact inference in a linear dynamical system, which is a Bayesian model similar to a hidden Markov model but where the state space of the latent variables is continuous and where all latent and observed variables have a Gaussian distribution (often a multivariate Gaussian distribution). The aim of this lecture note is to permit people who find this description confusing or terrifying to understand the basis of the Kalman filter via a simple and intuitive derivation.
Understanding the Meaning of Understanding Can we train a machine to detect if another machine has understood a concept In principle, this is possible by conducting tests on the subject of that concept. However we want this procedure to be done by avoiding direct questions. In other words, we would like to isolate the absolute meaning of an abstract idea by putting it into a class of equivalence, hence without adopting straight definitions or showing how this idea ‘works’ in practice. We discuss the metaphysical implications hidden in the above question, with the aim of providing a plausible reference framework.
Understanding the Motivations, Challenges and Needs of Blockchain Software Developers: A Survey The blockchain technology has potential applications in various areas such as smart-contracts, Internet of Things (IoT), land registry, supply chain management, storing medical data, and identity management. Although the Github currently hosts more than six thousand active Blockchain software (BCS) projects, few software engineering research has investigated these projects and its’ contributors. Although the number of BCS projects is growing rapidly, the motivations, challenges, and needs of BCS developers remain a puzzle. Therefore, the primary objective of this study is to understand the motivations, challenges, and needs of BCS developers and analyze the differences between BCS and non-BCS development. On this goal, we sent an online survey to 1,604 active BCS developers identified via mining the Github repositories of 145 popular BCS projects. The survey received 156 responses that met our criteria for analysis. The results suggest that the majority of the BCS developers are experienced in non-BCS development and are primarily motivated by the ideology of creating a decentralized financial system. Although most of the BCS projects are Open Source Software (OSS) projects by nature, more than 93% of our respondents found BCS development somewhat different from a non-BCS development. The aspects of BCS development that differ from a non-BCS one are also the primary sources of challenges for them. Current BCS development ecosystem is immature and needs an array of tools to be developed or improved.
Understanding the Probabilistic Latent Component Analysis Framework Probabilistic Component Latent Analysis (PLCA) is a statistical modeling method for feature extraction from non-negative data. It has been fruitfully applied to various research fields of information retrieval. However, the EM-solved optimization problem coming with the parameter estimation of PLCA-based models has never been properly posed and justified. We then propose in this short paper to re-define the theoretical framework of this problem, with the motivation of making it clearer to understand, and more admissible for further developments of PLCA-based computational systems.
Unifying Decision-Making: a Review on Evolutionary Theories on Rationality and Cognitive Biases In this paper, we make a review on the concepts of rationality across several different fields, namely in economics, psychology and evolutionary biology and behavioural ecology. We review how processes like natural selection can help us understand the evolution of cognition and how cognitive biases might be a consequence of this natural selection. In the end we argue that humans are not irrational, but rather rationally bounded and we complement the discussion on how quantum cognitive models can contribute for the modelling and prediction of human paradoxical decisions.
Unit Level Modeling of Survey Data for Small Area Estimation Under Informative Sampling: A Comprehensive Overview with Extensions Model-based small area estimation is frequently used in conjunction with survey data in order to establish estimates for under-sampled or unsampled geographies. These models can be specified at either the area-level, or the unit-level, but unit-level models often offer potential advantages such as more precise estimates and easy spatial aggregation. Nevertheless, relative to area-level models literature on unit-level models is less prevalent. In modeling small areas at the unit level, challenges often arise as a consequence of the informative sampling mechanism used to collect the survey data. This paper provides a comprehensive methodological review for unit-level models under informative sampling, with an emphasis on Bayesian approaches. To provide insight into the differences between methods, we conduct a simulation study that compares several of the described approaches. In addition, the methods used for simulation are further illustrated through an application to the American Community Survey. Finally, we present several extensions and areas for future research.
Universal gradient descent In this small book we collect many different and useful facts around gradient descent method. First of all we consider gradient descent with inexact oracle. We build a general model of optimized function that include composite optimization approache, level’s methods, proximal methods etc. Then we investigate primal-dual properties of the gradient descent in general model set-up. At the end we generalize method to universal one.
Universal Reasoning, Rational Argumentation and Human-Machine Interaction Classical higher-order logic, when utilized as a meta-logic in which various other (classical and non-classical) logics can be shallowly embedded, is well suited for realising a universal logic reasoning approach. Universal logic reasoning in turn, as envisioned already by Leibniz, may support the rigorous formalisation and deep logical analysis of rational arguments within machines. A respective universal logic reasoning framework is described and a range of exemplary applications are discussed. In the future, universal logic reasoning in combination with appropriate, controlled forms of rational argumentation may serve as a communication layer between humans and intelligent machines.
Universal Reinforcement Learning Algorithms: Survey and Experiments Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.
Universality of Deep Convolutional Neural Networks Deep learning has been widely applied and brought breakthroughs in speech recognition, computer vision, and many other domains. The involved deep neural network architectures and computational issues have been well studied in machine learning. But there lacks a theoretical foundation for understanding the approximation or generalization ability of deep learning methods generated by the network architectures such as deep convolutional neural networks having convolutional structures. Here we show that a deep convolutional neural network (CNN) is universal, meaning that it can be used to approximate any continuous function to an arbitrary accuracy when the depth of the neural network is large enough. This answers an open question in learning theory. Our quantitative estimate, given tightly in terms of the number of free parameters to be computed, verifies the efficiency of deep CNNs in dealing with large dimensional data. Our study also demonstrates the role of convolutions in deep CNNs.
Unsupervised and Supervised Principal Component Analysis: Tutorial This is a detailed tutorial paper which explains the Principal Component Analysis (PCA), Supervised PCA (SPCA), kernel PCA, and kernel SPCA. We start with projection, PCA with eigen-decomposition, PCA with one and multiple projection directions, properties of the projection matrix, reconstruction error minimization, and we connect to auto-encoder. Then, PCA with singular value decomposition, dual PCA, and kernel PCA are covered. SPCA using both scoring and Hilbert-Schmidt independence criterion are explained. Kernel SPCA using both direct and dual approaches are then introduced. We cover all cases of projection and reconstruction of training and out-of-sample data. Finally, some simulations are provided on Frey and AT&T face datasets for verifying the theory in practice.
Unsupervised learning of phase transitions: from principle component analysis to variational autoencoders We employ unsupervised machine learning techniques to learn latent parameters which best describe states of the two-dimensional Ising model and the three-dimensional XY model. These methods range from principle component analysis to artificial neural network based variational autoencoders. The states are sampled using a Monte-Carlo simulation above and below the critical temperature. We find that the predicted latent parameters correspond to the known order parameters. The latent representation of the states of the models in question are clustered, which makes it possible to identify phases without prior knowledge of their existence or the underlying Hamiltonian. Furthermore, we find that the reconstruction loss function can be used as a universal identifier for phase transitions.
Unsupervised Machine Learning for Networking: Techniques, Applications and Research Challenges While machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network performance and provide services such as traffic engineering, anomaly detection, Internet traffic classification, and quality of service optimization. The interest in applying unsupervised learning techniques in networking emerges from their great success in other fields such as computer vision, natural language processing, speech recognition, and optimal control (e.g., for developing autonomous self-driving cars). Unsupervised learning is interesting since it can unconstrain us from the need of labeled data and manual handcrafted feature engineering thereby facilitating flexible, general, and automated methods of machine learning. The focus of this survey paper is to provide an overview of the applications of unsupervised learning in the domain of networking. We provide a comprehensive survey highlighting the recent advancements in unsupervised learning techniques and describe their applications for various learning tasks in the context of networking. We also provide a discussion on future directions and open research issues, while also identifying potential pitfalls. While a few survey papers focusing on the applications of machine learning in networking have previously been published, a survey of similar scope and breadth is missing in literature. Through this paper, we advance the state of knowledge by carefully synthesizing the insights from these survey papers while also providing contemporary coverage of recent advances.
Unsupervised Pre-training for Natural Language Generation: A Literature Review Recently, unsupervised pre-training is gaining increasing popularity in the realm of computational linguistics, thanks to its surprising success in advancing natural language understanding (NLU) and the potential to effectively exploit large-scale unlabelled corpus. However, regardless of the success in NLU, the power of unsupervised pre-training is only partially excavated when it comes to natural language generation (NLG). The major obstacle stems from an idiosyncratic nature of NLG: Texts are usually generated based on certain context, which may vary with the target applications. As a result, it is intractable to design a universal architecture for pre-training as in NLU scenarios. Moreover, retaining the knowledge learned from pre-training when learning on the target task is also a non-trivial problem. This review summarizes the recent efforts to enhance NLG systems with unsupervised pre-training, with a special focus on the methods to catalyse the integration of pre-trained models into downstream tasks. They are classified into architecture-based methods and strategy-based methods, based on their way of handling the above obstacle. Discussions are also provided to give further insights into the relationship between these two lines of work, some informative empirical phenomenons, as well as some possible directions where future work can be devoted to.
Upping Your Analytic IQ – Driving Real Time Insights through Increased Employee Engagement (Slide Deck)
Use of Deep Learning in Modern Recommendation System: A Summary of Recent Works With the exponential increase in the amount of digital information over the internet, online shops, online music, video and image libraries, search engines and recommendation system have become the most convenient ways to find relevant information within a short time. In the recent times, deep learning’s advances have gained significant attention in the field of speech recognition, image processing and natural language processing. Meanwhile, several recent studies have shown the utility of deep learning in the area of recommendation systems and information retrieval as well. In this short review, we cover the recent advances made in the field of recommendation using various variants of deep learning technology. We organize the review in three parts: Collaborative system, Content based system and Hybrid system. The review also discusses the contribution of deep learning integrated recommendation systems into several application domains. The review concludes by discussion of the impact of deep learning in recommendation system in various domain and whether deep learning has shown any significant improvement over the conventional systems for recommendation. Finally, we also provide future directions of research which are possible based on the current state of use of deep learning in recommendation systems.
User Reviews and Language: How Language Influences Ratings The number of user reviews of tourist attractions, restaurants, mobile apps, etc. is increasing for all languages; yet, research is lacking on how reviews in multiple languages should be aggregated and displayed. Speakers of different languages may have consistently different experiences, e.g., different information available in different languages at tourist attractions or different user experiences with software due to internationalization/localization choices. This paper assesses the similarity in the ratings given by speakers of different languages to London tourist attractions on TripAdvisor. The correlations between different languages are generally high, but some language pairs are more correlated than others. The results question the common practice of computing average ratings from reviews in many languages.
Using DeployR to Solve the R Integration Problem Organizations use analytics to empower decision making, often in real time. That means the ability to easily embed and share the results of R analytics within existing web, desktop, and mobile applications, plus backend systems, is vital. As important as real-time delivery of analytics is to this process, users also need to consider security in terms of identity and data integrity, and scale in terms of workload and throughput. DeployR from Revolution Analytics delivers the power and flexibility of R securely and at scale to all of your decision-making systems. DeployR is an integration technology for deploying R analytics inside web, desktop, mobile, and dashboard applications, as well as backend systems. DeployR turns your R scripts into analytics web services, so R code can be easily executed by applications running on a secure server. Using analytics web services, DeployR also solves key integration problems faced by those adopting R-based analytics alongside existing IT infrastructure. These services make it easy for application developers to collaborate with R programmers to integrate R analytics into their applications without any R programming knowledge. DeployR is available in two editions: DeployR Open and DeployR Enterprise. DeployR Open is a free, open source solution that is ideal for prototyping, building, and deploying non-critical business applications. DeployR Enterprise scales for business-critical applications and offers support for production-grade workloads, as well as seamless integration with popular enterprise security solutions such as single sign-on (SSO), Lightweight Directory Access Protocol (LDAP), Active Directory, or Pluggable Authentication Modules (PAM).
Using intervention time series analyses to assess the effects of imperfectly identifiable natural events: a general method and example Intervention time series analysis (ITSA) is an important method for analysing the effect of sudden events on time series data. ITSA methods are quasi-experimental in nature and the validity of modelling with these methods depends upon assumptions about the timing of the intervention and the response of the process to it. This paper describes how to apply ITSA to analyse the impact of unplanned events on time series when the timing of the event is not accurately known, and so the problems of ITSA methods are magnified by uncertainty in the point of onset of the unplanned intervention. The methods are illustrated using the example of the Australian Heroin Shortage of 2001, which provided an opportunity to study the health and social consequences of an abrupt change in heroin availability in an environment of widespread harm reduction measures. Application of these methods enables valuable insights about the consequences of unplanned and poorly identified interventions while minimising the risk of spurious results.
Using JAGS for Bayesian Cognitive Diagnosis Models: A Tutorial In this article, JAGS software was systematically introduced to fit common Bayesian cognitive diagnosis models (CDMs), such as the deterministic inputs, noisy ‘and’ gate model, the deterministic inputs, noisy ‘or’ gate model, the linear logistic model, and the log-linear CDM. The unstructured structural model and the higher-order structural model were both employed. We also showed how to extend those models to consider the testlet-effect. Finally, an empirical example was given as a tutorial to illustrate how to use our JAGS code in R.
Using R and Tableau R functions and models can now be used in Tableau by creating new calculated fields that dynamically invoke the R engine and pass values to R. The results are then returned back to Tableau for use by the Tableau visualization engine.
Using Storm for Real-Time First Story Detection Twitter has been an excellent tool for extracting information in real-time, such as news and events. This dissertation uses Twitter to address the problem of real-time new events detection on the Storm distributed platform such that the system benefits from the scalability, efficiency and robustness this framework can offer. Towards this direction, three different implementations have been deployed, each of which having a different configuration. The first and simplest distributed implementation was the baseline approach. Two implementations followed, in an attempt to achieve faster data processing without loss in accuracy. The rest two implementations demonstrated significant improvements in both performance and scalability. Specifically, they achieved a 1357.16 % and 1213.15 % speed-up over the single-threaded baseline version, correspondingly. Moreover, the accuracy and robustness of the scalable approaches comparing to the baseline version were retained.
Using visualization to understand big data Studies have shown that the human short-term memory is capable of holding 3 – 7 items in place simultaneously, which means that people can only juggle a few items in their heads before they start to lose track of them. Visualization creates encodings of data into visual channels that people can view and understand. This process externalizes the data and enables people to think about and manipulate the data at a higher level. This externalization enables humans to think more complex thoughts about larger amounts of information than would otherwise be possible.1 Visualization exploits the human visual system to provide an intuitive, immediate and language-independent way to view and show your data. It is an essential tool for understanding information. The human visual system is by far the richest, most immediate, highest bandwidth pipeline into the human mind. The amount of brain capacity that is devoted to processing visual input far exceeds that of the other human senses. Some scientific estimates suggest that the human visual system is capable of processing about 9 megabits of information per second, which corresponds to close to 1 million letters of text per second. Visualization research over the past decades has discovered a wide range of effective visualization techniques that go far beyond the basic pie, bar and line charts used so pervasively in spreadsheets and dashboards. These techniques are especially useful now that most organizations are being confronted with big data. The majority of organizations are struggling to make sense of output from data sources that include RFID communications, social media text, customer surveys, streaming video and more, along with data captured over very long periods of time. For the IBM Institute for Business Value report on big data, IBM surveyed more than 1100 business and IT professionals and found that less than 26 percent of respondents who had active big data efforts could analyze extremely unstructured data such as voice and video and just 35 percent could analyze streaming data.2 Visualization plays a key role in enabling the understanding of these complex data analytics, and it can convey the key analytical nuggets of information to other people in the organization who have less expertise in analytics. When companies can analyze big data, they benefit. In that same IBM survey, 63 percent of respondents reported that they believe that understanding and exploiting big data effectively can create a competitive advantage for their organizations.3 Big data analysis can help them improve decision making, create a 360-degree view of their customers, improve security and surveillance, analyze operations and augment data warehousing. Visualization can play a vital role in using big data to get a complete view of your customer. This paper covers how.

V

Variable Selection and Estimation in High-Dimensional Models Models with high-dimensional covariates arise frequently in economics and other fields. Often, only a few covariates have important effects on the dependent variable. When this happens, the model is said to be sparse. In applications, however, it is not known which covariates are important and which are not. This paper reviews methods for discriminating between important and unimportant covariates with particular attention given to methods that discriminate correctly with probability approaching 1 as the sample size increases. Methods are available for a wide variety of linear, nonlinear, semiparametric, and nonparametric models. The performance of some of these methods in finite samples is illustrated through Monte Carlo simulations and an empirical example.
Variable Selection Methods for Model-based Clustering Model-based clustering is a popular approach for clustering multivariate data which has seen application in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for selecting relevant clustering variables in model-based clustering. The methods are illustrated by application to real-world data and existing software to implement the methods are indicated.
Variational Inference: A Review for Statisticians One of the core problems of modern statistics is to approximate difficult-to-compute probability distributions. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation about the posterior. In this paper, we review variational inference (VI), a method from machine learning that approximates probability distributions through optimization. VI has been used in myriad applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of distributions and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this widely-used class of algorithms.
Vector and Matrix Optimal Mass Transport: Theory, Algorithm, and Applications In many applications such as color image processing, data has more than one piece of information associated with each spatial coordinate, and in such cases the classical optimal mass transport (OMT) must be generalized to handle vector-valued or matrix-valued densities. In this paper, we discuss the vector and matrix optimal mass transport and present three contributions. We first present a rigorous mathematical formulation for these setups and provide analytical results including existence of solutions and strong duality. Next, we present a simple, scalable, and parallelizable methods to solve the vector and matrix-OMT problems. Finally, we implement the proposed methods on a CUDA GPU and present experiments and applications.
Vector Autoregressive Models
Verification for Machine Learning, Autonomy, and Neural Networks Survey This survey presents an overview of verification techniques for autonomous systems, with a focus on safety-critical autonomous cyber-physical systems (CPS) and subcomponents thereof. Autonomy in CPS is enabling by recent advances in artificial intelligence (AI) and machine learning (ML) through approaches such as deep neural networks (DNNs), embedded in so-called learning enabled components (LECs) that accomplish tasks from classification to control. Recently, the formal methods and formal verification community has developed methods to characterize behaviors in these LECs with eventual goals of formally verifying specifications for LECs, and this article presents a survey of many of these recent approaches.
Video Description: A Survey of Methods, Datasets and Evaluation Metrics Automatic video description is useful for assisting the visually impaired, human computer interaction, robotics and video indexing. The past few years have seen a surge of research interest in this area due to the unprecedented success of deep learning in computer vision and natural language processing. Numerous methods, datasets and evaluation measures have been proposed in the literature calling the need for a comprehensive survey to better focus research efforts in this flourishing direction. This paper answers exactly to this need by surveying state of the art approaches including deep learning models; comparing benchmark datasets in terms of their domain, number of classes, and repository size; and identifying the pros and cons of various evaluation metrics such as BLEU, ROUGE, METEOR, CIDEr, SPICE and WMD. Our survey shows that video description research has a long way to go before it can match human performance and that the main reasons for this shortfall are twofold. Firstly, existing datasets do not adequately represent the diversity in open domain videos and complex linguistic structures. Secondly, current measures of evaluation are not aligned with human judgement. For example, the same video can have very different, yet correct descriptions. We conclude that there is a need for improvement in evaluation measures as well as datasets in terms of size, diversity and annotation accuracy because they directly influence the development of better video description models. From an algorithmic point of view, diagnosis of the description quality is challenging because of the difficultly to assess the level of contribution from visual features compared to the bias that comes naturally from the language model adopted.
Video Object Segmentation and Tracking: A Survey Object segmentation and object tracking are fundamental research area in the computer vision community. These two topics are diffcult to handle some common challenges, such as occlusion, deformation, motion blur, and scale variation. The former contains heterogeneous object, interacting object, edge ambiguity, and shape complexity. And the latter suffers from difficulties in handling fast motion, out-of-view, and real-time processing. Combining the two problems of video object segmentation and tracking (VOST) can overcome their respective difficulties and improve their performance. VOST can be widely applied to many practical applications such as video summarization, high definition video compression, human computer interaction, and autonomous vehicles. This article aims to provide a comprehensive review of the state-of-the-art tracking methods, and classify these methods into different categories, and identify new trends. First, we provide a hierarchical categorization existing approaches, including unsupervised VOS, semi-supervised VOS, interactive VOS, weakly supervised VOS, and segmentation-based tracking methods. Second, we provide a detailed discussion and overview of the technical characteristics of the different methods. Third, we summarize the characteristics of the related video dataset, and provide a variety of evaluation metrics. Finally, we point out a set of interesting future works and draw our own conclusions.
Video Skimming: Taxonomy and Comprehensive Survey Video skimming, also known as dynamic video summarization, generates a temporally abridged version of a given video. Skimming can be achieved by identifying significant components either in uni-modal or multi-modal features extracted from the video. Being dynamic in nature, video skimming, through temporal connectivity, allows better understanding of the video from its summary. Having this obvious advantage, recently, video skimming has drawn the focus of many researchers benefiting from the easy availability of the required computing resources. In this paper, we provide a comprehensive survey on video skimming focusing on the substantial amount of literature from the past decade. We present a taxonomy of video skimming approaches, and discuss their evolution highlighting key advances. We also provide a study on the components required for the evaluation of a video skimming performance.
Viewpoint: Artificial Intelligence and Labour The welfare of modern societies has been intrinsically linked to wage labour. With some exceptions, the modern human has to sell her labour-power to be able reproduce biologically and socially. Thus, a lingering fear of technological unemployment features predominately as a theme among Artificial Intelligence researchers. In this short paper we show that, if past trends are anything to go by, this fear is irrational. On the contrary, we argue that the main problem humanity will be facing is the normalisation of extremely long working hours.
vim Graphical Cheat Sheet (Cheat Sheet)
Visions of a generalized probability theory In this Book we argue that the fruitful interaction of computer vision and belief calculus is capable of stimulating significant advances in both fields. From a methodological point of view, novel theoretical results concerning the geometric and algebraic properties of belief functions as mathematical objects are illustrated and discussed in Part II, with a focus on both a perspective ‘geometric approach’ to uncertainty and an algebraic solution to the issue of conflicting evidence. In Part III we show how these theoretical developments arise from important computer vision problems (such as articulated object tracking, data association and object pose estimation) to which, in turn, the evidential formalism is able to provide interesting new solutions. Finally, some initial steps towards a generalization of the notion of total probability to belief functions are taken, in the perspective of endowing the theory of evidence with a complete battery of estimation and inference tools to the benefit of all scientists and practitioners.
Visual Analysis Best Practices: Simple Techniques for Making Every Data Visualization Useful and Beautiful You made a visualization! Congratulations: you are part of a small but growing group that´s taking advantage of the power of visualization. However, bringing your visualizations from ‘good’ to ‘great’ takes time, patience and attention to detail. Luckily, we have compiled a short but important list of techniques to get you started. Happy visualizing!
Visual Analysis of Large Graphs: State-of-the-Art and Future Research Challenges The analysis of large graphs plays a prominent role in various fields of research and is relevant in many important application areas. Effective visual analysis of graphs requires appropriate visual presentations in combination with respective user interaction facilities and algorithmic graph analysis methods. How to design appropriate graph analysis systems depends on many factors, including the type of graph describing the data, the analytical task at hand, and the applicability of graph analysis methods. The most recent surveys of graph visualization and navigation techniques cover techniques that had been introduced until 2000 or concentrate only on graph layouts published until 2002. Recently, new techniques have been developed covering a broader range of graph types, such as time-varying graphs. Also, in accordance with ever growing amounts of graph-structured data becoming available, the inclusion of algorithmic graph analysis and interaction techniques becomes increasingly important. In this State-of-the-Art Report, we survey available techniques for the visual analysis of large graphs. Our review firstly considers graph visualization techniques according to the type of graphs supported. The visualization techniques form the basis for the presentation of interaction approaches suitable for visual graph exploration. As an important component of visual graph analysis, we discuss various graph algorithmic aspects useful for the different stages of the visual graph analysis process. We also present main open research challenges in this field.
Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers Deep learning has recently seen rapid development and significant attention due to its state-of-the-art performance on previously-thought hard problems. However, because of the innate complexity and nonlinear structure of deep neural networks, the underlying decision making processes for why these models are achieving such high performance are challenging and sometimes mystifying to interpret. As deep learning spreads across domains, it is of paramount importance that we equip users of deep learning with tools for understanding when a model works correctly, when it fails, and ultimately how to improve its performance. Standardized toolkits for building neural networks have helped democratize deep learning; visual analytics systems have now been developed to support model explanation, interpretation, debugging, and improvement. We present a survey of the role of visual analytics in deep learning research, noting its short yet impactful history and summarize the state-of-the-art using a human-centered interrogative framework, focusing on the Five W’s and How (Why, Who, What, How, When, and Where), to thoroughly summarize deep learning visual analytics research. We conclude by highlighting research directions and open research problems. This survey helps new researchers and practitioners in both visual analytics and deep learning to quickly learn key aspects of this young and rapidly growing body of research, whose impact spans a diverse range of domains.
Visual Analytics of Anomalous User Behaviors: A Survey The increasing accessibility of data provides substantial opportunities for understanding user behaviors. Unearthing anomalies in user behaviors is of particular importance as it helps signal harmful incidents such as network intrusions, terrorist activities, and financial frauds. Many visual analytics methods have been proposed to help understand user behavior-related data in various application domains. In this work, we survey the state of art in visual analytics of anomalous user behaviors and classify them into four categories including social interaction, travel, network communication, and transaction. We further examine the research works in each category in terms of data types, anomaly detection techniques, and visualization techniques, and interaction methods. Finally, we discuss the findings and potential research directions.
Visual Data Mining Techniques Never before in history has data been generated at such high volumes as it is today. Exploring and analyzing the vast volumes of data has become increasingly difficult. Information visualization and visual data mining can help to deal with the flood of information. The advantage of visual data exploration is that the user is directly involved in the data mining process. There are a large number of information visualization techniques that have been developed over the last two decades to support the exploration of large data sets. In this paper, we propose a classification of information visualization and visual data mining techniques based on the data type to be visualized, the visualization technique, and the interaction technique. We illustrate the classification using a few examples, and indicate some directions for future work.
Visual Interpretability for Deep Learning: a Survey This paper reviews recent studies in emerging directions of understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations. Although deep neural networks have exhibited superior performance in various tasks, the interpretability is always an Achilles’ heel of deep neural networks. At present, deep neural networks obtain a high discrimination power at the cost of low interpretability of their black-box representations. We believe that the high model interpretability may help people to break several bottlenecks of deep learning, e.g., learning from very few annotations, learning via human-computer communications at the semantic level, and semantically debugging network representations. In this paper, we focus on convolutional neural networks (CNNs), and we revisit the visualization of CNN representations, methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs with disentangled representations, and middle-to-end learning based on model interpretability. Finally, we discuss prospective trends of explainable artificial intelligence.
Visualize This In TechTarget´s 2013 Analytics and Data Warehousing Reader Survey, 36% of 664 respondents said their organizations were using data visualization and discovery tools, while another 41% said deployments were planned in the next 12 months. In response to another question, 44% said they expected their organizations to increase spending on data visualization initiatives over the next 12 months. That was the third-highest percentage among 10 technology categories, narrowly topped only by the figures for data warehousing and predictive analytics. It isn´t surprising that data visualization would be gaining in popularity. Business intelligence and analytics are becoming more central to business strategies – and business success. As a result, many organizations are looking to broaden the use of BI data in decision making. Visualizing that data can make it easier for users to grasp. But it´s easy to go awry on data visualization. In an August 2013 blog post, Forrester Research analyst Ryan Morrill cited ‘a cascade of bad examples’ of infographics and other types of visualizations with data errors and overly complicated designs. He recommended focusing on two things in creating visualizations: engaging graphics, yes, but also a ‘data-driven design’ in which the visual elements help to accurately depict the information being presented. This special edition of Business Information examines the dos and don´ts of managing successful data visualization processes. First we offer advice from experienced users on finding and deploying the right BI and data visualization tools. Next we provide tips on building effective visualizations for use in BI dashboards. We finish with a look at the use of geographic information systems to help improve the quality of health care.
Visualizing and Understanding Convolutional Networks Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark (Krizhevsky et al., 2012). However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classi er. Used in a diagnostic role, these visualizations allow us to nd model architectures that outperform Krizhevsky et al. on the ImageNet classification benchmark. We also perform an ablation study to discover the performance contribution from di erent model layers. We show our ImageNet model generalizes well to other datasets: when the softmax classi er is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.
Visualizing Data The HBR Insight Center highlights emerging thinking around today´s most important business ideas. In this Insight Center, we´ll explore the power of using data visualization to drive business strategy. We´ll talk about when (and when not) to use visualization, how to get started, how to know if you´re getting a good return on your data visualization investment, and more.
Visualizing Natural Language Descriptions: A Survey A natural language interface exploits the conceptual simplicity and naturalness of the language to create a high-level user-friendly communication channel between humans and machines. One of the promising applications of such interfaces is generating visual interpretations of semantic content of a given natural language that can be then visualized either as a static scene or a dynamic animation. This survey discusses requirements and challenges of developing such systems and reports 26 graphical systems that exploit natural language interfaces and addresses both artificial intelligence and visualization aspects. This work serves as a frame of reference to researchers and to enable further advances in the field.

W

Ward’s Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm TheWard error sum of squares hierarchical clustering method has been very widely used since its rst description by Ward in a 1963 publication. It has also been generalized in various ways. However there are di erent interpretations in the literature and there are di erent implementations of the Ward agglomerative algorithm in commonly used software systems, including di ering expressions of the agglomerative criterion. Our survey work and case studies will be useful for all those involved in developing software for data analysis using Ward’s hierarchical clustering method.
Warshall’s algorithm — survey and applications The survey presents the well-known Warshall’s algorithm, a generalization and some interesting applications of this.
Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples. This learning task is difficult even if all training and test samples are drawn from the same distribution—especially if the dimension of the uncertainty is large relative to the training sample size. Wasserstein distributionally robust optimization seeks data-driven decisions that perform well under the most adverse distribution within a certain Wasserstein distance from a nominal distribution constructed from the training samples. In this tutorial we will argue that this approach has many conceptual and computational benefits. Most prominently, the optimal decisions can often be computed by solving tractable convex optimization problems, and they enjoy rigorous out-of-sample and asymptotic consistency guarantees. We will also show that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others.
Weighted Abstract Dialectical Frameworks: Extended and Revised Report Abstract Dialectical Frameworks (ADFs) generalize Dung’s argumentation frameworks allowing various relationships among arguments to be expressed in a systematic way. We further generalize ADFs so as to accommodate arbitrary acceptance degrees for the arguments. This makes ADFs applicable in domains where both the initial status of arguments and their relationship are only insufficiently specified by Boolean functions. We define all standard ADF semantics for the weighted case, including grounded, preferred and stable semantics. We illustrate our approach using acceptance degrees from the unit interval and show how other valuation structures can be integrated. In each case it is sufficient to specify how the generalized acceptance conditions are represented by formulas, and to specify the information ordering underlying the characteristic ADF operator. We also present complexity results for problems related to weighted ADFs.
Weighted Clustering Ensemble: A Review Clustering ensemble has emerged as a powerful tool for improving both the robustness and the stability of results from individual clustering methods. Weighted clustering ensemble arises naturally from clustering ensemble. One of the arguments for weighted clustering ensemble is that elements (clusterings or clusters) in a clustering ensemble are of different quality, or that objects or features are of varying significance. However, it is not possible to directly apply the weighting mechanisms from classification (supervised) domain to clustering (unsupervised) domain, also because clustering is inherently an ill-posed problem. This paper provides an overview of weighted clustering ensemble by discussing different types of weights, major approaches to determining weight values, and applications of weighted clustering ensemble to complex data. The unifying framework presented in this paper will help clustering practitioners select the most appropriate weighting mechanisms for their own problems.
What am I searching for? Can we infer intentions and goals from a person’s actions? As an example of this family of problems, we consider here whether it is possible to decipher what a person is searching for by decoding their eye movement behavior. We conducted two human psychophysics experiments on object arrays and natural images where we monitored subjects’ eye movements while they were looking for a target object. Using as input the pattern of ‘error’ fixations on non-target objects before the target was found, we developed a model (InferNet) whose goal was to infer what the target was. ‘Error’ fixations share similar features with the sought target. The Infernet model uses a pre-trained 2D convolutional architecture to extract features from the error fixations and computes a 2D similarity map between the error fixation and all locations across the search image by modulating the search image via convolution across layers. InferNet consolidates the modulated response maps across layers via max pooling to keep track of the sub-patterns highly similar to features at error fixations and integrates these maps across all error fixations. InferNet successfully identifies the subject’s goal and outperforms all the competitive null models, even without any object-specific training on the inference task.
What can computational models learn from human selective attention? A review from an audiovisual crossmodal perspective Selective attention plays an essential role in information acquisition and utilization from the environment. In the past 50 years, research on selective attention has been a central topic in cognitive science. Compared with unimodal studies, crossmodal studies are more complex but necessary to solve real-world challenges in both human experiments and computational modeling. Although an increasing number of findings on crossmodal selective attention have shed light on humans’ behavioral patterns and neural underpinnings, a much better understanding is still necessary to yield the same benefit for computational intelligent agents. This article reviews studies of selective attention in unimodal visual and auditory and crossmodal audiovisual setups from the multidisciplinary perspectives of psychology and cognitive neuroscience, and evaluates different ways to simulate analogous mechanisms in computational models and robotics. We discuss the gaps between these fields in this interdisciplinary review and provide insights about how to use psychological findings and theories in artificial intelligence from different perspectives.
What Can Neural Networks Reason About? Neural networks have successfully been applied to solving reasoning tasks, ranging from learning simple concepts like ‘close to’, to intricate questions whose reasoning procedures resemble algorithms. Empirically, not all network structures work equally well for reasoning. For example, Graph Neural Networks have achieved impressive empirical results, while less structured neural networks may fail to learn to reason. Theoretically, there is currently limited understanding of the interplay between reasoning tasks and network learning. In this paper, we develop a framework to characterize which tasks a neural network can learn well, by studying how well its structure aligns with the algorithmic structure of the relevant reasoning procedure. This suggests that Graph Neural Networks can learn dynamic programming, a powerful algorithmic strategy that solves a broad class of reasoning problems, such as relational question answering, sorting, intuitive physics, and shortest paths. Our perspective also implies strategies to design neural architectures for complex reasoning. On several abstract reasoning tasks, we see empirically that our theory aligns well with practice.
What can the brain teach us about building artificial intelligence? This paper is the preprint of an invited commentary on Lake et al’s Behavioral and Brain Sciences article titled ‘Building machines that learn and think like people’. Lake et al’s paper offers a timely critique on the recent accomplishments in artificial intelligence from the vantage point of human intelligence, and provides insightful suggestions about research directions for building more human-like intelligence. Since we agree with most of the points raised in that paper, we will offer a few points that are complementary.
What do AI algorithms actually learn? – On false structures in deep learning There are two big unsolved mathematical questions in artificial intelligence (AI): (1) Why is deep learning so successful in classification problems and (2) why are neural nets based on deep learning at the same time universally unstable, where the instabilities make the networks vulnerable to adversarial attacks. We present a solution to these questions that can be summed up in two words; false structures. Indeed, deep learning does not learn the original structures that humans use when recognising images (cats have whiskers, paws, fur, pointy ears, etc), but rather different false structures that correlate with the original structure and hence yield the success. However, the false structure, unlike the original structure, is unstable. The false structure is simpler than the original structure, hence easier to learn with less data and the numerical algorithm used in the training will more easily converge to the neural network that captures the false structure. We formally define the concept of false structures and formulate the solution as a conjecture. Given that trained neural networks always are computed with approximations, this conjecture can only be established through a combination of theoretical and computational results similar to how one establishes a postulate in theoretical physics (e.g. the speed of light is constant). Establishing the conjecture fully will require a vast research program characterising the false structures. We provide the foundations for such a program establishing the existence of the false structures in practice. Finally, we discuss the far reaching consequences the existence of the false structures has on state-of-the-art AI and Smale’s 18th problem.
What Do We Understand About Convolutional Networks This document will review the most prominent proposals using multilayer convolutional architectures. Importantly, the various components of a typical convolutional network will be discussed through a review of different approaches that base their design decisions on biological findings and/or sound theoretical bases. In addition, the different attempts at understanding ConvNets via visualizations and empirical studies will be reviewed. The ultimate goal is to shed light on the role of each layer of processing involved in a ConvNet architecture, distill what we currently understand about ConvNets and highlight critical open problems.
What Does a Belief Function Believe In The conditioning in the Dempster-Shafer Theory of Evidence has been defined (by Shafer \cite{Shafer:90} as combination of a belief function and of an ‘event’ via Dempster rule. On the other hand Shafer \cite{Shafer:90} gives a ‘probabilistic’ interpretation of a belief function (hence indirectly its derivation from a sample). Given the fact that conditional probability distribution of a sample-derived probability distribution is a probability distribution derived from a subsample (selected on the grounds of a conditioning event), the paper investigates the empirical nature of the Dempster- rule of combination. It is demonstrated that the so-called ‘conditional’ belief function is not a belief function given an event but rather a belief function given manipulation of original empirical data.\\ Given this, an interpretation of belief function different from that of Shafer is proposed. Algorithms for construction of belief networks from data are derived for this interpretation.
What does ‘Big Data’ mean for official statistics In our modern world more and more data are generated on the web and produced by sensors in the ever growing number of electronic devices surrounding us. The amount of data and the frequency at which they are produced have led to the concept of ‘Big data’. Big data is characterized as data sets of increasing volume, velocity and variety; the 3 V’s. Big data is often largely unstructured, meaning that it has no pre-defined data model and/or does not fit well into conventional relational databases. Apart from generating new commercial opportunities in the private sector, big data is also potentially very interesting as an input for official statistics; either for use on its own, or in combination with more traditional data sources such as sample surveys and administrative registers. However, harvesting the information from big data and incorporating it into a statistical production process is not easy. As such, this paper will seek to address two fundamental questions, i.e. the What and the How.
What Does Explainable AI Really Mean A New Conceptualization of Perspectives We characterize three notions of explainable AI that cut across research fields: opaque systems that offer no insight into its algo- rithmic mechanisms; interpretable systems where users can mathemat- ically analyze its algorithmic mechanisms; and comprehensible systems that emit symbols enabling user-driven explanations of how a conclusion is reached. The paper is motivated by a corpus analysis of NIPS, ACL, COGSCI, and ICCV/ECCV paper titles showing differences in how work on explainable AI is positioned in various fields. We close by introducing a fourth notion: truly explainable systems, where automated reasoning is central to output crafted explanations without requiring human post processing as final step of the generative process.
What Is Data Science We´ve all heard it: according to Hal Varian, statistics is the next sexy job. Five years ago, in What is Web 2.0, Tim O´Reilly said that ‘data is the next Intel Inside.’ But what does that statement mean Why do we suddenly care about statistics and about data In this post, I examine the many sides of data science — the technologies, the companies and the unique skill sets.
What is Decision Support This paper attempts to describe and clarify the meaning of the term Decision Support (DS). Based on a survey of DS related WWW documents, and taking a broad view of DS, a classification of DS and related disciplines is presented. DS is put in the context of Decision Making, and some most important disciplines of DS are overviewed: Operations Research, Decision Analysis, Decision Support Systems, Data Warehousing and OLAP, and Group Decision Support.
What Is Statistics One might think that there is a simple answer to the question posed in the title of the form ‘Statistics is. . . . ‘ Sadly, there is not, although many contemporary statistical authors have attempted to answer the question.This article captures the essence of some of these efforts, setting them in their historical contexts. In the process, we focus on the cross-disciplinary nature of much modern statistical research. This discussion serves as a backdrop to the the aims of the Annual Review of Statistics and its Application, which begins publication with the present volume.
What is the expectation maximization algorithm Probabilistic models, such as hidden Markov models or Bayesian networks, are commonly used to model biological data. Much of their popularity can be attributed to the existence of efficient and robust procedures for learning parameters from observations. Often, however, the only data available for training a probabilistic model are incomplete. Missing values can occur, for example, in medical diagnosis, where patient histories generally include results from a limited battery of tests. Alternatively, in gene expression clustering, incomplete data arise from the intentional omission of gene-to-cluster assignments in the probabilistic model. The expectation maximization algorithm enables parameter estimation in probabilistic models with incomplete data.
What is the Machine Learning Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables — aided by physical intuition — that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable’s discriminating power. Planing also allows the investigation of the linear versus non-linear nature of the boundaries between signal and background. We demonstrate the efficacy of this approach using a toy example, followed by an application to an idealized heavy resonance scenario at the Large Hadron Collider. By unpacking the information being utilized by these algorithms, this method puts in context what it means for a machine to learn.
What Makes a Visualization Memorable An ongoing debate in the Visualization community concerns the role that visualization types play in data understanding. In human cognition, understanding and memorability are intertwined. As a first step towards being able to ask questions about impact and effectiveness, here we ask: ‘What makes a visualization memorable ‘ We ran the largest scale visualization study to date using 2,070 single-panel visualizations, categorized with visualization type (e.g., bar chart, line graph, etc.), collected from news media sites, government reports, scientific journals, and infographic sources. Each visualization was annotated with additional attributes, including ratings for data-ink ratios and visual densities. Using Amazon´s Mechanical Turk, we collected memorability scores for hundreds of these visualizations, and discovered that observers are consistent in which visualizations they find memorable and forgettable. We find intuitive results (e.g., attributes like color and the inclusion of a human recognizable object enhance memorability) and less intuitive results (e.g., common graphs are less memorable than unique visualization types). Altogether our findings suggest that quantifying memorability is a general metric of the utility of information, an essential step towards determining how to design effective visualizations.
What’s Wrong With Deep Learning (Slide Deck)
When deep learning meets security Deep learning is an emerging research field that has proven its effectiveness towards deploying more efficient intelligent systems. Security, on the other hand, is one of the most essential issues in modern communication systems. Recently many papers have shown that using deep learning models can achieve promising results when applied to the security domain. In this work, we provide an overview for the recent studies that apply deep learning techniques to the field of security.
When Does Stochastic Gradient Algorithm Work Well In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a fixed, large step size and propose a novel assumption on the objective function, under which this method has the improved convergence rates (to a neighborhood of the optimal solutions). We then empirically demonstrate that these assumptions hold for logistic regression and standard deep neural networks on classical data sets. Thus our analysis helps to explain when efficient behavior can be expected from the SGD method in training classification models and deep neural networks.
When Gaussian Process Meets Big Data: A Review of Scalable GPs The vast quantity of information brought by big data as well as the evolving computer hardware encourages success stories in the machine learning community. In the meanwhile, it poses challenges for the Gaussian process (GP), a well-known non-parametric and interpretable Bayesian model, which suffers from cubic complexity to training size. To improve the scalability while retaining the desirable prediction quality, a variety of scalable GPs have been presented. But they have not yet been comprehensively reviewed and discussed in a unifying way in order to be well understood by both academia and industry. To this end, this paper devotes to reviewing state-of-the-art scalable GPs involving two main categories: global approximations which distillate the entire data and local approximations which divide the data for subspace learning. Particularly, for global approximations, we mainly focus on sparse approximations comprising prior approximations which modify the prior but perform exact inference, and posterior approximations which retain exact prior but perform approximate inference; for local approximations, we highlight the mixture/product of experts that conducts model averaging from multiple local experts to boost predictions. To present a complete review, recent advances for improving the scalability and model capability of scalable GPs are reviewed. Finally, the extensions and open issues regarding the implementation of scalable GPs in various scenarios are reviewed and discussed to inspire novel ideas for future research avenues.
When is a Prediction Knowledge? Within Reinforcement Learning, there is a growing collection of research which aims to express all of an agent’s knowledge of the world through predictions about sensation, behaviour, and time. This work can be seen not only as a collection of architectural proposals, but also as the beginnings of a theory of machine knowledge in reinforcement learning. Recent work has expanded what can be expressed using predictions, and developed applications which use predictions to inform decision-making on a variety of synthetic and real-world problems. While promising, we here suggest that the notion of predictions as knowledge in reinforcement learning is as yet underdeveloped: some work explicitly refers to predictions as knowledge, what the requirements are for considering a prediction to be knowledge have yet to be well explored. This specification of the necessary and sufficient conditions of knowledge is important; even if claims about the nature of knowledge are left implicit in technical proposals, the underlying assumptions of such claims have consequences for the systems we design. These consequences manifest in both the way we choose to structure predictive knowledge architectures, and how we evaluate them. In this paper, we take a first step to formalizing predictive knowledge by discussing the relationship of predictive knowledge learning methods to existing theories of knowledge in epistemology. Specifically, we explore the relationships between Generalized Value Functions and epistemic notions of Justification and Truth.
When Neurons Fail We view a neural network as a distributed system of which neurons can fail independently, and we evaluate its robustness in the absence of any (recovery) learning phase. We give tight bounds on the number of neurons that can fail without harming the result of a computation. To determine our bounds, we leverage the fact that neural activation functions are Lipschitz-continuous. Our bound is on a quantity, we call the \textit{Forward Error Propagation}, capturing how much error is propagated by a neural network when a given number of components is failing, computing this quantity only requires looking at the topology of the network, while experimentally assessing the robustness of a network requires the costly experiment of looking at all the possible inputs and testing all the possible configurations of the network corresponding to different failure situations, facing a discouraging combinatorial explosion. We distinguish the case of neurons that can fail and stop their activity (crashed neurons) from the case of neurons that can fail by transmitting arbitrary values (Byzantine neurons). Interestingly, as we show in the paper, our bound can easily be extended to the case where synapses can fail. We show how our bound can be leveraged to quantify the effect of memory cost reduction on the accuracy of a neural network, to estimate the amount of information any neuron needs from its preceding layer, enabling thereby a boosting scheme that prevents neurons from waiting for unnecessary signals. We finally discuss the trade-off between neural networks robustness and learning cost.
When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets Semi-Supervised Learning (SSL) has been proved to be an effective way to leverage both labeled and unlabeled data at the same time. Recent semi-supervised approaches focus on deep neural networks and have achieved promising results on several benchmarks: CIFAR10, CIFAR100 and SVHN. However, most of their experiments are based on models trained from scratch instead of pre-trained models. On the other hand, transfer learning has demonstrated its value when the target domain has limited labeled data. Here comes the intuitive question: is it possible to incorporate SSL when fine-tuning a pre-trained model? We comprehensively study how SSL methods starting from pretrained models perform under varying conditions, including training strategies, architecture choice and datasets. From this study, we obtain several interesting and useful observations. While practitioners have had an intuitive understanding of these observations, we do a comprehensive emperical analysis and demonstrate that: (1) the gains from SSL techniques over a fully-supervised baseline are smaller when trained from a pre-trained model than when trained from random initialization, (2) when the domain of the source data used to train the pre-trained model differs significantly from the domain of the target task, the gains from SSL are significantly higher and (3) some SSL methods are able to advance fully-supervised baselines (like Pseudo-Label). We hope our studies can deepen the understanding of SSL research and facilitate the process of developing more effective SSL methods to utilize pre-trained models. Code is now available at github.
Which chart or graph is right for you You´ve got data and you´ve got questions. Creating a chart or graph links the two, but sometimes you´re not sure which types of charts and graphs will help you find the answers you need. This paper answers questions about how to select the best charts for the type of data you´re analyzing and the questions you want to answer. But it won´t stop there. Stranding your data in isolated, static graphs limits the number of questions you can answer. Let your data become the centerpiece of decision making by using it to tell a story. Combine related charts. Add a map. Provide filters to dig deeper. The impact Business insight and answers to questions at the speed of thought.
Which Knowledge Graph Is Best for Me? In recent years, DBpedia, Freebase, OpenCyc, Wikidata, and YAGO have been published as noteworthy large, cross-domain, and freely available knowledge graphs. Although extensively in use, these knowledge graphs are hard to compare against each other in a given setting. Thus, it is a challenge for researchers and developers to pick the best knowledge graph for their individual needs. In our recent survey, we devised and applied data quality criteria to the above-mentioned knowledge graphs. Furthermore, we proposed a framework for finding the most suitable knowledge graph for a given setting. With this paper we intend to ease the access to our in-depth survey by presenting simplified rules that map individual data quality requirements to specific knowledge graphs. However, this paper does not intend to replace our previously introduced decision-support framework. For an informed decision on which KG is best for you we still refer to our in-depth survey.
Whitening Black-Box Neural Networks Many deployed learned models are black boxes: given input, returns output. Internal information about the model, such as the architecture, optimisation procedure, or training data, is not disclosed explicitly as it might contain proprietary information or make the system more vulnerable. This work shows that such attributes of neural networks can be exposed from a sequence of queries. This has multiple implications. On the one hand, our work exposes the vulnerability of black-box neural networks to different types of attacks — we show that the revealed internal information helps generate more effective adversarial examples against the black box model. On the other hand, this technique can be used for better protection of private content from automatic recognition models using adversarial examples. Our paper suggests that it is actually hard to draw a line between white box and black box models.
Who’s to say what’s funny A computer using Language Models and Deep Learning, That’s Who! Humor is a defining characteristic of human beings. Our goal is to develop methods that automatically detect humorous statements and rank them on a continuous scale. In this paper we report on results using a Language Model approach, and outline our plans for using methods from Deep Learning.
Why and When Deep Learning Works: Looking Inside Deep Learnings The Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI) has been heavily supporting Machine Learning and Deep Learning research from its foundation in 2012. We have asked six leading ICRI-CI Deep Learning researchers to address the challenge of ‘Why and When Deep Learning works’, with the goal of looking inside Deep Learning, providing insights on how deep networks function, and uncovering key observations on their expressiveness, limitations, and potential. The output of this challenge resulted in five papers that address different facets of deep learning. These different facets include a high-level understating of why and when deep networks work (and do not work), the impact of geometry on the expressiveness of deep networks, and making deep networks interpretable.
Why Big Data Analytics The primary focus right now is on using big data analytics to gain insights across all areas of operations, customers, and product innovation. By applying the right analytical solutions to your data, you can: • Better understand customer behavior. • Gain better insights into operations. • Identify risks such as fraud. • Comply with government regulations. • Answer complex questions. • See important patterns in data that you couldn´t see before.
Why do deep convolutional networks generalize so poorly to small image transformations Deep convolutional network architectures are often assumed to guarantee generalization for small image translations and deformations. In this paper we show that modern CNNs (VGG16, ResNet50, and InceptionResNetV2) can drastically change their output when an image is translated in the image plane by a few pixels, and that this failure of generalization also happens with other realistic small image transformations. Furthermore, the deeper the network the more we see these failures to generalize. We show that these failures are related to the fact that the architecture of modern CNNs ignores the classical sampling theorem so that generalization is not guaranteed. We also show that biases in the statistics of commonly used image datasets makes it unlikely that CNNs will learn to be invariant to these transformations. Taken together our results suggest that the performance of CNNs in object recognition falls far short of the generalization capabilities of humans.
Why Machines Cannot Learn Mathematics, Yet Nowadays, Machine Learning (ML) is seen as the universal solution to improve the effectiveness of information retrieval (IR) methods. However, while mathematics is a precise and accurate science, it is usually expressed by less accurate and imprecise descriptions, contributing to the relative dearth of machine learning applications for IR in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, and non-formal language. Given recent advances in ML, it seems canonical to apply ML techniques to represent and retrieve mathematics semantically. In this work, we apply popular text embedding techniques to the arXiv collection of STEM documents and explore how these are unable to properly understand mathematics from that corpus. In addition, we also investigate the missing aspects that would allow mathematics to be learned by computers.
Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning Learning-based pattern classifiers, including deep networks, have demonstrated impressive performance in several application domains, ranging from computer vision to computer security. However, it has also been shown that adversarial input perturbations carefully crafted either at training or at test time can easily subvert their predictions. The vulnerability of machine learning to adversarial inputs (also known as adversarial examples), along with the design of suitable countermeasures, have been investigated in the research field of adversarial machine learning. In this work, we provide a thorough overview of the evolution of this interdisciplinary research area over the last ten years, starting from pioneering, earlier work up to more recent work aimed at understanding the security properties of deep learning algorithms, in the context of different applications. We report interesting connections between these apparently-different lines of work, highlighting common misconceptions related to the evaluation of the security of machine-learning algorithms. We finally discuss the main limitations of current work, along with the corresponding future research challenges towards the design of more secure learning algorithms.
Will Big Data Make IT Infrastructure Sexy Again Love it or hate it, big data seems to be driving a renaissance in IT infrastructure spending. IDC, for example, estimates that worldwide spending for infrastructure hardware alone (servers, storage, PCs, tablets, and peripherals) will rise from $461 billion in 2013 to $468 billion in 2014. Gartner predicts that total IT spending will grow 3.1% in 2014, reaching $3.8 trillion, and forecasts ‘consistent four to five percent annual growth through 2017.’ For a lot of people, the mere thought of all that additional cash makes IT infrastructure seem sexy again. Big data impacts IT spending directly and indirectly. The direct effects are less dramatic, largely because adding terabytes to a Hadoop cluster is much less costly than adding terabytes to an enterprise data warehouse (EDW). That said, IDG Enterprise´s 2014 Big Data survey indicates that more than half of the IT leaders polled believe ‘they will have to re-architect the data center network to some extent’ to accommodate big data services. The indirect effects are more dramatic, thanks in part to Rubin´s Law (derived by Dr. Howard Rubin, the unofficial dean of IT economics), which holds that demand for technology rises as the cost of technology drops (which it invariably will, according to Moore´s Law). Since big data essentially ‘liberates’ data that had been ‘trapped’ in mainframes and EDWs, the demand for big data services will increase as organizations perceive the untapped value at their fingertips. In other words, as utilization of big data goes up, spending on big data services and related infrastructure will also rise. ‘Big data has the same sort of disruptive potential as the client-server revolution of 30 years ago, which changed the whole way that IT infrastructure evolved,’ says Marshall Presser, field chief technology officer at Pivotal, a provider of application and data infrastructure software. ‘For some people, the disruption will be exciting and for others, it will be threatening.’ For IT vendors offering products and services related to big data, the future looks particularly rosy. IDC predicts stagnant (0.7%) growth of legacy IT products and high-volume (15%) growth of cloud, mobile, social, and big data products—the so-called ‘3rd Platform’ of IT. According to IDC, ‘3rd Platform technologies and solutions will drive 29 percent of 2014 IT spending and 89 percent of all IT spending growth.’ Much of that growth will come from the ‘cannibalization’ of traditional IT markets. Viewed from that harrowing perspective, maybe ‘scary’ is a better word than ‘sexy’ to describe the looming transformation of IT infrastructure.
Wireless Network Design for Control Systems: A Survey Wireless networked control systems (WNCS) are composed of spatially distributed sensors, actuators, and con- trollers communicating through wireless networks instead of conventional point-to-point wired connections. Due to their main benefits in the reduction of deployment and maintenance costs, large flexibility and possible enhancement of safety, WNCS are becoming a fundamental infrastructure technology for critical control systems in automotive electrical systems, avionics control systems, building management systems, and industrial automation systems. The main challenge in WNCS is to jointly design the communication and control systems considering their tight interaction to improve the control performance and the network lifetime. In this survey, we make an exhaustive review of the literature on wireless network design and optimization for WNCS. First, we discuss what we call the critical interactive variables including sampling period, message delay, message dropout, and network energy consumption. The mutual effects of these communication and control variables motivate their joint tuning. We discuss the effect of controllable wireless network parameters at all layers of the communication protocols on the probability distribution of these interactive variables. We also review the current wireless network standardization for WNCS and their corresponding methodology for adapting the network parameters. Moreover, we discuss the analysis and design of control systems taking into account the effect of the interactive variables on the control system performance. Finally, we present the state-of-the-art wireless network design and optimization for WNCS, while highlighting the tradeoff between the achievable performance and complexity of various approaches. We conclude the survey by highlighting major research issues and identifying future research directions.
Word Embeddings for Sentiment Analysis: A Comprehensive Empirical Survey This work investigates the role of factors like training method, training corpus size and thematic relevance of texts in the performance of word embedding features on sentiment analysis of tweets, song lyrics, movie reviews and item reviews. We also explore specific training or post-processing methods that can be used to enhance the performance of word embeddings in certain tasks or domains. Our empirical observations indicate that models trained with multithematic texts that are large and rich in vocabulary are the best in answering syntactic and semantic word analogy questions. We further observe that influence of thematic relevance is stronger on movie and phone reviews, but weaker on tweets and lyrics. These two later domains are more sensitive to corpus size and training method, with Glove outperforming Word2vec. ‘Injecting’ extra intelligence from lexicons or generating sentiment specific word embeddings are two prominent alternatives for increasing performance of word embedding features.
Word Embeddings via Tensor Factorization Most popular word embedding techniques involve implicit or explicit factorization of a word co-occurrence based matrix into low rank factors. In this paper, we aim to generalize this trend by using numerical methods to factor higher-order word co-occurrence based arrays, or \textit{tensors}. We present four word embeddings using tensor factorization and analyze their advantages and disadvantages. One of our main contributions is a novel joint symmetric tensor factorization technique related to the idea of coupled tensor factorization. We show that embeddings based on tensor factorization can be used to discern the various meanings of polysemous words without being explicitly trained to do so, and motivate the intuition behind why this works in a way that doesn’t with existing methods. We also modify an existing word embedding evaluation metric known as Outlier Detection [Camacho-Collados and Navigli, 2016] to evaluate the quality of the order-$N$ relations that a word embedding captures, and show that tensor-based methods outperform existing matrix-based methods at this task. Experimentally, we show that all of our word embeddings either outperform or are competitive with state-of-the-art baselines commonly used today on a variety of recent datasets. Suggested applications of tensor factorization-based word embeddings are given, and all source code and pre-trained vectors are publicly available online.
Word Embeddings: A Survey This work lists and describes the main recent strategies for building fixed-length, dense and distributed representations for words, based on the distributional hypothesis. These representations are now commonly called word embeddings and, in addition to encoding surprisingly good syntactic and semantic information, have been proven useful as extra features in many downstream NLP tasks.
Word Sense Disambiguation with LSTM: Do We Really Need 100 Billion Words Recently, Yuan et al. (2016) have shown the effectiveness of using Long Short-Term Memory (LSTM) for performing Word Sense Disambiguation (WSD). Their proposed technique outperformed the previous state-of-the-art with several benchmarks, but neither the training data nor the source code was released. This paper presents the results of a reproduction study of this technique using only openly available datasets (GigaWord, SemCore, OMSTI) and software (TensorFlow). From them, it emerged that state-of-the-art results can be obtained with much less data than hinted by Yuan et al. All code and trained models are made freely available.
word2vec Parameter Learning Explained The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector representations of words learned by word2vec models have been proven to be able to carry semantic meanings and are useful in various NLP tasks. As an increasing number of researchers would like to experiment with word2vec, I notice that there lacks a material that comprehensively explains the parameter learning process of word2vec in details, thus preventing many people with less neural network experience from understanding how exactly word2vec works. This note provides detailed derivations and explanations of the parameter update equations for the word2vec models, including the original continuous bag-of-word (CBOW) and skip-gram models, as well as advanced tricks, hierarchical soft-max and negative sampling. In the appendix a review is given on the basics of neuron network models and backpropagation.

Y

Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup This paper presents Yinyang K-means, a new algorithm for K-means clustering. By clustering the centers in the initial stage, and leveraging efficiently maintained lower and upper bounds between a point and centers, it more effectively avoids unnecessary distance calculations than prior algorithms. It significantly outperforms prior K-means algorithms consistently across all experimented data sets, cluster numbers, and machine configurations. The consistent, superior performance—plus its simplicity, user-control of overheads, and guarantee in producing the same clustering results as the standard K-means—makes Yinyang K-means a dropin replacement of the classic K-means with an order of magnitude higher performance.

Z

ZAP BI Chart Type Cheat Sheet (Cheat Sheet)
Zero-Shot Action Recognition in Videos: A Survey Zero-Shot Action Recognition has attracted attention in the last years, and many approaches have been proposed for recognition of objects, events, and actions in images and videos. There is a demand for methods that can classify instances from classes that are not present in the training of models, especially in the complex task of automatic video understanding, since collecting, annotating, and labeling videos are difficult and laborious tasks. We identify that there are many methods available in the literature, however, it is difficult to categorize which techniques can be considered state of the art. Despite the existence of some surveys about zero-shot action recognition in still images and experimental protocol, there is no work focusing on videos. Hence, in this paper, we present a survey of the methods comprising techniques to perform visual feature extraction and semantic feature extraction as well to learn the mapping between these features considering specifically zero-shot action recognition in videos. We also provide a complete description of datasets, experiments, and protocols, presenting open issues and directions for future work essential for the development of the computer vision research field.

1 thought on “Documents”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.