Document worth reading: “Internet of NanoThings: Concepts and Applications”

This chapter focuses on Internet of Things from the nanoscale point of view. The chapter starts with section 1 which provides an introduction of nanothings and nanotechnologies. The nanoscale communication paradigms and the different approaches are discussed for nanodevices development. Nanodevice characteristics are discussed and the architecture of wireless nanodevices are outlined. Section 2 describes Internet of NanoThing(IoNT), its network architecture, and the challenges of nanoscale communication which is essential for enabling IoNT. Section 3 gives some practical applications of IoNT. The internet of Bio-NanoThing (IoBNT) and relevant biomedical applications are discussed. Other Applications such as military, industrial, and environmental applications are also outlined. Internet of NanoThings: Concepts and Applications


Book Memo: “Developing Enterprise Chatbots”

Learning Linguistic Structures
A chatbot is expected to be capable of supporting a cohesive and coherent conversation and be knowledgeable, which makes it one of the most complex intelligent systems being designed nowadays. Designers have to learn to combine intuitive, explainable language understanding and reasoning approaches with high-performance statistical and deep learning technologies. Today, there are two popular paradigms for chatbot construction:
1. Build a bot platform with universal NLP and ML capabilities so that a bot developer for a particular enterprise, not being an expert, can populate it with training data;
2. Accumulate a huge set of training dialogue data, feed it to a deep learning network and expect the trained chatbot to automatically learn ‘how to chat’.
Although these two approaches are reported to imitate some intelligent dialogues, both of them are unsuitable for enterprise chatbots, being unreliable and too brittle. The latter approach is based on a belief that some learning miracle will happen and a chatbot will start functioning without a thorough feature and domain engineering by an expert and interpretable dialogue management algorithms. Enterprise high-performance chatbots with extensive domain knowledge require a mix of statistical, inductive, deep machine learning and learning from the web, syntactic, semantic and discourse NLP, ontology-based reasoning and a state machine to control a dialogue. This book will provide a comprehensive source of algorithms and architectures for building chatbots for various domains based on the recent trends in computational linguistics and machine learning. The foci of this book are applications of discourse analysis in text relevant assessment, dialogue management and content generation, which help to overcome the limitations of platform-based and data driven-based approaches.

Distilled News

How to visualize convolutional features in 40 lines of code

Recently, while reading Jeremy Rifkin’s book ‘The End of Work’, I came across an interesting definition of AI. Rifkin writes: ‘today when scientists talk of artificial intelligence, they generally mean ‘the art of creating machines that perform functions which require intelligence when performed by people.’ (taken from Kurzweil, Raymond, The Age of Intelligent Machines (Cambridge, MA: MIT Press, 1990), p. 14.)’. I like this definition because it avoids the hyped discussion whether AI is truly intelligent in the sense of our intelligence. As a scientist, the thought of unveiling the functional principles of our brain and creating a truly intelligent machine excites me but I think it is important to realize that a lot of current AI research is rather aimed at automizing processes that until now were not automizable. While that may sound less exciting, it still is a great thing. Just one example: the emergence of deep convolutional neural networks revolutionized computer vision and pattern recognition and will allow us to introduce a vast amount of automation in fields such as medical diagnosis. This could allow humanity to quickly bring top medical diagnosis to people in poor countries that are not able to educate the many doctors and experts they would otherwise require.

Key Steps for Building an Effective AI Organization

Recently, I got fascinated by the impact of Artificial Intelligence on any business from any sector (tech, banking, manufacturing, etc.) This led me to explore the subject further while trying to understand what a corporation should do to transform its processes using AI. In this article, I would love to summarize my observations into a set of actionable steps which can help any organization kickstart their AI transformation. My thoughts are heavily influenced by the amazing 12-page paper, written by Andrew Ng?-?founder of Landing AI, called ‘AI Transformation Playbook’. In addition, I have taken advice from numerous McKinsey&Company reports like ‘McKinsey on Payments: Special Edition on Advanced Analytics in Banking’.

Visualizing Principal Component Analysis with Matrix Transforms

Principal Component Analysis (PCA) is a method of decomposing data into correlated components by identifying eigenvalues and eigenvectors. The following is meant to help visualize what these different values represent and how they’re calculated. First I’ll show how matrices can be used to transform data, then how those matrices are used in PCA.

AI brings speed to security

Organizations that use security tools with artificial intelligence (AI) and machine learning (ML) see a significant decrease in incident response time, according to a survey of 457 security practitioners conducted by O’Reilly Media in conjunction with Oracle. Twenty percent of IT professionals who rely on traditional security measures said their teams can detect a malware infection or other attack within minutes, according to the survey. But among IT pros who reported using AI and ML security services, that number more than doubled to 45%. The long tail shows a similar trend: only 16% of IT professionals need days or longer to find an infection when AI or ML is involved, versus a whopping 35% for those who don’t use these technologies.

Neural Network Models in R

Neural Network(or Artificial Neural Network) has the ability to learn by examples. ANN is an information processing model inspired by the biological neuron system. It is composed of a large number of highly interconnected processing elements known as the neuron to solve problems. It follows the non-linear path and process information in parallel throughout the nodes. A neural network is a complex adaptive system. Adaptive means it has the ability to change its internal structure by adjusting weights of inputs.

Introduction to Video Classification

Many Deep Learning articles and tutorials primarily focus on three data domains: images, speech, and text. These data domains are popular for their applications in image classification, speech recognition, and text sentiment classification. Another very interesting data modality is video. From a dimensionality and size perspective, videos are one of the most interesting data types alongside datasets such as social networks or genetic codes. Video uploading platforms such as YouTube are collecting enormous datasets, empowering Deep Learning research. A video is really just a stack of images. This article will review a paper [1] on video classification research led Andrej Karpathy, currently the Director of AI at Tesla. This paper models videos with Convolutional Networks in a very similar way to how CNNs model images. This paper is a great anecdote to the powerful representation power of Convolutional Networks. Prior to this work, Video Classification research was dominated by a pipeline of visual bag-of-words features quantized into a k-means dictionary and classified with a machine learning model such as an SVM. This work highlights the power of CNNs to abstract away all of these previous feature engineering algorithms. The paper also serves as a good foundation of ideas to integrate the temporal component of videos into CNN models. This paper explores three different components of Video Classification, designing CNNs which account for temporal connectivity in videos, multi-resolution CNNs which can speed up computation, and the effectiveness of transfer learning with Video Classification.

Data Science and the Paradox of Predictions

Many data science projects are a hunt for knowledge. As history has taught us through the years, the mere act of knowing can change what it is we believe to know. Professor Harari explores this topic in Homo Deus with the skill we’ve become accustomed to in his work. Giving the example of Marx’s ‘Das Kapital’, Harari provides clarity to the idea that translates into a very valuable lesson.

Introducing Manifold

Machine learning programs defer from traditional software applications in the sense that their structure is constantly changing and evolving as the model builds more knowledge. As a result, debugging and interpreting machine learning models is one of the most challenging aspects of real world artificial intelligence(AI) solutions. Debugging, interpretation and diagnosis are active areas of focus of organizations building machine learning solutions at scale. Recently, Uber unveiled Manifold, a framework that utilizes visual analysis techniques to support interpretation, debugging, and comparison of machine learning models. Manifold brings together some very advanced innovations in the areas of machine learning interpretability to address some of the fundamental challenges of visually debugging machine learning models. The challenge of debugging and interpreting machine learning models is nothing new and the industry has produced several tools and frameworks in this area. However, most of the existing stacks focus on evaluating a candidate model using performance metrics such as like log loss, area under curve (AUC), and mean absolute error (MAE) which, although useful, offer little insight in terms of the underlying reasons of the model’s performance. Another common challenge is that most machine learning debugging tools are constrained to a specific types of models(ex: regression or classification) and are very difficult to generalize across broader machine learning architectures. Consequently, data scientists spend tremendous amounts of time trying different model configurations until they can achieve specific performances.

Why Feature Correlation Matters …. A Lot!

Machine Learning models are as good or as bad as the data you have. That’s why data scientists can spend hours on pre-processing and cleansing the data. They select only the features that would contribute most to the quality of the resulting model. This process is called ‘Feature Selection’. Feature Selection is the process of selecting the attributes that can make the predicted variable more accurate or eliminating those attributes that are irrelevant and can decrease the model accuracy and quality. Data and feature correlation is considered one important step in the feature selection phase of the data pre-processing especially if the data type for the features is continuous. so what is data correlation?

Demystifying Logistic Regression

Logistic Regression is one of the most popular classification technique.In most of the tutorials and articles people usually explain the probabilistic interpretation of Logistic Regression.So in this article i will try to give the geometric intuition of Logistic Regression.The topics that i will cover in this article –
• Geometric Intuituion of Logistic Regression
• Optimisation Function
• Sigmoid Function
• Overfitting and Underfitting
• Regularisation – L2 and L1

Chinese Interests Take a Big Seat at the AI Governance Table

Last summer the Chinese government released its ambitious New Generation Artificial Intelligence Development Plan (AIDP), which set the eye-catching target of national leadership in a variety of AI fields by 2030. The plan matters not only because of what it says about China’s technological ambitions, but also for its plans to shape AI governance and policy. Part of the plan’s approach is to devote considerable effort to standards-setting processes in AI-driven sectors. This means writing guidelines not only for key technologies and interoperability, but also for the ethical and security issues that arise across an AI-enabled ecosystem, from algorithmic transparency to liability, bias, and privacy. This year Chinese organizations took a major step toward putting these aspirations into action by releasing an in-depth white paper on AI standards in January and hosting a major international AI standards meeting in Beijing in April. These developments mark Beijing’s first stake in the ground as a leader in developing AI policy and in working with international bodies, even as many governments and companies around the world grapple with uncharted territory in writing the rules on AI. China is eager to participate in international standards-setting bodies on the question of whether and how to set standards around controversial aspects of AI, such as algorithmic bias and transparency in algorithmic decision making.

Chronological Representation

It’s crucial to know the chronological order of events to learn causality, to plan, to synchronize activities in societies and for many other reasons. However, it’s still a huge challenge for both neuroscientists to understand how time is represented in brains and for AI researchers to make agents able to operate in constantly changing environments. Usually cognitive scientists, unlike physicists, treat time completely different from space. Neuroscientists already discovered a lot of mechanisms responsible for circadian rhythms, heartbeat, brainwaves and other periodic biological ‘clocks’, as well as timers operating on the millisecond-second scale. However, generation and storage of event memories and representation of time for AI agents are still open questions.

Neural Networks with Numpy for Absolute Beginners: Introduction

In this tutorial, you will get a brief understanding of what Neural Networks are and how they have been developed. In the end, you will gain a brief intuition as to how the network learns. Artificial Intelligence has become one of the hottest fields in the current day and most of us willing to dive into this field start off with Neural Networks!! But on confronting the math intensive concepts of Neural Networks we just end up learning a few frameworks like Tensorflow, Pytorch etc., for implementing Deep Learning Models. Moreover, just learning these frameworks and not understanding the underlying concepts is like playing with a black box. Whether you want to work in the industry or academia, you will be working, tweaking and playing with the models for which you need to have a clear understanding. Both the industry and the academia expect you to have full clarity of these concepts including the math. In this series of tutorials, I’ll make it extremely simple to understand Neural Networks by providing step by step explanation. Also, the math you’ll need will be the level of high school. Let us start with the inception of artificial neural networks and gain some inspiration as to how it evolved.

Interpolation with Generative Models

In this post I am going to write about generative models. It’s gonna cover the dichotomy between generative and discriminative models, and how generative models can really learn the essence of objects of interest by being able to perform interpolations.

If you did not already know

Beetle Antennae Search (BAS) google
Meta-heuristic algorithms have become very popular because of powerful performance on the optimization problem. A new algorithm called beetle antennae search algorithm (BAS) is proposed in the paper inspired by the searching behavior of longhorn beetles. The BAS algorithm imitates the function of antennae and the random walking mechanism of beetles in nature, and then two main steps of detecting and searching are implemented. Finally, the algorithm is benchmarked on 2 well-known test functions, in which the numerical results validate the efficacy of the proposed BAS algorithm.
BSAS: Beetle Swarm Antennae Search Algorithm for Optimization Problems

Prior-Aware Dual Decomposition (PADD) google
Spectral topic modeling algorithms operate on matrices/tensors of word co-occurrence statistics to learn topic-specific word distributions. This approach removes the dependence on the original documents and produces substantial gains in efficiency and provable topic inference, but at a cost: the model can no longer provide information about the topic composition of individual documents. Recently Thresholded Linear Inverse (TLI) is proposed to map the observed words of each document back to its topic composition. However, its linear characteristics limit the inference quality without considering the important prior information over topics. In this paper, we evaluate Simple Probabilistic Inverse (SPI) method and novel Prior-aware Dual Decomposition (PADD) that is capable of learning document-specific topic compositions in parallel. Experiments show that PADD successfully leverages topic correlations as a prior, notably outperforming TLI and learning quality topic compositions comparable to Gibbs sampling on various data. …

Anytime Stochastic Gradient Descent google
In this paper, we focus on approaches to parallelizing stochastic gradient descent (SGD) wherein data is farmed out to a set of workers, the results of which, after a number of updates, are then combined at a central master node. Although such synchronized SGD approaches parallelize well in idealized computing environments, they often fail to realize their promised computational acceleration in practical settings. One cause is slow workers, termed stragglers, who can cause the fusion step at the master node to stall, which greatly slowing convergence. In many straggler mitigation approaches work completed by these nodes, while only partial, is discarded completely. In this paper, we propose an approach to parallelizing synchronous SGD that exploits the work completed by all workers. The central idea is to fix the computation time of each worker and then to combine distinct contributions of all workers. We provide a convergence analysis and optimize the combination function. Our numerical results demonstrate an improvement of several factors of magnitude in comparison to existing methods. …

R Packages worth a look

purrr’-Like Apply Functions Over Input Elements (dapr)
An easy-to-use, dependency-free set of functions for iterating over elements of various input objects. Functions are wrappers around base apply()/lappl …

Evaluation of Failure Time Surrogate Endpoints in Individual Patient Data Meta-Analyses (surrosurv)
Provides functions for the evaluation of surrogate endpoints when both the surrogate and the true endpoint are failure time variables. The approaches i …

Density Surface Modelling of Distance Sampling Data (dsm)
Density surface modelling of line transect data. A Generalized Additive Model-based approach is used to calculate spatially-explicit estimates of anima …

Distilled News

Feature Selection using Genetic Algorithms in R

This is a post about feature selection using genetic algorithms in R, in which we will do a quick review about:
• What are genetic algorithms?
• GA in ML?
• What does a solution look like?
• GA process and its operators
• The fitness function
• Genetics Algorithms in R!
• Try it yourself
• Relating concepts

When Automation Bites Back

The pilots fought continuously until the end of the flight’, said Capt. Nurcahyo Utomo, the head of the investigation of Lion Air Flight 610 that crashed on October 29, 2018, killing the 189 people aboard. The analysis of the black boxes had revealed that the Boeing 737’s nose was repeatedly forced down, apparently by an automatic system receiving incorrect sensor readings. During 10 minutes preceding the tragedy, the pilots tried 24 times to manually pull up the nose of the plane. They struggled against a malfunctioning anti-stall system that they did not know how to disengage for that specific version of the plane. That type of dramatic scene of humans struggling with a stubborn automated system belongs to pop culture. In the famous scene of the 1968 science-fiction film ‘2001: A Space Odyssey’, the astronaut Dave asks HAL (Heuristically programmed ALgorithmic computer) to open a pod bay door on the spacecraft, to which HAL responds repeatedly, ‘I’m sorry, Dave, I’m afraid I can’t do that’.

Factor Analysis in R with Psych Package: Measuring Consumer Involvement

The first step for anyone who wants to promote or sell something is to understand the psychology of potential customers. Getting into the minds of consumers is often problematic because measuring psychological traits is a complex task. Researchers have developed many parameters that describe our feelings, attitudes, personality and so on. One of these measures is consumer involvement, which is a measure of the attitude people have towards a product or service. The most common method to measure psychological traits is to ask people a battery of questions. Analysing these answers is complicated because it is difficult to relate the responses to a survey to the software of the mind. While the answers given by survey respondents are the directly measured variables, what we like to know are the hidden (latent) states in the mind of the consumer. Factor Analysis is a technique that helps to discover latent variables within a responses set of data, such as a customer survey. The basic principle of measuring consumer attitudes is that the consumer’s state of mind causes them to respond to questions in a certain way. Factor analysis seeks to reverse this causality by looking for patterns in the responses that are indicative of the consumer’s state of mind. Using a computing analogy, factor analysis is a technique to reverse-engineer the source code by analysing the input and output. This article introduces the concept of consumer involvement and how it can be predictive of other important marketing metrics such as service quality. An example using data from tap water consumers illustrates the theory. The data collected from these consumers is analysed using factor analysis in R, using the psych package.

Responsible AI Practices

The development of AI is creating new opportunities to improve the lives of people around the world, from business to healthcare to education. It is also raising new questions about the best way to build fairness, interpretability, privacy, and security into these systems. These questions are far from solved, and in fact are active areas of research and development. Google is committed to making progress in the responsible development of AI and to sharing knowledge, research, tools, datasets, and other resources with the larger community. Below we share some of our current work and recommended practices. As with all of our research, we will take our latest findings into account, work to incorporate them as appropriate, and adapt as we learn more over time.

Data Scientist’s Dilemma: The Cold Start Problem – Ten Machine Learning Examples

The ancient philosopher Confucius has been credited with saying ‘study your past to know your future.’ This wisdom applies not only to life but to machine learning also. Specifically, the availability and application of labeled data (things past) for the labeling of previously unseen data (things future) is fundamental to supervised machine learning. Without labels (diagnoses, classes, known outcomes) in past data, then how do we make progress in labeling (explaining) future data? This would be a problem.

Must-Read Tutorial to Learn Sequence Modeling

The ability to predict what comes next in a sequence is fascinating. It’s one of the reasons I became interested in data science! Interestingly – human mind is really good at it, but that is not the case with machines. Given a mysterious plot in a book, the human brain will start creating outcomes. But, how to teach machines to do something similar? Thanks to Deep Learning – we can do lot more today than what was possible a few years back. The ability to work with sequence data, like music lyrics, sentence translation, understanding reviews or building chatbots – all this is now possible thanks to sequence modeling.

An Intro to High-Level Keras API in Tensorflow

Tensorflow is the most famous library used in production for deep learning models. It has a very large and awesome community and gives lots of flexibility in operations. However, Tensorflow is not that user-friendly and has a steeper learning curve. To solve that, the high-level Keras API of Tensorflow provides building blocks to create and train deep learning models more easily. Also, Keras models are made by connecting configurable building blocks together with few restrictions. It makes it more modular and composable. You can explore on their official site.

Starting with Tensorflow: the basics

Tensorflow is an open source software library for numerical computation using data flow graphs that enables machine learning practitioners to do more data-intensive computing. It provides a robust implementation of some widely used deep learning algorithms and has flexible architecture. The main features of tensorflow are fast computing, flexibility, portability, easy debugging, unified API, extensible.

A Glimpse of TensorFlow

TensorFlow is a popular open-source software library from Google. Originally it was developed by the Google Brain team for internal Google use. As the AI research community got more and more collaborative, TensorFlow was released under the Apache 2.0 open source license. Detailed study of TensorFlow can take months. But a glimpse of its power provides a good motivation to dive into it. With this in mind, this blog looks at implementation of a classification model. Classification is one of the frequent problems we work in AI. Typically we have a set of inputs that have to be classified into different categories. We can use TensorFlow to train a model for this task. Below, we will step through each step of one such implementation.

Stocks, Significance Testing & p-Hacking: How volatile is volatile?

Stocks, Significance Testing & p-Hacking. Follow me on Twitter ( for more. Over the past 32 years, October has been the most volatile month on average for the S&P500 and December the least, in this article we will use simulation to assess the statistical significance of this observation and to what extent this observation could occur by chance. All code included! Originally posted here. Our goal:
• Demonstrate how to use Pandas to analyze Time Series
• Understand how to construct a hypothesis test
• Use simulation to perform hypothesis testing
• Show the importance of accounting for multiple comparison bias

SOD – An Embedded Computer Vision & Machine Learning Library

SOD is an embedded, modern cross-platform computer vision and machine learning software library that expose a set of APIs for deep-learning, advanced media analysis & processing including real-time, multi-class object detection and model training on embedded systems with limited computational resource and IoT devices. SOD was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in open source as well commercial products.

SHAP (SHapley Additive exPlanations)

SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. SHAP connects game theory with local explanations, uniting several previous methods [1-7] and representing the only possible consistent and locally accurate additive feature attribution method based on expectations (see the SHAP NIPS paper for details).

RStudio Connect 1.7.0

RStudio Connect is the publishing platform for everything you create in R. In conversations with our customers, R users were excited to have a central place to share all their data products, but were facing a tough problem. Their colleagues working in Python didn’t have the same option, leaving their work stranded on their desktops. Today, we are excited to introduce the ability for data science teams to publish Jupyter Notebooks and mixed Python and R content to RStudio Connect. Connect 1.7.0 is a major release that also includes many other significant improvements, such as programmatic deployment and historical event data.

Whats new on arXiv

Effectiveness Assessment of Cyber-Physical Systems

By achieving their purposes through interactions with the physical world, Cyber Physical Systems (CPS) pose new challenges in terms of dependability. Indeed, the evolution of the physical systems they control with transducers can be affected by surrounding physical processes over which they have no control and which may potentially hamper the achievement of their purposes. While it is illusory to hope for a comprehensive model of the physical environment at design time to anticipate and remove faults that may occur once these systems are deployed, it becomes necessary to evaluate their degree of effectiveness in vivo.In this paper, the degree of effectiveness is formally defined and generalized in the context of the measure theory and the mathematical properties it has to comply with are detailed. The measure is developed in the context of the Transfer Belief Model (TBM), an elaboration on the Dempster Shafer Theory (DST) of evidence so as to handle epistemic and aleatory uncertainties respectively pertaining the users expectations and the natural variability of the physical environment. This theoretical framework has several advantages over the probability and the possibility theories. (1) It is built on the Open World Assumption (OWA), (2) it allows to cope with dependent and possibly unreliable sources of information. The TBM is used in conjunction with the Input Output Hidden Markov Modeling framework (IOHMM) to specify the expected evolution of the physical system controlled by the CPS and the tolerances towards uncertainties. The measure of effectiveness is obtained from the forward algorithm, leveraging the conflict entailed by the successive combinations of the beliefs obtained from observations of the physical system and the beliefs corresponding to its expected evolution. The conflict, inherent to OWA, is meant to quantify the inability of the model at explaining observations.

NeuNetS: An Automated Synthesis Engine for Neural Network Design

Application of neural networks to a vast variety of practical applications is transforming the way AI is applied in practice. Pre-trained neural network models available through APIs or capability to custom train pre-built neural network architectures with customer data has made the consumption of AI by developers much simpler and resulted in broad adoption of these complex AI models. While prebuilt network models exist for certain scenarios, to try and meet the constraints that are unique to each application, AI teams need to think about developing custom neural network architectures that can meet the tradeoff between accuracy and memory footprint to achieve the tight constraints of their unique use-cases. However, only a small proportion of data science teams have the skills and experience needed to create a neural network from scratch, and the demand far exceeds the supply. In this paper, we present NeuNetS : An automated Neural Network Synthesis engine for custom neural network design that is available as part of IBM’s AI OpenScale’s product. NeuNetS is available for both Text and Image domains and can build neural networks for specific tasks in a fraction of the time it takes today with human effort, and with accuracy similar to that of human-designed AI models.

Ranking Online Consumer Reviews

The product reviews are posted online in the hundreds and even in the thousands for some popular products. Handling such a large volume of continuously generated online content is a challenging task for buyers, sellers, and even researchers. The purpose of this study is to rank the overwhelming number of reviews using their predicted helpfulness score. The helpfulness score is predicted using features extracted from review text data, product description data and customer question-answer data of a product using random-forest classifier and gradient boosting regressor. The system is made to classify the reviews into low or high quality by random-forest classifier. The helpfulness score of the high-quality reviews is only predicted using gradient boosting regressor. The helpfulness score of the low-quality reviews is not calculated because they are never going to be in the top k reviews. They are just added at the end of the review list to the review-listing website. The proposed system provides fair review placement on review listing pages and making all high-quality reviews visible to customers on the top. The experimental results on data from two popular Indian e-commerce websites validate our claim, as 3-4 new high-quality reviews are placed in the top ten reviews along with 5-6 old reviews based on review helpfulness. Our findings indicate that inclusion of features from product description data and customer question-answer data improves the prediction accuracy of the helpfulness score.

Optimizing Deep Neural Networks with Multiple Search Neuroevolution

This paper presents an evolutionary metaheuristic called Multiple Search Neuroevolution (MSN) to optimize deep neural networks. The algorithm attempts to search multiple promising regions in the search space simultaneously, maintaining sufficient distance between them. It is tested by training neural networks for two tasks, and compared with other optimization algorithms. The first task is to solve Global Optimization functions with challenging topographies. We found to MSN to outperform classic optimization algorithms such as Evolution Strategies, reducing the number of optimization steps performed by at least 2X. The second task is to train a convolutional neural network (CNN) on the popular MNIST dataset. Using 3.33% of the training set, MSN reaches a validation accuracy of 90%. Stochastic Gradient Descent (SGD) was able to match the same accuracy figure, while taking 7X less optimization steps. Despite lagging, the fact that the MSN metaheurisitc trains a 4.7M-parameter CNN suggests promise for future development. This is by far the largest network ever evolved using a pool of only 50 samples.

Activation Functions for Generalized Learning Vector Quantization – A Performance Comparison

An appropriate choice of the activation function (like ReLU, sigmoid or swish) plays an important role in the performance of (deep) multilayer perceptrons (MLP) for classification and regression learning. Prototype-based classification learning methods like (generalized) learning vector quantization (GLVQ) are powerful alternatives. These models also deal with activation functions but here they are applied to the so-called classifier function instead. In this paper we investigate successful candidates of activation functions known for MLPs for application in GLVQ and their influence on the performance.

A robust functional time series forecasting method

Univariate time series often take the form of a collection of curves observed sequentially over time. Examples of these include hourly ground-level ozone concentration curves. These curves can be viewed as a time series of functions observed at equally spaced intervals over a dense grid. Since functional time series may contain various types of outliers, we introduce a robust functional time series forecasting method to down-weigh the influence of outliers in forecasting. Through a robust principal component analysis based on projection pursuit, a time series of functions can be decomposed into a set of robust dynamic functional principal components and their associated scores. Conditioning on the estimated functional principal components, the crux of the curve-forecasting problem lies in modeling and forecasting principal component scores, through a robust vector autoregressive forecasting method. Via a simulation study and an empirical study on forecasting ground-level ozone concentration, the robust method demonstrates the superior forecast accuracy that dynamic functional principal component regression entails. The robust method also shows the superior estimation accuracy of the parameters in the vector autoregressive models for modeling and forecasting principal component scores, and thus improves curve forecast accuracy.

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.

Probabilistic symmetry and invariant neural networks

In an effort to improve the performance of deep neural networks in data-scarce, non-i.i.d., or unsupervised settings, much recent research has been devoted to encoding invariance under symmetry transformations into neural network architectures. We treat the neural network input and output as random variables, and consider group invariance from the perspective of probabilistic symmetry. Drawing on tools from probability and statistics, we establish a link between functional and probabilistic symmetry, and obtain generative functional representations of joint and conditional probability distributions that are invariant or equivariant under the action of a compact group. Those representations completely characterize the structure of neural networks that can be used to model such distributions and yield a general program for constructing invariant stochastic or deterministic neural networks. We develop the details of the general program for exchangeable sequences and arrays, recovering a number of recent examples as special cases.

Transfer Learning and Meta Classification Based Deep Churn Prediction System for Telecom Industry

A churn prediction system guides telecom service providers to reduce revenue loss. Development of a churn prediction system for a telecom industry is a challenging task, mainly due to size of the data, high dimensional features, and imbalanced distribution of the data. In this paper, we focus on a novel solution to the inherent problems of churn prediction, using the concept of Transfer Learning (TL) and Ensemble-based Meta-Classification. The proposed method TL-DeepE is applied in two stages. The first stage employs TL by fine tuning multiple pre-trained Deep Convolution Neural Networks (CNNs). Telecom datasets are in vector form, which is converted into 2D images because Deep CNNs have high learning capacity on images. In the second stage, predictions from these Deep CNNs are appended to the original feature vector and thus are used to build a final feature vector for the high-level Genetic Programming and AdaBoost based ensemble classifier. Thus, the experiments are conducted using various CNNs as base classifiers with the contribution of high-level GP-AdaBoost ensemble classifier, and the results achieved are as an average of the outcomes. By using 10-fold cross-validation, the performance of the proposed TL-DeepE system is compared with existing techniques, for two standard telecommunication datasets; Orange and Cell2cell. In experimental result, the prediction accuracy for Orange and Cell2cell datasets were as 75.4% and 68.2% and a score of the area under the curve as 0.83 and 0.74, respectively.

Cold-start Playlist Recommendation with Multitask Learning

Playlist recommendation involves producing a set of songs that a user might enjoy. We investigate this problem in three cold-start scenarios: (i) cold playlists, where we recommend songs to form new personalised playlists for an existing user; (ii) cold users, where we recommend songs to form new playlists for a new user; and (iii) cold songs, where we recommend newly released songs to extend users’ existing playlists. We propose a flexible multitask learning method to deal with all three settings. The method learns from user-curated playlists, and encourages songs in a playlist to be ranked higher than those that are not by minimising a bipartite ranking loss. Inspired by an equivalence between bipartite ranking and binary classification, we show how one can efficiently approximate an optimal solution of the multitask learning objective by minimising a classification loss. Empirical results on two real playlist datasets show the proposed approach has good performance for cold-start playlist recommendation.

Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection

Pavement crack detection is a critical task for insuring road safety. Manual crack detection is extremely time-consuming. Therefore, an automatic road crack detection method is required to boost this progress. However, it remains a challenging task due to the intensity inhomogeneity of cracks and complexity of the background, e.g., the low contrast with surrounding pavements and possible shadows with similar intensity. Inspired by recent advances of deep learning in computer vision, we propose a novel network architecture, named Feature Pyramid and Hierarchical Boosting Network (FPHBN), for pavement crack detection. The proposed network integrates semantic information to low-level features for crack detection in a feature pyramid way. And, it balances the contribution of both easy and hard samples to loss by nested sample reweighting in a hierarchical way. To demonstrate the superiority and generality of the proposed method, we evaluate the proposed method on five crack datasets and compare it with state-of-the-art crack detection, edge detection, semantic segmentation methods. Extensive experiments show that the proposed method outperforms these state-of-the-art methods in terms of accuracy and generality.

V-monotone independence

We introduce and study a new notion of non-commutative independence, called V-monotone independence, which can be viewed as an extension of the monotone independence of Muraki. We investigate the combinatorics of mixed moments of V-monotone random variables and prove the central limit theorem. We obtain a combinatorial formula for the limit moments and we find the solution of the differential equation for the moment generating function in the implicit form.

Adapting Convolutional Neural Networks for Geographical Domain Shift

We present the winning solution for the Inclusive Images Competition organized as part of the Conference on Neural Information Processing Systems (NeurIPS 2018) Competition Track. The competition was organized to study ways to cope with domain shift in image processing, specifically geographical shift: the training and two test sets in the competition had different geographical distributions. Our solution has proven to be relatively straightforward and simple: it is an ensemble of several CNNs where only the last layer is fine-tuned with the help of a small labeled set of tuning labels made available by the organizers. We believe that while domain shift remains a formidable problem, our approach opens up new possibilities for alleviating this problem in practice, where small labeled datasets from the target domain are usually either available or can be obtained and labeled cheaply.

Robust Anomaly Detection in Images using Adversarial Autoencoders

Reliably detecting anomalies in a given set of images is a task of high practical relevance for visual quality inspection, surveillance, or medical image analysis. Autoencoder neural networks learn to reconstruct normal images, and hence can classify those images as anomalies, where the reconstruction error exceeds some threshold. Here we analyze a fundamental problem of this approach when the training set is contaminated with a small fraction of outliers. We find that continued training of autoencoders inevitably reduces the reconstruction error of outliers, and hence degrades the anomaly detection performance. In order to counteract this effect, an adversarial autoencoder architecture is adapted, which imposes a prior distribution on the latent representation, typically placing anomalies into low likelihood-regions. Utilizing the likelihood model, potential anomalies can be identified and rejected already during training, which results in an anomaly detector that is significantly more robust to the presence of outliers during training.

Document worth reading: “Deep learning for time series classification: a review”

Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state of the art performance for document classification and speech recognition. In this article, we study the current state of the art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date. Deep learning for time series classification: a review

If you did not already know

Co-Arg google
This paper presents Co-Arg, a new type of cognitive assistant to an intelligence analyst that enables the synergistic integration of analyst imagination and expertise, computer knowledge and critical reasoning, and crowd wisdom, to draw defensible and persuasive conclusions from masses of evidence of all types, in a world that is changing all the time. Co-Arg’s goal is to improve the quality of the analytic results and enhance their understandability for both experts and novices. The performed analysis is based on a sound and transparent argumentation that links evidence to conclusions in a way that shows very clearly how the conclusions have been reached, what evidence was used and how, what is not known, and what assumptions have been made. The analytic results are presented in a report describes the analytic conclusion and its probability, the main favoring and disfavoring arguments, the justification of the key judgments and assumptions, and the missing information that might increase the accuracy of the solution. …

Locally Smoothed Neural Network (LSNN) google
Convolutional Neural Networks (CNN) and the locally connected layer are limited in capturing the importance and relations of different local receptive fields, which are often crucial for tasks such as face verification, visual question answering, and word sequence prediction. To tackle the issue, we propose a novel locally smoothed neural network (LSNN) in this paper. The main idea is to represent the weight matrix of the locally connected layer as the product of the kernel and the smoother, where the kernel is shared over different local receptive fields, and the smoother is for determining the importance and relations of different local receptive fields. Specifically, a multi-variate Gaussian function is utilized to generate the smoother, for modeling the location relations among different local receptive fields. Furthermore, the content information can also be leveraged by setting the mean and precision of the Gaussian function according to the content. Experiments on some variant of MNIST clearly show our advantages over CNN and locally connected layer. …

Exponential Random Graph Models (ERGM) google
Exponential random graph models (ERGMs) are a family of statistical models for analyzing data about social and other networks. Many metrics exist to describe the structural features of an observed network such as the density, centrality, or assortativity. However, these metrics describe the observed network which is only one instance of a large number of possible alternative networks. This set of alternative networks may have similar or dissimilar structural features. To support statistical inference on the processes influencing the formation of network structure, a statistical model should consider the set of all possible alternative networks weighted on their similarity to an observed network. However because network data is inherently relational, it violates the assumptions of independence and identical distribution of standard statistical models like linear regression. Alternative statistical models should reflect the uncertainty associated with a given observation, permit inference about the relative frequency about network substructures of theoretical interest, disambiguating the influence of confounding processes, efficiently representing complex structures, and linking local-level processes to global-level properties. Degree Preserving Randomization, for example, is a specific way in which an observed network could be considered in terms of multiple alternative networks. …