Distilled News

Comprehensive Introduction to Neural Network Architecture

This article is the second in a series of articles aimed at demystifying the theory behind neural networks and how to design and implement them for solving practical problems. In this article, I will cover the design and optimization aspects of neural networks in detail.
• The topics in this article are:
• Anatomy of a neural network
• Activation functions
• Loss functions
• Output units
• Architecture
These tutorials are largely based on the notes and examples from multiple classes taught at Harvard and Stanford in the computer science and data science departments.

The Kalman Filter and External Control Inputs

In this article, you will
• Use the statsmodels Python module to implement a Kalman Filter model with external control inputs,
• Use Maximum Likelihood to estimate unknown parameters in the Kalman Filter model matrices,
• See how cumulative impact can be modeled via the Kalman Filter. (This article uses the fitness-fatigue model of athletic performance as an example and doubles as Modeling Cumulative Impact Part IV.)

Mastering the features of Google Colaboratory !!!

Google Colaboratory is a research tool for data science and machine learning. It’s a jupyter notebook environment that requires no setup to use. It is by far one of the most top tools especially for data scientists because you don’t have to manually install all the packages and libraries, just import them directly by calling them. Whereas in normal IDE you have to manually install the libraries. And moreover notebooks are meant for code and explanation, it often should look like a blog post. I have been using Google colab from past two months and it has been the best tool for me. In this blog, I would be giving you guys some tips and tricks about mastering the Google Colab. Stay tuned read all the points, these were the features which even I was struggling to implement at the first place, now I mastered it. Let’s see the top best features of Google Colab notebook.

Optimizing Source-Based-Language-Learning using Genetic Algorithm

What is Source-Based-Language-Learning in this context? Very simple – it is my way of describing the process of learning a language to literally understand a source (i.e. book, speech, etc.). In the specific case of what I will be sharing, it translates to learning Classical Arabic to be able to read/comprehend the Quran (in its native language, without translation). So why all this drama of using a Genetic Algorithm (GA) to learn a language? To understand this, it will require a better sense of the problem statement.

Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors

The 1990s saw the emergence of cognitive models that depend on very high dimensionality and randomness. They include Holographic Reduced Representations, Spatter Code, Semantic Vectors, Latent Semantic Analysis, Context-Dependent Thinning, and Vector-Symbolic Architecture. They represent things in highdimensional vectors that are manipulated by operations that produce new high-dimensional vectors in the style of traditional computing, in what is called here hyperdimensional computing on account of the very high dimensionality. The paper presents the main ideas behind these models, written as a tutorial essay in hopes of making the ideas accessible and even provocative. A sketch of how we have arrived at these models, with references and pointers to further reading, is given at the end. The thesis of the paper is that hyperdimensional representation has much to offer to students of cognitive science, theoretical neuroscience, computer science and engineering, and mathematics.

TensorFlow World – October 28 – 31, 2019 – Santa Clara, CA

Since being open sourced in 2015, TensorFlow has had a significant impact on many industries. With TensorFlow 2.0’s eager execution, intuitive high-level APIs, and flexible model building on any platform, it’s cementing its place as the production-ready, end-to-end platform driving the machine learning revolution. At TensorFlow World you’ll see TensorFlow 2.0 in action, discover new ways to use it, and learn how to successfully implement it in your enterprise.

Towards explainable AI for healthcare: Predicting and visualizing age in Chest Radiographs

I recently published a paper in SPIE 2019 that is related to a system that estimates the age of a person using Chest X-Rays (CXR) and deep learning. Such a system can be utilized in scenarios where the age information of the patient is missing. Forensics is an example of an area that could benefit. More interestingly though, by using deep network activation maps we can visualize which anatomical areas of CXRs that age affects most; offering insight on what the network ‘sees’ to estimate age. It might be too early to tell how age estimation and visualization on CXRs can have clinical implications. Nevertheless, age discrepancy between the network’s prediction and the real patient age can be useful for preventative counseling of patient health status. Excerpts from the paper as well as new experiments are provided in this post.

Recall, Precision, F1, ROC, AUC, and everything

The output of your fraud detection model is the probability [0.0-1.0] that a transaction is fraudulent. If this probability is below 0.5, you classify the transaction as non-fraudulent; otherwise, you classify the transaction as fraudulent.

How To Use Active Learning To Iteratively Improve Your Machine Learning Models

In this article, I will explain how to use active learning to iteratively improve the performance of a machine learning model. This technique is applicable to any model but for the purpose of this article, I will illustrate how it’s done to improve a binary text classifier. All the materials covered in this article are based on the 2018 Strata Data Conference Tutorial titled ‘Using R and Python for scalable data science, machine learning and AI’ from Microsoft. I assume the reader is familiar with the concept of active learning in the context of machine learning. If not, then the lead section of this Wikipedia article serves as a good introduction.

Unwrapping the Secrets of SEO: How Does Google’s Knowledge Graph Work?

The Knowledge Graph is Google’s semantic database. This is where entities are placed in relation to one another, assigned attributes and set in a thematic context or an ontology. But what is an entity? And how does the Knowledge Graph actually work? Find the answers to these questions in our latest Unwrapping the Secrets of SEO, the last in part three in Olaf Kopp’s series looking at Google’s semantics and machine learning.


PyRobot is a framework and ecosystem that enables AI researchers and students to get up and running with a robot in just a few hours, without specialized knowledge of the hardware or of details such as device drivers, control, and planning.

Self-Supervised Learning

122 slides, very readable, about learning from images, from video, and from video with sound.

End-User Probabilistic Programming (DRAFT)

Probabilistic programming aims to help users make decisions under uncertainty. The user writes code representing a probabilistic model, and receives outcomes as distributions or summary statistics. We consider probabilistic programming for end-users, in particular spreadsheet users, estimated to number in tens to hundreds of millions. We examine the sources of uncertainty actually encountered by spreadsheet users, and their coping mechanisms, via an interview study. We examine spreadsheet-based interfaces and technology to help reason under uncertainty, via probabilistic and other means. We show how uncertain values can propagate uncertainty through spreadsheets, and how sheet-defined functions can be applied to handle uncertainty. Hence, we draw conclusions about the promise and limitations of probabilistic programming for end-users.

R Packages worth a look

Automatic Estimation of Number of Principal Components in PCA (pesel)
Automatic estimation of number of principal components in PCA with PEnalized SEmi-integrated Likelihood (PESEL). See Piotr Sobczyk, Malgorzata Bogdan, …

Fast Implementation of Dijkstra Algorithm (cppRouting)
Calculation of distances, shortest paths and isochrones on weighted graphs using several variants of Dijkstra algorithm. Proposed algorithms are unidir …

Robust P-Value Combination Methods (metapro)
The meta-analysis is performed to increase the statistical power by integrating the results from several experiments. The p-values are often combined i …

Estimation in Nonprobability Sampling (NonProbEst)
Different inference procedures are proposed in the literature to correct for selection bias that might be introduced with non-random selection mechanis …

Magister Dixit

“Our conversations with chat bots and digital agents will be highly personalized. And they’ll already know about our problems, so every conversation won’t have to start from scratch. The bots will know that you’ve called five times and that you’re frustrated. They’ll have access to in-depth knowledge about everyone else who’s called with a similar issue, so they’ll know which answer is most likely to resolve your problem. And all of this will seem to happen instantaneously. For years, we’ve been talking about big data. This is where big data finally becomes useful to large numbers of people. The robot will know the answer to your question before you even ask.” Kanishk Priyadarshi ( 2017 )

Whats new on arXiv – Complete List

HGC: Hierarchical Group Convolution for Highly Efficient Neural Network
Graph Independence Testing
Attention-based Conditioning Methods for External Knowledge Integration
Redundancy-Free Computation Graphs for Graph Neural Networks
Degrees of Freedom Analysis of Unrolled Neural Networks
Generative Continual Concept Learning
Attacking Graph Convolutional Networks via Rewiring
Identifying Data And Information Streams In Cyberspace: A Multi-Dimensional Perspective
Incorporating Open Data into Introductory Courses in Statistics
DataLearner: A Data Mining and Knowledge Discovery Tool for Android Smartphones and Tablets
FairLedger: A Fair Blockchain Protocol for Financial Institutions
Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification
Time-Series Anomaly Detection Service at Microsoft
Making Classical Machine Learning Pipelines Differentiable: A Neural Translation Approach
A Survey on Neural Machine Reading Comprehension
Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining
Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns
Topic-Aware Neural Keyphrase Generation for Social Media Language
Automatically Identifying Complaints in Social Media
A Survey of Reinforcement Learning Informed by Natural Language
Evaluating the Robustness of Nearest Neighbor Classifiers: A Primal-Dual Perspective
Accelerated methods for composite non-bilinear saddle point problem
Accelerated Alternating Minimization
A Distributed Event-Triggered Control Strategy for DC Microgrids Based on Publish-Subscribe Model Over Industrial Wireless Sensor Networks
Soft-ranking Label Encoding for Robust Facial Age Estimation
Deep Music Analogy Via Latent Representation Disentanglement
Accuracy Requirements for Early Estimation of Crop Production in Senegal
Distributed sub-optimal resource allocation via a projected form of singular perturbation
Movable-Object-Aware Visual SLAM via Weakly Supervised Semantic Segmentation
Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction
Learning to Predict Novel Noun-Noun Compounds
Consensus Neural Network for Medical Imaging Denoising with Only Noisy Training Samples
The Implicit Metropolis-Hastings Algorithm
The A.B.C.Ds of Schubert calculus
A Variant of Gaussian Process Dynamical Systems
LSTM Networks Can Perform Dynamic Counting
Unsupervised Primitive Discovery for Improved 3D Generative Modeling
Low-complexity Noncoherent Maximum Likelihood Sequence Detection Scheme for CPM in Aeronautical Telemetry
Crypto art: A decentralized view
Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification
Prokhorov-like conditions for weak compactness of sets of bounded Radon measures on different topological spaces
The key to the weak-ties phenomenon
Physics-Informed Probabilistic Learning of Linear Embeddings of Non-linear Dynamics With Guaranteed Stability
Understanding overfitting peaks in generalization error: Analytical risk curves for $l_2$ and $l_1$ penalized interpolation
Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
Question Answering as Global Reasoning over Semantic Abstractions
The Packed Interval Covering Problem is NP-complete
Happy Together: Learning and Understanding Appraisal From Natural Language
Active glasses
An Attention-based Recurrent Convolutional Network for Vehicle Taillight Recognition
Novelty Detection via Network Saliency in Visual-based Deep Learning
Borders, Palindrome Prefixes, and Square Prefixes
Interpreting Age Effects of Human Fetal Brain from Spontaneous fMRI using Deep 3D Convolutional Neural Networks
UBC-NLP at SemEval-2019 Task 6:Ensemble Learning of Offensive Content With Enhanced Training Data
Balanced Off-Policy Evaluation General Action Spaces
Gendered Pronoun Resolution using BERT and an extractive question answering formulation
A general solver to the elliptical mixture model through an approximate Wasserstein manifold
SVRG for Policy Evaluation with Fewer Gradient Evaluations
Note on the bias and variance of variational inference
Finitary Boolean functions
Curiosity-Driven Multi-Criteria Hindsight Experience Replay
Aggregation of pairwise comparisons with reduction of biases
Modeling Excess Deaths After a Natural Disaster with Application to Hurricane Maria
A note on norms of signed sums of vectors
Argument Generation with Retrieval, Planning, and Realization
Norms of weighted sums of log-concave random vectors
Association of genomic subtypes of lower-grade gliomas with shape features automatically extracted by a deep learning algorithm
Integrative Factorization of Bidimensionally Linked Matrices
Neural Heterogeneous Scheduler
The Generalization-Stability Tradeoff in Neural Network Pruning
Is Attention Interpretable?
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
Reconstructing $d$-manifold subcomplexes of cubes from their $(\lfloor d/2 \rfloor + 1)$-skeletons
Factorization Bandits for Online Influence Maximization
BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization
Coin Theorems and the Fourier Expansion
A note on Hedetniemi’s conjecture, Stahl’s conjecture and the Poljak-Rödl function
Improved Adversarial Robustness via Logit Regularization Methods
RobustTrend: A Huber Loss with a Combined First and Second Order Difference Regularization for Time Series Trend Filtering
Symmetry Properties of Nested Canalyzing Functions
Embedding Imputation with Grounded Language Information
A Shuffling Theorem for Centrally Symmetric Tilings
The Impact of Regularization on High-dimensional Logistic Regression
Embodied View-Contrastive 3D Feature Learning
Variance Reduction in Gradient Exploration for Online Learning to Rank
BAGS: An automatic homework grading system using the pictures taken by smart phones
A cost-reducing partial labeling estimator in text classification problem
Multimodal Data Fusion of Non-Gaussian Spatial Fields in Sensor Networks
Deep Spatio-Temporal Neural Networks for Click-Through Rate Prediction
A Regression Approach to Certain Information Transmission Problems
Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks
Generalized Data Augmentation for Low-Resource Translation
BDNet: Bengali handwritten numeral digit recognition based on densely connected convolutional neural networks
Intriguing properties of adversarial training
Randomization and reweighted $\ell_1$-minimization for A-optimal design of linear inverse problems
The Broad Optimality of Profile Maximum Likelihood
Fast Spatially-Varying Indoor Lighting Estimation
On the Optimality of Sparse Model-Based Planning for Markov Decision Processes
Improving Neural Language Modeling via Adversarial Training
Multiway clustering via tensor block models
A Closed-Form Learned Pooling for Deep Classification Networks
Sampling Humans for Optimizing Preferences in Coloring Artwork
Learned Conjugate Gradient Descent Network for Massive MIMO Detection
Learning to Segment Skin Lesions from Noisy Annotations
Random Access for Massive Machine-Type Communications
Noninvasive super-resolution imaging through scattering media
Efficient Bayesian estimation for GARCH-type models via Sequential Monte Carlo
Transfer Learning for Hate Speech Detection in Social Media
Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization
SAR: Learning Cross-Language API Mappings with Little Knowledge
Synthesizing 3D Shapes from Silhouette Image Collections using Multi-projection Generative Adversarial Networks
Analyzing the Role of Model Uncertainty for Electronic Health Records
A Comprehensive Hidden Markov Model for Hourly Rainfall Time Series
Progressive Cluster Purification for Transductive Few-shot Learning
Robustness Verification of Tree-based Models
On the Structure of Ordered Latent Trait Models
DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions
Bayesian Automatic Relevance Determination for Utility Function Specification in Discrete Choice Models
UniDual: A Unified Model for Image and Video Understanding
Loop-erased partitioning of a graph
Few-Shot Learning with Per-Sample Rich Supervision
Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks
Non-Coherent Rate Splitting for the MISO BC with Magnitude CSIT
Nonparametric Independence Testing for Right-Censored Data using Optimal Transport
A Lyapunov Approach to Robust Regulation of Distributed Port-Hamiltonian Systems
Deep Learning-Based Automatic Downbeat Tracking: A Brief Review
Propagation of chaos for a General Balls into Bins dynamics
A Closed-Form GN-Model Non-Linear Interference Coherence Term
LASSi: Metric based I/O analytics for HPC
Goodness-of-fit Test for Latent Block Models
Big Ramsey degrees of 3-uniform hypergraphs
Analysis of parallel I/O use on the UK national supercomputing service, ARCHER using Cray LASSi and EPCC SAFE
Learning to combine Grammatical Error Corrections
Time-Optimal Control Problem With State Constraints In A Time-Periodic Flow Field
An Image Clustering Auto-Encoder Based on Predefined Evenly-Distributed Class Centroids and MMD Distance
Automatic Segmentation of Vestibular Schwannoma from T2-Weighted MRI by Deep Spatial Attention with Hardness-Weighted Loss
Selection of Waveform Parameters Using Machine Learning for 5G and Beyond
User mode selection of NOMA based D2D communication for maximum sum-revenue
The role of ego vision in view-invariant action recognition
Generation of Multimodal Justification Using Visual Word Constraint Model for Explainable Computer-Aided Diagnosis
Multi-objects Generation with Amortized Structural Regularization
Tuning-Free, Low Memory Robust Estimator to Mitigate GPS Spoofing Attacks
On the Secrecy Performance of NOMA Systems with both External and Internal Eavesdroppers
A generalization of Heffter arrays
Coalescence for a Galton-Watson process with immigration
Intelligent Reflecting Surface vs. Decode-and-Forward: How Large Surfaces Are Needed to Beat Relaying?
Multimodal Logical Inference System for Visual-Textual Entailment
Best-First Width Search for Multi Agent Privacy-preserving Planning
Exploration and Exploitation in Symbolic Regression using Quality-Diversity and Evolutionary Strategies Algorithms
FaRM: Fair Reward Mechanism for Information Aggregation in Spontaneous Localized Settings (Extended Version)
Autonomous Goal Exploration using Learned Goal Spaces for Visuomotor Skill Acquisition in Robots
E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles
Unit Impulse Response as an Explainer of Redundancy in a Deep Convolutional Neural Network
Errors-in-variables Modeling of Personalized Treatment-Response Trajectories
2nd Place and 2nd Place Solution to Kaggle Landmark Recognition andRetrieval Competition 2019
Tropical Representations of Plactic Monoids
Automatic Algorithm Selection In Multi-agent Pathfinding
The Riddle of Togelby
Detecting Clues for Skill Levels and Machine Operation Difficulty from Egocentric Vision
Weighted Quasi Interpolant Spline Approximation of 3D point clouds via local refinement
Safe Reinforcement Learning Using Robust MPC
MPC-Based Precision Cooling Strategy (PCS) for Efficient Thermal Management of Automotive Air Conditioning System
Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past
An active-set algorithm for norm constrained quadratic problems
Project Thyia: A Forever Gameplayer
On the Odd Cycle Game and Connected Rules
Pitfalls and Protocols in Practice of Manufacturing Data Science
CRCEN: A Generalized Cost-sensitive Neural Network Approach for Imbalanced Classification
‘Did You Hear That?’ Learning to Play Video Games from Audio Cues
Neural Spline Flows
Towards Safe Reinforcement Learning Using NMPC and Policy Gradients: Part II – Deterministic Case
The University of Helsinki submissions to the WMT19 news translation task
CAiRE_HKUST at SemEval-2019 Task 3: Hierarchical Attention for Dialogue Emotion Classification
GLTR: Statistical Detection and Visualization of Generated Text
Multiparametric Deep Learning and Radiomics for Tumor Grading and Treatment Response Assessment of Brain Cancer: Preliminary Results
On the performance of various parallel GMRES implementations on CPU and GPU clusters
Joint Semantic Domain Alignment and Target Classifier Learning for Unsupervised Domain Adaptation
Enabling Robust State Estimation through Measurement Error Covariance Adaptation
Towards Safe Reinforcement Learning Using NMPC and Policy Gradients: Part I – Stochastic case
Data-driven Reconstruction of Nonlinear Dynamics from Sparse Observation
A Dijkstra-Based Efficient Algorithm for Finding a Shortest Non-zero Path in Group-Labeled Graphs

Whats new on arXiv

HGC: Hierarchical Group Convolution for Highly Efficient Neural Network

Group convolution works well with many deep convolutional neural networks (CNNs) that can effectively compress the model by reducing the number of parameters and computational cost. Using this operation, feature maps of different group cannot communicate, which restricts their representation capability. To address this issue, in this work, we propose a novel operation named Hierarchical Group Convolution (HGC) for creating computationally efficient neural networks. Different from standard group convolution which blocks the inter-group information exchange and induces the severe performance degradation, HGC can hierarchically fuse the feature maps from each group and leverage the inter-group information effectively. Taking advantage of the proposed method, we introduce a family of compact networks called HGCNets. Compared to networks using standard group convolution, HGCNets have a huge improvement in accuracy at the same model size and complexity level. Extensive experimental results on the CIFAR dataset demonstrate that HGCNets obtain significant reduction of parameters and computational cost to achieve comparable performance over the prior CNN architectures designed for mobile devices such as MobileNet and ShuffleNet.

Graph Independence Testing

Identifying statistically significant dependency between variables is a key step in scientific discoveries. Many recent methods, such as distance and kernel tests, have been proposed for valid and consistent independence testing and can be applied to data in Euclidean and non-Euclidean spaces. However, in those works, n pairs of points in \mathcal{X} \times \mathcal{Y} are observed. Here, we consider the setting where a pair of n \times n graphs are observed, and the corresponding adjacency matrices are treated as kernel matrices. Under a \rho-correlated stochastic block model, we demonstrate that a na\’ive test (permutation and Pearson’s) for a conditional dependency graph model is invalid. Instead, we propose a block-permutation procedure. We prove that our procedure is valid and consistent — even when the two graphs have different marginal distributions, are weighted or unweighted, and the latent vertex assignments are unknown — and provide sufficient conditions for the tests to estimate \rho. Simulations corroborate these results on both binary and weighted graphs. Applying these tests to the whole-organism, single-cell-resolution structural connectomes of C. elegans, we identify strong statistical dependency between the chemical synapse connectome and the gap junction connectome.

Attention-based Conditioning Methods for External Knowledge Integration

In this paper, we present a novel approach for incorporating external knowledge in Recurrent Neural Networks (RNNs). We propose the integration of lexicon features into the self-attention mechanism of RNN-based architectures. This form of conditioning on the attention distribution, enforces the contribution of the most salient words for the task at hand. We introduce three methods, namely attentional concatenation, feature-based gating and affine transformation. Experiments on six benchmark datasets show the effectiveness of our methods. Attentional feature-based gating yields consistent performance improvement across tasks. Our approach is implemented as a simple add-on module for RNN-based models with minimal computational overhead and can be adapted to any deep neural architecture.

Redundancy-Free Computation Graphs for Graph Neural Networks

Graph Neural Networks (GNNs) are based on repeated aggregations of information across nodes’ neighbors in a graph. However, because common neighbors are shared between different nodes, this leads to repeated and inefficient computations. We propose Hierarchically Aggregated computation Graphs (HAGs), a new GNN graph representation that explicitly avoids redundancy by managing intermediate aggregation results hierarchically, eliminating repeated computations and unnecessary data transfers in GNN training and inference. We introduce an accurate cost function to quantitatively evaluate the runtime performance of different HAGs and use a novel HAG search algorithm to find optimized HAGs. Experiments show that the HAG representation significantly outperforms the standard GNN graph representation by increasing the end-to-end training throughput by up to 2.8x and reducing the aggregations and data transfers in GNN training by up to 6.3x and 5.6x, while maintaining the original model accuracy.

Degrees of Freedom Analysis of Unrolled Neural Networks

Unrolled neural networks emerged recently as an effective model for learning inverse maps appearing in image restoration tasks. However, their generalization risk (i.e., test mean-squared-error) and its link to network design and train sample size remains mysterious. Leveraging the Stein’s Unbiased Risk Estimator (SURE), this paper analyzes the generalization risk with its bias and variance components for recurrent unrolled networks. We particularly investigate the degrees-of-freedom (DOF) component of SURE, trace of the end-to-end network Jacobian, to quantify the prediction variance. We prove that DOF is well-approximated by the weighted \textit{path sparsity} of the network under incoherence conditions on the trained weights. Empirically, we examine the SURE components as a function of train sample size for both recurrent and non-recurrent (with many more parameters) unrolled networks. Our key observations indicate that: 1) DOF increases with train sample size and converges to the generalization risk for both recurrent and non-recurrent schemes; 2) recurrent network converges significantly faster (with less train samples) compared with non-recurrent scheme, hence recurrence serves as a regularization for low sample size regimes.

Generative Continual Concept Learning

After learning a concept, humans are also able to continually generalize their learned concepts to new domains by observing only a few labeled instances without any interference with the past learned knowledge. In contrast, learning concepts efficiently in a continual learning setting remains an open challenge for current Artificial Intelligence algorithms as persistent model retraining is necessary. Inspired by the Parallel Distributed Processing learning and the Complementary Learning Systems theories, we develop a computational model that is able to expand its previously learned concepts efficiently to new domains using a few labeled samples. We couple the new form of a concept to its past learned forms in an embedding space for effective continual learning. Doing so, a generative distribution is learned such that it is shared across the tasks in the embedding space and models the abstract concepts. This procedure enables the model to generate pseudo-data points to replay the past experience to tackle catastrophic forgetting.

Attacking Graph Convolutional Networks via Rewiring

Graph Neural Networks (GNNs) have boosted the performance of many graph related tasks such as node classification and graph classification. Recent researches show that graph neural networks are vulnerable to adversarial attacks, which deliberately add carefully created unnoticeable perturbation to the graph structure. The perturbation is usually created by adding/deleting a few edges, which might be noticeable even when the number of edges modified is small. In this paper, we propose a graph rewiring operation which affects the graph in a less noticeable way compared to adding/deleting edges. We then use reinforcement learning to learn the attack strategy based on the proposed rewiring operation. Experiments on real world graphs demonstrate the effectiveness of the proposed framework. To understand the proposed framework, we further analyze how its generated perturbation to the graph structure affects the output of the target model.

Identifying Data And Information Streams In Cyberspace: A Multi-Dimensional Perspective

Cyberspace has gradually replaced the physical reality, its role evolving from a simple enabler of daily live processes to a necessity for modern existence. As a result of this convergence of physical and virtual realities, for all processes being critically dependent on networked communications, information representative of our physical, logical and social thoughts is constantly being generated in cyberspace. The interconnection and integration of links between our physical and virtual realities creates a new hyperspace as a source of data and information. Additionally, significant studies in cyber analysis have predominantly revolved around a single linear analysis of information from a single source of evidence (The Network). These studies are limited in their ability to understand the dynamics of relationships across the multiple dimensions of cyberspace. This paper introduces a multi-dimensional perspective for data identification in cyberspace. It provides critical discussions for identifying entangled relationships amongst entities across cyberspace.

Incorporating Open Data into Introductory Courses in Statistics

The 2016 Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report emphasized six recommendations to teach introductory courses in statistics. Among them: use of real data with context and purpose. Many educators have created databases consisting of multiple data sets for use in class; sometimes making hundreds of data sets available. Yet `the context and purpose’ component of the data may remain elusive if just a generic database is made available. We describe the use of open data in introductory courses. Countries and cities continue to share data through open data portals. Hence, educators can find regional data that engages their students more effectively. We present excerpts from case studies that show the application of statistical methods to data on: crime, housing, rainfall, tourist travel, and others. Data wrangling and discussion of results are recognized as important case study components. Thus the open data based case studies attend most GAISE College Report recommendations. Reproducible \textsf{R} code is made available for each case study. Example uses of open data in more advanced courses in statistics are also described.

DataLearner: A Data Mining and Knowledge Discovery Tool for Android Smartphones and Tablets

Smartphones have become the ultimate ‘personal’ computer, yet despite this, general-purpose data-mining and knowledge discovery tools for mobile devices are surprisingly rare. DataLearner is a new data-mining application designed specifically for Android devices that imports the Weka data-mining engine and augments it with algorithms developed by Charles Sturt University. Moreover, DataLearner can be expanded with additional algorithms. Combined, DataLearner delivers 40 classification, clustering and association rule mining algorithms for model training and evaluation without need for cloud computing resources or network connectivity. It provides the same classification accuracy as PCs and laptops, while doing so with acceptable processing speed and consuming negligible battery life. With its ability to provide easy-to-use data-mining on a phone-size screen, DataLearner is a new portable, self-contained data-mining tool for remote, personalised and learning applications alike. DataLearner features four elements – this paper, the app available on Google Play, the GPL3-licensed source code on GitHub and a short video on YouTube.

FairLedger: A Fair Blockchain Protocol for Financial Institutions

Financial institutions are currently looking into technologies for permissioned blockchains. A major effort in this direction is Hyperledger, an open source project hosted by the Linux Foundation and backed by a consortium of over a hundred companies. A key component in permissioned blockchain protocols is a byzantine fault tolerant (BFT) consensus engine that orders transactions. However, currently available BFT solutions in Hyperledger (as well as in the literature at large) are inadequate for financial settings; they are not designed to ensure fairness or to tolerate selfish behavior that arises when financial institutions strive to maximize their own profit. We present FairLedger, a permissioned blockchain BFT protocol, which is fair, designed to deal with rational behavior, and, no less important, easy to understand and implement. The secret sauce of our protocol is a new communication abstraction, called detectable all-to-all (DA2A), which allows us to detect participants (byzantine or rational) that deviate from the protocol, and punish them. We implement FairLedger in the Hyperledger open source project, using Iroha framework, one of the biggest projects therein. To evaluate FairLegder’s performance, we also implement it in the PBFT framework and compare the two protocols. Our results show that in failure-free scenarios FairLedger achieves better throughput than both Iroha’s implementation and PBFT in wide-area settings.

Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification

Open-domain targeted sentiment analysis aims to detect opinion targets along with their sentiment polarities from a sentence. Prior work typically formulates this task as a sequence tagging problem. However, such formulation suffers from problems such as huge search space and sentiment inconsistency. To address these problems, we propose a span-based extract-then-classify framework, where multiple opinion targets are directly extracted from the sentence under the supervision of target span boundaries, and corresponding polarities are then classified using their span representations. We further investigate three approaches under this framework, namely the pipeline, joint, and collapsed models. Experiments on three benchmark datasets show that our approach consistently outperforms the sequence tagging baseline. Moreover, we find that the pipeline model achieves the best performance compared with the other two models.

Time-Series Anomaly Detection Service at Microsoft

Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which is designed to be accurate, efficient and general. The pipeline consists of three major modules, including data ingestion, experimentation platform and online compute. To tackle the problem of time-series anomaly detection, we propose a novel algorithm based on Spectral Residual (SR) and Convolutional Neural Network (CNN). Our work is the first attempt to borrow the SR model from visual saliency detection domain to time-series anomaly detection. Moreover, we innovatively combine SR and CNN together to improve the performance of SR model. Our approach achieves superior experimental results compared with state-of-the-art baselines on both public datasets and Microsoft production data.

Making Classical Machine Learning Pipelines Differentiable: A Neural Translation Approach

Classical Machine Learning (ML) pipelines often comprise of multiple ML models where models, within a pipeline, are trained in isolation. Conversely, when training neural network models, layers composing the neural models are simultaneously trained using backpropagation. We argue that the isolated training scheme of ML pipelines is sub-optimal, since it cannot jointly optimize multiple components. To this end, we propose a framework that translates a pre-trained ML pipeline into a neural network and fine-tunes the ML models within the pipeline jointly using backpropagation. Our experiments show that fine-tuning of the translated pipelines is a promising technique able to increase the final accuracy.

A Survey on Neural Machine Reading Comprehension

Enabling a machine to read and comprehend the natural language documents so that it can answer some questions remains an elusive challenge. In recent years, the popularity of deep learning and the establishment of large-scale datasets have both promoted the prosperity of Machine Reading Comprehension. This paper aims to present how to utilize the Neural Network to build a Reader and introduce some classic models, analyze what improvements they make. Further, we also point out the defects of existing models and future research directions

Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining

Residual Networks with convolutional layers are widely used in the field of machine learning. Since they effectively extract features from input data by stacking multiple layers, they can achieve high accuracy in many applications. However, the stacking of many layers raises their computation costs. To address this problem, we propose Network Implosion, it erases multiple layers from Residual Networks without degrading accuracy. Our key idea is to introduce a priority term that identifies the importance of a layer; we can select unimportant layers according to the priority and erase them after the training. In addition, we retrain the networks to avoid critical drops in accuracy after layer erasure. A theoretical assessment reveals that our erasure and retraining scheme can erase layers without accuracy drop, and achieve higher accuracy than is possible with training from scratch. Our experiments show that Network Implosion can, for classification on Cifar-10/100 and ImageNet, reduce the number of layers by 24.00 to 42.86 percent without any drop in accuracy.

Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns

As machine learning is increasingly used to make real-world decisions, recent research efforts aim to define and ensure fairness in algorithmic decision making. Existing methods often assume a fixed set of observable features to define individuals, but lack a discussion of certain features not being observed at test time. In this paper, we study fairness of naive Bayes classifiers, which allow partial observations. In particular, we introduce the notion of a discrimination pattern, which refers to an individual receiving different classifications depending on whether some sensitive attributes were observed. Then a model is considered fair if it has no such pattern. We propose an algorithm to discover and mine for discrimination patterns in a naive Bayes classifier, and show how to learn maximum-likelihood parameters subject to these fairness constraints. Our approach iteratively discovers and eliminates discrimination patterns until a fair model is learned. An empirical evaluation on three real-world datasets demonstrates that we can remove exponentially many discrimination patterns by only adding a small fraction of them as constraints.

Topic-Aware Neural Keyphrase Generation for Social Media Language

A huge volume of user-generated content is daily produced on social media. To facilitate automatic language understanding, we study keyphrase prediction, distilling salient information from massive posts. While most existing methods extract words from source posts to form keyphrases, we propose a sequence-to-sequence (seq2seq) based neural keyphrase generation framework, enabling absent keyphrases to be created. Moreover, our model, being topic-aware, allows joint modeling of corpus-level latent topic representations, which helps alleviate the data sparsity that widely exhibited in social media language. Experiments on three datasets collected from English and Chinese social media platforms show that our model significantly outperforms both extraction and generation models that do not exploit latent topics. Further discussions show that our model learns meaningful topics, which interprets its superiority in social media keyphrase generation.

Automatically Identifying Complaints in Social Media

Complaining is a basic speech act regularly used in human and computer mediated communication to express a negative mismatch between reality and expectations in a particular situation. Automatically identifying complaints in social media is of utmost importance for organizations or brands to improve the customer experience or in developing dialogue systems for handling and responding to complaints. In this paper, we introduce the first systematic analysis of complaints in computational linguistics. We collect a new annotated data set of written complaints expressed in English on Twitter.\footnote{Data and code is available here: \url{https://…/complaints-social-media}} We present an extensive linguistic analysis of complaining as a speech act in social media and train strong feature-based and neural models of complaints across nine domains achieving a predictive performance of up to 79 F1 using distant supervision.

A Survey of Reinforcement Learning Informed by Natural Language

To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand. Recent advances in representation learning for language make it possible to build models that acquire world knowledge from text corpora and integrate this knowledge into downstream decision making problems. We thus argue that the time is right to investigate a tight integration of natural language understanding into RL in particular. We survey the state of the field, including work on instruction following, text games, and learning from textual domain knowledge. Finally, we call for the development of new environments as well as further investigation into the potential uses of recent Natural Language Processing (NLP) techniques for such tasks.

Evaluating the Robustness of Nearest Neighbor Classifiers: A Primal-Dual Perspective

We study the problem of computing the minimum adversarial perturbation of the Nearest Neighbor (NN) classifiers. Previous attempts either conduct attacks on continuous approximations of NN models or search for the perturbation by some heuristic methods. In this paper, we propose the first algorithm that is able to compute the minimum adversarial perturbation. The main idea is to formulate the problem as a list of convex quadratic programming (QP) problems that can be efficiently solved by the proposed algorithms for 1-NN models. Furthermore, we show that dual solutions for these QP problems could give us a valid lower bound of the adversarial perturbation that can be used for formal robustness verification, giving us a nice view of attack/verification for NN models. For K-NN models with larger K, we show that the same formulation can help us efficiently compute the upper and lower bounds of the minimum adversarial perturbation, which can be used for attack and verification.

If you did not already know

Ensemble Actor-Critic (EAC) google
We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected long-term rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new value-function based mechanism for high-level action selection. Empirically we show that TAC, RAC and EAC can achieve state-of-the-art performance on a range of benchmark control tasks, outperforming SAC and several cutting-edge learning algorithms in terms of both sample efficiency and effectiveness. …

Robust Regression Extended with Ensemble Loss Function (RELF) google
Ensemble techniques are powerful approaches that combine several weak learners to build a stronger one. As a meta-learning framework, ensemble techniques can easily be applied to many machine learning methods. Inspired by ensemble techniques, in this paper we propose an ensemble loss functions applied to a simple regressor. We then propose a half-quadratic learning algorithm in order to find the parameter of the regressor and the optimal weights associated with each loss function. Moreover, we show that our proposed loss function is robust in noisy environments. For a particular class of loss functions, we show that our proposed ensemble loss function is Bayes consistent and robust. Experimental evaluations on several datasets demonstrate that our proposed ensemble loss function significantly improves the performance of a simple regressor in comparison with state-of-the-art methods. …

StartNet google
We propose StartNet to address Online Detection of Action Start (ODAS) where action starts and their associated categories are detected in untrimmed, streaming videos. Previous methods aim to localize action starts by learning feature representations that can directly separate the start point from its preceding background. It is challenging due to the subtle appearance difference near the action starts and the lack of training data. Instead, StartNet decomposes ODAS into two stages: action classification (using ClsNet) and start point localization (using LocNet). ClsNet focuses on per-frame labeling and predicts action score distributions online. Based on the predicted action scores of the past and current frames, LocNet conducts class-agnostic start detection by optimizing long-term localization rewards using policy gradient methods. The proposed framework is validated on two large-scale datasets, THUMOS’14 and ActivityNet. The experimental results show that StartNet significantly outperforms the state-of-the-art by 15%-30% p-mAP under the offset tolerance of 1-10 seconds on THUMOS’14, and achieves comparable performance on ActivityNet with 10 times smaller time offset. …

Sub-LInear Deep Learning Engine (SLIDE) google
Deep Learning (DL) algorithms are the central focus of modern machine learning systems. As data volumes keep growing, it has become customary to train large neural networks with hundreds of millions of parameters with enough capacity to memorize these volumes and obtain state-of-the-art accuracy. To get around the costly computations associated with large models and data, the community is increasingly investing in specialized hardware for model training. However, with the end of Moore’s law, there is a limit to such scaling. The progress on the algorithmic front has failed to demonstrate a direct advantage over powerful hardware such as NVIDIA-V100 GPUs. This paper provides an exception. We propose SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, which drastically reduce the computation during both training and inference, with simple multi-core parallelism on a modest CPU. SLIDE is an auspicious illustration of the power of smart randomized algorithms over CPUs in outperforming the best available GPU with an optimized implementation. Our evaluations on large industry-scale datasets, with some large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 2.7 times (2 hours vs. 5.5 hours) faster than the same network trained using Tensorflow on Tesla V100 at any given accuracy level. We provide codes and benchmark scripts for reproducibility. …

Document worth reading: “Shannon’s entropy and its Generalizations towards Statistics, Reliability and Information Science during 1948-2018”

Starting from the pioneering works of Shannon and Weiner in 1948, a plethora of works have been reported on entropy in different directions. Entropy-related review work in the direction of statistics, reliability and information science, to the best of our knowledge, has not been reported so far. Here we have tried to collect all possible works in this direction during the period 1948-2018 so that people interested in entropy, specially the new researchers, get benefited. Shannon’s entropy and its Generalizations towards Statistics, Reliability and Information Science during 1948-2018

Whats new on arXiv – Complete List

How Different Is It Between Machine-Generated and Developer-Provided Patches? An Empirical Study on The Correct Patches Generated by Automated Program Repair Techniques
Recovering Variable Names for Minified Code with Usage Contexts
TickTalk — Timing API for Dynamically Federated Cyber-Physical Systems
Optimal Exploitation of Clustering and History Information in Multi-Armed Bandit
Intrinsic Stability: Global Stability of Dynamical Networks and Switched Systems Resilient to any Type of Time-Delays
Collage Inference: Achieving low tail latency during distributed image classification using coded redundancy models
Understanding Generalization through Visualizations
Sparse Variational Inference: Bayesian Coresets from Scratch
Detecting the Starting Frame of Actions in Video
Real or Fake? Learning to Discriminate Machine from Human Generated Text
Unsupervised Feature Learning with K-means and An Ensemble of Deep Convolutional Neural Networks for Medical Image Classification
Global Semantic Description of Objects based on Prototype Theory
Strategies to architect AI Safety: Defense to guard AI from Adversaries
Learning Individual Treatment Effects from Networked Observational Data
This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation
Forward and Backward Knowledge Transfer for Sentiment Classification
Simultaneous Classification and Novelty Detection Using Deep Neural Networks
Guidelines for Responsible and Human-Centered Use of Explainable Machine Learning
Four Things Everyone Should Know to Improve Batch Normalization
A Survey on Neural Network Language Models
Learned Sectors: A fundamentals-driven sector reclassification project
Secrets of the Brain: An Introduction to the Brain Anatomical Structure and Biological Function
Predicting Global Variations in Outdoor PM2.5 Concentrations using Satellite Images and Deep Convolutional Neural Networks
Specialness and the Bose representation
Conics in Baer subplanes
Characterising elliptic solids of $Q(4,q)$, $q$ even
Reliable training and estimation of variance networks
Token-Curated Registry with Citation Graph
A differential game analysis of R&D in oligopoly with differentiated goods under general demand and cost functions: Bertrand vs. Cournot
Solving Electrical Impedance Tomography with Deep Learning
Visual Backpropagation
PDE Traffic Observer Validated on Freeway Data
Ultra-Wideband Air-to-Ground Propagation Channel Characterization in an Open Area
Smart IoT Cameras for Crowd Analysis based on augmentation for automatic pedestrian detection, simulation and annotation
Learning Temporal Pose Estimation from Sparsely-Labeled Videos
Cormorant: Covariant Molecular Neural Networks
Acceleration of Radiation Transport Solves Using Artificial Neural Networks
Hidden Convexity in the l0 Pseudonorm
A Class of Analytic Solutions for Verification and Convergence Analysis of Linear and Nonlinear Fluid-Structure Interaction Algorithms
Synchronization of complex human networks
PHiSeg: Capturing Uncertainty in Medical Image Segmentation
Enhanced Optimization with Composite Objectives and Novelty Pulsation
Theory of a Planckian metal
Deep Robust Single Image Depth Estimation Neural Network Using Scene Understanding
When and Why Metaheuristics Researchers Can Ignore ‘No Free Lunch’ Theorems
Latent feature disentanglement for 3D meshes
Effectiveness of Equalized Odds for Fair Classification under Imperfect Group Information
Dynamic First Price Auctions Robust to Heterogeneous Buyers
Adaptive Nonparametric Variational Autoencoder
Peter-Weyl, Howe and Schur-Weyl theorems for current groups
On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset
Assessing incrementality in sequence-to-sequence models
PyramNet: Point Cloud Pyramid Attention Network and Graph Embedding Module for Classification and Segmentation
Clustering Degree-Corrected Stochastic Block Model with Outliers
An optimal transport problem with backward martingale constraints motivated by insider trading
Adversarial Examples for Non-Parametric Methods: Attacks, Defenses and Large Sample Limits
Vandermondes in superspace
Optimal Transport Relaxations with Application to Wasserstein GANs
Efficient non-conjugate Gaussian process factor models for spike count data using polynomial approximations
An Approximate Restricted Likelihood Ratio Test for Variance Components in Generalized Linear Mixed Models
Empirical Likelihood for Contextual Bandits
Approximately Strategyproof Tournament Rules: On Large Manipulating Sets and Cover-Consistence
Nonlinear Pose Filters on the Special Euclidean Group SE(3) with Guaranteed Transient and Steady-state Performance
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Manifold Matching Complexes
Efficient Project Gradient Descent for Ensemble Adversarial Attack
Benchmarking Minimax Linkage
Extension of Rough Set Based on Positive Transitive Relation
Dissecting Content and Context in Argumentative Relation Analysis
A Naive Bayes Approach for NFL Passing Evaluation using Tracking Data Extracted from Images
A splitting theorem for ordered hypergraphs
Increasing Transparent and Accountable Use of Data by Quantifying the Actual Privacy Risk in Interactive Record Linkage
When Unseen Domain Generalization is Unnecessary? Rethinking Data Augmentation
Classifying the reported ability in clinical mobility descriptions
Video Modeling with Correlation Networks
Watch, Try, Learn: Meta-Learning from Demonstrations and Reward
Structural Decompositions for End-to-End Relighting
Deep Contextualized Biomedical Abbreviation Expansion
Robust Bi-Tempered Logistic Loss Based on Bregman Divergences
Partially Linear Additive Gaussian Graphical Models
TransNet: A deep network for fast detection of common shot transitions
Online Forecasting of Total-Variation-bounded Sequences
Using learned optimizers to make models robust to input noise
Estimation Rates for Sparse Linear Cyclic Causal Models
Lift Up and Act! Classifier Performance in Resource-Constrained Applications
Clinical Concept Extraction for Document-Level Coding
S-ConvNet: A Shallow Convolutional Neural Network Architecture for Neuromuscular Activity Recognition Using Instantaneous High-Density Surface EMG Images
On the Leaders’ Graphical Characterization for Controllability of Path Related Graphs
A Characterization of $q$-binomials and its Application to Coding Theory
A Novel Modeling Approach for All-Dielectric Metasurfaces Using Deep Neural Networks
Physical implementation of quantum nonparametric learning with trapped ions
Detection and Prediction of Users Attitude Based on Real-Time and Batch Sentiment Analysis of Facebook Comments
Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
A Ride-Matching Strategy For Large Scale Dynamic Ridesharing Services Based on Polar Coordinates
Bayesian parametric analytic continuation of Green’s functions
Making targeted black-box evasion attacks effective and efficient
The regulator problem for the one-dimensional Schrodinger equation via the backstepping approach
Optimal Convergence for Stochastic Optimization with Multiple Expectation Constraints
Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis
A Coarse-to-Fine Framework for Learned Color Enhancement with Non-Local Attention
Secure Beamforming in MISO NOMA Backscatter Device Aided Symbiotic Radio Networks
A Two-Step Graph Convolutional Decoder for Molecule Generation
Asymptotically Optimal Change Point Detection for Composite Hypothesis in State Space Models
Class-specific Differential Detection in Diffractive Optical Neural Networks Improves Inference Accuracy
A Component-Based Approach to Traffic Data Wrangling
Lifschitz tail for alloy-type models driven by the fractional Laplacian
GSI: GPU-friendly Subgraph Isomorphism
Asymptotic Formulas for Empirical Measures of (Reflecting) Diffusion Processes on Riemannian Manifolds
News Labeling as Early as Possible: Real or Fake?
Convergence in Density of Splitting AVF Scheme for Stochastic Langevin Equation
Modified symmetry technique for mitigation of flow leak near corners for compressible inviscid fluid flow
Impact of temporal connectivity patterns on epidemic process
A Note on the Mean Residual Life Function of the Cantor Distribution
Defending against Adversarial Attacks through Resilient Feature Regeneration
Adversarial Mahalanobis Distance-based Attentive Song Recommender for Automatic Playlist Continuation
Algebra of Concurrent Games
Sensitivity of Deep Convolutional Networks to Gabor Noise
Control-guided Communication: Efficient Resource Arbitration and Allocation in Multi-hop Wireless Control Systems
Resource Management optimally in Non-Orthogonal Multiple Access Networks for fifth-generation by using game-theoretic
3DFPN-HS$^2$: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection
A gradual, semi-discrete approach to generative network training via explicit wasserstein minimization
Linear Optimization of Polynomials and Rational Functions over Boxes
5 Parallel Prism: A topology for pipelined implementations of convolutional neural networks using computational memory
Learning Radiative Transfer Models for Climate Change Applications in Imaging Spectroscopy
On statistical Calderón problems
Finding a Generator Matrix of a Multidimensional Cyclic Code
Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations
Neurogeometry of perception: isotropic and anisotropic aspects
Making Asynchronous Stochastic Gradient Descent Work for Transformers
ML-LOO: Detecting Adversarial Examples with Feature Attribution
Positroid varieties and cluster algebras
Attending to Discriminative Certainty for Domain Adaptation
Logarithmic entanglement growth in two-dimensional disordered fermionic systems
Convolutional Bipartite Attractor Networks
Sentence Centrality Revisited for Unsupervised Summarization
Mastery Learning-Like Teaching with Achievements
Adaptive Two-stage Stochastic Programming with an Application to Capacity Expansion Planning
DiCENet: Dimension-wise Convolutions for Efficient Networks
Maximum Weighted Loss Discrepancy
Domain Adaptive Dialog Generation via Meta Learning
Inductive Logic Programming via Differentiable Deep Neural Logic Networks
Pattern-Affinitive Propagation across Depth, Surface Normal and Semantic Segmentation
Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks
Reducing the variance in online optimization by transporting past gradients
Linear Dimension Reduction Approximately Preserving a Function of the 1-Norm
Seeing Things from a Different Angle: Discovering Diverse Perspectives about Claims
Structure from Motion for Panorama-Style Videos
apricot: Submodular selection for data summarization in Python
Dynamic Mode Decomposition and Sparse Measurements for Characterization and Monitoring of Power System Disturbances
In Situ Cane Toad Recognition
Concentration inequalities in spaces of random configurations with positive Ricci curvatures
Toward Solving 2-TBSG Efficiently
The Implicit Bias of AdaGrad on Separable Data
Cross-view Semantic Segmentation for Sensing Surroundings
Referring Expression Grounding by Marginalizing Scene Graph Likelihood
Beyond Adversarial Training: Min-Max Optimization in Adversarial Attack and Defense
A Low Rank Gaussian Process Prediction Model for Very Large Datasets
Optimal Task Offloading and Resource Allocation for Fog Computing
Learning Deep Multi-Level Similarity for Thermal Infrared Object Tracking
Transfer Learning by Modeling a Distribution over Policies
Robust conditional GANs under missing or uncertain labels
Stochastic In-Face Frank-Wolfe Methods for Non-Convex Optimization and Sparse Neural Network Training
A Hierarchical Network for Diverse Trajectory Proposals
A State-of-the-Art Survey on Multidimensional Scaling Based Localization Techniques
Dynamic Network Embedding via Incremental Skip-gram with Negative Sampling
rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method
Region of Attraction for Power Systems using Gaussian Process and Converse Lyapunov Function — Part I: Theoretical Framework and Off-line Study
Verifying fundamental solution groups for lossless wave equations via stationary action and optimal control
Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound
Federated AI lets a team imagine together: Federated Learning of GANs
What and Where to Translate: Local Mask-based Image-to-Image Translation
High-dimensional limit theorems for random vectors in $\ell_p^n$-balls. II
Optimal Control for Controllable Stochastic Linear Systems
On Copula-based Collective Risk Models
Semi-supervised Complex-valued GAN for Polarimetric SAR Image Classification
Pixel DAG-Recurrent Neural Network for Spectral-Spatial Hyperspectral Image Classification
Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings
Distilling Object Detectors with Fine-grained Feature Imitation
On the Vulnerability of Capsule Networks to Adversarial Attacks
Analysis of a Poisson-picking symmetric winners-take-all game with randomized payoffs

Whats new on arXiv

TickTalk — Timing API for Dynamically Federated Cyber-Physical Systems

Although timing and synchronization of a dynamically-changing set of elements and their related power considerations are essential to many cyber-physical systems (CPS), they are absent from today’s programming languages, forcing programmers to handle these matters outside of the language and on a case-by-case basis. This paper proposes a framework for adding time-related concepts to languages. Complementing prior work in this area, this paper develops the notion of dynamically federated islands of variable-precision synchronization and coordinated entities through synergistic activities at the language, system, network, and device levels. At the language level, we explore constructs that capture key timing and synchronization concepts and, at the system level, we propose a flexible intermediate language that represents both program logic and timing constraints together with run-time mechanisms. At the network level, we argue for architectural extensions that permit the network to act as a combined computing, communication, storage, and synchronization platform and at the device level, we explore architectural concepts that can lead to greater interoperability, easy establishment of timing constraints, and more power-efficient designs.

Optimal Exploitation of Clustering and History Information in Multi-Armed Bandit

We consider the stochastic multi-armed bandit problem and the contextual bandit problem with historical observations and pre-clustered arms. The historical observations can contain any number of instances for each arm, and the pre-clustering information is a fixed clustering of arms provided as part of the input. We develop a variety of algorithms which incorporate this offline information effectively during the online exploration phase and derive their regret bounds. In particular, we develop the META algorithm which effectively hedges between two other algorithms: one which uses both historical observations and clustering, and another which uses only the historical observations. The former outperforms the latter when the clustering quality is good, and vice-versa. Extensive experiments on synthetic and real world datasets on Warafin drug dosage and web server selection for latency minimization validate our theoretical insights and demonstrate that META is a robust strategy for optimally exploiting the pre-clustering information.

Intrinsic Stability: Global Stability of Dynamical Networks and Switched Systems Resilient to any Type of Time-Delays

In real-world networks the interactions between network elements are inherently time-delayed. These time-delays can not only slow the network but can have a destabilizing effect on the network’s dynamics leading to poor performance. The same is true in computational networks used for machine learning etc. where time-delays increase the network’s memory but can degrade the network’s ability to be trained. However, not all networks can be destabilized by time-delays. Previously, it has been shown that if a network or high-dimensional dynamical system is intrinsically stable, which is a stronger form of the standard notion of global stability, then it maintains its stability when constant time-delays are introduced into the system. Here we show that intrinsically stable systems, including intrinsically stable networks and a broad class of switched systems, i.e. systems whose mapping is time-dependent, remain stable in the presence of any type of time-varying time-delays whether these delays are periodic, stochastic, or otherwise. We apply these results to a number of well-studied systems to demonstrate that the notion of intrinsic stability is both computationally inexpensive, relative to other methods, and can be used to improve on some of the best known stability results. We also show that the asymptotic state of an intrinsically stable switched system is exponentially independent of the system’s initial conditions.

Collage Inference: Achieving low tail latency during distributed image classification using coded redundancy models

Reducing the latency variance in machine learning inference is a key requirement in many applications. Variance is harder to control in a cloud deployment in the presence of stragglers. In spite of this challenge, inference is increasingly being done in the cloud, due to the advent of affordable machine learning as a service (MLaaS) platforms. Existing approaches to reduce variance rely on replication which is expensive and partially negates the affordability of MLaaS. In this work, we argue that MLaaS platforms also provide unique opportunities to cut the cost of redundancy. In MLaaS platforms, multiple inference requests are concurrently received by a load balancer which can then create a more cost-efficient redundancy coding across a larger collection of images. We propose a novel convolutional neural network model, Collage-CNN, to provide a low-cost redundancy framework. A Collage-CNN model takes a collage formed by combining multiple images and performs multi-image classification in one shot, albeit at slightly lower accuracy. We then augment a collection of traditional single image classifiers with a single Collage-CNN classifier which acts as a low-cost redundant backup. Collage-CNN then provides backup classification results if a single image classification straggles. Deploying the Collage-CNN models in the cloud, we demonstrate that the 99th percentile tail latency of inference can be reduced by 1.47X compared to replication based approaches while providing high accuracy. Also, variation in inference latency can be reduced by 9X with a slight increase in average inference latency.

Understanding Generalization through Visualizations

The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remains elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization methods, we discuss the mystery of generalization, the geometry of loss landscapes, and how the curse (or, rather, the blessing) of dimensionality causes optimizers to settle into minima that generalize well.

Sparse Variational Inference: Bayesian Coresets from Scratch

The proliferation of automated inference algorithms in Bayesian statistics has provided practitioners newfound access to fast, reproducible data analysis and powerful statistical models. Designing automated methods that are also both computationally scalable and theoretically sound, however, remains a significant challenge. Recent work on Bayesian coresets takes the approach of compressing the dataset before running a standard inference algorithm, providing both scalability and guarantees on posterior approximation error. But the automation of past coreset methods is limited because they depend on the availability of a reasonable coarse posterior approximation, which is difficult to specify in practice. In the present work we remove this requirement by formulating coreset construction as sparsity-constrained variational inference within an exponential family. This perspective leads to a novel construction via greedy optimization, and also provides a unifying information-geometric view of present and past methods. The proposed Riemannian coreset construction algorithm is fully automated, requiring no inputs aside from the dataset, probabilistic model, desired coreset size, and sample size used for Monte Carlo estimates. In addition to being easier to use than past methods, experiments demonstrate that the proposed algorithm achieves state-of-the-art Bayesian dataset summarization.

Detecting the Starting Frame of Actions in Video

To understand causal relationships between events in the world, it is useful to pinpoint when actions occur in videos and to examine the state of the world at and around that time point. For example, one must accurately detect the start of an audience response — laughter in a movie, cheering at a sporting event — to understand the cause of the reaction. In this work, we focus on the problem of accurately detecting action starts rather than isolated events or action ends. We introduce a novel structured loss function based on matching predictions to true action starts that is tailored to this problem; it more heavily penalizes extra and missed action start detections over small misalignments. Recurrent neural networks are used to minimize a differentiable approximation of this loss. To evaluate these methods, we introduce the Mouse Reach Dataset, a large, annotated video dataset of mice performing a sequence of actions. The dataset was labeled by experts for the purpose of neuroscience research on causally relating neural activity to behavior. On this dataset, we demonstrate that the structured loss leads to significantly higher accuracy than a baseline of mean-squared error loss.

Real or Fake? Learning to Discriminate Machine from Human Generated Text

Recent advances in generative modeling of text have demonstrated remarkable improvements in terms of fluency and coherency. In this work we investigate to which extent a machine can discriminate real from machine generated text. This is important in itself for automatic detection of computer generated stories, but can also serve as a tool for further improving text generation. We show that learning a dedicated scoring function to discriminate between real and fake text achieves higher precision than employing the likelihood of a generative model. The scoring functions generalize to other generators than those used for training as long as these generators have comparable model complexity and are trained on similar datasets.

Unsupervised Feature Learning with K-means and An Ensemble of Deep Convolutional Neural Networks for Medical Image Classification

Medical image analysis using supervised deep learning methods remains problematic because of the reliance of deep learning methods on large amounts of labelled training data. Although medical imaging data repositories continue to expand there has not been a commensurate increase in the amount of annotated data. Hence, we propose a new unsupervised feature learning method that learns feature representations to then differentiate dissimilar medical images using an ensemble of different convolutional neural networks (CNNs) and K-means clustering. It jointly learns feature representations and clustering assignments in an end-to-end fashion. We tested our approach on a public medical dataset and show its accuracy was better than state-of-the-art unsupervised feature learning methods and comparable to state-of-the-art supervised CNNs. Our findings suggest that our method could be used to tackle the issue of the large volume of unlabelled data in medical imaging repositories.

Global Semantic Description of Objects based on Prototype Theory

In this paper, we introduce a novel semantic description approach inspired on Prototype Theory foundations. We propose a Computational Prototype Model (CPM) that encodes and stores the central semantic meaning of objects category: the semantic prototype. Also, we introduce a Prototype-based Description Model that encodes the semantic meaning of an object while describing its features using our CPM model. Our description method uses semantic prototypes computed by CNN-classifications models to create discriminative signatures that describe an object highlighting its most distinctive features within the category. Our experiments show that: i) our CPM model (semantic prototype + distance metric) is able to describe the internal semantic structure of objects categories; ii) our semantic distance metric can be understood as the object visual typicality score within a category; iii) our descriptor encoding is semantically interpretable and significantly outperforms other image global encodings in clustering and classification tasks.

Strategies to architect AI Safety: Defense to guard AI from Adversaries

The impact of designing for security of AI is critical for humanity in the AI era. With humans increasingly becoming dependent upon AI, there is a need for neural networks that work reliably, inspite of Adversarial attacks. The vision for Safe and secure AI for popular use is achievable. To achieve safety of AI, this paper explores strategies and a novel deep learning architecture. To guard AI from adversaries, paper explores combination of 3 strategies: 1. Introduce randomness at inference time to hide the representation learning from adversaries. 2. Detect presence of adversaries by analyzing the sequence of inferences. 3. Exploit visual similarity. To realize these strategies, this paper designs a novel architecture, Dynamic Neural Defense, DND. This defense has 3 deep learning architectural features: 1. By hiding the way a neural network learns from exploratory attacks using a random computation graph, DND evades attack. 2. By analyzing input sequence to cloud AI inference engine with LSTM, DND detects attack sequence. 3. By inferring with visual similar inputs generated by VAE, any AI defended by DND approach does not succumb to hackers. Thus, a roadmap to develop reliable, safe and secure AI is presented.

Learning Individual Treatment Effects from Networked Observational Data

With convenient access to observational data, learning individual causal effects from such data draws more attention in many influential research areas such as economics, healthcare, and education. For example, we aim to study how a medicine (treatment) would affect the health condition (outcome) of a certain patient. To validate causal inference from observational data, we need to control the influence of confounders – the variables which causally influence both the treatment and the outcome. Along this line, existing work for learning individual treatment effect overwhelmingly relies on the assumption that there are no hidden confounders. However, in real-world observational data, this assumption is untenable and can be unrealistic. In fact, an important fact ignored by them is that observational data can come with network information that can be utilized to infer hidden confounders. For example, in an observational study of the individual treatment effect of a medicine, instead of randomized experiments, the medicine is assigned to individuals based on a series of factors. Some factors (e.g., socioeconomic status) are hard to measure directly and therefore become hidden confounders of observational datasets. Fortunately, the socioeconomic status of an individual can be reflected by whom she is connected in social networks. With this fact in mind, we aim to exploit the network structure to recognize patterns of hidden confounders in the task of learning individual treatment effects from observational data. In this work, we propose a novel causal inference framework, the network deconfounder, which learns representations of confounders by unraveling patterns of hidden confounders from the network structure between instances of observational data. Empirically, we perform extensive experiments to validate the effectiveness of the network deconfounder on various datasets.

This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation

Given the overwhelming number of emails, an effective subject line becomes essential to better inform the recipient of the email’s content. In this paper, we propose and study the task of email subject line generation: automatically generating an email subject line from the email body. We create the first dataset for this task and find that email subject line generation favor extremely abstractive summary which differentiates it from news headline generation or news single document summarization. We then develop a novel deep learning method and compare it to several baselines as well as recent state-of-the-art text summarization systems. We also investigate the efficacy of several automatic metrics based on correlations with human judgments and propose a new automatic evaluation metric. Our system outperforms competitive baselines given both automatic and human evaluations. To our knowledge, this is the first work to tackle the problem of effective email subject line generation.

Forward and Backward Knowledge Transfer for Sentiment Classification

This paper studies the problem of learning a sequence of sentiment classification tasks. The learned knowledge from each task is retained and used to help future or subsequent task learning. This learning paradigm is called Lifelong Learning (LL). However, existing LL methods either only transfer knowledge forward to help future learning and do not go back to improve the model of a previous task or require the training data of the previous task to retrain its model to exploit backward/reverse knowledge transfer. This paper studies reverse knowledge transfer of LL in the context of naive Bayesian (NB) classification. It aims to improve the model of a previous task by leveraging future knowledge without retraining using its training data. This is done by exploiting a key characteristic of the generative model of NB. That is, it is possible to improve the NB classifier for a task by improving its model parameters directly by using the retained knowledge from other tasks. Experimental results show that the proposed method markedly outperforms existing LL baselines.

Simultaneous Classification and Novelty Detection Using Deep Neural Networks

Deep neural networks have achieved great success in classification tasks during the last years. However, one major problem to the path towards artificial intelligence is the inability of neural networks to accurately detect novel class distributions and therefore, most of the classification algorithms proposed make the assumption that all classes are known prior to the training stage. In this work, we propose a methodology for training a neural network that allows it to efficiently detect novel class distributions without compromising much of its classification accuracy on the test examples of known classes. Experimental results on the CIFAR 100 and MiniImagenet data sets demonstrate the effectiveness of the proposed algorithm. The way this method was constructed also makes it suitable for training any classification algorithm that is based on Maximum Likelihood methods.

Guidelines for Responsible and Human-Centered Use of Explainable Machine Learning

Explainable machine learning (ML) has been implemented in numerous open source and proprietary software packages and explainable ML is an important aspect of commercial predictive modeling. However, explainable ML can be misused, particularly as a faulty safeguard for harmful black-boxes, e.g. fairwashing, and for other malevolent purposes like model stealing. This text discusses definitions, examples, and guidelines that promote a holistic and human-centered approach to ML which includes interpretable (i.e. white-box ) models and explanatory, debugging, and disparate impact analysis techniques.

Four Things Everyone Should Know to Improve Batch Normalization

A key component of most neural network architectures is the use of normalization layers, such as Batch Normalization. Despite its common use and large utility in optimizing deep architectures that are otherwise intractable, it has been challenging both to generically improve upon Batch Normalization and to understand specific circumstances that lend themselves to other enhancements. In this paper, we identify four improvements to the generic form of Batch Normalization and the circumstances under which they work, yielding performance gains across all batch sizes while requiring no additional computation during training. These contributions include proposing a method for reasoning about the current example in inference normalization statistics which fixes a training vs. inference discrepancy; recognizing and validating the powerful regularization effect of Ghost Batch Normalization for small and medium batch sizes; examining the effect of weight decay regularization on the scaling and shifting parameters; and identifying a new normalization algorithm for very small batch sizes by combining the strengths of Batch and Group Normalization. We validate our results empirically on four datasets: CIFAR-100, SVHN, Caltech-256, and ImageNet.

A Survey on Neural Network Language Models

As the core component of Natural Language Processing (NLP) system, Language Model (LM) can provide word representation and probability indication of word sequences. Neural Network Language Models (NNLMs) overcome the curse of dimensionality and improve the performance of traditional LMs. A survey on NNLMs is performed in this paper. The structure of classic NNLMs is described firstly, and then some major improvements are introduced and analyzed. We summarize and compare corpora and toolkits of NNLMs. Further, some research directions of NNLMs are discussed.

If you did not already know

Two-Step Importance Weighting IL (2IWIL) google
Imitation learning (IL) aims to learn an optimal policy from demonstrations. However, such demonstrations are often imperfect since collecting optimal ones is costly. To effectively learn from imperfect demonstrations, we propose a novel approach that utilizes confidence scores, which describe the quality of demonstrations. More specifically, we propose two confidence-based IL methods, namely two-step importance weighting IL (2IWIL) and generative adversarial IL with imperfect demonstration and confidence (IC-GAIL). We show that confidence scores given only to a small portion of sub-optimal demonstrations significantly improve the performance of IL both theoretically and empirically. …

Cumulative Spectral Gradient google
In this paper, we propose a new measure to gauge the complexity of image classification problems. Given an annotated image dataset, our method computes a complexity measure called the cumulative spectral gradient (CSG) which strongly correlates with the test accuracy of convolutional neural networks (CNN). The CSG measure is derived from the probabilistic divergence between classes in a spectral clustering framework. We show that this metric correlates with the overall separability of the dataset and thus its inherent complexity. As will be shown, our metric can be used for dataset reduction, to assess which classes are more difficult to disentangle, and approximate the accuracy one could expect to get with a CNN. Results obtained on 11 datasets and three CNN models reveal that our method is more accurate and faster than previous complexity measures. …

DeepMutation google
Deep learning (DL) defines a new data-driven programming paradigm where the internal system logic is largely shaped by the training data. The standard way of evaluating DL models is to examine their performance on a test dataset. The quality of the test dataset is of great importance to gain confidence of the trained models. Using an inadequate test dataset, DL models that have achieved high test accuracy may still lack generality and robustness. In traditional software testing, mutation testing is a well-established technique for quality evaluation of test suites, which analyzes to what extent a test suite detects the injected faults. However, due to the fundamental difference between traditional software and deep learning-based software, traditional mutation testing techniques cannot be directly applied to DL systems. In this paper, we propose a mutation testing framework specialized for DL systems to measure the quality of test data. To do this, by sharing the same spirit of mutation testing in traditional software, we first define a set of source-level mutation operators to inject faults to the source of DL (i.e., training data and training programs). Then we design a set of model-level mutation operators that directly inject faults into DL models without a training process. Eventually, the quality of test data could be evaluated from the analysis on to what extent the injected faults could be detected. The usefulness of the proposed mutation testing techniques is demonstrated on two public datasets, namely MNIST and CIFAR-10, with three DL models. …

Signuology google
Signuology is defined as the study of sets of characteristic predictive signals contained within data in the form of combined features of the data that are characteristic of an observation of interest within the data. The terms data mining and data structure imply rigid and discrete characteristics. A signal has more flexibility, borrowing from ideas contained in the superposition principle in physics. One can take the same data and ask a difference question, a different dependent variable, and find a different signal; the data structure will be the same. Data structure as a high level concept appears to limit one’s thinking. Feature engineering is an activity within signuology. These signals allow for a flexibility not afforded in the thinking implied by the terms data structure and data mining. …