Magister Dixit

“As organizations adopt machine-learning techniques, they will see immediate competitive advantages in automation, workflow efficiency and human augmentation. Now is the golden time to consider how machine learning could help your organization and implement this science to improve overall efficiency.” Shawn Masters ( April 2, 2015 )


Book Memo: “The World of Open Data”

Concepts, Methods, Tools and Experiences
This book discusses the latest developments in the field of open data. The opening of data by public organizations has the potential to improve the public sector, inspire business innovation, and establish transparency. With this potential comes unique challenges; these developments impact the operation of governments as well as their relationship with private sector enterprises and society. Changes at the technical, organizational, managerial, and political level are taking place, which, in turn, impact policy-making and traditional institutional structures. This book contributes to the systematic analysis and publication of cutting-edge methods, tools, and approaches for more efficient data sharing policies, practices, and further research. Topics discussed include an introduction to open data, the open data landscape, the open data life cycle, open data policies, organizational issues, interoperability, infrastructure, business models, open data portal evaluation, and research directions, best practices, and guidelines. Written to address different perspectives, this book will be of equal interest to students and researchers, ICT industry staff, practitioners, policy makers and public servants.

Whats new on arXiv

Forecasting Transportation Network Speed Using Deep Capsule Networks with Nested LSTM Models

Accurate and reliable traffic forecasting for complicated transportation networks is of vital importance to modern transportation management. The complicated spatial dependencies of roadway links and the dynamic temporal patterns of traffic states make it particularly challenging. To address these challenges, we propose a new capsule network (CapsNet) to extract the spatial features of traffic networks and utilize a nested LSTM (NLSTM) structure to capture the hierarchical temporal dependencies in traffic sequence data. A framework for network-level traffic forecasting is also proposed by sequentially connecting CapsNet and NLSTM. On the basis of literature review, our study is the first to adopt CapsNet and NLSTM in the field of traffic forecasting. An experiment on a Beijing transportation network with 278 links shows that the proposed framework with the capability of capturing complicated spatiotemporal traffic patterns outperforms multiple state-of-the-art traffic forecasting baseline models. The superiority and feasibility of CapsNet and NLSTM are also demonstrated, respectively, by visualizing and quantitatively evaluating the experimental results.

Modeling Conceptual Characteristics of Virtual Machines for CPU Utilization Prediction

Cloud services have grown rapidly in recent years, which provide high flexibility for cloud users to fulfill their computing requirements on demand. To wisely allocate computing resources in the cloud, it is inevitably important for cloud service providers to be aware of the potential utilization of various resources in the future. This paper focuses on predicting CPU utilization of virtual machines (VMs) in the cloud. We conduct empirical analysis on Microsoft Azure’s VM workloads and identify important conceptual characteristics of CPU utilization among VMs, including locality, periodicity and tendency. We propose a neural network method, named Time-aware Residual Networks (T-ResNet), to model the observed conceptual characteristics with expanded network depth for CPU utilization prediction. We conduct extensive experiments to evaluate the effectiveness of our proposed method and the results show that T-ResNet consistently outperforms baseline approaches in various metrics including RMSE, MAE and MAPE.

Data Pallets: Containerizing Storage For Reproducibility and Traceability

Trusting simulation output is crucial for Sandia’s mission objectives. We rely on these simulations to perform our high-consequence mission tasks given national treaty obligations. Other science and modeling applications, while they may have high-consequence results, still require the strongest levels of trust to enable using the result as the foundation for both practical applications and future research. To this end, the computing community has developed workflow and provenance systems to aid in both automating simulation and modeling execution as well as determining exactly how was some output was created so that conclusions can be drawn from the data. Current approaches for workflows and provenance systems are all at the user level and have little to no system level support making them fragile, difficult to use, and incomplete solutions. The introduction of container technology is a first step towards encapsulating and tracking artifacts used in creating data and resulting insights, but their current implementation is focused solely on making it easy to deploy an application in an isolated ‘sandbox’ and maintaining a strictly read-only mode to avoid any potential changes to the application. All storage activities are still using the system-level shared storage. This project explores extending the container concept to include storage as a new container type we call \emph{data pallets}. Data Pallets are potentially writeable, auto generated by the system based on IO activities, and usable as a way to link the contained data back to the application and input deck used to create it.

Reasoning From Data in the Mathematical Theory of Evidence

Mathematical Theory of Evidence (MTE) is known as a foundation for reasoning when knowledge is expressed at various levels of detail. Though much research effort has been committed to this theory since its foundation, many questions remain open. One of the most important open questions seems to be the relationship between frequencies and the Mathematical Theory of Evidence. The theory is blamed to leave frequencies outside (or aside of) its framework. The seriousness of this accusation is obvious: no experiment may be run to compare the performance of MTE-based models of real world processes against real world data. In this paper we develop a frequentist model of the MTE bringing to fall the above argument against MTE. We describe, how to interpret data in terms of MTE belief functions, how to reason from data about conditional belief functions, how to generate a random sample out of a MTE model, how to derive MTE model from data and how to compare results of reasoning in MTE model and reasoning from data. It is claimed in this paper that MTE is suitable to model some types of destructive processes

A Bayesian Perspective of Statistical Machine Learning for Big Data

Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword `learning’ in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view — where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.

Reasoning over RDF Knowledge Bases using Deep Learning

Semantic Web knowledge representation standards, and in particular RDF and OWL, often come endowed with a formal semantics which is considered to be of fundamental importance for the field. Reasoning, i.e., the drawing of logical inferences from knowledge expressed in such standards, is traditionally based on logical deductive methods and algorithms which can be proven to be sound and complete and terminating, i.e. correct in a very strong sense. For various reasons, though, in particular, the scalability issues arising from the ever-increasing amounts of Semantic Web data available and the inability of deductive algorithms to deal with noise in the data, it has been argued that alternative means of reasoning should be investigated which bear high promise for high scalability and better robustness. From this perspective, deductive algorithms can be considered the gold standard regarding correctness against which alternative methods need to be tested. In this paper, we show that it is possible to train a Deep Learning system on RDF knowledge graphs, such that it is able to perform reasoning over new RDF knowledge graphs, with high precision and recall compared to the deductive gold standard.

Meet Cyrus – The Query by Voice Mobile Assistant for the Tutoring and Formative Assessment of SQL Learners

Being declarative, SQL stands a better chance at being the programming language for conceptual computing next to natural language programming. We examine the possibility of using SQL as a back-end for natural language database programming. Distinctly from keyword based SQL querying, keyword dependence and SQL’s table structure constraints are significantly less pronounced in our approach. We present a mobile device voice query interface, called Cyrus, to arbitrary relational databases. Cyrus supports a large type of query classes, sufficient for an entry level database class. Cyrus is also application independent, allows test database adaptation, and not limited to specific sets of keywords or natural language sentence structures. It’s cooperative error reporting is more intuitive, and iOS based mobile platform is also more accessible compared to most contemporary mobile and voice enabled systems.

What does it mean for data to be `observed’ or `missing’?

In statistical modelling of incomplete data, missingness is encoded as a relation between datasets Y and response patterns R. We identify two different meanings of `observed’ and `missing’ implicit in this framework, only one of which is consistent with the definition formally encoded in (Y, R). Notation that has been used in the literature for more than three decades fails to distinguish between these two concepts, rendering the notations `f(\mathbf{y}_{obs},\mathbf{y}_{mis})‘ and `f(\mathbf{y}_{mis} |\, \mathbf{y}_{obs})‘ conceptually contradictory. Additionally, the same notation `f(\mathbf{y}_{mis} |\, \mathbf{y}_{obs})‘ is used to refer to two densities with different domains. These densities can be considered to be equivalent mathematically, but conceptually they are not interchangeable as distributions because of their differing relationships to (Y, R). Only one of these distributions is consistent with (Y, R) and standard conventions for interpretation of mathematical notation leads to the wrong choice conceptually for ignorable multiple imputation. We introduce formal definitions and notational improvements to treat these and other ambiguities, and we demonstrate their use through several example derivations.

The Global Convergence of the Alternating Minimization Algorithm for Deep Neural Network Problems

In recent years, stochastic gradient descent (SGD) is a dominant optimization method for training deep neural networks. But the SGD suffers from several limitations including lack of theoretical guarantees, gradient vanishing, poor conditioning and difficulty in solving highly non-smooth constraints and functions, which motivates the development of alternating minimization-based methods for deep neural network optimization. However, as an emerging domain, there are still several challenges to overcome, where the major ones include: 1) no guarantee on the global convergence under mild conditions, and 2) low efficiency of computation for the subproblem optimization in each iteration. In this paper, we propose a novel deep learning alternating minimization (DLAM) algorithm to deal with those two challenges. Furthermore, global convergence of our DLAM algorithm is analyzed and guaranteed under mild conditions which are satisfied by commonly-used models. Experiments on real-world datasets demonstrate the effectiveness of our DLAM algorithm.

Adversarially-Trained Normalized Noisy-Feature Auto-Encoder for Text Generation

This article proposes Adversarially-Trained Normalized Noisy-Feature Auto-Encoder (ATNNFAE) for byte-level text generation. An ATNNFAE consists of an auto-encoder where the internal code is normalized on the unit sphere and corrupted by additive noise. Simultaneously, a replica of the decoder (sharing the same parameters as the AE decoder) is used as the generator and fed with random latent vectors. An adversarial discriminator is trained to distinguish training samples reconstructed from the AE from samples produced through the random-input generator, making the entire generator-discriminator path differentiable for discrete data like text. The combined effect of noise injection in the code and shared weights between the decoder and the generator can prevent the mode collapsing phenomenon commonly observed in GANs. Since perplexity cannot be applied to non-sequential text generation, we propose a new evaluation method using the total variance distance between frequencies of hash-coded byte-level n-grams (NGTVD). NGTVD is a single benchmark that can characterize both the quality and the diversity of the generated texts. Experiments are offered in 6 large-scale datasets in Arabic, Chinese and English, with comparisons against n-gram baselines and recurrent neural networks (RNNs). Ablation study on both the noise level and the discriminator is performed. We find that RNNs have trouble competing with the n-gram baselines, and the ATNNFAE results are generally competitive.

Efficient Spiking Neural Networks with Logarithmic Temporal Coding

A Spiking Neural Network (SNN) can be trained indirectly by first training an Artificial Neural Network (ANN) with the conventional backpropagation algorithm, then converting it into an SNN. The conventional rate-coding method for SNNs uses the number of spikes to encode magnitude of an activation value, and may be computationally inefficient due to the large number of spikes. Temporal-coding is typically more efficient by leveraging the timing of spikes to encode information. In this paper, we present Logarithmic Temporal Coding (LTC), where the number of spikes used to encode an activation value grows logarithmically with the activation value; and the accompanying Exponentiate-and-Fire (EF) spiking neuron model, which only involves efficient bit-shift and addition operations. Moreover, we improve the training process of ANN to compensate for approximation errors due to LTC. Experimental results indicate that the resulting SNN achieves competitive performance at significantly lower computational cost than related work.

Towards Formula Translation using Recursive Neural Networks

While it has become common to perform automated translations on natural language, performing translations between different representations of mathematical formulae has thus far not been possible. We implemented the first translator for mathematical formulae based on recursive neural networks. We chose recursive neural networks because mathematical formulae inherently include a structural encoding. In our implementation, we developed new techniques and topologies for recursive tree-to-tree neural networks based on multi-variate multi-valued Long Short-Term Memory cells. We propose a novel approach for mini-batch training that utilizes clustering and tree traversal. We evaluate our translator and analyze the behavior of our proposed topologies and techniques based on a translation from generic LaTeX to the semantic LaTeX notation. We use the semantic LaTeX notation from the Digital Library for Mathematical Formulae and the Digital Repository for Mathematical Formulae at the National Institute for Standards and Technology. We find that a simple heuristics-based clustering algorithm outperforms the conventional clustering algorithms on the task of clustering binary trees of mathematical formulae with respect to their topology. Furthermore, we find a mask for the loss function, which can prevent the neural network from finding a local minimum of the loss function. Given our preliminary results, a complete translation from formula to formula is not yet possible. However, we achieved a prediction accuracy of 47.05% for predicting symbols at the correct position and an accuracy of 92.3% when ignoring the predicted position. Concluding, our work advances the field of recursive neural networks by improving the training speed and quality of training. In the future, we will work towards a complete translation allowing a machine-interpretation of LaTeX formulae.

Learning Shaping Strategies in Human-in-the-loop Interactive Reinforcement Learning

Providing reinforcement learning agents with informationally rich human knowledge can dramatically improve various aspects of learning. Prior work has developed different kinds of shaping methods that enable agents to learn efficiently in complex environments. All these methods, however, tailor human guidance to agents in specialized shaping procedures, thus embodying various characteristics and advantages in different domains. In this paper, we investigate the interplay between different shaping methods for more robust learning performance. We propose an adaptive shaping algorithm which is capable of learning the most suitable shaping method in an on-line manner. Results in two classic domains verify its effectiveness from both simulated and real human studies, shedding some light on the role and impact of human factors in human-robot collaborative learning.

Anomaly Detection via Graphical Lasso

Anomalies and outliers are common in real-world data, and they can arise from many sources, such as sensor faults. Accordingly, anomaly detection is important both for analyzing the anomalies themselves and for cleaning the data for further analysis of its ambient structure. Nonetheless, a precise definition of anomalies is important for automated detection and herein we approach such problems from the perspective of detecting sparse latent effects embedded in large collections of noisy data. Standard Graphical Lasso-based techniques can identify the conditional dependency structure of a collection of random variables based on their sample covariance matrix. However, classic Graphical Lasso is sensitive to outliers in the sample covariance matrix. In particular, several outliers in a sample covariance matrix can destroy the sparsity of its inverse. Accordingly, we propose a novel optimization problem that is similar in spirit to Robust Principal Component Analysis (RPCA) and splits the sample covariance matrix M into two parts, M=F+S, where F is the cleaned sample covariance whose inverse is sparse and computable by Graphical Lasso, and S contains the outliers in M. We accomplish this decomposition by adding an additional \ell_1 penalty to classic Graphical Lasso, and name it ‘Robust Graphical Lasso (Rglasso)’. Moreover, we propose an Alternating Direction Method of Multipliers (ADMM) solution to the optimization problem which scales to large numbers of unknowns. We evaluate our algorithm on both real and synthetic datasets, obtaining interpretable results and outperforming the standard robust Minimum Covariance Determinant (MCD) method and Robust Principal Component Analysis (RPCA) regarding both accuracy and speed.

Efficiently Approximating Edit Distance Between Pseudorandom Strings

We present an algorithm for approximating the edit distance \operatorname{ed}(x, y) between two strings x and y in time parameterized by the degree to which one of the strings x satisfies a natural pseudorandomness property. The pseudorandomness model is asymmetric in that no requirements are placed on the second string y, which may be constructed by an adversary with full knowledge of x. We say that x is \emph{(p, B)-pseudorandom} if all pairs a and b of disjoint B-letter substrings of x satisfy \operatorname{ed}(a, b) \ge pB. Given parameters p and B, our algorithm computes the edit distance between a (p, B)-pseudorandom string x and an arbitrary string y within a factor of O(1/p) in time \tilde{O}(nB), with high probability. Our algorithm is robust in the sense that it can handle a small portion of x being adversarial (i.e., not satisfying the pseudorandomness property). In this case, the algorithm incurs an additive approximation error proportional to the fraction of x which behaves maliciously. The asymmetry of our pseudorandomness model has particular appeal for the case where x is a \emph{source string}, meaning that \operatorname{ed}(x, y) will be computed for many strings y. Suppose that one wishes to achieve an O(\alpha)-approximation for each \operatorname{ed}(x, y) computation, and that B is the smallest block-size for which the string x is (1/\alpha, B)-pseudorandom. We show that without knowing B beforehand, x may be preprocessed in time \tilde{O}(n^{1.5}\sqrt{B}), so that all future computations of the form \operatorname{ed}(x, y) may be O(\alpha)-approximated in time \tilde{O}(nB). Furthermore, for the special case where only a single \operatorname{ed}(x, y) computation will be performed, we show how to achieve an O(\alpha)-approximation in time \tilde{O}(n^{4/3}B^{2/3}).

Multi-label Object Attribute Classification using a Convolutional Neural Network

Objects of different classes can be described using a limited number of attributes such as color, shape, pattern, and texture. Learning to detect object attributes instead of only detecting objects can be helpful in dealing with a priori unknown objects. With this inspiration, a deep convolutional neural network for low-level object attribute classification, called the Deep Attribute Network (DAN), is proposed. Since object features are implicitly learned by object recognition networks, one such existing network is modified and fine-tuned for developing DAN. The performance of DAN is evaluated on the ImageNet Attribute and a-Pascal datasets. Experiments show that in comparison with state-of-the-art methods, the proposed model achieves better results.

Optimizing Taxi Carpool Policies via Reinforcement Learning and Spatio-Temporal Mining

In this paper, we develop a reinforcement learning (RL) based system to learn an effective policy for carpooling that maximizes transportation efficiency so that fewer cars are required to fulfill the given amount of trip demand. For this purpose, first, we develop a deep neural network model, called ST-NN (Spatio-Temporal Neural Network), to predict taxi trip time from the raw GPS trip data. Secondly, we develop a carpooling simulation environment for RL training, with the output of ST-NN and using the NYC taxi trip dataset. In order to maximize transportation efficiency and minimize traffic congestion, we choose the effective distance covered by the driver on a carpool trip as the reward. Therefore, the more effective distance a driver achieves over a trip (i.e. to satisfy more trip demand) the higher the efficiency and the less will be the traffic congestion. We compared the performance of RL learned policy to a fixed policy (which always accepts carpool) as a baseline and obtained promising results that are interpretable and demonstrate the advantage of our RL approach. We also compare the performance of ST-NN to that of state-of-the-art travel time estimation methods and observe that ST-NN significantly improves the prediction performance and is more robust to outliers.

A Self-Learning Information Diffusion Model for Smart Social Networks

In this big data era, more and more social activities are digitized thereby becoming traceable, and thus the studies of social networks attract increasing attention from academia. It is widely believed that social networks play important role in the process of information diffusion. However, the opposite question, i.e., how does information diffusion process rebuild social networks, has been largely ignored. In this paper, we propose a new framework for understanding this reversing effect. Specifically, we first introduce a novel information diffusion model on social networks, by considering two types of individuals, i.e., smart and normal individuals, and two kinds of messages, true and false messages. Since social networks consist of human individuals, who have self-learning ability, in such a way that the trust of an individual to one of its neighbors increases (or decreases) if this individual received a true (or false) message from that neighbor. Based on such a simple self-learning mechanism, we prove that a social network can indeed become smarter, in terms of better distinguishing the true message from the false one. Moreover, we observe the emergence of social stratification based on the new model, i.e., the true messages initially posted by an individual closer to the smart one can be forwarded by more others, which is enhanced by the self-learning mechanism. We also find the crossover advantage, i.e., interconnection between two chain networks can make the related individuals possessing higher social influences, i.e., their messages can be forwarded by relatively more others. We obtained these results theoretically and validated them by simulations, which help better understand the reciprocity between social networks and information diffusion.

A Survey of Mixed Data Clustering Algorithms

Most of the datasets normally contain either numeric or categorical features. Mixed data comprises of both numeric and categorical features, and they frequently occur in various domains, such as health, finance, marketing, etc. Clustering is often sought on mixed data to find structures and to group similar objects. However, clustering mixed data is challenging because it is difficult to directly apply mathematical operations, such as summation, average etc. on the feature values of these datasets. In this paper, we review various types of mixed data clustering techniques in detail. We present a taxonomy to identify ten types of different mixed data clustering techniques. We also compare the performance of several mixed data clustering methods on publicly available datasets. The paper further identifies challenges in developing different mixed data clustering algorithms and provides guidelines for future directions in this area.

Explaining Deep Learning Models using Causal Inference

Although deep learning models have been successfully applied to a variety of tasks, due to the millions of parameters, they are becoming increasingly opaque and complex. In order to establish trust for their widespread commercial use, it is important to formalize a principled framework to reason over these models. In this work, we use ideas from causal inference to describe a general framework to reason over CNN models. Specifically, we build a Structural Causal Model (SCM) as an abstraction over a specific aspect of the CNN. We also formulate a method to quantitatively rank the filters of a convolution layer according to their counterfactual importance. We illustrate our approach with popular CNN architectures such as LeNet5, VGG19, and ResNet32.

Quantum Reasoning using Lie Algebra for Everyday Life (and AI perhaps…)
Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization
A Feature Complete SPIKE Banded Algorithm and Solver
Scene Parsing via Dense Recurrent Neural Networks with Attentional Selection
AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms
Fast Beam Alignment for Millimeter Wave Communications: A Sparse Encoding and Phaseless Decoding Approach
ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems
Multilingual and Unsupervised Subword Modeling for Zero-Resource Languages
Mathematical Theory of Evidence Versus Evidence
Gaining insight from large data volumes with ease
Observability Properties of Colored Graphs
Surrogate Modeling of Stochastic Functions – Application to computational Electromagnetic Dosimetry
On Weisfeiler-Leman Invariance: Subgraph Counts and Related Graph Properties
Multiple People Tracking Using Hierarchical Deep Tracklet Re-identification
Deep Learning Super-Diffusion in Multiplex Networks
Reducing Network Agnostophobia
An Agent-Based Approach for Optimizing Modular Vehicle Fleet Operation
Rethinking network reciprocity over social ties: local interactions make direct reciprocity possible and pave the rational way to cooperation
Second order Stein: SURE for SURE and other applications in high-dimensional inference
Bootstrapping Structural Change Tests
Many-Body Localization in Two Dimensions from Projected Entangled-Pair States
Policy Regret in Repeated Games
STA: Spatial-Temporal Attention for Large-Scale Video-based Person Re-Identification
Integrating Recurrence Dynamics for Speech Emotion Recognition
Relative Error RKHS Embeddings for Gaussian Kernels
SURE-fuse WFF: A Multi-resolution Windowed Fourier Analysis for Interferometric Phase Denoising
Feedback-Aware Precoding for Millimeter Wave Massive MIMO Systems
Median Confidence Regions in a Nonparametric Model
Complex Unitary Recurrent Neural Networks using Scaled Cayley Transform
Simulation of the energy efficiency auction prices in Brazil
LoRa Digital Receiver Analysis and Implementation
Optimal Distribution System Restoration with Microgrids and Distributed Generators
Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions
Design Rule Violation Hotspot Prediction Based on Neural Network Ensembles
Zero-shot Neural Transfer for Cross-lingual Entity Linking
Adversarial Sampling and Training for Semi-Supervised Information Retrieval
Computational Thinking with the Web Crowd using CodeMapper
Dual Latent Variable Model for Low-Resource Natural Language Generation in Dialogue Systems
Power Normalizing Second-order Similarity Network for Few-shot Learning
Symmetry Type Graphs on 4-Orbit maps
The Augmented Synthetic Control Method
Designing plateaued Boolean functions in spectral domain and their classification
Use of Neural Signals to Evaluate the Quality of Generative Adversarial Network Performance in Facial Image Generation
New Movement and Transformation Principle of Fuzzy Reasoning and Its Application to Fuzzy Neural Network
CED: Credible Early Detection of Social Media Rumors
An efficient branch-and-bound algorithm for submodular function maximization
Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction
CAPTAIN: Comprehensive Composition Assistance for Photo Taking
R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate
Fast On-the-fly Retraining-free Sparsification of Convolutional Neural Networks
A Note on Local Mode-in-State Participation Factors for Nonlinear Systems
Image Cartoon-Texture Decomposition Using Isotropic Patch Recurrence
Densely Connected Attention Propagation for Reading Comprehension
User-Centric Multiobjective Approach to Privacy Preservation and Energy Cost Minimization in Smart Home
Innovative 3D Depth Map Generation From A Holoscopic 3D Image Based on Graph Cut Technique
Properties of Noncommutative Renyi and Augustin Information
StationPlot: A New Non-stationarity Quantification Tool for Detection of Epileptic Seizures
Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency
Skeleton-Based Action Recognition with Synchronous Local and Non-local Spatio-temporal Learning and Frequency Attention
Near Real-Time Data Labeling Using a Depth Sensor for EMG Based Prosthetic Arms
Breast Cancer Classification from Histopathological Images with Inception Recurrent Residual Convolutional Neural Network
A Bayesian Approach to Income Inference in a Communication Network
Deep Learning Approach for Building Detection in Satellite Multispectral Imagery
Bayesian variational inference for exponential random graph models
Detecting Work Zones in SHRP 2 NDS Videos Using Deep Learning Based Computer Vision
Formal Limitations on the Measurement of Mutual Information
A new resource measure with respect to resource destroying maps
Scene Text Detection and Recognition: The Deep Learning Era
Input Perturbations for Adaptive Regulation and Learning
A stochastically perturbed mean curvature flow by colored noise
Besov class via heat semigroup on Dirichlet spaces I: Sobolev type inequalities
More robust estimation of sample average treatment effects using Kernel Optimal Matching in an observational study of spine surgical interventions
Scalability Evaluation of Iterative Algorithms Used for Supercomputer Simulation of Physical processes
The method of multimodal MRI brain image segmentation based on differential geometric features
The Queue-Hawkes Process: Ephemeral Self-Excitement
Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling
Minimax Optimal Sequential Hypothesis Tests for Markov Processes
Many $H$-copies in graphs with a forbidden tree
IP Geolocation through Reverse DNS
Coronary Calcium Detection using 3D Attention Identical Dual Deep Network Based on Weakly Supervised Learning
Prediction and forecasting models based on patient’s history and biomarkers with application to Scleroderma disease
Averaging principle for stochastic real Ginzburg-Landau equation driven by $α$-stable process
Using NonBacktracking Expansion to Analyze k-core Pruning Process
On Word and Gómez Graphs and Their Automorphism Groups in the Degree Diameter Problem
Centralized adaptive traffic control strategy design across multiple intersections based on vehicle path flows: An approximated Lagrangian decomposition approach
PolyNeuron: Automatic Neuron Discovery via Learned Polyharmonic Spline Activations
Automatic Brain Structures Segmentation Using Deep Residual Dilated U-Net
Uniform, Integral and Feasible Proofs for the Determinant Identities
Playing by the Book: Towards Agent-based Narrative Understanding through Role-playing and Simulation
Channel Coding at Low Capacity
Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing
Diversity-Driven Extensible Hierarchical Reinforcement Learning
Coverage Centrality Maximization in Undirected Networks
Reactive Task and Motion Planning for Robust Whole-Body Dynamic Locomotion in Constrained Environments
Convolution Algebras for Finite Reductive Monoids
Traversal with Enumeration of Geometric Graphs in Bounded Space
Constructing Geometric Graphs of Cop Number Three
Langevin-gradient parallel tempering for Bayesian neural learning
Discovering heterogeneous subpopulations for fine-grained analysis of opioid use and opioid use disorders
Deep Face Quality Assessment
Compressive Sensing and Morphology Singular Entropy-Based Real-time Secondary Voltage Control of Multi-area Power Systems
Model predictive trajectory optimization and tracking for on-road autonomous vehicles
Towards Governing Agent’s Efficacy: Action-Conditional $β$-VAE for Deep Transparent Reinforcement Learning
Generalization Bounds for Vicinal Risk Minimization Principle
Neural-based Pinyin-to-Character Conversion with Adaptive Vocabulary
Multi-labeled Relation Extraction with Attentive Capsule Network
Bayesian Convolutional Neural Networks for Compressed Sensing Restoration
Neural Generative Models for 3D Faces with Application in 3D Texture Free Face Recognition
Anticipated mean-field backward stochastic differential equations with jumps
Universal Randomized Guessing with Application to Asynchronous Decentralized Brute-Force Attacks
User Modeling for Task Oriented Dialogues
Improved Visual Relocalization by Discovering Anchor Points
A globally and linearly convergent PGM for zero-norm regularized quadratic optimization with sphere constraint
Fashion and Apparel Classification using Convolutional Neural Networks
Attentive Aspect Modeling for Review-aware Recommendation
ReSet: Learning Recurrent Dynamic Routing in ResNet-like Neural Networks
Adapting multi-armed bandits policies to contextual bandits scenarios
Integrating Multiple Receptive Fields through Grouped Active Convolution

R Packages worth a look

The Matrix Normal Distribution (matrixNormal)
Computes densities, probabilities, and random deviates of the Matrix Normal (Iranmanesh (2010) <doi:10.7508/ijmsi.2010.02.004>). Also incl …

Learning Bayesian Networks with Mixed Variables (deal)
Bayesian networks with continuous and/or discrete variables can be learned and compared from data. The method is described in Boettcher and Dethlefsen …

A Common API to Modeling and Analysis Functions (parsnip)
A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or …

Magister Dixit

“Business culture needs to go through a fundamental change to become data-driven. It’s not the new tools, more data, or a PhD-holder staff that will make the change happen. A system, a program, or a robot, designed to be most rational, capable, and failure-proof is put to action only if the environment it operates in allows it. Analytics have been available for years, but they continue being misused because the business models they aid are inherently incompatible. Centrally manged organisations by design impede change; its time that decision making becomes more democratic and distributed, as is data.” Eve the Analyst ( 7th November 2017 )

If you did not already know

Generalized Value Iteration Network (GVIN) google
In this paper, we introduce a generalized value iteration network (GVIN), which is an end-to-end neural network planning module. GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. We propose three novel differentiable kernels as graph convolution operators and show that the embedding based kernel achieves the best performance. We further propose episodic Q-learning, an improvement upon traditional n-step Q-learning that stabilizes training for networks that contain a planning module. Lastly, we evaluate GVIN on planning problems in 2D mazes, irregular graphs, and real-world street networks, showing that GVIN generalizes well for both arbitrary graphs and unseen graphs of larger scale and outperforms a naive generalization of VIN (discretizing a spatial graph into a 2D image). …

Transfer Automatic Machine Learning google
Building effective neural networks requires many design choices. These include the network topology, optimization procedure, regularization, stability methods, and choice of pre-trained parameters. This design is time consuming and requires expert input. Automatic Machine Learning aims automate this process using hyperparameter optimization. However, automatic model building frameworks optimize performance on each task independently, whereas human experts leverage prior knowledge when designing a new network. We propose Transfer Automatic Machine Learning, a method to accelerate network design using knowledge of prior tasks. For this, we build upon reinforcement learning architecture design methods to support parallel training on multiple tasks and transfer the search strategy to new tasks. Tested on NLP and Image classification tasks, Transfer Automatic Machine Learning reduces convergence time over single-task methods by almost an order of magnitude on 13 out of 14 tasks. It achieves better test set accuracy on 10 out of 13 tasks NLP tasks and improves performance on CIFAR-10 image recognition from 95.3% to 97.1%. …

Cavs google
Recent deep learning (DL) models have moved beyond static network architectures to dynamic ones, handling data where the network structure changes every example, such as sequences of variable lengths, trees, and graphs. Existing dataflow-based programming models for DL—both static and dynamic declaration—either cannot readily express these dynamic models, or are inefficient due to repeated dataflow graph construction and processing, and difficulties in batched execution. We present Cavs, a vertex-centric programming interface and optimized system implementation for dynamic DL models. Cavs represents dynamic network structure as a static vertex function $\mathcal{F}$ and a dynamic instance-specific graph $\mathcal{G}$, and performs backpropagation by scheduling the execution of $\mathcal{F}$ following the dependencies in $\mathcal{G}$. Cavs bypasses expensive graph construction and preprocessing overhead, allows for the use of static graph optimization techniques on pre-defined operations in $\mathcal{F}$, and naturally exposes batched execution opportunities over different graphs. Experiments comparing Cavs to two state-of-the-art frameworks for dynamic NNs (TensorFlow Fold and DyNet) demonstrate the efficacy of this approach: Cavs achieves a near one order of magnitude speedup on training of various dynamic NN architectures, and ablations demonstrate the contribution of our proposed batching and memory management strategies. …

Whats new on arXiv

Looking Deeper into Deep Learning Model: Attribution-based Explanations of TextCNN

Layer-wise Relevance Propagation (LRP) and saliency maps have been recently used to explain the predictions of Deep Learning models, specifically in the domain of text classification. Given different attribution-based explanations to highlight relevant words for a predicted class label, experiments based on word deleting perturbation is a common evaluation method. This word removal approach, however, disregards any linguistic dependencies that may exist between words or phrases in a sentence, which could semantically guide a classifier to a particular prediction. In this paper, we present a feature-based evaluation framework for comparing the two attribution methods on customer reviews (public data sets) and Customer Due Diligence (CDD) extracted reports (corporate data set). Instead of removing words based on the relevance score, we investigate perturbations based on embedded features removal from intermediate layers of Convolutional Neural Networks. Our experimental study is carried out on embedded-word, embedded-document, and embedded-ngrams explanations. Using the proposed framework, we provide a visualization tool to assist analysts in reasoning toward the model’s final prediction.

Stovepiping and Malicious Software: A Critical Review of AGI Containment

Awareness of the possible impacts associated with artificial intelligence has risen in proportion to progress in the field. While there are tremendous benefits to society, many argue that there are just as many, if not more, concerns related to advanced forms of artificial intelligence. Accordingly, research into methods to develop artificial intelligence safely is increasingly important. In this paper, we provide an overview of one such safety paradigm: containment with a critical lens aimed toward generative adversarial networks and potentially malicious artificial intelligence. Additionally, we illuminate the potential for a developmental blindspot in the stovepiping of containment mechanisms.

How Do Fairness Definitions Fare? Examining Public Attitudes Towards Algorithmic Definitions of Fairness

What is the best way to define algorithmic fairness? There has been much recent debate on algorithmic fairness. While many definitions of fairness have been proposed in the computer science literature, there is no clear agreement over a particular definition. In this work, we investigate ordinary people’s perceptions of three of these fairness definitions. Across two online experiments, we test which definitions people perceive to be the fairest in the context of loan decisions, and whether those fairness perceptions change with the addition of sensitive information (i.e., race of the loan applicants). We find a clear preference for one definition, and the general results seem to align with the principle of affirmative action.

A Primal Decomposition Method with Suboptimality Bounds for Distributed Mixed-Integer Linear Programming

In this paper we deal with a network of agents seeking to solve in a distributed way Mixed-Integer Linear Programs (MILPs) with a coupling constraint (modeling a limited shared resource) and local constraints. MILPs are NP-hard problems and several challenges arise in a distributed framework, so that looking for suboptimal solutions is of interest. To achieve this goal, the presence of a linear coupling calls for tailored decomposition approaches. We propose a fully distributed algorithm based on a primal decomposition approach and a suitable tightening of the coupling constraints. Agents repeatedly update local allocation vectors, which converge to an optimal resource allocation of an approximate version of the original problem. Based on such allocation vectors, agents are able to (locally) compute a mixed-integer solution, which is guaranteed to be feasible after a sufficiently large time. Asymptotic and finite-time suboptimality bounds are established for the computed solution. Numerical simulations highlight the efficacy of the proposed methodology.

On the Statistical and Information-theoretic Characteristics of Deep Network Representations

It has been common to argue or imply that a regularizer can be used to alter a statistical property of a hidden layer’s representation and thus improve generalization or performance of deep networks. For instance, dropout has been known to improve performance by reducing co-adaptation, and representational sparsity has been argued as a good characteristic because many data-generation processes have a small number of factors that are independent. In this work, we analytically and empirically investigate the popular characteristics of learned representations, including correlation, sparsity, dead unit, rank, and mutual information, and disprove many of the \textit{conventional wisdom}. We first show that infinitely many Identical Output Networks (IONs) can be constructed for any deep network with a linear layer, where any invertible affine transformation can be applied to alter the layer’s representation characteristics. The existence of ION proves that the correlation characteristics of representation is irrelevant to the performance. Extensions to ReLU layers are provided, too. Then, we consider sparsity, dead unit, and rank to show that only loose relationships exist among the three characteristics. It is shown that a higher sparsity or additional dead units do not imply a better or worse performance when the rank of representation is fixed. We also develop a rank regularizer and show that neither representation sparsity nor lower rank is helpful for improving performance even when the data-generation process has a small number of independent factors. Mutual information I(\mathbf{z}_l;\mathbf{x}) and I(\mathbf{z}_l;\mathbf{y}) are investigated, and we show that regularizers can affect I(\mathbf{z}_l;\mathbf{x}) and thus indirectly influence the performance. Finally, we explain how a rich set of regularizers can be used as a powerful tool for performance tuning.

Practical Bayesian Learning of Neural Networks via Adaptive Subgradient Methods

We introduce a novel framework for the estimation of the posterior distribution of the weights of a neural network, based on a new probabilistic interpretation of adaptive subgradient algorithms such as AdaGrad and Adam. Having a confidence measure of the weights allows several shortcomings of neural networks to be addressed. In particular, the robustness of the network can be improved by performing weight pruning based on signal-to-noise ratios from the weight posterior distribution. Using the MNIST dataset, we demonstrate that the empirical performance of Badam, a particular instance of our framework based on Adam, is competitive in comparison to related Bayesian approaches such as Bayes By Backprop.

Benchmarking Deep Sequential Models on Volatility Predictions for Financial Time Series

Volatility is a quantity of measurement for the price movements of stocks or options which indicates the uncertainty within financial markets. As an indicator of the level of risk or the degree of variation, volatility is important to analyse the financial market, and it is taken into consideration in various decision-making processes in financial activities. On the other hand, recent advancement in deep learning techniques has shown strong capabilities in modelling sequential data, such as speech and natural language. In this paper, we empirically study the applicability of the latest deep structures with respect to the volatility modelling problem, through which we aim to provide an empirical guidance for the theoretical analysis of the marriage between deep learning techniques and financial applications in the future. We examine both the traditional approaches and the deep sequential models on the task of volatility prediction, including the most recent variants of convolutional and recurrent networks, such as the dilated architecture. Accordingly, experiments with real-world stock price datasets are performed on a set of 1314 daily stock series for 2018 days of transaction. The evaluation and comparison are based on the negative log likelihood (NLL) of real-world stock price time series. The result shows that the dilated neural models, including dilated CNN and Dilated RNN, produce most accurate estimation and prediction, outperforming various widely-used deterministic models in the GARCH family and several recently proposed stochastic models. In addition, the high flexibility and rich expressive power are validated in this study.

Fast determinantal point processes via distortion-free intermediate sampling

Given a fixed n\times d matrix \mathbf{X}, where n\gg d, we study the complexity of sampling from a distribution over all subsets of rows where the probability of a subset is proportional to the squared volume of the parallelopiped spanned by the rows (a.k.a. a determinantal point process). In this task, it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). To that end, we propose a new determinantal point process algorithm which has the following two properties, both of which are novel: (1) a preprocessing step which runs in time O(\text{number-of-non-zeros}(\mathbf{X})\cdot\log n)+\text{poly}(d), and (2) a sampling step which runs in \text{poly}(d) time, independent of the number of rows n. We achieve this by introducing a new regularized determinantal point process (R-DPP), which serves as an intermediate distribution in the sampling procedure by reducing the number of rows from n to \text{poly}(d). Crucially, this intermediate distribution does not distort the probabilities of the target sample. Our key novelty in defining the R-DPP is the use of a Poisson random variable for controlling the probabilities of different subset sizes, leading to new determinantal formulas such as the normalization constant for this distribution. Our algorithm has applications in many diverse areas where determinantal point processes have been used, such as machine learning, stochastic optimization, data summarization and low-rank matrix reconstruction.

Estimation of Structural Break Point in Linear Regression Models

This paper proposes a point estimator of the break location for a one-time structural break in linear regression models. If the break magnitude is small, the least-squares estimator of the break date has two modes at ends of the finite sample period, regardless of the true break location. I suggest a modification of the least-squares objective function to solve this problem. The modified objective function incorporates estimation uncertainty that varies across potential break dates. The new break point estimator is consistent and has a unimodal finite sample distribution under a small break magnitude. A limit distribution is provided under a in-fill asymptotic framework which verifies that the new estimator outperforms the least-squares estimator.

Incorporating Relevant Knowledge in Context Modeling and Response Generation

To sustain engaging conversation, it is critical for chatbots to make good use of relevant knowledge. Equipped with a knowledge base, chatbots are able to extract conversation-related attributes and entities to facilitate context modeling and response generation. In this work, we distinguish the uses of attribute and entity and incorporate them into the encoder-decoder architecture in different manners. Based on the augmented architecture, our chatbot, namely Mike, is able to generate responses by referring to proper entities from the collected knowledge. To validate the proposed approach, we build a movie conversation corpus on which the proposed approach significantly outperforms other four knowledge-grounded models.

Ball: An R package for detecting distribution difference and association in metric spaces

The rapid development of modern technology facilitates the appearance of numerous unprecedented complex data which do not satisfy the axioms of Euclidean geometry, while most of the statistical hypothesis tests are available in Euclidean or Hilbert spaces. To properly analyze the data of more complicated structures, efforts have been made to solve the fundamental test problems in more general spaces. In this paper, a publicly available R package Ball is provided to implement Ball statistical test procedures for K-sample distribution comparison and test of mutual independence in metric spaces, which extend the test procedures for two sample distribution comparison and test of independence. The tailormade algorithms as well as engineering techniques are employed on the Ball package to speed up computation to the best of our ability. Two real data analyses and several numerical studies have been performed and the results certify the powerfulness of Ball package in analyzing complex data, e.g., spherical data and symmetric positive matrix data.

DeepSaucer: Unified Environment for Verifying Deep Neural Networks

In recent years, a number of methods for verifying DNNs have been developed. Because the approaches of the methods differ and have their own limitations, we think that a number of verification methods should be applied to a developed DNN. To apply a number of methods to the DNN, it is necessary to translate either the implementation of the DNN or the verification method so that one runs in the same environment as the other. Since those translations are time-consuming, a utility tool, named DeepSaucer, which helps to retain and reuse implementations of DNNs, verification methods, and their environments, is proposed. In DeepSaucer, code snippets of loading DNNs, running verification methods, and creating their environments are retained and reused as software assets in order to reduce cost of verifying DNNs. The feasibility of DeepSaucer is confirmed by implementing it on the basis of Anaconda, which provides virtual environment for loading a DNN and running a verification method. In addition, the effectiveness of DeepSaucer is demonstrated by usecase examples.

EA-LSTM: Evolutionary Attention-based LSTM for Time Series Prediction

Time series prediction with deep learning methods, especially long short-term memory neural networks (LSTMs), have scored significant achievements in recent years. Despite the fact that the LSTMs can help to capture long-term dependencies, its ability to pay different degree of attention on sub-window feature within multiple time-steps is insufficient. To address this issue, an evolutionary attention-based LSTM training with competitive random search is proposed for multivariate time series prediction. By transferring shared parameters, an evolutionary attention learning approach is introduced to the LSTMs model. Thus, like that for biological evolution, the pattern for importance-based attention sampling can be confirmed during temporal relationship mining. To refrain from being trapped into partial optimization like traditional gradient-based methods, an evolutionary computation inspired competitive random search method is proposed, which can well configure the parameters in the attention layer. Experimental results have illustrated that the proposed model can achieve competetive prediction performance compared with other baseline methods.

Encoding Implicit Relation Requirements for Relation Extraction: A Joint Inference Approach

Relation extraction is the task of identifying predefined relationship between entities, and plays an essential role in information extraction, knowledge base construction, question answering and so on. Most existing relation extractors make predictions for each entity pair locally and individually, while ignoring implicit global clues available across different entity pairs and in the knowledge base, which often leads to conflicts among local predictions from different entity pairs. This paper proposes a joint inference framework that employs such global clues to resolve disagreements among local predictions. We exploit two kinds of clues to generate constraints which can capture the implicit type and cardinality requirements of a relation. Those constraints can be examined in either hard style or soft style, both of which can be effectively explored in an integer linear program formulation. Experimental results on both English and Chinese datasets show that our proposed framework can effectively utilize those two categories of global clues and resolve the disagreements among local predictions, thus improve various relation extractors when such clues are applicable to the datasets. Our experiments also indicate that the clues learnt automatically from existing knowledge bases perform comparably to or better than those refined by human.

Skeptical Deep Learning with Distribution Correction

Recently deep neural networks have been successfully used for various classification tasks, especially for problems with massive perfectly labeled training data. However, it is often costly to have large-scale credible labels in real-world applications. One solution is to make supervised learning robust with imperfectly labeled input. In this paper, we develop a distribution correction approach that allows deep neural networks to avoid overfitting imperfect training data. Specifically, we treat the noisy input as samples from an incorrect distribution, which will be automatically corrected during our training process. We test our approach on several classification datasets with elaborately generated noisy labels. The results show significantly higher prediction and recovery accuracy with our approach compared to alternative methods.

A Very Brief and Critical Discussion on AutoML

This contribution presents a very brief and critical discussion on automated machine learning (AutoML), which is categorized here into two classes, referred to as narrow AutoML and generalized AutoML, respectively. The conclusions yielded from this discussion can be summarized as follows: (1) most existent research on AutoML belongs to the class of narrow AutoML; (2) advances in narrow AutoML are mainly motivated by commercial needs, while any possible benefit obtained is definitely at a cost of increase in computing burdens; (3)the concept of generalized AutoML has a strong tie in spirit with artificial general intelligence (AGI), also called ‘strong AI’, for which obstacles abound for obtaining pivotal progresses.

Long Short-Term Memory with Dynamic Skip Connections

In recent years, long short-term memory (LSTM) has been successfully used to model sequential data of variable length. However, LSTM can still experience difficulty in capturing long-term dependencies. In this work, we tried to alleviate this problem by introducing a dynamic skip connection, which can learn to directly connect two dependent words. Since there is no dependency information in the training data, we propose a novel reinforcement learning-based method to model the dependency relationship and connect dependent words. The proposed model computes the recurrent transition functions based on the skip connections, which provides a dynamic skipping advantage over RNNs that always tackle entire sentences sequentially. Our experimental results on three natural language processing tasks demonstrate that the proposed method can achieve better performance than existing methods. In the number prediction experiment, the proposed model outperformed LSTM with respect to accuracy by nearly 20%.

Deep Ensemble Bayesian Active Learning : Addressing the Mode Collapse issue in Monte Carlo dropout via Ensembles

In image classification tasks, the ability of deep CNNs to deal with complex image data has proven to be unrivalled. However, they require large amounts of labeled training data to reach their full potential. In specialised domains such as healthcare, labeled data can be difficult and expensive to obtain. Active Learning aims to alleviate this problem, by reducing the amount of labelled data needed for a specific task while delivering satisfactory performance. We propose DEBAL, a new active learning strategy designed for deep neural networks. This method improves upon the current state-of-the-art deep Bayesian active learning method, which suffers from the mode collapse problem. We correct for this deficiency by making use of the expressive power and statistical properties of model ensembles. Our proposed method manages to capture superior data uncertainty, which translates into improved classification performance. We demonstrate empirically that our ensemble method yields faster convergence of CNNs trained on the MNIST and CIFAR-10 datasets.

Stratified Constructive Disjunction and Negation in Constraint Programming

Constraint Programming (CP) is a powerful declarative programming paradigm combining inference and search in order to find solutions to various type of constraint systems. Dealing with highly disjunctive constraint systems is notoriously difficult in CP. Apart from trying to solve each disjunct independently from each other, there is little hope and effort to succeed in constructing intermediate results combining the knowledge originating from several disjuncts. In this paper, we propose If Then Else (ITE), a lightweight approach for implementing stratified constructive disjunction and negation on top of an existing CP solver, namely SICStus Prolog clp(FD). Although constructive disjunction is known for more than three decades, it does not have straightforward implementations in most CP solvers. ITE is a freely available library proposing stratified and constructive reasoning for various operators, including disjunction and negation, implication and conditional. Our preliminary experimental results show that ITE is competitive with existing approaches that handle disjunctive constraint systems.

Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence

In this paper we introduce evidence transfer for clustering, a deep learning method that can incrementally manipulate the latent representations of an autoencoder, according to external categorical evidence, in order to improve a clustering outcome. It is deployed on a baseline solution to reduce the cross entropy between the external evidence and an extension of the latent space. By evidence transfer we define the process by which the categorical outcome of an external, auxiliary task is exploited to improve a primary task, in this case representation learning for clustering. Our proposed method makes no assumptions regarding the categorical evidence presented, nor the structure of the latent space. We compare our method, against the baseline solution by performing k-means clustering before and after its deployment. Experiments with three different kinds of evidence show that our method effectively manipulates the latent representations when introduced with real corresponding evidence, while remaining robust when presented with low quality evidence.

A Hierarchical Framework for Relation Extraction with Reinforcement Learning

Most existing methods determine relation types only after all the entities have been recognized, thus the interaction between relation types and entity mentions is not fully modeled. This paper presents a novel paradigm to deal with relation extraction by regarding the related entities as the arguments of a relation. We apply a hierarchical reinforcement learning (HRL) framework in this paradigm to enhance the interaction between entity mentions and relation types. The whole extraction process is decomposed into a hierarchy of two-level RL policies for relation detection and entity extraction respectively, so that it is more feasible and natural to deal with overlapping relations. Our model was evaluated on public datasets collected via distant supervision, and results show that it gains better performance than existing methods and is more powerful for extracting overlapping relations.

Sequential Subspace Changepoint Detection

We consider the sequential changepoint detection problem of detecting changes that are characterized by a subspace structure which is manifested in the covariance matrix. In particular, the covariance structure changes from an identity matrix to an unknown spiked covariance model. We consider three sequential changepoint detection procedures: The exact cumulative sum (CUSUM) that assumes knowledge of all parameters, the largest eigenvalue procedure and a novel Subspace-CUSUM algorithm with the last two being used for the case when unknown parameters are present. By leveraging the extreme eigenvalue distribution from random matrix theory and modeling the non-negligible temporal correlation in the sequence of detection statistics due to the sliding window approach, we provide theoretical approximations to the average run length (ARL) and the expected detection delay (EDD) for the largest eigenvalue procedure. The three methods are compared to each other using simulations.

A Convergence Theory for Deep Learning via Over-Parameterization

Deep neural networks (DNNs) have demonstrated dominating performance in many fields, e.g., computer vision, natural language progressing, and robotics. Since AlexNet, the neural networks used in practice are going wider and deeper. On the theoretical side, a long line of works have been focusing on why we can train neural networks when there is only one hidden layer. The theory of multi-layer neural networks remains somewhat unsettled. We present a new theory to understand the convergence of training DNNs. We only make two assumptions: the inputs do not degenerate and the network is over-parameterized. The latter means the number of hidden neurons is sufficiently large: polynomial in n, the number of training samples and in L, the number of layers. We show on the training dataset, starting from randomly initialized weights, simple algorithms such as stochastic gradient descent attain 100% accuracy in classification tasks, or minimize \ell_2 regression loss in linear convergence rate, with a number of iterations that only scale polynomial in n and L. Our theory applies to the widely-used but non-smooth ReLU activation, and to any smooth and possibly non-convex loss functions. In terms of network architectures, our theory at least applies to fully-connected neural networks, convolutional neural networks (CNN), and residual neural networks (ResNet).

Deep Compression of Sum-Product Networks on Tensor Networks

Sum-product networks (SPNs) represent an emerging class of neural networks with clear probabilistic semantics and superior inference speed over graphical models. This work reveals a strikingly intimate connection between SPNs and tensor networks, thus leading to a highly efficient representation that we call tensor SPNs (tSPNs). For the first time, through mapping an SPN onto a tSPN and employing novel optimization techniques, we demonstrate remarkable parameter compression with negligible loss in accuracy.

A generic framework for privacy preserving deep learning

We detail a new framework for privacy preserving deep learning and discuss its assets. The framework puts a premium on ownership and secure processing of data and introduces a valuable representation based on chains of commands and tensors. This abstraction allows one to implement complex privacy preserving constructs such as Federated Learning, Secure Multiparty Computation, and Differential Privacy while still exposing a familiar deep learning API to the end-user. We report early results on the Boston Housing and Pima Indian Diabetes datasets. While the privacy features apart from Differential Privacy do not impact the prediction accuracy, the current implementation of the framework introduces a significant overhead in performance, which will be addressed at a later stage of the development. We believe this work is an important milestone introducing the first reliable, general framework for privacy preserving deep learning.

An Overview of Computational Approaches for Analyzing Interpretation

It is said that beauty is in the eye of the beholder. But how exactly can we characterize such discrepancies in interpretation? For example, are there any specific features of an image that makes person A regard an image as beautiful while person B finds the same image displeasing? Such questions ultimately aim at explaining our individual ways of interpretation, an intention that has been of fundamental importance to the social sciences from the beginning. More recently, advances in computer science brought up two related questions: First, can computational tools be adopted for analyzing ways of interpretation? Second, what if the ‘beholder’ is a computer model, i.e., how can we explain a computer model’s point of view? Numerous efforts have been made regarding both of these points, while many existing approaches focus on particular aspects and are still rather separate. With this paper, in order to connect these approaches we introduce a theoretical framework for analyzing interpretation, which is applicable to interpretation of both human beings and computer models. We give an overview of relevant computational approaches from various fields, and discuss the most common and promising application areas. The focus of this paper lies on interpretation of text and image data, while many of the presented approaches are applicable to other types of data as well.

Understanding and Predicting Links in Graphs: A Persistent Homology Perspective

Persistent Homology is a powerful tool in Topological Data Analysis (TDA) to capture topological properties of data succinctly at different spatial resolutions. For graphical data, shape, and structure of the neighborhood of individual data items (nodes) is an essential means of characterizing their properties. In this paper, we propose the use of persistent homology methods to capture structural and topological properties of graphs and use it to address the problem of link prediction. We evaluate our approach on seven different real-world datasets and offer directions for future work.

Automated Multi-Label Classification based on ML-Plan

Automated machine learning (AutoML) has received increasing attention in the recent past. While the main tools for AutoML, such as Auto-WEKA, TPOT, and auto-sklearn, mainly deal with single-label classification and regression, there is very little work on other types of machine learning tasks. In particular, there is almost no work on automating the engineering of machine learning applications for multi-label classification. This paper makes two contributions. First, it discusses the usefulness and feasibility of an AutoML approach for multi-label classification. Second, we show how the scope of ML-Plan, an AutoML-tool for multi-class classification, can be extended towards multi-label classification using MEKA, which is a multi-label extension of the well-known Java library WEKA. The resulting approach recursively refines MEKA’s multi-label classifiers, which sometimes nest another multi-label classifier, up to the selection of a single-label base learner provided by WEKA. In our evaluation, we find that the proposed approach yields superb results and performs significantly better than a set of baselines.

An estimation of the greedy algorithm’s accuracy for a set cover problem instance
New Tribonacci Recurrence Relations and Addition Formulas
Attitude and Angular Velocity Tracking for a Rigid Body using Geometric Methods on the Two-Sphere (Stability Proof)
Broadcasting on Random Directed Acyclic Graphs
On contraction analysis for hybrid systems
Satyam: Democratizing Groundtruth for Machine Vision
Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory
The Evolution of Gene Dominance through the Baldwin Effect
Federated Byzantine Quorum Systems (Extended Version)
Voronoi Partition-based Scenario Reduction for Fast Sampling-based Stochastic Reachability Computation of LTI Systems
Spiral Fermi Surfaces in Quasicrystals and Twisted Bilayer Graphene: Signatures in Quantum Oscillations
Plug-In Stochastic Gradient Method
SpeedReader: Reader Mode Made Fast and Private
Gender Effect on Face Recognition for a Large Longitudinal Database
Comparison of partition functions in a space-time random environment
New CleverHans Feature: Better Adversarial Robustness Evaluations with Attack Bundling
Variational Bayesian hierarchical regression for data analysis
Can Deep Learning Outperform Modern Commercial CT Image Reconstruction Methods?
NEMGAN: Noise Engineered Mode-matching GAN
Deep Learning Predicts Hip Fracture using Confounding Patient and Healthcare Variables
A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-Trained Neural Network Acoustic Models
Maximizing Diversity of Opinion in Social Networks
Validating Hyperspectral Image Segmentation
Learning Energy Based Inpainting for Optical Flow
Symmetries of the Quaternionic Ginibre Ensemble
Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
New bounds on the maximum size of Sperner partition systems
Universal Hard-label Black-Box Perturbations: Breaking Security-Through-Obscurity Defenses
Close and ordinary social contacts: how important are they in promoting large-scale contagion?
A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models
Semantic and Contrast-Aware Saliency
Securing Behavior-based Opinion Spam Detection
Analysis of Fleet Modularity in an Artificial Intelligence-Based Attacker-Defender Game
Density estimation for shift-invariant multidimensional distributions
A Fundamental Measure of Treatment Effect Heterogeneity
Inducibility of directed paths
Adaptive Task Allocation for Mobile Edge Learning
Imagining an Engineer: On GAN-Based Data Augmentation Perpetuating Biases
Neural sequence labeling for Vietnamese POS Tagging and NER
A new insight into the secondary path modeling problem in active noise control
RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets
Typeface Completion with Generative Adversarial Networks
Towards Instance-Optimal Private Query Release
Energy-Efficient Offloading in Mobile Edge Computing with Edge-Cloud Collaboration
M2M-GAN: Many-to-Many Generative Adversarial Transfer Learning for Person Re-Identification
A Fully Automated System for Sizing Nasal PAP Masks Using Facial Photographs
Artificial neural networks for density-functional optimizations in fermionic systems
Nonlinear Modal Decoupling Based Power System Transient Stability Analysis
Codeword Position Index based Sparse Code Multiple Access System
Addition-deletion theorem for free hyperplane arrangements and combinatorics
A Theoretically Guaranteed Deep Optimization Framework for Robust Compressive Sensing MRI
Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection
Feature Analysis for Classification of Physical Actions using surface EMG Data
On complexity of cyclic coverings of graphs
On rationality of generating function for the number of spanning trees in circulant graphs
Gradient Descent Finds Global Minima of Deep Neural Networks
A Sufficient Condition for Small-Signal Stability and Construction of Robust Stability Region
Homomorphism bounds of signed bipartite $K_4$-minor-free graphs and edge-colorings of $2k$-regular $K_4$-minor-free multigraphs
Football and Beer – a Social Media Analysis on Twitter in Context of the FIFA Football World Cup 2018
The trouble with tensor ring decompositions
Neural Stain Normalization and Unsupervised Classification of Cell Nuclei in Histopathological Breast Cancer Images
Invariant projections for operators that are free over the diagonal
RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement
Increased dose rate precision in combined $α$ and $β$ counting in the $μ$Dose system – a probabilistic approach to data analysis
The Price of Governance: A Middle Ground Solution to Coordination in Organizational Control
How does stock market volatility react to oil shocks?
On the Inducibility of Stackelberg Equilibrium for Security Games
Changing the Image Memorability: From Basic Photo Editing to GANs
Image-Level Attentional Context Modeling Using Nested-Graph Neural Networks
Deterministic and stochastic inexact regularization algorithms for nonconvex ptimization with optimal complexity
Information Theoretic Bounds Based Channel Quantization Design for Emerging Memories
Minimizing and Computing the Inverse Geodesic Length on Trees
Multilevel Schwarz preconditioners for singularly perturbed symmetric reaction-diffusion systems
Unique End of Potential Line
Computation Load Balancing Real-Time Model Predictive Control in Urban Traffic Networks
An Average of the Human Ear Canal: Recovering Acoustical Properties via Shape Analysis
MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets
Precision of the ENDGame: Mixed-precision arithmetic in the iterative solver of the Unified Model
Sample-Efficient Policy Learning based on Completely Behavior Cloning
The invariance principle and the large deviation for the biased random walk on $\mathbb{Z}^d$
An external validation of Thais’ cardiovascular 10-year risk assessment in the southern Thailand
Targeting Solutions in Bayesian Multi-Objective Optimization: Sequential and Parallel Versions
Non-convex Lasso-kind approach to compressed sensing for finite-valued signals
Multimodal Grounding for Sequence-to-Sequence Speech Recognition
Learning Semantic Representations for Novel Words: Leveraging Both Form and Context
Suggesting Cooking Recipes Through Simulation and Bayesian Optimization
Quasi-Perfect Stackelberg Equilibrium
A first sketch: Construction of model defect priors inspired by dynamic time warping
Multimodal One-Shot Learning of Speech and Images
Cross and Learn: Cross-Modal Self-Supervision
Parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications
Exploiting Capacity of Sewer System Using Unsupervised Learning Algorithms Combined with Dimensionality Reduction
On arithmetic index in the generalized Thue-Morse word
Graded Betti numbers of balanced simplicial complexes
Breaking Landauer’s Limit\\Using Quantum-dot Cellular Automata
Performance Guarantees for Homomorphisms Beyond Markov Decision Processes
On Conditional Correlations
Toward Autonomous Rotation-Aware Unmanned Aerial Grasping
Central limit theorems for patterns in multiset permutations and set partitions
Post-randomization Biomarker Effect Modification in an HIV Vaccine Clinical Trial
Vector Gaussian CEO Problem Under Logarithmic Loss and Applications
Resolving a Feedback Bottleneck of Multi-Antenna Coded Caching
Modeling Rape Reporting Delays Using Spatial, Temporal and Social Features
Matrix Recovery with Implicitly Low-Rank Data
A Complexity Dichotomy for Critical Values of the b-Chromatic Number of Graphs
Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions
On a relation between packing and covering densities of convex bodies
Strongly unimodal systems
Role of initial conditions in the dynamics of quantum glassy systems
Insights into Bootstrap Percolation: Its Equivalence with k-core Percolation and the Giant Component
Uncertainty relations and sparse signal recovery
Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision
Reachability-based safe learning for optimal control problem
The layer complexity of Arthur-Merlin-like communication
An output-sensitive polynomial Time Algorithm to partition a Sequence of Integers into Subsets with equal Sums
Convolutional neural networks in phase space and inverse problems
Adversarial Uncertainty Quantification in Physics-Informed Neural Networks
The control set of a linear control system on the two dimensional solvable Lie group
Representation-Oblivious Error Correction by Natural Redundancy
The discrete cosine transform on triangles
Counting the Number of Quasiplatonic Topological Actions of the Cyclic Group on Surfaces
Splenomegaly Segmentation on Multi-modal MRI using Deep Convolutional Networks
A Microprocessor implemented in 65nm CMOS with Configurable and Bit-scalable Accelerator for Programmable In-memory Computing
Polynomial-time Approximation Scheme for Minimum k-cut in Planar and Minor-free Graphs
Cusp Universality for Random Matrices II: The Real Symmetric Case
Bernstein-von Mises theorems and uncertainty quantification for linear inverse problems
Pure $\mathcal{O}$-sequences arising from $2$-dimensional PS ear-decomposable simplicial complexes
A note on simultaneous representation problem for interval and circular-arc graphs
On convexity and solution concepts in cooperative interval games
Block Belief Propagation for Parameter Learning in Markov Random Fields
Two Party Distribution Testing: Communication and Security
Benefits of Coded Placement for Networks with Heterogeneous Cache Sizes

Book Memo: “Handbook of Signal Processing Systems”

In this new edition of the Handbook of Signal Processing Systems, many of the chapters from the previous editions have been updated, and several new chapters have been added. The new contributions include chapters on signal processing methods for light field displays, throughput analysis of dataflow graphs, modeling for reconfigurable signal processing systems, fast Fourier transform architectures, deep neural networks, programmable architectures for histogram of oriented gradients processing, high dynamic range video coding, system-on-chip architectures for data analytics, analysis of finite word-length effects in fixed-point systems, and models of architecture. There are more than 700 tables and illustrations; in this edition over 300 are in color. This new edition of the handbook is organized in three parts. Part I motivates representative applications that drive and apply state-of-the art methods for design and implementation of signal processing systems; Part II discusses architectures for implementing these applications; and Part III focuses on compilers, as well as models of computation and their associated design tools and methodologies.

R Packages worth a look

Integral of B-Spline Functions (ibs)
Calculate B-spline basis functions with a given set of knots and order, or a B-spline function with a given set of knots and order and set of de Boor p …

Cox Proportional Hazards Regression for Right Truncated Data (coxrt)
Fits Cox regression based on retrospectively ascertained times-to-event. The method uses Inverse-Probability-Weighting estimating equations.

Display Information About Nested Subsets of a Data Frame (vtree)
A tool for drawing ‘variable trees’. Variable trees display information about hierarchical subsets of a data frame defined by values of categorical var …