If you did not already know

Non-convex Conditional Gradient Sliding (NCGS) google
We investigate a projection free method, namely conditional gradient sliding on batched, stochastic and finite-sum non-convex problem. CGS is a smart combination of Nesterov’s accelerated gradient method and Frank-Wolfe (FW) method, and outperforms FW in the convex setting by saving gradient computations. However, the study of CGS in the non-convex setting is limited. In this paper, we propose the non-convex conditional gradient sliding (NCGS) which surpasses the non-convex Frank-Wolfe method in batched, stochastic and finite-sum setting. …

Innovation Management google
Innovation management is the management of innovation processes. It refers both to product and organizational innovation. Innovation management includes a set of tools that allow managers and engineers to cooperate with a common understanding of processes and goals. Innovation management allows the organization to respond to external or internal opportunities, and use its creativity to introduce new ideas, processes or products. It is not relegated to R&D; it involves workers at every level in contributing creatively to a company’s product development, manufacturing and marketing. …

Central Network (CentralNet) google
This paper proposes a novel multimodal fusion approach, aiming to produce best possible decisions by integrating information coming from multiple media. While most of the past multimodal approaches either work by projecting the features of different modalities into the same space, or by coordinating the representations of each modality through the use of constraints, our approach borrows from both visions. More specifically, assuming each modality can be processed by a separated deep convolutional network, allowing to take decisions independently from each modality, we introduce a central network linking the modality specific networks. This central network not only provides a common feature embedding but also regularizes the modality specific networks through the use of multi-task learning. The proposed approach is validated on 4 different computer vision tasks on which it consistently improves the accuracy of existing multimodal fusion approaches. …


R Packages worth a look

Derive Polygenic Risk Score Based on Emprical Bayes Theory (EBPRS)
EB-PRS is a novel method that leverages information for effect sizes across all the markers to improve the prediction accuracy. No parameter tuning is …

A ‘sparklyr’ Extension for Nested Data (sparklyr.nested)
A ‘sparklyr’ extension adding the capability to work easily with nested data.

A Two-Sample Test for the Equality of Distributions for High-Dimensional Data (TwoSampleTest.HD)
For high-dimensional data whose main feature is a large number, p, of variables but a small sample size, the null hypothesis that the marginal distribu …

Whats new on arXiv

A causal inference framework for cancer cluster investigations using publicly available data

Often, a community becomes alarmed when high rates of cancer are noticed, and residents suspect that the cancer cases could be caused by a known source of hazard. In response, the CDC recommends that departments of health perform a standardized incidence ratio (SIR) analysis to determine whether the observed cancer incidence is higher than expected. This approach has several limitations that are well documented in the literature. In this paper we propose a novel causal inference approach to cancer cluster investigations, rooted in the potential outcomes framework. Assuming that a source of hazard representing a potential cause of increased cancer rates in the community is identified a priori, we introduce a new estimand called the causal SIR (cSIR). The cSIR is a ratio defined as the expected cancer incidence in the exposed population divided by the expected cancer incidence under the (counterfactual) scenario of no exposure. To estimate the cSIR we need to overcome two main challenges: 1) identify unexposed populations that are as similar as possible to the exposed one to inform estimation under the counterfactual scenario of no exposure, and 2) make inference on cancer incidence in these unexposed populations using publicly available data that are often available at a much higher level of spatial aggregation than what is desired. We overcome the first challenge by relying on matching. We overcome the second challenge by developing a Bayesian hierarchical model that borrows information from other sources to impute cancer incidence at the desired finer level of spatial aggregation. We apply our proposed approach to determine whether trichloroethylene vapor exposure has caused increased cancer incidence in Endicott, NY.

Human-like machine learning: limitations and suggestions

This paper attempts to address the issues of machine learning in its current implementation. It is known that machine learning algorithms require a significant amount of data for training purposes, whereas recent developments in deep learning have increased this requirement dramatically. The performance of an algorithm depends on the quality of data and hence, algorithms are as good as the data they are trained on. Supervised learning is developed based on human learning processes by analysing named (i.e. annotated) objects, scenes and actions. Whether training on large quantities of data (i.e. big data) is the right or the wrong approach, is debatable. The fact is, that training algorithms the same way we learn ourselves, comes with limitations. This paper discusses the issues around applying a human-like approach to train algorithms and the implications of this approach when using limited data. Several current studies involving non-data-driven algorithms and natural examples are also discussed and certain alternative approaches are suggested.

On the Robustness of Information-Theoretic Privacy Measures and Mechanisms

Consider a data publishing setting for a dataset composed of non-private features correlated with a set of private features not necessarily present in the dataset. The objective of the publisher is to maximize the amount of information about the non-private features in a revealed dataset (utility), while keeping the information leaked about the private attributes bounded (privacy). Here, both privacy and utility are measured using information leakage measures that arise in adversarial settings. We consider a practical setting wherein the publisher uses an estimate of the joint distribution of the features to design the privacy mechanism. For any privacy mechanism, we provide probabilistic upper bounds for the discrepancy between the privacy guarantees for the empirical and true distributions, and similarly for utility. These bounds follow from our main technical results regarding the Lipschitz continuity of the considered information leakage measures. We also establish the statistical consistency of the notion of optimal privacy mechanism. Finally, we introduce and study the family of uniform privacy mechanisms, mechanisms designed upon an estimate of the joint distribution which are capable of providing privacy to a whole neighborhood of the estimated distribution, and thereby, guaranteeing privacy for the true distribution with high probability.

Constraint-based Sequential Pattern Mining with Decision Diagrams

Constrained sequential pattern mining aims at identifying frequent patterns on a sequential database of items while observing constraints defined over the item attributes. We introduce novel techniques for constraint-based sequential pattern mining that rely on a multi-valued decision diagram representation of the database. Specifically, our representation can accommodate multiple item attributes and various constraint types, including a number of non-monotone constraints. To evaluate the applicability of our approach, we develop an MDD-based prefix-projection algorithm and compare its performance against a typical generate-and-check variant, as well as a state-of-the-art constraint-based sequential pattern mining algorithm. Results show that our approach is competitive with or superior to these other methods in terms of scalability and efficiency.

Unsupervised learning with contrastive latent variable models

In unsupervised learning, dimensionality reduction is an important tool for data exploration and visualization. Because these aims are typically open-ended, it can be useful to frame the problem as looking for patterns that are enriched in one dataset relative to another. These pairs of datasets occur commonly, for instance a population of interest vs. control or signal vs. signal free recordings.However, there are few methods that work on sets of data as opposed to data points or sequences. Here, we present a probabilistic model for dimensionality reduction to discover signal that is enriched in the target dataset relative to the background dataset. The data in these sets do not need to be paired or grouped beyond set membership. By using a probabilistic model where some structure is shared amongst the two datasets and some is unique to the target dataset, we are able to recover interesting structure in the latent space of the target dataset. The method also has the advantages of a probabilistic model, namely that it allows for the incorporation of prior information, handles missing data, and can be generalized to different distributional assumptions. We describe several possible variations of the model and demonstrate the application of the technique to de-noising, feature selection, and subgroup discovery settings.

Newton Methods for Convolutional Neural Networks

Deep learning involves a difficult non-convex optimization problem, which is often solved by stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some situations. Recently, Newton methods have been investigated as an alternative optimization technique, but nearly all existing studies consider only fully-connected feedforward neural networks. They do not investigate other types of networks such as Convolutional Neural Networks (CNN), which are more commonly used in deep-learning applications. One reason is that Newton methods for CNN involve complicated operations, and so far no works have conducted a thorough investigation. In this work, we give details of all building blocks including function, gradient, and Jacobian evaluation, and Gauss-Newton matrix-vector products. These basic components are very important because with them further developments of Newton methods for CNN become possible. We show that an efficient MATLAB implementation can be done in just several hundred lines of code and demonstrate that the Newton method gives competitive test accuracy.

Multivariate Time-series Similarity Assessment via Unsupervised Representation Learning and Stratified Locality Sensitive Hashing: Application to Early Acute Hypotensive Episode Detection

Timely prediction of clinically critical events in Intensive Care Unit (ICU) is important for improving care and survival rate. Most of the existing approaches are based on the application of various classification methods on explicitly extracted statistical features from vital signals. In this work, we propose to eliminate the high cost of engineering hand-crafted features from multivariate time-series of physiologic signals by learning their representation with a sequence-to-sequence auto-encoder. We then propose to hash the learned representations to enable signal similarity assessment for the prediction of critical events. We apply this methodological framework to predict Acute Hypotensive Episodes (AHE) on a large and diverse dataset of vital signal recordings. Experiments demonstrate the ability of the presented framework in accurately predicting an upcoming AHE.

Predictive Modeling with Delayed Information: a Case Study in E-commerce Transaction Fraud Control

In Business Intelligence, accurate predictive modeling is the key for providing adaptive decisions. We studied predictive modeling problems in this research which was motivated by real-world cases that Microsoft data scientists encountered while dealing with e-commerce transaction fraud control decisions using transaction streaming data in an uncertain probabilistic decision environment. The values of most online transactions related features can return instantly, while the true fraud labels only return after a stochastic delay. Using partially mature data directly for predictive modeling in an uncertain probabilistic decision environment would lead to significant inaccuracy on risk decision-making. To improve accurate estimation of the probabilistic prediction environment, which leads to more accurate predictive modeling, two frameworks, Current Environment Inference (CEI) and Future Environment Inference (FEI), are proposed. These frameworks generated decision environment related features using long-term fully mature and short-term partially mature data, and the values of those features were estimated using varies of learning methods, including linear regression, random forest, gradient boosted tree, artificial neural network, and recurrent neural network. Performance tests were conducted using some e-commerce transaction data from Microsoft. Testing results suggested that proposed frameworks significantly improved the accuracy of decision environment estimation.

Deep Learning in the Wavelet Domain

This paper examines the possibility of, and the possible advantages to learning the filters of convolutional neural networks (CNNs) for image analysis in the wavelet domain. We are stimulated by both Mallat’s scattering transform and the idea of filtering in the Fourier domain. It is important to explore new spaces in which to learn, as these may provide inherent advantages that are not available in the pixel space. However, the scattering transform is limited by its inability to learn in between scattering orders, and any Fourier domain filtering is limited by the large number of filter parameters needed to get localized filters. Instead we consider filtering in the wavelet domain with learnable filters. The wavelet space allows us to have local, smooth filters with far fewer parameters, and learnability can give us flexibility. We present a novel layer which takes CNN activations into the wavelet space, learns parameters and returns to the pixel space. This allows it to be easily dropped in to any neural network without affecting the structure. As part of this work, we show how to pass gradients through a multirate system and give preliminary results.

Concept Learning through Deep Reinforcement Learning with Memory-Augmented Neural Networks

Deep neural networks have shown superior performance in many regimes to remember familiar patterns with large amounts of data. However, the standard supervised deep learning paradigm is still limited when facing the need to learn new concepts efficiently from scarce data. In this paper, we present a memory-augmented neural network which is motivated by the process of human concept learning. The training procedure, imitating the concept formation course of human, learns how to distinguish samples from different classes and aggregate samples of the same kind. In order to better utilize the advantages originated from the human behavior, we propose a sequential process, during which the network should decide how to remember each sample at every step. In this sequential process, a stable and interactive memory serves as an important module. We validate our model in some typical one-shot learning tasks and also an exploratory outlier detection problem. In all the experiments, our model gets highly competitive to reach or outperform those strong baselines.

Theoretical Perspective of Deep Domain Adaptation

Deep domain adaptation has recently undergone a big success. Compared with shallow domain adaptation, deep domain adaptation has shown higher predictive performance and stronger capacity to tackle structural data (e.g., image and sequential data). The underlying idea of deep domain adaptation is to bridge the gap between source and target domains in a joint feature space so that a supervised classifier trained on labeled source data can be nicely transferred to the target domain. This idea is certainly appealing and motivating, but under the theoretical perspective, none of the theory has been developed to support this. In this paper, we have developed a rigorous theory to explain why we can bridge the relevant gap in an intermediate joint space. Under the light of our proposed theory, it turns out that there is a strong connection between deep domain adaptation and Wasserstein (WS) distance. More specifically, our theory revolves the following points: i) first, we propose a context wherein we can perfectly perform a transfer learning and ii) second, we further prove that by means of bridging the relevant gap and minimizing some reconstruction errors we are minimizing a WS distance between the push forward source distribution and the target distribution via a transport that maps from the source to target domains.

Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models

In machine learning, a nonparametric forecasting algorithm for time series data has been proposed, called the kernel spectral hidden Markov model (KSHMM). In this paper, we propose a technique for short-term wind-speed prediction based on KSHMM. We numerically compared the performance of our KSHMM-based forecasting technique to other techniques with machine learning, using wind-speed data offered by the National Renewable Energy Laboratory. Our results demonstrate that, compared to these methods, the proposed technique offers comparable or better performance.

Model-based Approximate Query Processing

Interactive visualizations are arguably the most important tool to explore, understand and convey facts about data. In the past years, the database community has been working on different techniques for Approximate Query Processing (AQP) that aim to deliver an approximate query result given a fixed time bound to support interactive visualizations better. However, classical AQP approaches suffer from various problems that limit the applicability to support the ad-hoc exploration of a new data set: (1) Classical AQP approaches that perform online sampling can support ad-hoc exploration queries but yield low quality if executed over rare subpopulations. (2) Classical AQP approaches that rely on offline sampling can use some form of biased sampling to mitigate these problems but require a priori knowledge of the workload, which is often not realistic if users want to explore a new database. In this paper, we present a new approach to AQP called Model-based Approximate Query Processing that leverages generative models learned over the complete database to answer SQL queries at interactive speeds. Different from classical AQP approaches, generative models allow us to compute responses to ad-hoc queries and deliver high-quality estimates also over rare subpopulations at the same time. In our experiments with real and synthetic data sets, we show that Model-based AQP can in many scenarios return more accurate results in a shorter runtime. Furthermore, we think that our techniques of using generative models presented in this paper can not only be used for AQP in databases but also has applications for other database problems including Query Optimization as well as Data Cleaning.

Selective Feature Connection Mechanism: Concatenating Multi-layer CNN Features with a Feature Selector

Different layers of deep convolutional neural networks(CNN) can encode different-level information. High-layer features always contain more semantic information, and low-layer features contain more detail information. However, low-layer features suffer from the background clutter and semantic ambiguity. During visual recognition, the feature combination of the low-layer and high-level features plays an important role in context modulation. Directly combining the high-layer and low-layer features, the background clutter and semantic ambiguity may be caused due to the introduction of detailed information.In this paper, we propose a general network architecture to concatenate CNN features of different layers in a simple and effective way, called Selective Feature Connection Mechanism (SFCM). Low level features are selectively linked to high-level features with an feature selector which is generated by high-level features. The proposed connection mechanism can effectively overcome the above-mentioned drawbacks. We demonstrate the effectiveness, superiority, and universal applicability of this method on many challenging computer vision tasks, such as image classification, scene text detection, and image-to-image translation.

Contextual Care Protocol using Neural Networks and Decision Trees

A contextual care protocol is used by a medical practitioner for patient healthcare, given the context or situation that the specified patient is in. This paper proposes a method to build an automated self-adapting protocol which can help make relevant, early decisions for effective healthcare delivery. The hybrid model leverages neural networks and decision trees. The neural network estimates the chances of each disease and each tree in the decision trees represents care protocol for a disease. These trees are subject to change in case of aberrations found by the diagnosticians. These corrections or prediction errors are clustered into similar groups for scalability and review by the experts. The corrections as suggested by the experts are incorporated into the model.

Minimizing the Age of Information from Sensors with Correlated Observations

The Age of Information (AoI) metric has recently received much attention as a measure of freshness of information in a network. In this paper, we study the average AoI in a generic setting with correlated sensor information, which can be related to multiple Internet of Things (IoT) scenarios. The system consists of physical sources with discrete-time updates that are observed by a set of sensors. Each source update may be observed by multiple sensors, and hence the sensor observations are correlated. We devise a model that is simple, but still capable to capture the main tradeoffs. We propose two sensor scheduling policies that minimize the AoI of the sources; one that requires the system parameters to be known a priori, and one based on contextual bandits in which the parameters are unknown and need to be learned. We show that both policies are able to exploit the sensor correlation to reduce the AoI, which result in a large reduction in AoI compared to the use of schedules that are random or based on round-robin.

Towards Explainable Deep Learning for Credit Lending: A Case Study

Deep learning adoption in the financial services industry has been limited due to a lack of model interpretability. However, several techniques have been proposed to explain predictions made by a neural network. We provide an initial investigation into these techniques for the assessment of credit risk with neural networks.

Multi-cell LSTM Based Neural Language Model

Language models, being at the heart of many NLP problems, are always of great interest to researchers. Neural language models come with the advantage of distributed representations and long range contexts. With its particular dynamics that allow the cycling of information within the network, `Recurrent neural network’ (RNN) becomes an ideal paradigm for neural language modeling. Long Short-Term Memory (LSTM) architecture solves the inadequacies of the standard RNN in modeling long-range contexts. In spite of a plethora of RNN variants, possibility to add multiple memory cells in LSTM nodes was seldom explored. Here we propose a multi-cell node architecture for LSTMs and study its applicability for neural language modeling. The proposed multi-cell LSTM language models outperform the state-of-the-art results on well-known Penn Treebank (PTB) setup.

Exploiting Class Learnability in Noisy Data

In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets harvested via these means, sometimes resulting in entire classes of data on which learned classifiers generalize poorly. For real world applications, we argue that it can be beneficial to avoid training on such classes entirely. In this work, we aim to explore the classes in a given data set, and guide supervised training to spend time on a class proportional to its learnability. By focusing the training process, we aim to improve model generalization on classes with a strong signal. To that end, we develop an online algorithm that works in conjunction with classifier and training algorithm, iteratively selecting training data for the classifier based on how well it appears to generalize on each class. Testing our approach on a variety of data sets, we show our algorithm learns to focus on classes for which the model has low generalization error relative to strong baselines, yielding a classifier with good performance on learnable classes.

DeepCSO: Forecasting of Combined Sewer Overflow at a Citywide Level using Multi-task Deep Learning
Enhancing Operation of a Sewage Pumping Station for Inter Catchment Wastewater Transfer by Using Deep Learning and Hydraulic Model
Modelling student online behaviour in a virtual learning environment
Cluster analysis of homicide rates in the Brazilian state of Goias from 2002 to 2014
Baire categorical aspects of first passage percolation II
Modeling car-following behavior on urban expressways in Shanghai: A naturalistic driving study
Prediction of Preliminary Maximum Wing Bending Moments under Discrete Gust
Transportation inequalities for stochastic wave equation
The Amortized Analysis of a Non-blocking Chromatic Tree
On the Further Structure of the Finite Free Convolutions
Fault-Tolerant Metric Dimension of $P(n,2)$ with Prism Graph
Randomisation Algorithms for Large Sparse Matrices
How to get meaningful and correct results from your finite element model
Machine Learning Analysis of Heterogeneity in the Effect of Student Mindset Interventions
On the use of FHT, its modification for practical applications and the structure of Hough image
Describing many-body localized systems in thermal environments
Catch and Prolong: recurrent neural network for seeking track-candidates
Estimation of Multivariate Wrapped Models for Data in Torus
Faster manipulation of large quantum circuits using wire label reference diagrams
Performance Estimation of Synthesis Flows cross Technologies using LSTMs and Transfer Learning
Distinguishing number of Urysohn metric spaces
Incentivizing Exploration with Unbiased Histories
Verification of Recurrent Neural Networks Through Rule Extraction
A Fast Method for Array Response Adjustment with Phase-Only Constraint
A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
Natural Environment Benchmarks for Reinforcement Learning
Limit distributions of the upper order statistics for the Lévy-frailty Marshall-Olkin distribution
Dynamic Behavior Control of Induction Motor with STATCOM
Focus Quality Assessment of High-Throughput Whole Slide Imaging in Digital Pathology
Arterial Blood Pressure Feature Estimation Using Photoplethysmography
Unsupervised domain adaptation for medical imaging segmentation with self-ensembling
A Performance Vocabulary for Affine Loop Transformations
Looking at the Driver/Rider in Autonomous Vehicles to Predict Take-Over Readiness
A multi-level convolutional LSTM model for the segmentation of left ventricle myocardium in infarcted porcine cine MR images
Minimax Rates in Network Analysis: Graphon Estimation, Community Detection and Hypothesis Testing
Lattice bijections for string modules, snake graphs and the weak Bruhat order
Spatial Logics and Model Checking for Medical Imaging (Extended Version)
Interpretable deep learning for guided structure-property explorations in photovoltaics
Derivation of an aggregated band pseudo phasor for single phase pulse width modulation voltage waveforms
Communication-Optimal Distributed Dynamic Graph Clustering
Learning Optimal Personalized Treatment Rules Using Robust Regression Informed K-NN
ReSIFT: Reliability-Weighted SIFT-based Image Quality Assessment
Automatic Grammar Augmentation for Robust Voice Command Recognition
On the specification and verification of atomic swap smart contracts
Deep Neural Networks based Modrec: Some Results with Inter-Symbol Interference and Adversarial Examples
Operator-Theoretical Treatment of Ergodic Theorem and Its Application to Dynamic Models in Economics
Layered Belief Propagation for Low-complexity Large MIMO Detection Based on Statistical Beams
Learning from Past Bids to Participate Strategically in Day-Ahead Electricity Markets
Prophet Inequalities for Independent Random Variables from an Unknown Distribution
The case for shifting the Renyi Entropy
Some Moderate Deviations for Ewens-Pitman Sampling Model
Cooperation Enforcement and Collusion Resistance in Repeated Public Goods Games
Vectorized Character Counting for Faster Pattern Matching
Machine Learning for Combinatorial Optimization: a Methodological Tour d’Horizon
Gallai-Ramsey numbers for a class of graphs with five vertices
Forbidden rainbow subgraphs that force large monochromatic or multicolored connected subgraphs
It Does Not Follow. Response to ‘Yes They Can! …’
Empirical Effects of Dynamic Human-Body Blockage in 60 GHz Communications
Adaptive Full-Duplex Jamming Receiver for Secure D2D Links in Random Networks
Real-time Power System State Estimation and Forecasting via Deep Neural Networks
Boosting Search Performance Using Query Variations
Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs
Effect Handling for Composable Program Transformations in Edward2
Orthogonal Policy Gradient and Autonomous Driving Application
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
Cops and robbers on oriented graphs
Exploiting Sentence Embedding for Medical Question Answering
Intersecting Families of Perfect Matchings
Recurrent approach to effective material properties with application to anisotropic binarized random fields
Improving Skin Condition Classification with a Question Answering Model
The autoregression bootstrap for kernel estimates of smooth nonlinear functional time series
Leveraging Financial News for Stock Trend Prediction with Attention-Based Recurrent Neural Network
Monochromatic Schur triples in randomly perturbed dense sets of integers
Implementing a Portable Clinical NLP System with a Common Data Model – a Lisp Perspective
A q-analogue and a symmetric function analogue of a result by Carlitz, Scoville and Vaughan
Phase retrieval for Bragg coherent diffraction imaging at high X-ray energies
Characterizing Design Patterns of EHR-Driven Phenotype Extraction Algorithms
Electric Vehicle Valet
GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition
Intervention Aided Reinforcement Learning for Safe and Practical Policy Optimization in Navigation
Equivariant Perturbation in Gomory and Johnson’s Infinite Group Problem. VII. Inverse semigroup theory, closures, decomposition of perturbations
From Videos to URLs: A Multi-Browser Guide To Extract User’s Behavior with Optical Character Recognition
Face Verification and Forgery Detection for Ophthalmic Surgery Images
Distributed Obstacle and Multi-Robot Collision Avoidance in Uncertain Environments
Minimax Posterior Convergence Rates and Model Selection Consistency in High-dimensional DAG Models based on Sparse Cholesky Factors
Exotic Steiner Chains in Miquelian Möbius Planes
Combining Axiom Injection and Knowledge Base Completion for Efficient Natural Language Inference
Time-Varying Formation Control of a Collaborative Multi-Agent System Using Negative-Imaginary Systems Theory
Fano generalized Bott manifolds
Quantile Regression Modeling of Recurrent Event Risk
Predicting thermoelectric properties from crystal graphs and material descriptors – first application for functional materials
A Schur transform for spatial stochastic processes
Reward-estimation variance elimination in sequential decision processes
Graph Convolutional Neural Networks for Polymers Property Prediction
On Training Targets and Objective Functions for Deep-Learning-Based Audio-Visual Speech Enhancement
SGR: Self-Supervised Spectral Graph Representation Learning
Computing Quartet Distance is Equivalent to Counting 4-Cycles
Fundamental Limits of Caching in Heterogeneous Networks with Uncoded Prefetching
Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems
Network capacity enhancement with various link rewiring strategies
In-silico Feedback Control of a MIMO Synthetic Toggle Switch via Pulse-Width Modulation
On Graphs with equal Dominating and C-dominating energy
Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search
On Infinite Prefix Normal Words
Effect of correlations on routing and modeling of Time Varying Communication Networks
Image declipping with deep networks
Survey of Computational Approaches to Diachronic Conceptual Change
Decentralized Data Storages: Technologies of Construction
Guiding the One-to-one Mapping in CycleGAN via Optimal Transport
Sketch based Reduced Memory Hough Transform
On a new class of score functions to estimate tail probabilities of some stochastic processes with Adaptive Multilevel Splitting
Unique ergodicity for stochastic hyperbolic equations with additive space-time white noise
Local theorems for arithmetic compound renewal processes, when Cramer’s condition holds
End-to-End Learning for Answering Structured Queries Directly over Text
A Neurodynamical model of Saliency prediction in V1
Redundancy scheduling with scaled Bernoulli service requirements
Sum rules and large deviations for spectral matrix measures in the Jacobi ensemble
Effect of data reduction on sequence-to-sequence neural TTS
ShuffleDet: Real-Time Vehicle Detection Network in On-board Embedded UAV Imagery
Multivariate Spatiotemporal Hawkes Processes and Network Reconstruction
Impedance Network of Interconnected Power Electronics Systems: Impedance Operator and Stability Criterion
State Complexity Characterizations of Parameterized Degree-Bounded Graph Connectivity, Sub-Linear Space Computation, and the Linear Space Hypothesis
Hörmander’s theorem for semilinear SPDEs
Sparsely Observed Functional Time Series: Estimation and Prediction
Spatio-temporal Stacked LSTM for Temperature Prediction in Weather Forecasting
Deep Template Matching for Offline Handwritten Chinese Character Recognition
Physical Signal Classification Via Deep Neural Networks
Temporal viability regulation for control affine systems with applications to mobile vehicle coordination under time-varying motion constraints
State-dependent jump activity estimation for Markovian semimartingales
Cost of selfishness in the allocation of cities in the Multiple Travelling Salesmen Problem
On approximations of Value at Risk and Expected Shortfall involving kurtosis
Semi-perfect 1-Factorizations of the Hypercube
Leaf-induced subtrees of leaf-Fibonacci trees
Asynchronous Stochastic Composition Optimization with Variance Reduction
Offline Biases in Online Platforms: a Study of Diversity and Homophily in Airbnb
Pairwise Relational Networks using Local Appearance Features for Face Recognition
Neural Predictive Belief Representations
Chordal circulant graphs and induced matching number
LinkNet: Relational Embedding for Scene Graph
Characterising $k$-connected sets in infinite graphs
Effects of Beamforming and Antenna Configurations on Mobility in 5G NR
The Sliding Frank-Wolfe Algorithm and its Application to Super-Resolution Microscopy
Adversarial Examples from Cryptographic Pseudo-Random Generators
Learning to Bound the Multi-class Bayes Error
Latecomers Help to Meet: Deterministic Anonymous Gathering in the Plane
Introducing Multiobjective Complex Systems
On a minimum distance procedure for threshold selection in tail analysis
HCU400: An Annotated Dataset for Exploring Aural Phenomenology Through Causal Uncertainty
Energy Efficient Precoder in Multi-User MIMO Systems with Imperfect Channel State Information
Secretary Ranking with Minimal Inversions
Preliminary Studies on a Large Face Database
Adversarial Resilience Learning – Towards Systemic Vulnerability Analysis for Large and Complex Systems
A regularised Dean-Kawasaki model for weakly interacting particles
MIMO Beampattern and Waveform Design with Low Resolution DACs
Review of isolation enhancement with the help of Theory of characteristic modes
Psychophysical evaluation of individual low-level feature influences on visual attention
Oversampled Adaptive Sensing with Random Projections: Analysis and Algorithmic Approaches
The q-Hahn PushTASEP
A Note On Universal Point Sets for Planar Graphs
Fourier decay, Renewal theorem and Spectral gaps for random walks on split semisimple Lie groups
Exploring the Deep Feature Space of a Cell Classification Neural Network
Mathematical Analysis of Adversarial Attacks
Large-Scale Distributed Algorithms for Facility Location with Outliers
Development and Validation of a Deep Learning Algorithm for Improving Gleason Scoring of Prostate Cancer
Adjusting for Confounding in Unsupervised Latent Representations of Images
A Bayesian optimization approach to compute the Nash equilibria of potential games using bandit feedback
Anti-concentration in most directions
Tight Bayesian Ambiguity Sets for Robust MDPs
Kernel Smoothing of the Treatment Effect CDF
Reward learning from human preferences and demonstrations in Atari
On transfer learning using a MAC model variant
Learning to Predict the Cosmological Structure Formation
Predicting enterprise cyber incidents using social network analysis on the darkweb hacker forums

If you did not already know

Halide google
Halide is a computer programming language designed for writing digital image processing code that takes advantage of memory locality, vectorized computation and multi-core CPUs and GPUs. Halide is implemented as an internal domain-specific language (DSL) in C++. The main innovation Halide brings is the separation of the algorithm being implemented from its execution schedule, i.e. code specifying the loop nesting, parallelization, loop unrolling and vector instruction. These two are usually interleaved together and experimenting with changing the schedule requires the programmer to rewrite large portions of the algorithm with every change. With Halide, changing the schedule does not require any changes to the algorithm and this allows the programmer to experiment with scheduling and finding the most efficient one.
DNN Dataflow Choice Is Overrated

Learning Active Learning (LAL) google
In this paper, we suggest a novel data-driven approach to active learning: Learning Active Learning (LAL). The key idea behind LAL is to train a regressor that predicts the expected error reduction for a potential sample in a particular learning state. By treating the query selection procedure as a regression problem we are not restricted to dealing with existing AL heuristics; instead, we learn strategies based on experience from previous active learning experiments. We show that LAL can be learnt from a simple artificial 2D dataset and yields strategies that work well on real data from a wide range of domains. Moreover, if some domain-specific samples are available to bootstrap active learning, the LAL strategy can be tailored for a particular problem. …

Dimensional Clustering google
This paper introduces a new clustering technique, called {\em dimensional clustering}, which clusters each data point by its latent {\em pointwise dimension}, which is a measure of the dimensionality of the data set local to that point. Pointwise dimension is invariant under a broad class of transformations. As a result, dimensional clustering can be usefully applied to a wide range of datasets. Concretely, we present a statistical model which estimates the pointwise dimension of a dataset around the points in that dataset using the distance of each point from its $n^{\text{th}}$ nearest neighbor. We demonstrate the applicability of our technique to the analysis of dynamical systems, images, and complex human movements. …

Document worth reading: “Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches”

Reinforcement Learning (RL) is a learning paradigm concerned with learning to control a system so as to maximize an objective over the long term. This approach to learning has received immense interest in recent times and success manifests itself in the form of human-level performance on games like \textit{Go}. While RL is emerging as a practical component in real-life systems, most successes have been in Single Agent domains. This report will instead specifically focus on challenges that are unique to Multi-Agent Systems interacting in mixed cooperative and competitive environments. The report concludes with advances in the paradigm of training Multi-Agent Systems called \textit{Decentralized Actor, Centralized Critic}, based on an extension of MDPs called \textit{Decentralized Partially Observable MDP}s, which has seen a renewed interest lately. Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

Book Memo: “Reinforcement Learning for Optimal Feedback Control”

A Lyapunov-Based Approach
Reinforcement Learning for Optimal Feedback Control develops model-based and data-driven reinforcement learning methods for solving optimal control problems in nonlinear deterministic dynamical systems. In order to achieve learning under uncertainty, data-driven methods for identifying system models in real-time are also developed. The book illustrates the advantages gained from the use of a model and the use of previous experience in the form of recorded data through simulations and experiments. The book’s focus on deterministic systems allows for an in-depth Lyapunov-based analysis of the performance of the methods described during the learning phase and during execution. To yield an approximate optimal controller, the authors focus on theories and methods that fall under the umbrella of actor-critic methods for machine learning. They concentrate on establishing stability during the learning phase and the execution phase, and adaptive model-based and data-driven reinforcement learning, to assist readers in the learning process, which typically relies on instantaneous input-output measurements.

Distilled News

Prescriptive Maintenance for Manufacturing Industry

Today’s trend of Artificial Intelligence (AI) and the increased level of Automation in manufacturing allow firms to flexibly connect assets and improve productivity through data-driven insights that has not been possible before. As more automation is used in manufacturing, the speed of responses required in dealing with maintenance issues is going to get faster and automated decisions as to what’s the best option from an economic standpoint are getting more complex.

Best books on Artificial Intelligence and Deep Learning for October 2018

Machine Learning with C++ – Faster R-CNN with MXNet C++ Frontend

I published implementation of Faster R-CNN with MXNet C++ Frontend. You can use this implementation as comprehensive example of using MXNet C++ Frontend, it has custom data loader for MS Coco dataset, implements custom target proposal layer as a part of the project without modification MXNet library, contains code for errors checking (Missed in current C++ API), have Eigen and NDArray integration samples. Feel free to leave comments and proposes.

A deep dive into glmnet: standardize

I´m writing a series of posts on various function options of the glmnet function (from the package of the same name), hoping to give more detail and insight beyond R´s documentation.

Make Beautiful Tables with the Formattable Package

I love the formattable package, but I always struggle to remember its syntax. A quick Google search reveals that I’m not alone in this struggle. This post is intended as a reminder for myself of how the package works – and hopefully you’ll find it useful too!

What is RQDA and what are its features?

RDQA is a R package for Qualitative Data Analysis, a free (free as freedom) qualitative analysis software application (BSD license). It works on Windows, Linux/FreeBSD and Mac OSX platforms. RQDA is an easy to use tool to assist in the analysis of textual data. At the moment it only supports plain text formatted data. All the information is stored in a SQLite database via the R package of RSQLite. The GUI is based on RGtk2, via the aid of gWidgetsRGtk2. It includes a number of standard Computer-Aided Qualitative Data Analysis features. In addition it seamlessly integrates with R, which means that a) statistical analysis on the coding is possible, and b) functions for data manipulation and analysis can be easily extended by writing R functions. To some extent, RQDA and R make an integrated platform for both quantitative and qualitative data analysis.

Discourse Network Analysis: Undertaking Literature Reviews in R

Literature reviews are the cornerstone of science. Keeping abreast of developments within any given field of enquiry has become increasingly difficult given the enormous amounts of new research. Databases and search technology have made finding relevant literature easy but, keeping a coherent overview of the discourse within a field of enquiry is an ever more encompassing task. Scholars have proposed many approaches to analysing literature, which can be placed along a continuum from traditional narrative methods to systematic analytic syntheses of text using machine learning. Traditional reviews are biased because they rely entirely on the interpretation of the researcher. Analytical approaches follow a process that is more like scientific experimentation. These systematic methods are reproducible in the way literature is searched and collated but still rely on subjective interpretation. Machine learning provides new methods to analyse large swaths of text. Although these methods sound exciting, these methods are incapable of providing insight. Machine learning cannot interpret a text; it can only summarise and structure a corpus. Machine learning still requires human interpretation to make sense of the information. This article introduces a mixed-method technique for reviewing literature, combining qualitative and quantitative methods. I used this method to analyse literature published by the International Water Association as part of my dissertation into water utility marketing. You can read the code below, or download it from GitHub. Detailed infromation about the methodology is available through FigShare.

Is the Food Here Yet?

Delivery time prediction has long been a part of city logistics, but refining accuracy has recently become very important for services such as Deliveroo, Foodpanda and Uber Eats which deliver food on-demand. These services and similar ones must receive an order and have it delivered within ~30 minutes to appease their users. In these situations +/- 5 minutes can make a big difference so it’s very important for customer satisfaction that the initial prediction is highly accurate and that any delays are communicated effectively.

How does Scaling change Principle Components? – Part 1

Assume that we have a some data that looks like above, and red arrow shows the first principle component, while the blue arrow shows the second principle component. How would normalization as well as standardization change those two? Also can we perform some kind of normalization respect to the variance we have? (I will make more post as I go along. )

Algorithmic Complexity [101]

Computers are fast: they store and manipulate data using electronic signals that travel across their silicon internals at hundreds of thousands of miles per hour. For comparison, the fastest signals in the human nervous system travel at about 250mph, which is about 3 million times slower, and those speeds are only possible for unconscious signals – signal speeds observed for conscious thought and calculation are typically orders of magnitude slower still. Basically, we’re never going to be able to out calculate a computer.

Speed up your deep learning language model up to 1000% with the adaptive softmax, Part 1

How would you like to speed up your language modeling (LM) tasks by 1000%, with nearly no drop in accuracy? A recent paper from Grave et al. (2017), called ‘Efficient softmax approximation for GPUs’, shows how you can gain a massive speedup in one of the most time-consuming aspects of language-modeling, the computation-heavy softmax step, through their ‘adaptive softmax’. The giant speedup from using the adaptive softmax comes with only minimal costs in accuracy, so anyone who is doing language modeling should definitely consider using it. Here in Part 1 of this blog post, I’ll fully explain the adaptive softmax, then in Part 2 I’ll walk you step by step through a Pytorch implementation (with an accompanying Jupyter notebook), which uses Pytorch’s built-in AdaptiveLogSoftmaxWithLoss function.

Speed up your deep learning language model up to 1000% with the adaptive softmax, Part 2: Pytorch implementation

In Part 1 of this blog post, I explained how the adaptive softmax works, and how it can speed up your language model by up to 1000%. Here in Part 2, I’ll walk you step by step through a Pytorch implementation (with an accompanying Jupyter notebook), which uses Pytorch’s built-in AdaptiveLogSoftmaxWithLoss function. For preprocessing you will need fastai (see ), a deep learning library that runs on top of Pytorch that simplifies training neural networks. [For those who want to learn state-of-the-art deep learning techniques, I highly recommend Jeremy Howard’s course, which is available online for free: https://…/]. I decided to use the Wikitext-2 dataset, which is a relatively small dataset that contains ~2 million tokens and ~33 thousand words in the vocabulary. Once the data has been downloaded and properly formatted in csv files, fastai makes it easy to quickly tokenize, numericalize, and create a data-loader that will be used for training. I also downloaded GloVe pre-trained word vectors that are used for the models’ word embeddings. Finally, I created a modeler class that handles training.

Magister Dixit

“1. Think carefully about which projects you take on.
2. Use as much data as you can from as many places as possible.
3. Don’t just use internal customer data.
4. Have a clear sampling strategy.
5. Always use a holdout sample.
6. Spend time on ‘throwaway’ modelling.
7. Refresh your model regularly.
8. Make sure your insights are meaningful to other people.
9. Use your model in the real world.”
Rachel Clinton ( January 7, 2015 )

R Packages worth a look

Header-Only C++ Mathematical Optimization Library for ‘Armadillo’ (RcppEnsmallen)
Ensmallen’ is a templated C++ mathematical optimization library (by the ‘MLPACK’ team) that provides a simple set of abstractions for writing an object …

Network Tools for Memory Research (memnet)
Efficient implementations of network science tools to facilitate research into human (semantic) memory. In its current version, the package contains se …

Sequential Quadratic Programming for Fast Maximum-Likelihood Estimation of Mixture Proportions (mixsqp)
Provides optimization algorithms based on sequential quadratic programming (SQP) for maximum likelihood estimation of the mixture proportions in a fini …

Document worth reading: “Saliency Prediction in the Deep Learning Era: An Empirical Investigation”

Visual saliency models have enjoyed a big leap in performance in recent years, thanks to advances in deep learning and large scale annotated data. Despite enormous effort and huge breakthroughs, however, models still fall short in reaching human-level accuracy. In this work, I explore the landscape of the field emphasizing on new deep saliency models, benchmarks, and datasets. A large number of image and video saliency models are reviewed and compared over two image benchmarks and two large scale video datasets. Further, I identify factors that contribute to the gap between models and humans and discuss remaining issues that need to be addressed to build the next generation of more powerful saliency models. Some specific questions that are addressed include: in what ways current models fail, how to remedy them, what can be learned from cognitive studies of attention, how explicit saliency judgments relate to fixations, how to conduct fair model comparison, and what are the emerging applications of saliency models. Saliency Prediction in the Deep Learning Era: An Empirical Investigation