Advertisements

Magister Dixit

“Big Data is not about volume, size or velocity of data – neither of which are easily translated into financial results for most businesses. It is about the integration of external sources of information and unstructured data into a company’s IT infrastructure and business processes.” Gregory Yankelovich ( October 20, 2014 )

Advertisements

Finding out why

Paper: Unifying Causal Models with Trek Rules

In many scientific contexts, different investigators experiment with or observe different variables with data from a domain in which the distinct variable sets might well be related. This sort of fragmentation sometimes occurs in molecular biology, whether in studies of RNA expression or studies of protein interaction, and it is common in the social sciences. Models are built on the diverse data sets, but combining them can provide a more unified account of the causal processes in the domain. On the other hand, this problem is made challenging by the fact that a variable in one data set may influence variables in another although neither data set contains all of the variables involved. Several authors have proposed using conditional independence properties of fragmentary (marginal) data collections to form unified causal explanations when it is assumed that the data have a common causal explanation but cannot be merged to form a unified dataset. These methods typically return a large number of alternative causal models. The first part of the thesis shows that marginal datasets contain extra information that can be used to reduce the number of possible models, in some cases yielding a unique model.


Paper: It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution

This paper treats gender bias latent in word embeddings. Previous mitigation attempts rely on the operationalisation of gender bias as a projection over a linear subspace. An alternative approach is Counterfactual Data Augmentation (CDA), in which a corpus is duplicated and augmented to remove bias, e.g. by swapping all inherently-gendered words in the copy. We perform an empirical comparison of these approaches on the English Gigaword and Wikipedia, and find that whilst both successfully reduce direct bias and perform well in tasks which quantify embedding quality, CDA variants outperform projection-based methods at the task of drawing non-biased gender analogies by an average of 19% across both corpora. We propose two improvements to CDA: Counterfactual Data Substitution (CDS), a variant of CDA in which potentially biased text is randomly substituted to avoid duplication, and the Names Intervention, a novel name-pairing technique that vastly increases the number of words being treated. CDA/S with the Names Intervention is the only approach which is able to mitigate indirect gender bias: following debiasing, previously biased words are significantly less clustered according to gender (cluster purity is reduced by 49%), thus improving on the state-of-the-art for bias mitigation.


Paper: Counterfactual Depth from a Single RGB Image

We describe a method that predicts, from a single RGB image, a depth map that describes the scene when a masked object is removed – we call this ‘counterfactual depth’ that models hidden scene geometry together with the observations. Our method works for the same reason that scene completion works: the spatial structure of objects is simple. But we offer a much higher resolution representation of space than current scene completion methods, as we operate at pixel-level precision and do not rely on a voxel representation. Furthermore, we do not require RGBD inputs. Our method uses a standard encoder-decoder architecture, and with a decoder modified to accept an object mask. We describe a small evaluation dataset that we have collected, which allows inference about what factors affect reconstruction most strongly. Using this dataset, we show that our depth predictions for masked objects are better than other baselines.


Paper: A Turvey-Shapley Value Method for Distribution Network Cost Allocation

This paper proposes a novel cost-reflective and computationally efficient method for allocating distribution network costs to residential customers. First, the method estimates the growth in peak demand with a 50% probability of exceedance (50POE) and the associated network augmentation costs using a probabilistic long-run marginal cost computation based on the Turvey perturbation method. Second, it allocates these costs to customers on a cost-causal basis using the Shapley value solution concept. To overcome the intractability of the exact Shapley value computation for real-world applications, we implement a fast, scalable and efficient clustering technique based on customers’ peak demand contribution, which drastically reduces the Shapley value computation time. Using customer load traces from an Australian smart grid trial (Solar Home Electricity Data), we demonstrate the efficacy of our method by comparing it with established energy- and peak demand-based cost allocation approaches.


Paper: Multilevel latent class (MLC) modelling of healthcare provider causal effects on patient outcomes: Evaluation via simulation

Where performance comparison of healthcare providers is of interest, characteristics of both patients and the health condition of interest must be balanced across providers for a fair comparison. This is unlikely to be feasible within observational data, as patient population characteristics may vary geographically and patient care may vary by characteristics of the health condition. We simulated data for patients and providers, based on a previously utilized real-world dataset, and separately considered both binary and continuous covariate-effects at the upper level. Multilevel latent class (MLC) modelling is proposed to partition a prediction focus at the patient level (accommodating casemix) and a causal inference focus at the provider level. The MLC model recovered a range of simulated Trust-level effects. Median recovered values were almost identical to simulated values for the binary Trust-level covariate, and we observed successful recovery of the continuous Trust-level covariate with at least 3 latent Trust classes. Credible intervals widen as the error variance increases. The MLC approach successfully partitioned modelling for prediction and for causal inference, addressing the potential conflict between these two distinct analytical strategies. This improves upon strategies which only adjust for differential selection. Patient-level variation and measurement uncertainty are accommodated within the latent classes.


Paper: Optimal Causal Rate-Constrained Sampling of the Wiener Process

We consider the following communication scenario. An encoder causally observes the Wiener process and decides when and what to transmit about it. A decoder makes real-time estimation of the process using causally received codewords. We determine the causal encoding and decoding policies that jointly minimize the mean-square estimation error, under the long-term communication rate constraint of $R$ bits per second. We show that an optimal encoding policy can be implemented as a causal sampling policy followed by a causal compressing policy. We prove that the optimal encoding policy samples the Wiener process once the innovation passes either $\sqrt{\frac{1}{R}}$ or $-\sqrt{\frac{1}{R}}$, and compresses the sign of the innovation (SOI) using a 1-bit codeword. The SOI coding scheme achieves the operational distortion-rate function, which is equal to $D^{\mathrm{op}}(R)=\frac{1}{6R}$. Surprisingly, this is significantly better than the distortion-rate tradeoff achieved in the limit of infinite delay by the best non-causal code. This is because the SOI coding scheme leverages the free timing information supplied by the zero-delay channel between the encoder and the decoder. The key to unlock that gain is the event-triggered nature of the SOI sampling policy. In contrast, the distortion-rate tradeoffs achieved with deterministic sampling policies are much worse: we prove that the causal informational distortion-rate function in that scenario is as high as $D_{\mathrm{DET}}(R) = \frac{5}{6R}$. It is achieved by the uniform sampling policy with the sampling interval $\frac{1}{R}$. In either case, the optimal strategy is to sample the process as fast as possible and to transmit 1-bit codewords to the decoder without delay.


Paper: Learning Physics from Data: a Thermodynamic Interpretation

Experimental data bases are typically very large and high dimensional. To learn from them requires to recognize important features (a pattern), often present at scales different to that of the recorded data. Following the experience collected in statistical mechanics and thermodynamics, the process of recognizing the pattern (the learning process) can be seen as a dissipative time evolution driven by entropy. This is the way thermodynamics enters machine learning. Learning to handle free surface liquids serves as an illustration.


Python Library: causal-tree-learn

Python implementation of causal trees with validation

Document worth reading: “The History of Digital Spam”

Spam!: that’s what Lorrie Faith Cranor and Brian LaMacchia exclaimed in the title of a popular call-to-action article that appeared twenty years ago on Communications of the ACM. And yet, despite the tremendous efforts of the research community over the last two decades to mitigate this problem, the sense of urgency remains unchanged, as emerging technologies have brought new dangerous forms of digital spam under the spotlight. Furthermore, when spam is carried out with the intent to deceive or influence at scale, it can alter the very fabric of society and our behavior. In this article, I will briefly review the history of digital spam: starting from its quintessential incarnation, spam emails, to modern-days forms of spam affecting the Web and social media, the survey will close by depicting future risks associated with spam and abuse of new technologies, including Artificial Intelligence (e.g., Digital Humans). After providing a taxonomy of spam, and its most popular applications emerged throughout the last two decades, I will review technological and regulatory approaches proposed in the literature, and suggest some possible solutions to tackle this ubiquitous digital epidemic moving forward. The History of Digital Spam

Distilled News

Machine learning requires a fundamentally different deployment approach

The biggest issue facing machine learning (ML) isn’t whether we will discover better algorithms (we probably will), whether we’ll create a general AI (we probably won’t), or whether we’ll be able to deal with a flood of smart fakes (that’s a long-term, escalating battle). The biggest issue is how we’ll put ML systems into production. Getting an experiment to work on a laptop, even an experiment that runs ‘in the cloud,’ is one thing. Putting that experiment into production is another matter. Production has to deal with reality, and reality doesn’t live on our laptops. Most of our understanding of ‘production’ has come from the web world and learning how to run ecommerce and social media applications at scale. The latest advances in web operations – containerization and container orchestration – make it easier to package applications that can be deployed reliably and maintained consistently. It’s still not easy, but the tools are there. That’s a good start. ML applications differ from traditional software in two important ways. First, they’re not deterministic. Second, the application’s behavior isn’t determined by the code, but by the data used for training. These two differences are closely related.


Visualizing Association Rules for Text Mining

Association is a powerful data analysis technique that appears frequently in data mining literature. An association rule is an implication of the form X?Y where X is a set of antecedent items and Y is the consequent item. An example association rule of a supermarket database is 80% of the people who buy diapers and baby powder also buy baby oil. The analysis of association rules is used in a variety of ways, including merchandise stocking, insurance fraud investigation, and climate prediction. For years scientists and engineers have developed many visualization techniques to support the analyses of association rules. Many of the visualizations, however, have come up short in dealing with large amounts of rules or rules with multiple antecedents. This limitation results in serious challenges for analysts who need to understand the association information of large databases.


Data Engineering – How to Set Dependencies Between Data Pipelines in Apache Airflow

Use Sensors to Set Effective Dependencies between Data Pipelines to Build a Solid Foundation for the Data Team. Greetings my fellow readers, it’s me again. That guy who writes about his life experiences and a little tad bit about data. Just a little bit. Article after article, I always start with how important is data in a strong organisation. How large companies are using data to impact their business, impact our society and in turn, making them profits. Data can be used to save lives, just take a look at this article on how data is used to predict cancer cells. So please, let’s skip the small talk about how important is data and how it’s always right, just like your wife.


Torchvision & Transfer Learning

This article will probably be of most interest to individuals just starting out in deep learning or people who are relatively new to leveraging PyTorch. It is a summary of my experience with recent attempts to modify the torchvision package’s CNNs that have been pre-trained on data from Imagenet. The aim of making a multiple architecture classifier a little easier to program.


Productionizing NLP Models

Lately, I have been consolidating my experiences of working in different ML projects. I will tell this story from the lens of my recent project. Our task was to classify certain phrases into categories – A multiclass single label problem.


Unsupervised Learning to Market Behavior Forecasting

This article describes the technique that forecasts the market behavior. The second part demonstrates the application of the approach in a trading strategy. The market data is a sequence called time series. Usually, researchers use only price data (or asset returns) to create a model that forecasts the next price value, movement direction, or other output. I think the better way is to use more data for that. The idea is try to combine versatile market conditions (volatility, volumes, price changes, and etc.)


The proper way to use Machine Learning metrics

It’s hard to select the right measure of accuracy for a given problem. Having a standardised approach is what every data scientist should do. Plan of this article
• Motivation
• First consideration
• Must-know measures of accuracy for ML models
• An approach to use these measures to select the right ML model for your problem
Note I focus on binary classification problems in this article, but the approach would be similar with multi classification and regression problems.


Why do we use word embeddings in NLP?

Natural language processing (NLP) is a sub-field of machine learning (ML) that deals with natural language, often in the form of text, which is itself composed of smaller units like words and characters. Dealing with text data is problematic, since our computers, scripts and machine learning models can’t read and understand text in any human sense. When I read the word ‘cat’, many different associations are invoked – it’s a small furry animal that’s cute, eats fish, my landlord doesn’t allow, etc. But these linguistic associations are a result of quite complex neurological computations honed over millions of years of evolution, whereas our ML models must start from scratch with no pre-built understanding of word meaning.


Risks and Caution on applying PCA for Supervised Learning Problems

The curse of dimensionality is a very crucial problem while dealing with real-life datasets which are generally higher dimensional data .As the dimensionality of the feature space increases, the number of configurations can grow exponentially and thus the number of configurations covered by an observation decreases. In such a scenario, Principal Component Analysis plays a major part in efficiently reducing the dimensionality of the data yet retaining as much as possible of the variation present in the data set. Let us give a very brief introduction to Principal component analysis before delving into the actual problem.


Recursive Sketches for Modular Deep Learning

Much of classical machine learning (ML) focuses on utilizing available data to make more accurate predictions. More recently, researchers have considered other important objectives, such as how to design algorithms to be small, efficient, and robust. With these goals in mind, a natural research objective is the design of a system on top of neural networks that efficiently stores information encoded within – in other words, a mechanism to compute a succinct summary (a ‘sketch’) of how a complex deep network processes its inputs. Sketching is a rich field of study that dates back to the foundational work of Alon, Matias, and Szegedy, which can enable neural networks to efficiently summarize information about their inputs. For example: Imagine stepping into a room and briefly viewing the objects within. Modern machine learning is excellent at answering immediate questions, known at training time, about this scene: ‘Is there a cat? How big is said cat?’ Now, suppose we view this room every day over the course of a year. People can reminisce about the times they saw the room: ‘How often did the room contain a cat? Was it usually morning or night when we saw the room?’. However, can one design systems that are also capable of efficiently answering such memory-based questions even if they are unknown at training time? In ‘Recursive Sketches for Modular Deep Learning’, recently presented at ICML 2019, we explore how to succinctly summarize how a machine learning model understands its input. We do this by augmenting an existing (already trained) machine learning model with ‘sketches’ of its computation, using them to efficiently answer memory-based questions – for example, image-to-image-similarity and summary statistics – despite the fact that they take up much less memory than storing the entire original computation.


Scikit-Learn vs mlr for Machine Learning

Scikit-Learn is known for its easily understandable API for Python users, and MLR became an alternative to the popular Caret package with a larger suite of available algorithms and an easy way of tuning hyperparameters. These two packages are somewhat in competition due to the debate where many people involved in analytics turn to Python for machine learning and R for statistical analysis. One of the reasons for a preference in using Python could be that current R packages for machine learning are provided via other packages that contain the algorithm. The packages are called through MLR but still require extra installation. Even external feature selection libraries are needed and they will have other external dependencies that need to be satisfied as well. Scikit-Learn is dubbed as a unified API to a number of machine learning algorithms that do not require the user to call anymore libraries. This by no means discredits R. R is still a major component in the data science world regardless of what an online poll would say. Anyone with a background in Statistics and or Mathematics will know why you should use R (regardless of whether they use it themselves they recognize the appeal). Now we will take a look at how a user would go through a typical machine learning workflow. We will proceed with Logistic Regression in Scikit-Learn, and Decision Tree in MLR.


A Primer to Recommendation Engines

Recommendation engines are everywhere now. Almost any app you use incorporates some sort of recommendation system to either push new content or drive sales. Recommendation engines are Netflix telling you what you should watch next, the ads on Facebook pushing products that you just happened to look at once, or even Slack suggesting which organization channels you should join. The advent of big data and machine learning has made recommendation engines one of the most directly applicable aspects of Data Science.


A Robustly Optimized BERT Pretraining Approach

BERT (Devlin et al., 2018) is a method of pre-training language representations, meaning that we train a general-purpose ‘language understanding’ model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised, deeply bidirectional system for pre-training NLP.

If you did not already know

I-Optimality google
The generalized linear model plays an important role in statistical analysis and the related design issues are undoubtedly challenging. The state-of-the-art works mostly apply to design criteria on the estimates of regression coefficients. It is of importance to study optimal designs for generalized linear models, especially on the prediction aspects. In this work, we propose a prediction-oriented design criterion, I-optimality, and develop an efficient sequential algorithm of constructing I-optimal designs for generalized linear models. Through establishing the General Equivalence Theorem of the I-optimality for generalized linear models, we obtain an insightful understanding for the proposed algorithm on how to sequentially choose the support points and update the weights of support points of the design. The proposed algorithm is computationally efficient with guaranteed convergence property. Numerical examples are conducted to evaluate the feasibility and computational efficiency of the proposed algorithm. …

Regularized Determinantal Point Process (R-DPP) google
Given a fixed $n\times d$ matrix $\mathbf{X}$, where $n\gg d$, we study the complexity of sampling from a distribution over all subsets of rows where the probability of a subset is proportional to the squared volume of the parallelopiped spanned by the rows (a.k.a. a determinantal point process). In this task, it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). To that end, we propose a new determinantal point process algorithm which has the following two properties, both of which are novel: (1) a preprocessing step which runs in time $O(\text{number-of-non-zeros}(\mathbf{X})\cdot\log n)+\text{poly}(d)$, and (2) a sampling step which runs in $\text{poly}(d)$ time, independent of the number of rows $n$. We achieve this by introducing a new regularized determinantal point process (R-DPP), which serves as an intermediate distribution in the sampling procedure by reducing the number of rows from $n$ to $\text{poly}(d)$. Crucially, this intermediate distribution does not distort the probabilities of the target sample. Our key novelty in defining the R-DPP is the use of a Poisson random variable for controlling the probabilities of different subset sizes, leading to new determinantal formulas such as the normalization constant for this distribution. Our algorithm has applications in many diverse areas where determinantal point processes have been used, such as machine learning, stochastic optimization, data summarization and low-rank matrix reconstruction. …

Feature-Label Memory Network google
Deep learning typically requires training a very capable architecture using large datasets. However, many important learning problems demand an ability to draw valid inferences from small size datasets, and such problems pose a particular challenge for deep learning. In this regard, various researches on ‘meta-learning’ are being actively conducted. Recent work has suggested a Memory Augmented Neural Network (MANN) for meta-learning. MANN is an implementation of a Neural Turing Machine (NTM) with the ability to rapidly assimilate new data in its memory, and use this data to make accurate predictions. In models such as MANN, the input data samples and their appropriate labels from previous step are bound together in the same memory locations. This often leads to memory interference when performing a task as these models have to retrieve a feature of an input from a certain memory location and read only the label information bound to that location. In this paper, we tried to address this issue by presenting a more robust MANN. We revisited the idea of meta-learning and proposed a new memory augmented neural network by explicitly splitting the external memory into feature and label memories. The feature memory is used to store the features of input data samples and the label memory stores their labels. Hence, when predicting the label of a given input, our model uses its feature memory unit as a reference to extract the stored feature of the input, and based on that feature, it retrieves the label information of the input from the label memory unit. In order for the network to function in this framework, a new memory-writingmodule to encode label information into the label memory in accordance with the meta-learning task structure is designed. Here, we demonstrate that our model outperforms MANN by a large margin in supervised one-shot classification tasks using Omniglot and MNIST datasets. …

Decision Tree Based Missing Value Imputation Technique (DMI) google
Decision tree based Missing value Imputation technique’ (DMI) makes use of an EM algorithm and a decision tree (DT) algorithm. …

Whats new on arXiv – Complete List

Do Cross Modal Systems Leverage Semantic Relationships?
On perfectness in Gaussian graphical models
Meta Relational Learning for Few-Shot Link Prediction in Knowledge Graphs
Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis
Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set
Deep Morphological Neural Networks
What can the brain teach us about building artificial intelligence?
Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment
Answers Unite! Unsupervised Metrics for Reinforced Summarization Models
Metric Learning from Imbalanced Data
Empirical Analysis of Knowledge Distillation Technique for Optimization of Quantized Deep Neural Networks
Subset Multivariate Collective And Point Anomaly Detection
Deep Convolutional Networks in System Identification
Augmented Memory Networks for Streaming-Based Active One-Shot Learning
Mogrifier LSTM
Matching Component Analysis for Transfer Learning
Quasi-Newton Optimization Methods For Deep Learning Applications
Differentially Private SQL with Bounded User Contribution
Modelling transport provision in a polycentric mega city region
Artificial Neural Networks and Adaptive Neuro-fuzzy Models for Prediction of Remaining Useful Life
ApproxNet: Content and Contention Aware Video Analytics System for the Edge
Towards a reliable approach on scaling in data acquisition
Reduced-order modeling for nonlinear Bayesian statistical inverse problems
Evaluating Conformance Measures in Process Mining using Conformance Propositions (Extended version)
High-Fidelity State-of-Charge Estimation of Li-Ion Batteries Using Machine Learning
Online Dissolved Gas Analysis (DGA) Monitoring System
On Distribution Patterns of Power Flow Solutions
Data-driven simulation for general purpose multibody dynamics using deep neural networks
High-order partitioned spectral deferred correction solvers for multiphysics problems
Riemannian batch normalization for SPD neural networks
A Communication-Efficient Algorithm for Exponentially Fast Non-Bayesian Learning in Networks
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control
Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense
Generalized Neyman-Pearson lemma for convex expectations on $L^{\infty}(μ)$-spaces
Parameter Estimation with the Ordered $\ell_{2}$ Regularization via an Alternating Direction Method of Multipliers
Fundamental Tradeoffs in Uplink Grant-Free Multiple Access with Protected CSI
Accurate Esophageal Gross Tumor Volume Segmentation in PET/CT using Two-Stream Chained 3D Deep Network Fusion
Likelihood-Free Overcomplete ICA and Applications in Causal Discovery
Deep Esophageal Clinical Target Volume Delineation using Encoded 3D Spatial Context of Tumors, Lymph Nodes, and Organs At Risk
Referring Expression Generation Using Entity Profiles
Spectral Norm and Nuclear Norm of a Third Order Tensor
What Happens on the Edge, Stays on the Edge: Toward Compressive Deep Learning
Network Transfer Learning via Adversarial Domain Adaptation with Graph Convolution
Snowball: Iterative Model Evolution and Confident Sample Discovery for Semi-Supervised Learning on Very Small Labeled Datasets
Towards Automatic Detection of Misinformation in Online Medical Videos
Q-DATA: Enhanced Traffic Flow Monitoring in Software-Defined Networks applying Q-learning
Using Weaker Consistency Models with Monitoring and Recovery for Improving Performance of Key-Value Stores
Aerial multi-object tracking by detection using deep association networks
Counting acyclic and strong digraphs by descents
Engineering Boolean Matrix Multiplication for Multiple-Accelerator Shared-Memory Architectures
Rigidity and non-rigidity for uniform perturbed lattice
Simpler and Faster Learning of Adaptive Policies for Simultaneous Translation
Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons
Classes of graphs with low complexity: the case of classes with bounded linear rankwidth
A Non-commutative Bilinear Model for Answering Path Queries in Knowledge Graphs
AMR Normalization for Fairer Evaluation
Simulated Annealing In $\mathbf{R}^d$ With Slowly Growing Potentials
Reservoir Computing based on Quenched Chaos
Learning sparse representations in reinforcement learning
Empirical Hypothesis Space Reduction
Rate of convergence of uniform transport processes to Brownian sheet
Stability phenomena for Martin boundaries of relatively hyperbolic groups
High-order gas-kinetic scheme with three-dimensional WENO reconstruction for the Euler and Navier-Stokes solutions
Gerrymandering: A Briber’s Perspective
Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity
Functional Asplund’s metrics for pattern matching robust to variable lighting conditions
Topological Coding and Topological Matrices Toward Network Overall Security
HinDom: A Robust Malicious Domain Detection System based on Heterogeneous Information Network with Transductive Classification
Bidirectional One-Shot Unsupervised Domain Mapping
Shortest Paths in a Hybrid Network Model
A Cost-Scaling Algorithm for Minimum-Cost Node-Capacitated Multiflow Problem
A Compositional Model of Multi-faceted Trust for Personalized Item Recommendation
Online Optimization of Wireless Powered Mobile-Edge Computing for Heterogeneous Industrial Internet of Things
SQuAP-Ont: an Ontology of Software Quality Relational Factors from Financial Systems
Multi-DoF Time Domain Passivity Approach based Drift Compensation for Telemanipulation
Multifractal Description of Streamflow and Suspended Sediment Concentration Data from Indian River Basins
Latent Gaussian process with composite likelihoods for data-driven disease stratification
SSAP: Single-Shot Instance Segmentation With Affinity Pyramid
Exponential and Laplace approximation for occupation statistics of branching random walk
Scaling agile on large enterprise level — systematic bundling and application of state of the art approaches for lasting agile transitions
On the k-synchronizability for mailbox systems
The polarization process of ferroelectric materials analyzed in the framework of variational inequalities
Learning Sensor Placement from Demonstration for UAV networks
Competing risks joint models using R-INLA
Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?
Defeating Opaque Predicates Statically through Machine Learning and Binary Analysis
ParaQG: A System for Generating Questions and Answers from Paragraphs
PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud
Reachable states and holomorphic function spaces for the 1-D heat equation
Heterogeneous Proxytypes Extended: Integrating Theory-like Representations and Mechanisms with Prototypes and Exemplars
LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-Based Games
3D landmark detection for augmented reality based otologic procedures
Regression-based sparse polynomial chaos for uncertainty quantification of subsurface flow models
Control issues and linear projection constraints on the control and on the controlled trajectory
Zeta functions of graphs, their symmetries and extended Catalan numbers
Minimising the Levelised Cost of Electricity for Bifacial Solar Panel Arrays using Bayesian Optimisation
Stochastic perturbations and fisheries management
Complexity of controlled bad sequences over finite powersets of $\mathbb{N}^k$
Hierarchical Model Reduction Techniques for Flow Modeling in a Parametrized Setting
Distance transform regression for spatially-aware deep semantic segmentation
Testing nonparametric shape restrictions
Parameterized Intractability of Even Set and Shortest Vector Problem
Modelling the Behavior Classification of Social News Aggregations Users
Social Networks as a Tool for a Higher Education Institution Image Creation
Predicting Software Tests Traces
Under the Conditions of Non-Agenda Ownership: Social Media Users in the 2019 Ukrainian Presidential Elections Campaign
Deep Learning-Aided Tabu Search Detection for Large MIMO Systems
A Specialized Evolutionary Strategy Using Mean Absolute Error Random Sampling to Design Recurrent Neural Networks
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
SAO WMT19 Test Suite: Machine Translation of Audit Reports
A note on the optimal rubbling in ladders and prisms
Simulation and computational analysis of multiscale graph agent-based tumor model
A scalable algorithm for identifying multiple sensor faults using disentangled RNNs
ScisummNet: A Large Annotated Dataset and Content-Impact Models for Scientific Paper Summarization with Citation Networks
Different Absorption from the Same Sharing: Sifted Multi-task Learning for Fake News Detection
On a Conjecture of Lovász on Circle-Representations of Simple 4-Regular Planar Graphs
Chase-escape with death on trees
Transforming Gaussian correlations. Applications to generating long-range power-law correlated time series with arbitrary distribution
Joint Radar-Communications Strategies for Autonomous Vehicles
Outage Analysis of Cooperative NOMA Using Maximum Ratio Combining at Intersections
Tensor decompositions on simplicial complexes with invariance
Minimax Isometry Method
Büchi automata for distributed temporal logic
Proof-Based Synthesis of Sorting Algorithms Using Multisets in Theorema
An Efficient and Layout-Independent Automatic License Plate Recognition System Based on the YOLO detector
Optimal uniform continuity bound for conditional entropy of classical–quantum states
Rate-Memory Trade-off for Multi-access Coded Caching with Uncoded Placement
Value Iteration Algorithm for Mean-field Games
Extending the Scope of Robust Quadratic Optimization
Complexity of Computing the Shapley Value in Games with Externalities
Spiking Neural Networks for Inference and Learning: A Memristor-based Design Perspective
Affect Enriched Word Embeddings for News Information Retrieval
Sequential Convex Restriction and its Applications in Robust Optimization
Inference in Differences-in-Differences: How Much Should We Trust in Independent Clusters?
GPU-based parallelism for ASP-solving
Fractals2019: Combinatorial Optimisation with Dynamic Constraint Annealing
Extracting Aspects Hierarchies using Rhetorical Structure Theory
Epistemological Issues in Educational Data Mining
ICDM 2019 Knowledge Graph Contest: Team UWA
ALIME: Autoencoder Based Approach for Local Interpretability
Learning Distributions Generated by One-Layer ReLU Networks
Linear robust adaptive model predictive control: Computational complexity and conservatism
PISEP^2: Pseudo Image Sequence Evolution based 3D Pose Prediction
Status in flux: Unequal alliances can create power vacuums
New insights for setting up contractual options for demand side flexibility
Using Mw dependence of surface dynamics of glassy polymers to probe the length scale of free surface mobility
Multifidelity Computer Model Emulation with High-Dimensional Output
‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking
Are Adversarial Robustness and Common Perturbation Robustness Independant Attributes ?
Mapping Spiking Neural Networks to Neuromorphic Hardware
Deep kernel learning for integral measurements
Semiparametric Inference for Non-monotone Missing-Not-at-Random Data: the No Self-Censoring Model
An equilibrated a posteriori error estimator for arbitrary-order Nédélec elements for magnetostatic problems
Dynamic Boundary Guarding Against Radially Incoming Targets
Rethinking the Number of Channels for the Convolutional Neural Network
Empirical Study of Diachronic Word Embeddings for Scarce Data
Optimal Wireless Resource Allocation with Random Edge Graph Neural Networks
Deep learning networks for selection of persistent scatterer pixels in multi-temporal SAR interferometric processing
Generalized Integrated Gradients: A practical method for explaining diverse ensembles
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning
Projectively self-concordant barriers on convex sets
Does the ratio of Laplace transforms of powers of a function identify the function?
Agent-based model for tumour-analysis using Python+Mesa
Optimal translational-rotational invariant dictionaries for images
Let’s agree to disagree: learning highly debatable multirater labelling
About Fibonacci trees III: multiple Fibonacci trees
A Microscopic Theory of Intrinsic Timescales in Spiking Neural Networks
ICSrange: A Simulation-based Cyber Range Platform for Industrial Control Systems
Cultural diversity and the measurement of functional impairment: A cross-cultural validation of the Amsterdam IADL Questionnaire
On Orthogonal Vector Edge Coloring
Mape_Maker: A Scenario Creator
Ramsey numbers of path-matchings, covering designs and 1-cores
Moderate deviations for the range of a transient random walk. II
The spectral properties of Vandermonde matrices with clustered nodes
Simultaneous Estimation of Number of Clusters and Feature Sparsity in Clustering High-Dimensional Data
Efron-Stein PAC-Bayesian Inequalities
Gaps of Summands of the Zeckendorf Lattice
The Fibonacci Quilt Game
Two-Way Coding and Attack Decoupling in Control Systems Under Injection Attacks
Regularized Linear Inversion with Randomized Singular Value Decomposition
Mixture Content Selection for Diverse Sequence Generation
Tensor Analysis with n-Mode Generalized Difference Subspace
Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection
Dispersion of Mobile Robots in the Global Communication Model
From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An Overview of the Aristo Project
A Note on Data-Driven Control for SISO Feedback Linearizable Systems Without Persistency of Excitation
Beyond Photo Realism for Domain Adaptation from Synthetic Data
A Constructive Approach for Data-Driven Randomized Learning of Feedforward Neural Networks
Self-Attentive Adversarial Stain Normalization
A greedoid and a matroid inspired by Bhargava’s $p$-orderings
The ML-EM algorithm in continuum: sparse measure solutions
ACES — Automatic Configuration of Energy Harvesting Sensors with Reinforcement Learning
Level-set percolation of the Gaussian free field on regular graphs II: Finite expanders
Level-set percolation of the Gaussian free field on regular graphs I: Regular trees
Mining for Dark Matter Substructure: Inferring subhalo population properties from strong lenses with machine learning
On Arithmetical Structures on Complete Graphs
Deep Transfer Learning for Star Cluster Classification: I. Application to the PHANGS-HST Survey
An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction
Online Regularization by Denoising with Applications to Phase Retrieval

Whats new on arXiv

Do Cross Modal Systems Leverage Semantic Relationships?

Current cross-modal retrieval systems are evaluated using R@K measure which does not leverage semantic relationships rather strictly follows the manually marked image text query pairs. Therefore, current systems do not generalize well for the unseen data in the wild. To handle this, we propose a new measure, SemanticMap, to evaluate the performance of cross-modal systems. Our proposed measure evaluates the semantic similarity between the image and text representations in the latent embedding space. We also propose a novel cross-modal retrieval system using a single stream network for bidirectional retrieval. The proposed system is based on a deep neural network trained using extended center loss, minimizing the distance of image and text descriptions in the latent space from the class centers. In our system, the text descriptions are also encoded as images which enabled us to use a single stream network for both text and images. To the best of our knowledge, our work is the first of its kind in terms of employing a single stream network for cross-modal retrieval systems. The proposed system is evaluated on two publicly available datasets including MSCOCO and Flickr30K and has shown comparable results to the current state-of-the-art methods.


On perfectness in Gaussian graphical models

Knowing when a graphical model is perfect to a distribution is essential in order to relate separation in the graph to conditional independence in the distribution, and this is particularly important when performing inference from data. When the model is perfect, there is a one-to-one correspondence between conditional independence statements in the distribution and separation statements in the graph. Previous work has shown that almost all models based on linear directed acyclic graphs as well as Gaussian chain graphs are perfect, the latter of which subsumes Gaussian graphical models (i.e., the undirected Gaussian models) as a special case. However, the complexity of chain graph models leads to a proof of this result which is indirect and mired by the complications of parameterizing this general class. In this paper, we directly approach the problem of perfectness for the Gaussian graphical models, and provide a new proof, via a more transparent parametrization, that almost all such models are perfect. Our approach is based on, and substantially extends, a construction of Ln\v{e}ni\v{c}ka and Mat\’u\v{s} showing the existence of a perfect Gaussian distribution for any graph.


Meta Relational Learning for Few-Shot Link Prediction in Knowledge Graphs

Link prediction is an important way to complete knowledge graphs (KGs), while embedding-based methods, effective for link prediction in KGs, perform poorly on relations that only have a few associative triples. In this work, we propose a Meta Relational Learning (MetaR) framework to do the common but challenging few-shot link prediction in KGs, namely predicting new triples about a relation by only observing a few associative triples. We solve few-shot link prediction by focusing on transferring relation-specific meta information to make model learn the most important knowledge and learn faster, corresponding to relation meta and gradient meta respectively in MetaR. Empirically, our model achieves state-of-the-art results on few-shot link prediction KG benchmarks.


Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis

When a robot acquires new information, ideally it would immediately be capable of using that information to understand its environment. While deep neural networks are now widely used by robots for inferring semantic information, conventional neural networks suffer from catastrophic forgetting when they are incrementally updated, with new knowledge overwriting established representations. While a variety of approaches have been developed that attempt to mitigate catastrophic forgetting in the incremental batch learning scenario, in which an agent learns a large collection of labeled samples at once, streaming learning has been much less studied in the robotics and deep learning communities. In streaming learning, an agent learns instances one-by-one and can be tested at any time. Here, we revisit streaming linear discriminant analysis, which has been widely used in the data mining research community. By combining streaming linear discriminant analysis with deep learning, we are able to outperform both incremental batch learning and streaming learning algorithms on both ImageNet-1K and CORe50.


Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set

Development sets are impractical to obtain for real low-resource languages, since using all available data for training is often more effective. However, development sets are widely used in research papers that purport to deal with low-resource natural language processing (NLP). Here, we aim to answer the following questions: Does using a development set for early stopping in the low-resource setting influence results as compared to a more realistic alternative, where the number of training epochs is tuned on development languages? And does it lead to overestimation or underestimation of performance? We repeat multiple experiments from recent work on neural models for low-resource NLP and compare results for models obtained by training with and without development sets. On average over languages, absolute accuracy differs by up to 1.4%. However, for some languages and tasks, differences are as big as 18.0% accuracy. Our results highlight the importance of realistic experimental setups in the publication of low-resource NLP research results.


Deep Morphological Neural Networks

Mathematical morphology is a theory and technique to collect features like geometric and topological structures in digital images. Given a target image, determining suitable morphological operations and structuring elements is a cumbersome and time-consuming task. In this paper, a morphological neural network is proposed to address this problem. Serving as a nonlinear feature extracting layer in deep learning frameworks, the efficiency of the proposed morphological layer is confirmed analytically and empirically. With a known target, a single-filter morphological layer learns the structuring element correctly, and an adaptive layer can automatically select appropriate morphological operations. For practical applications, the proposed morphological neural networks are tested on several classification datasets related to shape or geometric image features, and the experimental results have confirmed the high computational efficiency and high accuracy.


What can the brain teach us about building artificial intelligence?

This paper is the preprint of an invited commentary on Lake et al’s Behavioral and Brain Sciences article titled ‘Building machines that learn and think like people’. Lake et al’s paper offers a timely critique on the recent accomplishments in artificial intelligence from the vantage point of human intelligence, and provides insightful suggestions about research directions for building more human-like intelligence. Since we agree with most of the points raised in that paper, we will offer a few points that are complementary.


Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment

A recent trend observed in traditionally challenging fields such as computer vision and natural language processing has been the significant performance gains shown by deep learning (DL). In many different research fields, DL models have been evolving rapidly and become ubiquitous. Despite researchers’ excitement, unfortunately, most software developers are not DL experts and oftentimes have a difficult time following the booming DL research outputs. As a result, it usually takes a significant amount of time for the latest superior DL models to prevail in industry. This issue is further exacerbated by the common use of sundry incompatible DL programming frameworks, such as Tensorflow, PyTorch, Theano, etc. To address this issue, we propose a system, called Model Asset Exchange (MAX), that avails developers of easy access to state-of-the-art DL models. Regardless of the underlying DL programming frameworks, it provides an open source Python library (called the MAX framework) that wraps DL models and unifies programming interfaces with our standardized RESTful APIs. These RESTful APIs enable developers to exploit the wrapped DL models for inference tasks without the need to fully understand different DL programming frameworks. Using MAX, we have wrapped and open-sourced more than 30 state-of-the-art DL models from various research fields, including computer vision, natural language processing and signal processing, etc. In the end, we selectively demonstrate two web applications that are built on top of MAX, as well as the process of adding a DL model to MAX.


Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

Abstractive summarization approaches based on Reinforcement Learning (RL) have recently been proposed to overcome classical likelihood maximization. RL enables to consider complex, possibly non-differentiable, metrics that globally assess the quality and relevance of the generated outputs. ROUGE, the most used summarization metric, is known to suffer from bias towards lexical similarity as well as from suboptimal accounting for fluency and readability of the generated abstracts. We thus explore and propose alternative evaluation measures: the reported human-evaluation analysis shows that the proposed metrics, based on Question Answering, favorably compares to ROUGE — with the additional property of not requiring reference summaries. Training a RL-based model on these metrics leads to improvements (both in terms of human or automated metrics) over current approaches that use ROUGE as a reward.


Metric Learning from Imbalanced Data

A key element of any machine learning algorithm is the use of a function that measures the dis/similarity between data points. Given a task, such a function can be optimized with a metric learning algorithm. Although this research field has received a lot of attention during the past decade, very few approaches have focused on learning a metric in an imbalanced scenario where the number of positive examples is much smaller than the negatives. Here, we address this challenging task by designing a new Mahalanobis metric learning algorithm (IML) which deals with class imbalance. The empirical study performed shows the efficiency of IML.


Empirical Analysis of Knowledge Distillation Technique for Optimization of Quantized Deep Neural Networks

Knowledge distillation (KD) is a very popular method for model size reduction. Recently, the technique is exploited for quantized deep neural networks (QDNNs) training as a way to restore the performance sacrificed by word-length reduction. KD, however, employs additional hyper-parameters, such as temperature, coefficient, and the size of teacher network for QDNN training. We analyze the effect of these hyper-parameters for QDNN optimization with KD. We find that these hyper-parameters are inter-related, and also introduce a simple and effective technique that reduces \textit{coefficient} during training. With KD employing the proposed hyper-parameters, we achieve the test accuracy of 92.7% and 67.0% on Resnet20 with 2-bit ternary weights for CIFAR-10 and CIFAR-100 data sets, respectively.


Subset Multivariate Collective And Point Anomaly Detection

In recent years, there has been a growing interest in identifying anomalous structure within multivariate data streams. We consider the problem of detecting collective anomalies, corresponding to intervals where one or more of the data streams behaves anomalously. We first develop a test for a single collective anomaly that has power to simultaneously detect anomalies that are either rare, that is affecting few data streams, or common. We then show how to detect multiple anomalies in a way that is computationally efficient but avoids the approximations inherent in binary segmentation-like approaches. This approach, which we call MVCAPA, is shown to consistently estimate the number and location of the collective anomalies, a property that has not previously been shown for competing methods. MVCAPA can be made robust to point anomalies and can allow for the anomalies to be imperfectly aligned. We show the practical usefulness of allowing for imperfect alignments through a resulting increase in power to detect regions of copy number variation.


Deep Convolutional Networks in System Identification

Recent developments within deep learning are relevant for nonlinear system identification problems. In this paper, we establish connections between the deep learning and the system identification communities. It has recently been shown that convolutional architectures are at least as capable as recurrent architectures when it comes to sequence modeling tasks. Inspired by these results we explore the explicit relationships between the recently proposed temporal convolutional network (TCN) and two classic system identification model structures; Volterra series and block-oriented models. We end the paper with an experimental study where we provide results on two real-world problems, the well-known Silverbox dataset and a newer dataset originating from ground vibration experiments on an F-16 fighter aircraft.


Augmented Memory Networks for Streaming-Based Active One-Shot Learning

One of the major challenges in training deep architectures for predictive tasks is the scarcity and cost of labeled training data. Active Learning (AL) is one way of addressing this challenge. In stream-based AL, observations are continuously made available to the learner that have to decide whether to request a label or to make a prediction. The goal is to reduce the request rate while at the same time maximize prediction performance. In previous research, reinforcement learning has been used for learning the AL request/prediction strategy. In our work, we propose to equip a reinforcement learning process with memory augmented neural networks, to enhance the one-shot capabilities. Moreover, we introduce Class Margin Sampling (CMS) as an extension of the standard margin sampling to the reinforcement learning setting. This strategy aims to reduce training time and improve sample efficiency in the training process. We evaluate the proposed method on a classification task using empirical accuracy of label predictions and percentage of label requests. The results indicates that the proposed method, by making use of the memory augmented networks and CMS in the training process, outperforms existing baselines.


Mogrifier LSTM

Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3-4 perplexity points on Penn Treebank and Wikitext-2, and 0.01-0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.


Matching Component Analysis for Transfer Learning

We introduce a new Procrustes-type method called matching component analysis to isolate components in data for transfer learning. Our theoretical results describe the sample complexity of this method, and we demonstrate through numerical experiments that our approach is indeed well suited for transfer learning.


Quasi-Newton Optimization Methods For Deep Learning Applications

Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement learning (RL), are generally restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). While SGD iterates are inexpensive to compute, they have slow theoretical convergence rates. Furthermore, they require exhaustive trial-and-error to fine-tune many learning parameters. Using second-order curvature information to find search directions can help with more robust convergence for non-convex optimization problems. However, computing Hessian matrices for large-scale problems is not computationally practical. Alternatively, quasi-Newton methods construct an approximate of the Hessian matrix to build a quadratic model of the objective function. Quasi-Newton methods, like SGD, require only first-order gradient information, but they can result in superlinear convergence, which makes them attractive alternatives to SGD. The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive definite Hessian approximations. In this chapter, we propose efficient optimization methods based on L-BFGS quasi-Newton methods using line search and trust-region strategies. Our methods bridge the disparity between first- and second-order methods by using gradient information to calculate low-rank updates to Hessian approximations. We provide formal convergence analysis of these methods as well as empirical results on deep learning applications, such as image classification tasks and deep reinforcement learning on a set of ATARI 2600 video games. Our results show a robust convergence with preferred generalization characteristics as well as fast training time.


Differentially Private SQL with Bounded User Contribution

Differential privacy (DP) provides formal guarantees that the output of a database query does not reveal too much information about any individual present in the database. While many differentially private algorithms have been proposed in the scientific literature, there are only a few end-to-end implementations of differentially private query engines. Crucially, existing systems assume that each individual is associated with at most one database record, which is unrealistic in practice. We propose a generic and scalable method to perform differentially private aggregations on databases, even when individuals can each be associated with arbitrarily many rows. We express this method as an operator in relational algebra, and implement it in an SQL engine. To validate this system, we test the utility of typical queries on industry benchmarks, and verify its correctness with a stochastic test framework we developed. We highlight the promises and pitfalls learned when deploying such a system in practice, and we publish its core components as open-source software.

Document worth reading: “High-Performance Support Vector Machines and Its Applications”

The support vector machines (SVM) algorithm is a popular classification technique in data mining and machine learning. In this paper, we propose a distributed SVM algorithm and demonstrate its use in a number of applications. The algorithm is named high-performance support vector machines (HPSVM). The major contribution of HPSVM is two-fold. First, HPSVM provides a new way to distribute computations to the machines in the cloud without shuffling the data. Second, HPSVM minimizes the inter-machine communications in order to maximize the performance. We apply HPSVM to some real-world classification problems and compare it with the state-of-the-art SVM technique implemented in R on several public data sets. HPSVM achieves similar or better results. High-Performance Support Vector Machines and Its Applications

If you did not already know

NestDNN google
Mobile vision systems such as smartphones, drones, and augmented-reality headsets are revolutionizing our lives. These systems usually run multiple applications concurrently and their available resources at runtime are dynamic due to events such as starting new applications, closing existing applications, and application priority changes. In this paper, we present NestDNN, a framework that takes the dynamics of runtime resources into account to enable resource-aware multi-tenant on-device deep learning for mobile vision systems. NestDNN enables each deep learning model to offer flexible resource-accuracy trade-offs. At runtime, it dynamically selects the optimal resource-accuracy trade-off for each deep learning model to fit the model’s resource demand to the system’s available runtime resources. In doing so, NestDNN efficiently utilizes the limited resources in mobile vision systems to jointly maximize the performance of all the concurrently running applications. Our experiments show that compared to the resource-agnostic status quo approach, NestDNN achieves as much as 4.2% increase in inference accuracy, 2.0x increase in video frame processing rate and 1.7x reduction on energy consumption. …

Attention Branch Network (ABN) google
Visual explanation enables human to understand the decision making of Deep Convolutional Neural Network (CNN), but it is insufficient to contribute the performance improvement. In this paper, we focus on the attention map for visual explanation, which represents high response value as the important region in image recognition. This region significantly improves the performance of CNN by introducing an attention mechanism that focuses on a specific region in an image. In this work, we propose Attention Branch Network (ABN), which extends the top-down visual explanation model by introducing a branch structure with an attention mechanism. ABN can be applicable to several image recognition tasks by introducing a branch for attention mechanism and is trainable for the visual explanation and image recognition in end-to-end manner. We evaluate ABN on several image recognition tasks such as image classification, fine-grained recognition, and multiple facial attributes recognition. Experimental results show that ABN can outperform the accuracy of baseline models on these image recognition tasks while generating an attention map for visual explanation.
Embedding Human Knowledge in Deep Neural Network via Attention Map


Payoff Dynamical Model (PDM) google
We consider that at every instant each member of a population, which we refer to as an agent, selects one strategy out of a finite set. The agents are nondescript, and their strategy choices are described by the so-called population state vector, whose entries are the portions of the population selecting each strategy. Likewise, each entry constituting the so-called payoff vector is the reward attributed to a strategy. We consider that a general finite-dimensional nonlinear dynamical system, denoted as payoff dynamical model (PDM), describes a mechanism that determines the payoff as a causal map of the population state. A bounded-rationality protocol, inspired primarily on evolutionary biology principles, governs how each agent revises its strategy repeatedly based on complete or partial knowledge of the population state and payoff. The population is protocol-homogeneous but is otherwise strategy-heterogeneous considering that the agents are allowed to select distinct strategies concurrently. A stochastic mechanism determines the instants when agents revise their strategies, but we consider that the population is large enough that, with high probability, the population state can be approximated with arbitrary accuracy uniformly over any finite horizon by a so-called (deterministic) mean population state. We propose an approach that takes advantage of passivity principles to obtain sufficient conditions determining, for a given protocol and PDM, when the mean population state is guaranteed to converge to a meaningful set of equilibria, which could be either an appropriately defined extension of Nash’s for the PDM or a perturbed version of it. By generalizing and unifying previous work, our framework also provides a foundation for future work. …

GP-DRF google
Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variable-sized data, and learn deep features. Their limitation is that they do not scale well with the size of the data. Existing approaches address this using a deep random feature (DRF) expansion model, which makes inference tractable by approximating DGPs. However, DRF is not suitable for variable-sized input data such as trees, graphs, and sequences. We introduce the GP-DRF, a novel Bayesian model with an input layer of GPs, followed by DRF layers. The key advantage is that the combination of GP and DRF leads to a tractable model that can both handle a variable-sized input as well as learn deep long-range dependency structures of the data. We provide a novel efficient method to simultaneously infer the posterior of GP’s latent vectors and infer the posterior of DRF’s internal weights and random frequencies. Our experiments show that GP-DRF outperforms the standard GP model and DRF model across many datasets. Furthermore, they demonstrate that GP-DRF enables improved uncertainty quantification compared to GP and DRF alone, with respect to a Bhattacharyya distance assessment. Source code is available at https://…/GP_DRF.