Document worth reading: “Algorithms and Statistical Models for Scientific Discovery in the Petabyte Era”

The field of astronomy has arrived at a turning point in terms of size and complexity of both datasets and scientific collaboration. Commensurately, algorithms and statistical models have begun to adapt — e.g., via the onset of artificial intelligence — which itself presents new challenges and opportunities for growth. This white paper aims to offer guidance and ideas for how we can evolve our technical and collaborative frameworks to promote efficient algorithmic development and take advantage of opportunities for scientific discovery in the petabyte era. We discuss challenges for discovery in large and complex data sets; challenges and requirements for the next stage of development of statistical methodologies and algorithmic tool sets; how we might change our paradigms of collaboration and education; and the ethical implications of scientists’ contributions to widely applicable algorithms and computational modeling. We start with six distinct recommendations that are supported by the commentary following them. This white paper is related to a larger corpus of effort that has taken place within and around the Petabytes to Science Workshops ( Algorithms and Statistical Models for Scientific Discovery in the Petabyte Era


What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

Evaluation of essays using NLP

Gender Classifier ML Package for classifying gender using firstname

Multi-threaded implementation of the Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, and Weighted Tsetlin Machine, with support for continuous features and multigranularity.

a lib with some recommendation algorithms

Embeeder tool for ‘Modelling the Language of Life – Deep Learning Protein Sequences’. One common task in Computational Biology is the prediction of aspects of protein function and structure from their amino acid sequence. For 26 years, most state-of-the-art approaches toward this end have been marrying machine learning and evolutionary information. The retrieval of related proteins from ever growing sequence databases is becoming so time-consuming that the analysis of entire proteomes becomes challenging. On top, evolutionary information is less powerful for small families, e.g. for proteins from the Dark Proteome.

A spiking deep neural network simulator, and neuromoprhic hardware emulator. sinabs (pytorch based library) is developed to design and implement Spiking Convolutional Neural Networks (SCNNs). The library implements several layers that are spiking equivalents of CNN layers. In addition it provides support to import CNN models implemented in keras conveniently to test their spiking equivalent implementation.

DeepMind Memory Tasks, a set of Unity-based machine-learning research tasks.

Grammar Guided Genetic Programming Library

A tool kit for MS data post-processing


N-Beats: Neural basis expansion analysis for interpretable time series forecasting
• Implementation in Pytorch
• Implementation in Keras

N-Beats: Neural basis expansion analysis for interpretable time series forecasting
• Implementation in Pytorch
• Implementation in Keras

Finding out why

Paper: Confounding and Regression Adjustment in Difference-in-Differences

Difference-in-differences (diff-in-diff) is a study design that compares outcomes of two groups (treated and comparison) at two time points (pre- and post-treatment) and is widely used in evaluating new policy implementations. For instance, diff-in-diff has been used to estimate the effect that increasing minimum wage has on employment rates and to assess the Affordable Care Act’s effect on health outcomes. Although diff-in-diff appears simple, potential pitfalls lurk. In this paper, we discuss one such complication: time-varying confounding. We provide rigorous definitions for confounders in diff-in-diff studies and explore regression strategies to adjust for confounding. In simulations, we show how and when regression adjustment can ameliorate confounding for both time-invariant and time-varying covariates. We compare our regression approach to those models commonly fit in applied literature, which often fail to address the time-varying nature of confounding in diff-in-diff.

Paper: Actionable Interpretability through Optimizable Counterfactual Explanations for Tree Ensembles

Counterfactual explanations help users understand why machine learned models make certain decisions, and more specifically, how these decisions can be changed. In this work, we frame the problem of finding counterfactual explanations — the minimal perturbation to an input such that the prediction changes — as an optimization task. Previously, optimization techniques for generating counterfactual examples could only be applied to differentiable models, or alternatively via query access to the model by estimating gradients from randomly sampled perturbations. In order to accommodate non-differentiable models such as tree ensembles, we propose using probabilistic model approximations in the optimization framework. We introduce a novel approximation technique that is effective for finding counterfactual explanations while also closely approximating the original model. Our results show that our method is able to produce counterfactual examples that are closer to the original instance in terms of Euclidean, Cosine, and Manhattan distance compared to other methods specifically designed for tree ensembles.

Article: An introduction to Causal inference

Causal inference goes beyond prediction by modeling the outcome of interventions and formalizing counterfactual reasoning. In this blog post, I provide an introduction to the graphical approach to causal inference in the tradition of Sewell Wright, Judea Pearl, and others. We first rehash the common adage that correlation is not causation. We then move on to climb what Pearl calls the ‘ladder of causal inference’, from association (seeing) to intervention (doing) to counterfactuals (imagining). We will discover how directed acyclic graphs describe conditional (in)dependencies; how the do-calculus describes interventions; and how Structural Causal Models allow us to imagine what could have been. This blog post is by no means exhaustive, but should give you a first appreciation of the concepts that surround causal inference; references to further readings are provided below. Let’s dive in!1

Python Library: causalnex

Paper: Causal inference of hazard ratio based on propensity score matching

Propensity score matching is commonly used to draw causal inference from observational survival data. However, there is no gold standard approach to analyze survival data after propensity score matching, and variance estimation after matching is open to debate. We derive the statistical properties of the propensity score matching estimator of the marginal causal hazard ratio based on matching with replacement and a fixed number of matches. We also propose a double-resampling technique for variance estimation that takes into account the uncertainty due to propensity score estimation prior to matching.

Paper: Improving Model Robustness Using Causal Knowledge

For decades, researchers in fields, such as the natural and social sciences, have been verifying causal relationships and investigating hypotheses that are now well-established or understood as truth. These causal mechanisms are properties of the natural world, and thus are invariant conditions regardless of the collection domain or environment. We show in this paper how prior knowledge in the form of a causal graph can be utilized to guide model selection, i.e., to identify from a set of trained networks the models that are the most robust and invariant to unseen domains. Our method incorporates prior knowledge (which can be incomplete) as a Structural Causal Model (SCM) and calculates a score based on the likelihood of the SCM given the target predictions of a candidate model and the provided input variables. We show on both publicly available and synthetic datasets that our method is able to identify more robust models in terms of generalizability to unseen out-of-distribution test examples and domains where covariates have shifted.

Paper: A review and evaluation of standard methods to handle missing data on time-varying confounders in marginal structural models

Marginal structural models (MSMs) are commonly used to estimate causal intervention effects in longitudinal non-randomised studies. A common issue when analysing data from observational studies is the presence of incomplete confounder data, which might lead to bias in the intervention effect estimates if they are not handled properly in the statistical analysis. However, there is currently no recommendation on how to address missing data on covariates in MSMs under a variety of missingness mechanisms encountered in practice. We reviewed existing methods to handling missing data in MSMs and performed a simulation study to compare the performance of complete case (CC) analysis, the last observation carried forward (LOCF), the missingness pattern approach (MPA), multiple imputation (MI) and inverse-probability-of-missingness weighting (IPMW). We considered three mechanisms for non-monotone missing data which are common in observational studies using electronic health record data. Whereas CC analysis lead to biased estimates of the intervention effect in almost all scenarios, the performance of the other approaches varied across scenarios. The LOCF approach led to unbiased estimates only under a specific non-random mechanism in which confounder values were missing when their values remained unchanged since the previous measurement. In this scenario, MI, the MPA and IPMW were biased. MI and IPMW led to the estimation of unbiased effects when data were missing at random, given the covariates or the treatment but only MI was unbiased when the outcome was a predictor of missingness. Furthermore, IPMW generally lead to very large standard errors. Lastly, regardless of the missingness mechanism, the MPA led to unbiased estimates only when the failure to record a confounder at a given time-point modified the subsequent relationships between the partially observed covariate and the outcome.

Paper: Entropy, mutual information, and systematic measures of structured spiking neural networks

The aim of this paper is to investigate various information-theoretic measures, including entropy, mutual information, and some systematic measures that based on mutual information, for a class of structured spiking neuronal network. In order to analyze and compute these information-theoretic measures for large networks, we coarse-grained the data by ignoring the order of spikes that fall into the same small time bin. The resultant coarse-grained entropy mainly capture the information contained in the rhythm produced by a local population of the network. We first proved that these information theoretical measures are well-defined and computable by proving the stochastic stability and the law of large numbers. Then we use three neuronal network examples, from simple to complex, to investigate these information-theoretic measures. Several analytical and computational results about properties of these information-theoretic measures are given.

If you did not already know

Minimax Entropy (MME) google
Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision. However, we show that these techniques perform poorly when even a few labeled examples are available in the target. To address this semi-supervised domain adaptation (SSDA) setting, we propose a novel Minimax Entropy (MME) approach that adversarially optimizes an adaptive few-shot model. Our base model consists of a feature encoding network, followed by a classification layer that computes the features’ similarity to estimated prototypes (representatives of each class). Adaptation is achieved by alternately maximizing the conditional entropy of unlabeled target data with respect to the classifier and minimizing it with respect to the feature encoder. We empirically demonstrate the superiority of our method over many baselines, including conventional feature alignment and few-shot methods, setting a new state of the art for SSDA. …

Integrated Rational Prediction and Motionless TextbfANalysis (IRON-MAN) google
Analyzing video for traffic categorization is an important pillar of Intelligent Transport Systems. However, it is difficult to analyze and predict traffic based on image frames because the representation of each frame may vary significantly within a short time period. This also would inaccurately represent the traffic over a longer period of time such as the case of video. We propose a novel bio-inspired methodology that integrates analysis of the previous image frames of the video to represent the analysis of the current image frame, the same way a human being analyzes the current situation based on past experience. In our proposed methodology, called IRON-MAN (Integrated Rational prediction and Motionless textbfANalysis), we utilize Bayesian update on top of the individual image frame analysis in the videos and this has resulted in highly accurate prediction of Temporal Motionless Analysis of the Videos (TMAV) for most of the chosen test cases. The proposed approach could be used for TMAV using Convolutional Neural Network (CNN) for applications where the number of objects in an image is the deciding factor for prediction and results also show that our proposed approach outperforms the state-of-the-art for the chosen test case. We also introduce a new metric named, Energy Consumption per Training Image (ECTI). Since, different CNN based models have different training capability and computing resource utilization, some of the models are more suitable for embedded device implementation than the others, and ECTI metric is useful to assess the suitability of using a CNN model in multi-processor systems-on-chips (MPSoCs) with a focus on energy consumption and reliability in terms of lifespan of the embedded device using these MPSoCs. …

Functional Aggregate Queries (FAQ) google
Motivated by fundamental applications in databases and relational machine learning, we formulate and study the problem of answering Functional Aggregate Queries (FAQ) in which some of the input factors are defined by a collection of Additive Inequalities between variables. We refer to these queries as FAQ-AI for short. To answer FAQ-AI in the Boolean semiring, we define ‘relaxed’ tree decompositions and ‘relaxed’ submodular and fractional hypertree width parameters. We show that an extension of the InsideOut algorithm using Chazelle’s geometric data structure for solving the semigroup range search problem can answer Boolean FAQ-AI in time given by these new width parameters. This new algorithm achieves lower complexity than known solutions for FAQ-AI. It also recovers some known results in database query answering. Our second contribution is a relaxation of the set of polymatroids that gives rise to the counting version of the submodular width, denoted by ‘#subw’. This new width is sandwiched between the submodular and the fractional hypertree widths. Any FAQ and FAQ-AI over one semiring can be answered in time proportional to #subw and respectively to the relaxed version of #subw. We present three applications of our FAQ-AI framework to relational machine learning: k-means clustering, training linear support vector machines, and training models using non-polynomial loss. These optimization problems can be solved over a database asymptotically faster than computing the join of the database relations. …

Approximate Common Variance (A-ComVar) google
We consider nonregular fractions of factorial experiments for a class of linear models. These models have a common general mean and main effects, however they may have different 2-factor interactions. Here we assume for simplicity that 3-factor and higher order interactions are negligible. In the absence of a priori knowledge about which interactions are important, it is reasonable to prefer a design that results in equal variance for the estimates of all interaction effects to aid in model discrimination. Such designs are called common variance designs and can be quite challenging to identify without performing an exhaustive search of possible designs. In this work, we introduce an extension of common variance designs called approximate common variance, or A-ComVar designs. We develop a numerical approach to finding A-ComVar designs that is much more efficient than an exhaustive search. We present the types of A-ComVar designs that can be found for different number of factors, runs, and interactions. We further demonstrate the competitive performance of both common variance and A-ComVar designs with Plackett-Burman designs for model selection using simulation. …

Document worth reading: “Machine Learning for Fluid Mechanics”

The field of fluid mechanics is rapidly advancing, driven by unprecedented volumes of data from experiments, field measurements, and large-scale simulations at multiple spatiotemporal scales. Machine learning presents us with a wealth of techniques to extract information from data that can be translated into knowledge about the underlying fluid mechanics. Moreover, machine learning algorithms can augment domain knowledge and automate tasks related to flow control and optimization. This article presents an overview of past history, current developments, and emerging opportunities of machine learning for fluid mechanics. We outline fundamental machine learning methodologies and discuss their uses for understanding, modeling, optimizing, and controlling fluid flows. The strengths and limitations of these methods are addressed from the perspective of scientific inquiry that links data with modeling, experiments, and simulations. Machine learning provides a powerful information processing framework that can augment, and possibly even transform, current lines of fluid mechanics research and industrial applications. Machine Learning for Fluid Mechanics

Whats new on arXiv – Complete List

On the Legal Compatibility of Fairness Definitions
Domain-Aware Dynamic Networks
ModelHub.AI: Dissemination Platform for Deep Learning Models
Entropy, mutual information, and systematic measures of structured spiking neural networks
To Trust, or Not to Trust? A Study of Human Bias in Automated Video Interview Assessments
Defining and Unpacking Transformative AI
FT-SWRL: A Fuzzy-Temporal Extension of Semantic Web Rule Language
The Distance Matching Problem
AR-Net: A simple Auto-Regressive Neural Network for time-series
Android Botnet Detection using Convolutional Neural Networks
Product Knowledge Graph Embedding for E-commerce
Dual-Attention Graph Convolutional Network
QKD: Quantization-aware Knowledge Distillation
Data Augmentation for Deep Transfer Learning
Unbiased Evaluation of Deep Metric Learning Algorithms
How Can We Know What Language Models Know?
Towards Privacy and Security of Deep Learning Systems: A Survey
Multi-Agent Deep Reinforcement Learning with Adaptive Policies
Understand Dynamic Regret with Switching Cost for Online Decision Making
QPDAS: Dual Active Set Solver for Mixed Constraint Quadratic Programming
D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems
RETRO: Relation Retrofitting For In-Database Machine Learning on Textual Data
Continuous Dropout
Neural networks with redundant representation: detecting the undetectable
Detection of Derivative Discontinuities in Observational Data
Inducing Relational Knowledge from BERT
Constraints in Gaussian Graphical Models
Modelling Load-Changing Attacks in Cyber-Physical Systems
Finite impulse response models: A non-asymptotic analysis of the least squares estimator
Greed is Good: Exploration and Exploitation Trade-offs in Bayesian Optimisation
Application of Time Series Analysis to Traffic Accidents in Los Angeles
DeStress: Deep Learning for Unsupervised Identification of Mental Stress in Firefighters from Heart-rate Variability (HRV) Data
Towards Lingua Franca Named Entity Recognition with BERT
Mean-field backward stochastic differential equations with mean reflection and nonlinear resistance
Multi-PCA based Fault Detection Model Combined with Prior knowledge of HVAC
Non-Intrusive Electrical Appliances Monitoring and Classification using K-Nearest Neighbors
Use of Artificial Intelligence to Analyse Risk in Legal Documents for a Better Decision Support
How much physics is in a current-voltage curve? Inferring defect properties from photovoltaic device measurements
Corpus-Level End-to-End Exploration for Interactive Systems
Tabulated MLP for Fast Point Feature Embedding
Kernelized Multiview Subspace Analysis by Self-weighted Learning
Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics
DeepMimic: Mentor-Student Unlabeled Data Based Training
Extending the dynamic strain sensing rang of phase-OTDR with frequency modulation pulse and frequency interrogation
Cycle and bit accurate chaos-based communication system
Chaos-based spread-spectrum communication system
Machine Learning-based Signal Detection for PMH Signals in Load-modulated MIMO System
Euclidean random matching in 2D for non constant densities
Key Modes for Time-Space Evolutions of ENSO and PDO by ESMD Method
Importance-Aware Learning for Neural Headline Editing
Conclusion-Supplement Answer Generation for Non-Factoid Questions
Independent language modeling architecture for end-to-end ASR
ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network
A Synergistic Approach for Internet of Things and Cloud Integration: Current Research and Future Direction
Tree search algorithms for the Sequential Ordering Problem
Pattern and Anomaly Detection in Urban Temporal Networks
Polyhedral study of the Convex Recoloring problem
Mis-classified Vector Guided Softmax Loss for Face Recognition
AuthorGAN: Improving GAN Reproducibility using a Modular GAN Framework
Consider ethical and social challenges in smart grid research
An Optimized and Energy-Efficient Parallel Implementation of Non-Iteratively Trained Recurrent Neural Networks
Experiments with a PCCoder extension
Low Rank Factorization for Compact Multi-Head Self-Attention
QoS-Aware Machine Learning-based Multiple Resources Scheduling for Microservices in Cloud Environment
CONAN: Complementary Pattern Augmentation for Rare Disease Detection
Coronary Artery Classification and Weakly Supervised Abnormality Localization on Coronary CT Angiography with 3-Dimensional Convolutional Neural Networks
Restoring Chaos Using Deep Reinforcement Learning
Study of Distributed Robust Beamforming with Low-Rank and Cross-Correlation Techniques
Degenerate zero-truncated Poisson random variables
Deep Image Harmonization via Domain Verification
An integrated heterogeneous Poisson model for neuron functions in hand movement during reaching and grasp
Measurement and Characterization of the Stationary Noise in Narrowband Power Line Communication
Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory
Music Source Separation in the Waveform Domain
Towards improving the e-learning experience for deaf students: e-LUX
A concrete example of inclusive design: deaf-oriented accessibility
The Scenario Culture
Solving Inverse Wave Scattering with Deep Learning
Example-Guided Scene Image Synthesis using Masked Spatial-Channel Attention and Patch-Based Self-Supervision
Learning a faceted customer segmentation for discovering new business opportunities at Intel
Class-Conditional VAE-GAN for Local-Ancestry Simulation
Flatsomatic: A Method for Compression of Somatic Mutation Profiles in Cancer
Fock-space correlations and the origins of many-body localisation
Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation
DeFINE: DEep Factorized INput Word Embeddings for Neural Sequence Modeling
Deep Density: circumventing the Kohn-Sham equations via symmetry preserving neural networks
Detecting total hip replacement prosthesis design on preoperative radiographs using deep convolutional neural network
Stablizer-Free Weak Galerkin Methods for Monotone Quasilinear Elliptic PDEs
SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling
Properties of nowhere dense graph classes related to independent set problem
Alternative Metrics
Variational Physics-Informed Neural Networks For Solving Partial Differential Equations
Goodness-of-fit test for the bivariate Hermite distribution
Dynamical fitness models: evidence of universality classes for preferential attachment graphs
Roman and Vatican Crossover Designs
Modelling dependence within and across run-off triangles for claims reserving
PointPWC-Net: A Coarse-to-Fine Network for Supervised and Self-Supervised Scene Flow Estimation on 3D Point Clouds
PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition
Automatic Generation of Headlines for Online Math Questions
Model-Aware Deep Architectures for One-Bit Compressive Variational Autoencoding
Roundtrip Spanners with $(2k-1)$ Stretch
Some properties of $k$-bonacci words on infinite alphabet
A Framework for Weighted-Sum Energy Efficiency Maximization in Wireless Networks
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
Learning with less data via Weakly Labeled Patch Classification in Digital Pathology
Conditional Hierarchical Bayesian Tucker Decomposition
Causal inference of hazard ratio based on propensity score matching
On the choice of initial guesses for the Newton-Raphson algorithm
DC Optimal Power Flow with Joint Chance Constraints
Improving Model Robustness Using Causal Knowledge
A New Inventory Control Approach For Considering Customer Classes In An Integrated Supply Chain Management
Calibrationless Parallel MRI using Model based Deep Learning (C-MODL)
Integral equalities and inequalities: a proxy-measure for multivariate sensitivity analysis
Modelling publication bias and p-hacking
QubitHD: A Stochastic Acceleration Method for HD Computing-Based Machine Learning
Serverless seismic imaging in the cloud
Soft Anchor-Point Object Detection
Empirical Upper-bound in Object Detection and More
Counting stationary points of the loss function in the simplest constrained least-square optimization
Collecting Charges for Ad Impact on User Experience for Different Price Types
Some Algebraic Properties of Lecture Hall Polytopes
Two-Stage Learning for Uplink Channel Estimation in One-Bit Massive MIMO
Information-Geometric Set Embeddings (IGSE): From Sets to Probability Distributions
Words With Few Palindromes, Revisited
3D Shape Completion with Multi-view Consistent Inference
Analysis of Hydrological and Suspended Sediment Events from Mad River Wastershed using Multivariate Time Series Clustering
Towards Reliable Evaluation of Road Network Reconstructions
Spectrum Cartography via Coupled Block-Term Tensor Decomposition
Large independent sets in triangle-free cubic graphs: beyond planarity
Manipulating Elections by Selecting Issues
Bayesian Optimization for Categorical and Category-Specific Continuous Inputs
Qini-based Uplift Regression
All you need is a good representation: A multi-level and classifier-centric representation for few-shot learning
Metre as a stylometric feature in Latin hexameter poetry
On the $q$-Dyson orthogonality problem
Designing the Next Generation of Intelligent Personal Robotic Assistants for the Physically Impaired
Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
Stable chimeras of non-locally coupled Kuramoto-Sakaguchi oscillators in a finite array
An End-to-end Framework for Unconstrained Monocular 3D Hand Pose Estimation
Deep Learning for Optimal Deployment of UAVs with Visible Light Communications
System Integration and Control Design of a Maglev Platform for Space Vibration Isolation
Stigmergic Independent Reinforcement Learning for Multi-Agent Collaboration
Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music
Time-Guided High-Order Attention Model of Longitudinal Heterogeneous Healthcare Data
Error Resilient Deep Compressive Sensing
Action Recognition via Pose-Based Graph Convolutional Networks with Intermediate Dense Supervision
Generalized Constructions of Complementary Sets of Sequences of Lengths Non-Power-of-Two
Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction
Rethinking Temporal Fusion for Video-based Person Re-identification on Semantic and Time Aspect
Product Sequencing and Pricing under Cascade Browse Model
Palmprint Recognition in Uncontrolled and Uncooperative Environment
Optimal and Adaptive Estimation of Extreme Values in the Permuted Monotone Matrix Model
A Discriminative Learned CNN Embedding For Remote Senseing Image Scene Classification
Analysis of Asymptotic Escape of Strict Saddle Sets in Manifold Optimization
Schur Polynomials do not have small formulas if the Determinant doesn’t!
New constructions of cooperative MSR codes: Reducing node size to $\exp(O(n))$
Sparse-GAN: Sparsity-constrained Generative Adversarial Network for Anomaly Detection in Retinal OCT Image
One-Shot Object Detection with Co-Attention and Co-Excitation
Diffusion Maps for Embedded Manifolds with Boundary with Applications to PDEs
Abnormal source identification for parabolic distributed parameter systems
A Simple Variational Bayes Detector for Orthogonal Time Frequency Space (OTFS) Modulation
U-CNNpred: A Universal CNN-based Predictor for Stock Markets
Least $Q$-eigenvalues of nonbipartite 2-connected graphs
A note on algebraic connectivity of 2-connected graphs
Language-Independent Sentiment Analysis Using Subjectivity and Positional Information
An accelerated first-order method with complexity analysis for solving cubic regularization subproblems
Cycle-Consistent Adversarial Networks for Realistic Pervasive Change Generation in Remote Sensing Imagery
DiscoTK: Using Discourse Structure for Machine Translation Evaluation
A Data Driven Approach to Learning The Hamiltonian Matrix in Quantum Mechanics
An Efficient Multi-Domain Framework for Image-to-Image Translation
Augmented Random Search for Quadcopter Control: An alternative to Reinforcement Learning
Improving Neural Relation Extraction with Positive and Unlabeled Learning
Addressing Time Bias in Bipartite Graph Ranking for Important Node Identification
KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents
Sketching for Motzkin’s Iterative Method for Linear Systems
Free-riders in Federated Learning: Attacks and Defenses
Hydrodynamics for the partial exclusion process in random environment
Weapon-Target Assignment Problem with Interference Constraints using Mixed-Integer Linear Programming
Optimal Estimation of Change in a Population of Parameters
Emotion helps Sentiment: A Multi-task Model for Sentiment and Emotion Analysis
Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization
A New Corpus for Low-Resourced Sindhi Language with Word Embeddings
Stable Learning via Sample Reweighting
Reaction Asymmetries to Social Responsibility Index Recomposition: A Matching Portfolio Approach
FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions
AutoRemover: Automatic Object Removal for Autonomous Driving Videos
Challenges of Scaled Agile for Safety-Critical Systems
Computer Systems Have 99 Problems, Let’s Not Make Machine Learning Another One
Factorization graphs of finite groups
An Integrated Early Warning System for Stock Market Turbulence
Lidar-Camera Co-Training for Semi-Supervised Road Detection
Online Pricing with Reserve Price Constraint for Personal Data Markets
Homogenization with fractional random fields
A Generalization Theory based on Independent and Task-Identically Distributed Assumption
The Weighted Tsetlin Machine: Compressed Representations with Weighted Clauses
HyperTraPS: Inferring probabilistic patterns of trait acquisition in evolutionary and disease progression pathways
A note on the Lomax distribution
Data Transmission based on Exact Inverse Periodic Nonlinear Fourier Transform, Part I: Theory
Data Transmission based on Exact Inverse Periodic Nonlinear Fourier Transform, Part II: Waveform Design and Experiment
A Case for the Score: Identifying Image Anomalies using Variational Autoencoder Gradients
Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR
Machine learning for music genre: multifaceted review and experimentation with audioset
Estimation of Blood Glucose Level of Type-2 Diabetes Patients using Smartphone Video
A review and evaluation of standard methods to handle missing data on time-varying confounders in marginal structural models
Learning restrictions on admissible switching signals for switched systems
Legal document retrieval across languages: topic hierarchies based on synsets
Quantum Lower Bounds for 2D-Grid and Dyck Language
Self-Supervised Unconstrained Illumination Invariant Representation
Predicting Performance of Software Configurations: There is no Silver Bullet
Individual-based models under various time-scales
Multiple quadrotors carrying a flexible hose: dynamics, differential flatness and control
Maestro: A Python library for multi-carrier energy district optimal control design
System Identification for Hybrid Systems using Neural Networks
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Improved cross-validation for classifiers that make algorithmic choices to minimise runtime without compromising output correctness
xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation
Optimized Runge-Kutta (LDDRK) timestepping schemes for non-constant-amplitude oscillations
Patch Reordering: a Novel Way to Achieve Rotation and Translation Invariance in Convolutional Neural Networks
Moment Propagation of Discrete-Time Stochastic Polynomial Systems using Truncated Carleman Linearization
A novel classification-selection approach for the self updating of template-based face recognition systems
How to Efficiently Handle Complex Values? Implementing Decision Diagrams for Quantum Computing
Transform-Invariant Convolutional Neural Networks for Image Classification and Search
On a characterisation theorem for $a$-adic solenoids
Phase transition for the volume of high-dimensional random polytopes
Joint User Association and Resource Allocation in the Uplink of Heterogeneous Networks
Conley-Morse-Forman theory for generalized combinatorial multivector fields on finite topological spaces
Comparative Study of Differentially Private Synthetic Data Algorithms and Evaluation Standards
Cameras Viewing Cameras Geometry
Generalized Guerra’s interpolation schemes for dense associative neural networks
Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections
Lockless Transaction Isolation in Hyperledger Fabric
HS-CAI: A Hybrid DCOP Algorithm via Combining Search with Context-based Inference
Testing for high frequency features in a noisy signal
Mixture-Model-based Bounding Box Density Estimation for Object Detection
A Fine-grained Sentiment Dataset for Norwegian
Distributed estimation of principal support vector machines for sufficient dimension reduction
Diversity-Aware Vehicle Motion Prediction via Latent Semantic Sampling
LL(1) Parsing with Derivatives and Zippers
Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow
Data-Driven Compression of Convolutional Neural Networks
Asymptotics and approximation of large systems of ordinary differential equations
ASR is all you need: cross-modal distillation for lip reading
Hexagon tilings of the plane that are not edge-to-edge
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech
A reduction formula for Waring’s numbers through generalized Paley graphs
Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest Neighbours Baselines to SoTA
A Chevalley formula for semi-infinite flag manifolds and quantum K-theory (Extended abstract)
Effective Sub-clonal Cancer Representation to Predict Tumor Evolution
Distributed payoff allocation in coalitional games via time varying paracontractions
Interpreting Epsilon of Differential Privacy in Terms of Advantage in Guessing or Approximating Sensitive Attributes
PERMUTATION Strikes Back: The Power of Recourse in Online Metric Matching
Inference under random limit bootstrap measures
Detection and Mitigation of Rare Subclasses in Neural Network Classifiers
Elliptic and $q$-analogs of the Fibonomial numbers
Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case Study
On the Penalty term for the Mixed Discontinuous Galerkin Finite Element Method for the Biharmonic Equation
Integro-differential equations linked to compound birth processes with infinitely divisible addends
Partition and Cohen-Macaulay Extenders
Discontinuous Galerkin Finite Element Methods for 1D Rosenau Equation
Light-weight Calibrator: a Separable Component for Unsupervised Domain Adaptation
Rate Analysis of Cell-Free Massive MIMO-NOMA With Three Linear Precoders
Multimodal Machine Translation through Visuals and Speech
Marked Gibbs point processes with unbounded interaction: an existence result
Motion Equivariance OF Event-based Camera Data with the Temporal Normalization Transform
Data-Driven Spectrum Cartography via Deep Completion Autoencoders
Neural Network-Inspired Analog-to-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices
Equivalence Relations for Computing Permutation Polynomials
Option-critic in cooperative multi-agent systems

Whats new on arXiv

On the Legal Compatibility of Fairness Definitions

Past literature has been effective in demonstrating ideological gaps in machine learning (ML) fairness definitions when considering their use in complex socio-technical systems. However, we go further to demonstrate that these definitions often misunderstand the legal concepts from which they purport to be inspired, and consequently inappropriately co-opt legal language. In this paper, we demonstrate examples of this misalignment and discuss the differences in ML terminology and their legal counterparts, as well as what both the legal and ML fairness communities can learn from these tensions. We focus this paper on U.S. anti-discrimination law since the ML fairness research community regularly references terms from this body of law.

Domain-Aware Dynamic Networks

Deep neural networks with more parameters and FLOPs have higher capacity and generalize better to diverse domains. But to be deployed on edge devices, the model’s complexity has to be constrained due to limited compute resource. In this work, we propose a method to improve the model capacity without increasing inference-time complexity. Our method is based on an assumption of data locality: for an edge device, within a short period of time, the input data to the device are sampled from a single domain with relatively low diversity. Therefore, it is possible to utilize a specialized, low-complexity model to achieve good performance in that input domain. To leverage this, we propose Domain-aware Dynamic Network (DDN), which is a high-capacity dynamic network in which each layer contains multiple weights. During inference, based on the input domain, DDN dynamically combines those weights into one single weight that specializes in the given domain. This way, DDN can keep the inference-time complexity low but still maintain a high capacity. Experiments show that without increasing the parameters, FLOPs, and actual latency, DDN achieves up to 2.6\% higher AP50 than a static network on the BDD100K object-detection benchmark.

ModelHub.AI: Dissemination Platform for Deep Learning Models

Recent advances in artificial intelligence research have led to a profusion of studies that apply deep learning to problems in image analysis and natural language processing among others. Additionally, the availability of open-source computational frameworks has lowered the barriers to implementing state-of-the-art methods across multiple domains. Albeit leading to major performance breakthroughs in some tasks, effective dissemination of deep learning algorithms remains challenging, inhibiting reproducibility and benchmarking studies, impeding further validation, and ultimately hindering their effectiveness in the cumulative scientific progress. In developing a platform for sharing research outputs, we present ModelHub.AI (, a community-driven container-based software engine and platform for the structured dissemination of deep learning models. For contributors, the engine controls data flow throughout the inference cycle, while the contributor-facing standard template exposes model-specific functions including inference, as well as pre- and post-processing. Python and RESTful Application programming interfaces (APIs) enable users to interact with models hosted on ModelHub.AI and allows both researchers and developers to utilize models out-of-the-box. ModelHub.AI is domain-, data-, and framework-agnostic, catering to different workflows and contributors’ preferences.

Entropy, mutual information, and systematic measures of structured spiking neural networks

The aim of this paper is to investigate various information-theoretic measures, including entropy, mutual information, and some systematic measures that based on mutual information, for a class of structured spiking neuronal network. In order to analyze and compute these information-theoretic measures for large networks, we coarse-grained the data by ignoring the order of spikes that fall into the same small time bin. The resultant coarse-grained entropy mainly capture the information contained in the rhythm produced by a local population of the network. We first proved that these information theoretical measures are well-defined and computable by proving the stochastic stability and the law of large numbers. Then we use three neuronal network examples, from simple to complex, to investigate these information-theoretic measures. Several analytical and computational results about properties of these information-theoretic measures are given.

To Trust, or Not to Trust? A Study of Human Bias in Automated Video Interview Assessments

Supervised systems require human labels for training. But, are humans themselves always impartial during the annotation process? We examine this question in the context of automated assessment of human behavioral tasks. Specifically, we investigate whether human ratings themselves can be trusted at their face value when scoring video-based structured interviews, and whether such ratings can impact machine learning models that use them as training data. We present preliminary empirical evidence that indicates there might be biases in such annotations, most of which are visual in nature.

Defining and Unpacking Transformative AI

Recently the concept of transformative AI (TAI) has begun to receive attention in the AI policy space. TAI is often framed as an alternative formulation to notions of strong AI (e.g. artificial general intelligence or superintelligence) and reflects increasing consensus that advanced AI which does not fit these definitions may nonetheless have extreme and long-lasting impacts on society. However, the term TAI is poorly defined and often used ambiguously. Some use the notion of TAI to describe levels of societal transformation associated with previous ‘general purpose technologies’ (GPTs) such as electricity or the internal combustion engine. Others use the term to refer to more drastic levels of transformation comparable to the agricultural or industrial revolutions. The notion has also been used much more loosely, with some implying that current AI systems are already having a transformative impact on society. This paper unpacks and analyses the notion of TAI, proposing a distinction between TAI and radically transformative AI (RTAI), roughly corresponding to societal change on the level of the agricultural or industrial revolutions. We describe some relevant dimensions associated with each and discuss what kinds of advances in capabilities they might require. We further consider the relationship between TAI and RTAI and whether we should necessarily expect a period of TAI to precede the emergence of RTAI. This analysis is important as it can help guide discussions among AI policy researchers about how to allocate resources towards mitigating the most extreme impacts of AI and it can bring attention to negative TAI scenarios that are currently neglected.

FT-SWRL: A Fuzzy-Temporal Extension of Semantic Web Rule Language

We present, FT-SWRL, a fuzzy temporal extension to the Semantic Web Rule Language (SWRL), which combines fuzzy theories based on the valid-time temporal model to provide a standard approach for modeling imprecise temporal domain knowledge in OWL ontologies. The proposal introduces a fuzzy temporal model for the semantic web, which is syntactically defined as a fuzzy temporal SWRL ontology (SWRL-FTO) with a new set of fuzzy temporal SWRL built-ins for defining their semantics. The SWRL-FTO hierarchically defines the necessary linguistic terminologies and variables for the fuzzy temporal model. An example model demonstrating the usefulness of the fuzzy temporal SWRL built-ins to model imprecise temporal information is also represented. Fuzzification process of interval-based temporal logic is further discussed as a reasoning paradigm for our FT-SWRL rules, with the aim of achieving a complete OWL-based fuzzy temporal reasoning. Literature review on fuzzy temporal representation approaches, both within and without the use of ontologies, led to the conclusion that the FT-SWRL model can authoritatively serve as a formal specification for handling imprecise temporal expressions on the semantic web.

The Distance Matching Problem

This paper introduces the d-distance matching problem, in which we are given a bipartite graph G=(S,T;E) with S=\{s_1,\dots,s_n\}, a weight function on the edges and an integer d\in\mathbb{N}. The goal is to find a maximum weight subset M\subseteq E of the edges satisfying the following two conditions: i) the degree of every node of S is at most one in M, ii) if ts_i,ts_j\in M for some i<j, then j-i\geq d. The question finds applications, for example, in various scheduling problems. \quadWe show that the problem is NP-complete in general and admits a simple 3-approximation. We give an FPT algorithm parameterized by d and also settle the case when the size of T is constant. From an approximability point of view, we consider several greedy approaches. In particular, a local search algorithm is presented that achieves an approximation ratio of 3/2+\epsilon for any constant \epsilon>0 in the unweighted case. We show that the integrality gap of the natural integer programming model is at most 2-\frac{1}{2d-1}, and give an LP-based approximation algorithm for the weighted case with the same guarantee. The novel approaches used in the analysis of the integrality gap and the approximation ratio of locally optimal solutions might be of independent combinatorial interest.

AR-Net: A simple Auto-Regressive Neural Network for time-series

In this paper we present a new framework for time-series modeling that combines the best of traditional statistical models and neural networks. We focus on time-series with long-range dependencies, needed for monitoring fine granularity data (e.g. minutes, seconds, milliseconds), prevalent in operational use-cases. Traditional models, such as auto-regression fitted with least squares (Classic-AR) can model time-series with a concise and interpretable model. When dealing with long-range dependencies, Classic-AR models can become intractably slow to fit for large data. Recently, sequence-to-sequence models, such as Recurrent Neural Networks, which were originally intended for natural language processing, have become popular for time-series. However, they can be overly complex for typical time-series data and lack interpretability. A scalable and interpretable model is needed to bridge the statistical and deep learning-based approaches. As a first step towards this goal, we propose modelling AR-process dynamics using a feed-forward neural network approach, termed AR-Net. We show that AR-Net is as interpretable as Classic-AR but also scales to long-range dependencies. Our results lead to three major conclusions: First, AR-Net learns identical AR-coefficients as Classic-AR, thus being equally interpretable. Second, the computational complexity with respect to the order of the AR process, is linear for AR-Net as compared to a quadratic for Classic-AR. This makes it possible to model long-range dependencies within fine granularity data. Third, by introducing regularization, AR-Net automatically selects and learns sparse AR-coefficients. This eliminates the need to know the exact order of the AR-process and allows to learn sparse weights for a model with long-range dependencies.

Android Botnet Detection using Convolutional Neural Networks

Today, Android devices are able to provide various services. They support applications for different purposes such as entertainment, business, health, education, and banking services. Because of the functionality and popularity of Android devices as well as the open-source policy of Android OS, they have become a suitable target for attackers. Android Botnet is one of the most dangerous malwares because an attacker called Botmaster can control that remotely to perform destructive attacks. A number of researchers have used different well-known Machine Learning (ML) methods to recognize Android Botnets from benign applications. However, these conventional methods are not able to detect new sophisticated Android Botnets. In this paper, we propose a novel method based on Android permissions and Convolutional Neural Networks (CNNs) to classify Botnets and benign Android applications. Being the first developed method that uses CNNs for this aim, we also proposed a novel method to represent each application as an image which is constructed based on the co-occurrence of used permissions in that application. The proposed CNN is a binary classifier that is trained using these images. Evaluating the proposed method on 5450 Android applications consist of Botnet and benign samples, the obtained results show the accuracy of 97.2% and recall of 96% which is a promising result just using Android permissions.

Product Knowledge Graph Embedding for E-commerce

In this paper, we propose a new product knowledge graph (PKG) embedding approach for learning the intrinsic product relations as product knowledge for e-commerce. We define the key entities and summarize the pivotal product relations that are critical for general e-commerce applications including marketing, advertisement, search ranking and recommendation. We first provide a comprehensive comparison between PKG and ordinary knowledge graph (KG) and then illustrate why KG embedding methods are not suitable for PKG learning. We construct a self-attention-enhanced distributed representation learning model for learning PKG embeddings from raw customer activity data in an end-to-end fashion. We design an effective multi-task learning schema to fully leverage the multi-modal e-commerce data. The Poincare embedding is also employed to handle complex entity structures. We use a real-world dataset from to evaluate the performances on knowledge completion, search ranking and recommendation. The proposed approach compares favourably to baselines in knowledge completion and downstream tasks.

Dual-Attention Graph Convolutional Network

Graph convolutional networks (GCNs) have shown the powerful ability in text structure representation and effectively facilitate the task of text classification. However, challenges still exist in adapting GCN on learning discriminative features from texts due to the main issue of graph variants incurred by the textual complexity and diversity. In this paper, we propose a dual-attention GCN to model the structural information of various texts as well as tackle the graph-invariant problem through embedding two types of attention mechanisms, i.e. the connection-attention and hop-attention, into the classic GCN. To encode various connection patterns between neighbour words, connection-attention adaptively imposes different weights specified to neighbourhoods of each word, which captures the short-term dependencies. On the other hand, the hop-attention applies scaled coefficients to different scopes during the graph diffusion process to make the model learn more about the distribution of context, which captures long-term semantics in an adaptive way. Extensive experiments are conducted on five widely used datasets to evaluate our dual-attention GCN, and the achieved state-of-the-art performance verifies the effectiveness of dual-attention mechanisms.

QKD: Quantization-aware Knowledge Distillation

Quantization and Knowledge distillation (KD) methods are widely used to reduce memory and power consumption of deep neural networks (DNNs), especially for resource-constrained edge devices. Although their combination is quite promising to meet these requirements, it may not work as desired. It is mainly because the regularization effect of KD further diminishes the already reduced representation power of a quantized model. To address this short-coming, we propose Quantization-aware Knowledge Distillation (QKD) wherein quantization and KD are care-fully coordinated in three phases. First, Self-studying (SS) phase fine-tunes a quantized low-precision student network without KD to obtain a good initialization. Second, Co-studying (CS) phase tries to train a teacher to make it more quantizaion-friendly and powerful than a fixed teacher. Finally, Tutoring (TU) phase transfers knowledge from the trained teacher to the student. We extensively evaluate our method on ImageNet and CIFAR-10/100 datasets and show an ablation study on networks with both standard and depthwise-separable convolutions. The proposed QKD outperformed existing state-of-the-art methods (e.g., 1.3% improvement on ResNet-18 with W4A4, 2.6% on MobileNetV2 with W4A4). Additionally, QKD could recover the full-precision accuracy at as low as W3A3 quantization on ResNet and W6A6 quantization on MobilenetV2.

Data Augmentation for Deep Transfer Learning

Current approaches to deep learning are beginning to rely heavily on transfer learning as an effective method for reducing overfitting, improving model performance, and quickly learning new tasks. Similarly, such pre-trained models are often used to create embedding representations for various types of data, such as text and images, which can then be fed as input into separate, downstream models. However, in cases where such transfer learning models perform poorly (i.e., for data outside of the training distribution), one must resort to fine-tuning such models, or even retraining them completely. Currently, no form of data augmentation has been proposed that can be applied directly to embedding inputs to improve downstream model performance. In this work, we introduce four new types of data augmentation that are generally applicable to embedding inputs, thus making them useful in both Natural Language Processing (NLP) and Computer Vision (CV) applications. For models trained on downstream tasks with such embedding inputs, these augmentation methods are shown to improve the AUC score of the models from a score of 0.9582 to 0.9812 and significantly increase the model’s ability to identify classes of data that are not seen during training.

Unbiased Evaluation of Deep Metric Learning Algorithms

Deep metric learning (DML) is a popular approach for images retrieval, solving verification (same or not) problems and addressing open set classification. Arguably, the most common DML approach is with triplet loss, despite significant advances in the area of DML. Triplet loss suffers from several issues such as collapse of the embeddings, high sensitivity to sampling schemes and more importantly a lack of performance when compared to more modern methods. We attribute this adoption to a lack of fair comparisons between various methods and the difficulty in adopting them for novel problem statements. In this paper, we perform an unbiased comparison of the most popular DML baseline methods under same conditions and more importantly, not obfuscating any hyper parameter tuning or adjustment needed to favor a particular method. We find, that under equal conditions several older methods perform significantly better than previously believed. In fact, our unified implementation of 12 recently introduced DML algorithms achieve state-of-the art performance on CUB200, CAR196, and Stanford Online products datasets which establishes a new set of baselines for future DML research. The codebase and all tuned hyperparameters will be open-sourced for reproducibility and to serve as a source of benchmark.

How Can We Know What Language Models Know?

Recent work has presented intriguing results examining the knowledge contained in language models (LM) by having the LM fill in the blanks of prompts such as ‘Obama is a _ by profession’. These prompts are usually manually created, and quite possibly sub-optimal; another prompt such as ‘Obama worked as a _’ may result in more accurately predicting the correct profession. Because of this, given an inappropriate prompt, we might fail to retrieve facts that the LM does know, and thus any given prompt only provides a lower bound estimate of the knowledge contained in an LM. In this paper, we attempt to more accurately estimate the knowledge contained in LMs by automatically discovering better prompts to use in this querying process. Specifically, we propose mining-based and paraphrasing-based methods to automatically generate high-quality and diverse prompts and ensemble methods to combine answers from different prompts. Extensive experiments on the LAMA benchmark for extracting relational knowledge from LMs demonstrate that our methods can improve accuracy from 31.1% to 38.1%, providing a tighter lower bound on what LMs know. We have released the code and the resulting LM Prompt And Query Archive (LPAQA) at https://…/LPAQA.

Towards Privacy and Security of Deep Learning Systems: A Survey

Deep learning has gained tremendous success and great popularity in the past few years. However, recent research found that it is suffering several inherent weaknesses, which can threaten the security and privacy of the stackholders. Deep learning’s wide use further magnifies the caused consequences. To this end, lots of research has been conducted with the purpose of exhaustively identifying intrinsic weaknesses and subsequently proposing feasible mitigation. Yet few is clear about how these weaknesses are incurred and how effective are these attack approaches in assaulting deep learning. In order to unveil the security weaknesses and aid in the development of a robust deep learning system, we are devoted to undertaking a comprehensive investigation on attacks towards deep learning, and extensively evaluating these attacks in multiple views. In particular, we focus on four types of attacks associated with security and privacy of deep learning: model extraction attack, model inversion attack, poisoning attack and adversarial attack. For each type of attack, we construct its essential workflow as well as adversary capabilities and attack goals. Many pivot metrics are devised for evaluating the attack approaches, by which we perform a quantitative and qualitative analysis. From the analysis, we have identified significant and indispensable factors in an attack vector, \eg, how to reduce queries to target models, what distance used for measuring perturbation. We spot light on 17 findings covering these approaches’ merits and demerits, success probability, deployment complexity and prospects. Moreover, we discuss other potential security weaknesses and possible mitigation which can inspire relevant researchers in this area.

Multi-Agent Deep Reinforcement Learning with Adaptive Policies

We propose a novel approach to address one aspect of the non-stationarity problem in multi-agent reinforcement learning (RL), where the other agents may alter their policies due to environment changes during execution. This violates the Markov assumption that governs most single-agent RL methods and is one of the key challenges in multi-agent RL. To tackle this, we propose to train multiple policies for each agent and postpone the selection of the best policy at execution time. Specifically, we model the environment non-stationarity with a finite set of scenarios and train policies fitting each scenario. In addition to multiple policies, each agent also learns a policy predictor to determine which policy is the best with its local information. By doing so, each agent is able to adapt its policy when the environment changes and consequentially the other agents alter their policies during execution. We empirically evaluated our method on a variety of common benchmark problems proposed for multi-agent deep RL in the literature. Our experimental results show that the agents trained by our algorithm have better adaptiveness in changing environments and outperform the state-of-the-art methods in all the tested environments.

Understand Dynamic Regret with Switching Cost for Online Decision Making

As a metric to measure the performance of an online method, dynamic regret with switching cost has drawn much attention for online decision making problems. Although the sublinear regret has been provided in many previous researches, we still have little knowledge about the relation between the dynamic regret and the switching cost. In the paper, we investigate the relation for two classic online settings: Online Algorithms (OA) and Online Convex Optimization (OCO). We provide a new theoretical analysis framework, which shows an interesting observation, that is, the relation between the switching cost and the dynamic regret is different for settings of OA and OCO. Specifically, the switching cost has significant impact on the dynamic regret in the setting of OA. But, it does not have an impact on the dynamic regret in the setting of OCO. Furthermore, we provide a lower bound of regret for the setting of OCO, which is same with the lower bound in the case of no switching cost. It shows that the switching cost does not change the difficulty of online decision making problems in the setting of OCO.

QPDAS: Dual Active Set Solver for Mixed Constraint Quadratic Programming

We present a method for solving the general mixed constrained convex quadratic programming problem using an active set method on the dual problem. The approach is similar to existing active set methods, but we present a new way of solving the linear systems arising in the algorithm. There are two main contributions; we present a new way of factorizing the linear systems, and show how iterative refinement can be used to achieve good accuracy and to solve both types of sub-problems that arise from semi-definite problems.

D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems

Decentralized optimization algorithms have attracted intensive interests recently, as it has a balanced communication pattern, especially when solving large-scale machine learning problems. Stochastic Path Integrated Differential Estimator Stochastic First-Order method (SPIDER-SFO) nearly achieves the algorithmic lower bound in certain regimes for nonconvex problems. However, whether we can find a decentralized algorithm which achieves a similar convergence rate to SPIDER-SFO is still unclear. To tackle this problem, we propose a decentralized variant of SPIDER-SFO, called decentralized SPIDER-SFO (D-SPIDER-SFO). We show that D-SPIDER-SFO achieves a similar gradient computation cost—that is, \mathcal{O}(\epsilon^{-3}) for finding an \epsilon-approximate first-order stationary point—to its centralized counterpart. To the best of our knowledge, D-SPIDER-SFO achieves the state-of-the-art performance for solving nonconvex optimization problems on decentralized networks in terms of the computational cost. Experiments on different network configurations demonstrate the efficiency of the proposed method.

RETRO: Relation Retrofitting For In-Database Machine Learning on Textual Data

There are massive amounts of textual data residing in databases, valuable for many machine learning (ML) tasks. Since ML techniques depend on numerical input representations, word embeddings are increasingly utilized to convert symbolic representations such as text into meaningful numbers. However, a naive one-to-one mapping of each word in a database to a word embedding vector is not sufficient and would lead to poor accuracies in ML tasks. Thus, we argue to additionally incorporate the information given by the database schema into the embedding, e.g. which words appear in the same column or are related to each other. In this paper, we propose RETRO (RElational reTROfitting), a novel approach to learn numerical representations of text values in databases, capturing the best of both worlds, the rich information encoded by word embeddings and the relational information encoded by database tables. We formulate relation retrofitting as a learning problem and present an efficient algorithm solving it. We investigate the impact of various hyperparameters on the learning problem and derive good settings for all of them. Our evaluation shows that the proposed embeddings are ready-to-use for many ML tasks such as classification and regression and even outperform state-of-the-art techniques in integration tasks such as null value imputation and link prediction.

Continuous Dropout

Dropout has been proven to be an effective algorithm for training robust deep networks because of its ability to prevent overfitting by avoiding the co-adaptation of feature detectors. Current explanations of dropout include bagging, naive Bayes, regularization, and sex in evolution. According to the activation patterns of neurons in the human brain, when faced with different situations, the firing rates of neurons are random and continuous, not binary as current dropout does. Inspired by this phenomenon, we extend the traditional binary dropout to continuous dropout. On the one hand, continuous dropout is considerably closer to the activation characteristics of neurons in the human brain than traditional binary dropout. On the other hand, we demonstrate that continuous dropout has the property of avoiding the co-adaptation of feature detectors, which suggests that we can extract more independent feature detectors for model averaging in the test stage. We introduce the proposed continuous dropout to a feedforward neural network and comprehensively compare it with binary dropout, adaptive dropout, and DropConnect on MNIST, CIFAR-10, SVHN, NORB, and ILSVRC-12. Thorough experiments demonstrate that our method performs better in preventing the co-adaptation of feature detectors and improves test performance. The code is available at: https://…/dropout.

Neural networks with redundant representation: detecting the undetectable

We consider a three-layer Sejnowski machine and show that features learnt via contrastive divergence have a dual representation as patterns in a dense associative memory of order P=4. The latter is known to be able to Hebbian-store an amount of patterns scaling as N^{P-1}, where N denotes the number of constituting binary neurons interacting P-wisely. We also prove that, by keeping the dense associative network far from the saturation regime (namely, allowing for a number of patterns scaling only linearly with N, while P>2) such a system is able to perform pattern recognition far below the standard signal-to-noise threshold. In particular, a network with P=4 is able to retrieve information whose intensity is O(1) even in the presence of a noise O(\sqrt{N}) in the large N limit. This striking skill stems from a redundancy representation of patterns — which is afforded given the (relatively) low-load information storage — and it contributes to explain the impressive abilities in pattern recognition exhibited by new-generation neural networks. The whole theory is developed rigorously, at the replica symmetric level of approximation, and corroborated by signal-to-noise analysis and Monte Carlo simulations.

Detection of Derivative Discontinuities in Observational Data

This paper presents a new approach to the detection of discontinuities in the n-th derivative of observational data. This is achieved by performing two polynomial approximations at each interstitial point. The polynomials are coupled by constraining their coefficients to ensure continuity of the model up to the (n-1)-th derivative; while yielding an estimate for the discontinuity of the n-th derivative. The coefficients of the polynomials correspond directly to the derivatives of the approximations at the interstitial points through the prudent selection of a common coordinate system. The approximation residual and extrapolation errors are investigated as measures for detecting discontinuity. This is necessary since discrete observations of continuous systems are discontinuous at every point. It is proven, using matrix algebra, that positive extrema in the combined approximation-extrapolation error correspond exactly to extrema in the difference of the Taylor coefficients. This provides a relative measure for the severity of the discontinuity in the observational data. The matrix algebraic derivations are provided for all aspects of the methods presented here; this includes a solution for the covariance propagation through the computation. The performance of the method is verified with a Monte Carlo simulation using synthetic piecewise polynomial data with known discontinuities. It is also demonstrated that the discontinuities are suitable as knots for B-spline modelling of data. For completeness, the results of applying the method to sensor data acquired during the monitoring of heavy machinery are presented.

Inducing Relational Knowledge from BERT

One of the most remarkable properties of word embeddings is the fact that they capture certain types of semantic and syntactic relationships. Recently, pre-trained language models such as BERT have achieved groundbreaking results across a wide range of Natural Language Processing tasks. However, it is unclear to what extent such models capture relational knowledge beyond what is already captured by standard word embeddings. To explore this question, we propose a methodology for distilling relational knowledge from a pre-trained language model. Starting from a few seed instances of a given relation, we first use a large text corpus to find sentences that are likely to express this relation. We then use a subset of these extracted sentences as templates. Finally, we fine-tune a language model to predict whether a given word pair is likely to be an instance of some relation, when given an instantiated template for that relation as input.

Constraints in Gaussian Graphical Models

In this paper, we consider the problem of finding the constraints in bow-free acyclic directed mixed graphs (ADMGs). ADMGs are a generalisation of directed acyclic graphs (DAGs) that allow for certain latent variables. We first show that minimal generators for the ideal \I(G) containing all the constraints of a Gaussian ADMG G corresponds precisely to the pairs of non-adjacent vertices in G. The proof of this theorem naturally leads to an efficient algorithm that fits a bow-free Gaussian ADMG by maximum likelihood. In particular, we can test for the goodness of fit of a given data set to a bow-free ADMG.

Modelling Load-Changing Attacks in Cyber-Physical Systems

Cyber-Physical Systems (CPS) are present in many settings addressing a myriad of purposes. Examples are Internet-of-Things (IoT) or sensing software embedded in appliances or even specialised meters that measure and respond to electricity demands in smart grids. Due to their pervasive nature, they are usually chosen as recipients for larger scope cyber-security attacks. Those promote system-wide disruptions and are directed towards one key aspect such as confidentiality, integrity, availability or a combination of those characteristics. Our paper focuses on a particular and distressing attack where coordinated malware infected IoT units are maliciously employed to synchronously turn on or off high-wattage appliances, affecting the grid’s primary control management. Our model could be extended to larger (smart) grids, Active Buildings as well as similar infrastructures. Our approach models Coordinated Load-Changing Attacks (CLCA) also referred as GridLock or BlackIoT, against a theoretical power grid, containing various types of power plants. It employs Continuous-Time Markov Chains where elements such as Power Plants and Botnets are modelled under normal or attack situations to evaluate the effect of CLCA in power reliant infrastructures. We showcase our modelling approach in the scenario of a power supplier (e.g. power plant) being targeted by a botnet. We demonstrate how our modelling approach can quantify the impact of a botnet attack and be abstracted for any CPS system involving power load management in a smart grid. Our results show that by prioritising the type of power-plants, the impact of the attack may change: in particular, we find the most impacting attack times and show how different strategies impact their success. We also find the best power generator to use depending on the current demand and strength of attack.

Finite impulse response models: A non-asymptotic analysis of the least squares estimator

We consider a finite impulse response system with centered independent sub-Gaussian design covariates and noise components that are not necessarily identically distributed. We derive non-asymptotic near-optimal estimation and prediction bounds for the least-squares estimator of the parameters. Our results are based on two concentration inequalities on the norm of sums of dependent covariate vectors and on the singular values of their covariance operator that are of independent value on their own and where the dependence arises from the time shift structure of the time series. These results generalize the known bounds for the independent case.

Greed is Good: Exploration and Exploitation Trade-offs in Bayesian Optimisation

The performance of acquisition functions for Bayesian optimisation is investigated in terms of the Pareto front between exploration and exploitation. We show that Expected Improvement and the Upper Confidence Bound always select solutions to be expensively evaluated on the Pareto front, but Probability of Improvement is never guaranteed to do so and Weighted Expected Improvement does only for a restricted range of weights. We introduce two novel \epsilon-greedy acquisition functions. Extensive empirical evaluation of these together with random search, purely exploratory and purely exploitative search on 10 benchmark problems in 1 to 10 dimensions shows that \epsilon-greedy algorithms are generally at least as effective as conventional acquisition functions, particularly with a limited budget. In higher dimensions \epsilon-greedy approaches are shown to have improved performance over conventional approaches. These results are borne out on a real world computational fluid dynamics optimisation problem and a robotics active learning problem.

Application of Time Series Analysis to Traffic Accidents in Los Angeles

With the improvements of Los Angeles in many aspects, people in mounting numbers tend to live or travel to the city. The primary objective of this paper is to apply a set of methods for the time series analysis of traffic accidents in Los Angeles in the past few years. The number of traffic accidents, collected from 2010 to 2019 monthly reveals that the traffic accident happens seasonally and increasing with fluctuation. This paper utilizes the ensemble methods to combine several different methods to model the data from various perspectives, which can lead to better forecasting accuracy. The IMA(1, 1), ETS(A, N, A), and two models with Fourier items are failed in independence assumption checking. However, the Online Gradient Descent (OGD) model generated by the ensemble method shows the perfect fit in the data modeling, which is the state-of-the-art model among our candidate models. Therefore, it can be easier to accurately forecast future traffic accidents based on previous data through our model, which can help designers to make better plans.

What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

Statistics tool for git repositories

Self Organizing Maps Package

Deep learning-based Night-to-Day image-translation software

PySpark Project Buiding Tool. Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

A package for visualisation of serial data in a grid.

Implementation of TOPSIS decision making

Sequential Monte Carlo modeling for linear Gaussian systems

Optimal binning algorithm and function to apply on a pandas DataFrame

Gradient Boosting powered by GPU(NVIDIA CUDA)

modeling the topic of arguments

A wrapper for the DeepL Pro API.

Pipelines and utility classes for Kaggle and data science!

What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

Tree crown prediction using deep learning retinanets

Distillation of KoBERT

Historical OpenStreetMap Objects to Machine Learning Training Samples

High-dimensional function approximation and estimation

A python-sc2 wrapper for Reinforcement Learning

Implementation of machine learning algorithm.

Extensible Efficient Quantum Algorithm Design in Python

Standardized data structures for Python.

A High Level Python Deep Reinforcement Learning library. Great for beginners, for prototyping and quickly comparing algorithms

ExKaldi Automatic Speech Recognition Toolkit

This project exports a set of methods that enable graph-based database management.

A system of Human-Machine Dialogue