Overarching Computation Model (OCM)

Existing models of computation, such as a Turing machine (hereafter, TM), do not consider the agent involved in interpreting the outcome of the computation. We argue that a TM, or any other computation model, has no significance if its output is not interpreted by some agent. Furthermore, we argue that including the interpreter in the model definition sheds light on some of the difficult problems faced in computation and mathematics. We provide an analytic process framework to address this limitation. The framework can be overlaid on existing concepts of computation to address many practical and philosophical concerns such as the P vs NP problem. In addition, we provide constructive proof for the P vs NP problem under the assumption that the class NP comprises of problems solvable by non-deterministic algorithms. We utilize the observation that deterministic computational procedures lack fundamental capacity to fully simulate their non-deterministic variant.

Maximum Weight Online Matching with Deadlines

We study the problem of matching agents who arrive at a marketplace over time and leave after d time periods. Agents can only be matched while they are present in the marketplace. Each pair of agents can yield a different match value, and the planner’s goal is to maximize the total value over a finite time horizon. First we study the case in which vertices arrive in an adversarial order. We provide a randomized 0.25-competitive algorithm building on a result by Feldman et al. (2009) and Lehman et al. (2006). We extend the model to the case in which departure times are drawn independently from a distribution with non-decreasing hazard rate, for which we establish a 1/8-competitive algorithm. When the arrival order is chosen uniformly at random, we show that a batching algorithm, which computes a maximum-weighted matching every (d+1) periods, is 0.279-competitive.

NL4Py: Agent-Based Modeling in Python with Parallelizable NetLogo Workspaces

NL4Py is a NetLogo controller software for Python, for the rapid, parallel execution of NetLogo models. NL4Py provides both headless (no graphical user interface) and GUI NetLogo workspace control through Python. Spurred on by the increasing availability of open-source computation and machine learning libraries on the Python package index, there is an increasing demand for such rapid, parallel execution of agent-based models through Python. NetLogo, being the language of choice for a majority of agent-based modeling driven research projects, requires an integration to Python for researchers looking to perform statistical analyses of agent-based model output using these libraries. Unfortunately, until the recent introduction of PyNetLogo, and now NL4Py, such a controller was unavailable. This article provides a detailed introduction into the usage of NL4Py and explains its client-server software architecture, highlighting architectural differences to PyNetLogo. A step-by-step demonstration of global sensitivity analysis and parameter calibration of the Wolf Sheep Predation model is then performed through NL4Py. Finally, NL4Py’s performance is benchmarked against PyNetLogo and its combination with IPyParallel, and shown to provide significant savings in execution time over both configurations.

Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network

Because of their effectiveness in broad practical applications, LSTM networks have received a wealth of coverage in scientific journals, technical blogs, and implementation guides. However, in most articles, the inference formulas for the LSTM network and its parent, RNN, are stated axiomatically, while the training formulas are omitted altogether. In addition, the technique of ‘unrolling’ an RNN is routinely presented without justification throughout the literature. The goal of this paper is to explain the essential RNN and LSTM fundamentals in a single document. Drawing from concepts in signal processing, we formally derive the canonical RNN formulation from differential equations. We then propose and prove a precise statement, which yields the RNN unrolling technique. We also review the difficulties with training the standard RNN and address them by transforming the RNN into the ‘Vanilla LSTM’ network through a series of logical arguments. We provide all equations pertaining to the LSTM system together with detailed descriptions of its constituent entities. Albeit unconventional, our choice of notation and the method for presenting the LSTM system emphasizes ease of understanding. As part of the analysis, we identify new opportunities to enrich the LSTM system and incorporate these extensions into the Vanilla LSTM network, producing the most general LSTM variant to date. The target reader has already been exposed to RNNs and LSTM networks through numerous available resources and is open to an alternative pedagogical approach. A Machine Learning practitioner seeking guidance for implementing our new augmented LSTM model in software for experimentation and research will find the insights and derivations in this tutorial valuable as well.

Fuzzy Clustering to Identify Clusters at Different Levels of Fuzziness: An Evolutionary Multi-Objective Optimization Approach

Fuzzy clustering methods identify naturally occurring clusters in a dataset, where the extent to which different clusters are overlapped can differ. Most methods have a parameter to fix the level of fuzziness. However, the appropriate level of fuzziness depends on the application at hand. This paper presents Entropy c-Means (ECM), a method of fuzzy clustering that simultaneously optimizes two contradictory objective functions, resulting in the creation of fuzzy clusters with different levels of fuzziness. This allows ECM to identify clusters with different degrees of overlap. ECM optimizes the two objective functions using two multi-objective optimization methods, Non-dominated Sorting Genetic Algorithm II (NSGA-II), and Multiobjective Evolutionary Algorithm based on Decomposition (MOEA/D). We also propose a method to select a suitable trade-off clustering from the Pareto front. Experiments on challenging synthetic datasets as well as real-world datasets show that ECM leads to better cluster detection compared to the conventional fuzzy clustering methods as well as previously used multi-objective methods for fuzzy clustering.

Linked Causal Variational Autoencoder for Inferring Paired Spillover Effects

Modeling spillover effects from observational data is an important problem in economics, business, and other fields of research. % It helps us infer the causality between two seemingly unrelated set of events. For example, if consumer spending in the United States declines, it has spillover effects on economies that depend on the U.S. as their largest export market. In this paper, we aim to infer the causation that results in spillover effects between pairs of entities (or units), we call this effect as \textit{paired spillover}. To achieve this, we leverage the recent developments in variational inference and deep learning techniques to propose a generative model called Linked Causal Variational Autoencoder (LCVA). Similar to variational autoencoders (VAE), LCVA incorporates an encoder neural network to learn the latent attributes and a decoder network to reconstruct the inputs. However, unlike VAE, LCVA treats the \textit{latent attributes as confounders that are assumed to affect both the treatment and the outcome of units}. Specifically, given a pair of units u and \bar{u}, their individual treatment and outcomes, the encoder network of LCVA samples the confounders by conditioning on the observed covariates of u, the treatments of both u and \bar{u} and the outcome of u. Once inferred, the latent attributes (or confounders) of u captures the spillover effect of \bar{u} on u. Using a network of users from job training dataset (LaLonde (1986)) and co-purchase dataset from Amazon e-commerce domain, we show that LCVA is significantly more robust than existing methods in capturing spillover effects.

Exploiting Structure for Fast Kernel Learning

We propose two methods for exact Gaussian process (GP) inference and learning on massive image, video, spatial-temporal, or multi-output datasets with missing values (or ‘gaps’) in the observed responses. The first method ignores the gaps using sparse selection matrices and a highly effective low-rank preconditioner is introduced to accelerate computations. The second method introduces a novel approach to GP training whereby response values are inferred on the gaps before explicitly training the model. We find this second approach to be greatly advantageous for the class of problems considered. Both of these novel approaches make extensive use of Kronecker matrix algebra to design massively scalable algorithms which have low memory requirements. We demonstrate exact GP inference for a spatial-temporal climate modelling problem with 3.7 million training points as well as a video reconstruction problem with 1 billion points.

Fast Flexible Function Dispatch in Julia

Technical computing is a challenging application area for programming languages to address. This is evinced by the unusually large number of specialized languages in the area (e.g. MATLAB, R), and the complexity of common software stacks, often involving multiple languages and custom code generators. We believe this is ultimately due to key characteristics of the domain: highly complex operators, a need for extensive code specialization for performance, and a desire for permissive high-level programming styles allowing productive experimentation. The Julia language attempts to provide a more effective structure for this kind of programming by allowing programmers to express complex polymorphic behaviors using dynamic multiple dispatch over parametric types. The forms of extension and reuse permitted by this paradigm have proven valuable for technical computing. We report on how this approach has allowed domain experts to express useful abstractions while simultaneously providing a natural path to better performance for high-level technical code.

Fast computation of the principal components of genotype matrices in Julia

Finding the largest few principal components of a matrix of genetic data is a common task in genome-wide association studies (GWASs), both for dimensionality reduction and for identifying unwanted factors of variation. We describe a simple random matrix model for matrices that arise in GWASs, showing that the singular values have a bulk behavior that obeys a Marchenko-Pastur distributed with a handful of large outliers. We also implement Golub-Kahan-Lanczos (GKL) bidiagonalization in the Julia programming language, providing thick restarting and a choice between full and partial reorthogonalization strategies to control numerical roundoff. Our implementation of GKL bidiagonalization is up to 36 times faster than software tools used commonly in genomics data analysis for computing principal components, such as EIGENSOFT and FlashPCA, which use dense LAPACK routines and randomized subspace iteration respectively.

Hierarchical Block Sparse Neural Networks

Sparse deep neural networks(DNNs) are efficient in both memory and compute when compared to dense DNNs. But due to irregularity in computation of sparse DNNs, their efficiencies are much lower than that of dense DNNs on general purpose hardwares. This leads to poor/no performance benefits for sparse DNNs. Performance issue for sparse DNNs can be alleviated by bringing structure to the sparsity and leveraging it for improving runtime efficiency. But such structural constraints often lead to sparse models with suboptimal accuracies. In this work, we jointly address both accuracy and performance of sparse DNNs using our proposed class of neural networks called HBsNN ( Hierarchical Block Sparse Neural Networks).

Lingke: A Fine-grained Multi-turn Chatbot for Customer Service

Traditional chatbots usually need a mass of human dialogue data, especially when using supervised machine learning method. Though they can easily deal with single-turn question answering, for multi-turn the performance is usually unsatisfactory. In this paper, we present Lingke, an information retrieval augmented chatbot which is able to answer questions based on given product introduction document and deal with multi-turn conversations. We will introduce a fine-grained pipeline processing to distill responses based on unstructured documents, and attentive sequential context-response matching for multi-turn conversations.

Bagging of Density Estimators

In this work we give new density estimators by averaging classical density estimators such as the histogram, the frequency polygon and the kernel density estimators obtained over different bootstrap samples of the original data. We prove the L 2-consistency of these new estimators and compare them to several similar approaches by extensive simulations. Based on them, we give also a way to construct non parametric pointwise confidence intervals for the target density.

AIQ: Measuring Intelligence of Business AI Software

Focusing on Business AI, this article introduces the AIQ quadrant that enables us to measure AI for business applications in a relative comparative manner, i.e. to judge that software A has more or less intelligence than software B. Recognizing that the goal of Business software is to maximize value in terms of business results, the dimensions of the quadrant are the key factors that determine the business value of AI software: Level of Output Quality (Smartness) and Level of Automation. The use of the quadrant is illustrated by several software solutions to support the real life business challenge of field service scheduling. The role of machine learning and conversational digital assistants in increasing the business value are also discussed and illustrated with a recent integration of existing intelligent digital assistants for factory floor decision making with the new version of Google Glass. Such hands free AI solutions elevate the AIQ level to its ultimate position.

TwoWingOS: A Two-Wing Optimization Strategy for Evidential Claim Verification

Determining whether a given claim is supported by evidence is a fundamental NLP problem that is best modeled as Textual Entailment. However, given a large collection of text, finding evidence that could support or refute a given claim is a challenge in itself, amplified by the fact that different evidence might be needed to support or refute a claim. Nevertheless, most prior work decouples evidence identification from determining the truth value of the claim given the evidence. We propose to consider these two aspects jointly. We develop TwoWingOS (two-wing optimization strategy), a system that, while identifying appropriate evidence for a claim, also determines whether or not the claim is supported by the evidence. Given the claim, TwoWingOS attempts to identify a subset of the evidence candidates; given the predicted evidence, it then attempts to determine the truth value of the corresponding claim. We treat this challenge as coupled optimization problems, training a joint model for it. TwoWingOS offers two advantages: (i) Unlike pipeline systems, it facilitates flexible-size evidence set, and (ii) Joint training improves both the claim entailment and the evidence identification. Experiments on a benchmark dataset show state-of-the-art performance. Code: https://…/FEVER

Greedy Algorithms for Approximating the Diameter of Machine Learning Datasets in Multidimensional Euclidean Space

Finding the diameter of a dataset in multidimensional Euclidean space is a well-established problem, with well-known algorithms. However, most of the algorithms found in the literature do not scale well with large values of data dimension, so the time complexity grows exponentially in most cases, which makes these algorithms impractical. Therefore, we implemented 4 simple greedy algorithms to be used for approximating the diameter of a multidimensional dataset; these are based on minimum/maximum l2 norms, hill climbing search, Tabu search and Beam search approaches, respectively. The time complexity of the implemented algorithms is near-linear, as they scale near-linearly with data size and its dimensions. The results of the experiments (conducted on different machine learning data sets) prove the efficiency of the implemented algorithms and can therefore be recommended for finding the diameter to be used by different machine learning applications when needed.

Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning

Multi-layer neural networks have lead to remarkable performance on many kinds of benchmark tasks in text, speech and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting. One approach to this overfitting and related problems (local minima, colinearity, feature discovery etc.) is called dropout (Srivastava, et al 2014, Baldi et al 2016). This method removes hidden units with a Bernoulli random variable with probability p over updates. In this paper we will show that Dropout is a special case of a more general model published originally in 1990 called the stochastic delta rule ( SDR, Hanson, 1990). SDR parameterizes each weight in the network as a random variable with mean \mu_{w_{ij}} and standard deviation \sigma_{w_{ij}}. These random variables are sampled on each forward activation, consequently creating an exponential number of potential networks with shared weights. Both parameters are updated according to prediction error, thus implementing weight noise injections that reflect a local history of prediction error and efficient model averaging. SDR therefore implements a local gradient-dependent simulated annealing per weight converging to a bayes optimal network. Tests on standard benchmarks (CIFAR) using a modified version of DenseNet shows the SDR outperforms standard dropout in error by over 50% and in loss by over 50%. Furthermore, the SDR implementation converges on a solution much faster, reaching a training error of 5 in just 15 epochs with DenseNet-40 compared to standard DenseNet-40’s 94 epochs.

How Complex is your classification problem? A survey on measuring classification complexity

Extracting characteristics from the training datasets of classification problems has proven effective in a number of meta-analyses. Among them, measures of classification complexity can estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision boundary are among the existent measures for this characterization. This information can support the formulation of new data-driven pre-processing and pattern recognition techniques, which can in turn be focused on challenging characteristics of the problems. This paper surveys and analyzes measures which can be extracted from the training datasets in order to characterize the complexity of the respective classification problems. Their use in recent literature is also reviewed and discussed, allowing to prospect opportunities for future work in the area. Finally, descriptions are given on an R package named Extended Complexity Library (ECoL) that implements a set of complexity measures and is made publicly available.

Ensemble Kalman Inversion: A Derivative-Free Technique For Machine Learning Tasks

The standard probabilistic perspective on machine learning gives rise to empirical risk-minimization tasks that are frequently solved by stochastic gradient descent (SGD) and variants thereof. We present a formulation of these tasks as classical inverse or filtering problems and, furthermore, we propose an efficient, gradient-free algorithm for finding a solution to these problems using ensemble Kalman inversion (EKI). Applications of our approach include offline and online supervised learning with deep neural networks, as well as graph-based semi-supervised learning. The essence of the EKI procedure is an ensemble based approximate gradient descent in which derivatives are replaced by differences from within the ensemble. We suggest several modifications to the basic method, derived from empirically successful heuristics developed in the context of SGD. Numerical results demonstrate wide applicability and robustness of the proposed algorithm.

‘Quantum Equilibrium-Disequilibrium’: Asset Price Dynamics, Symmetry Breaking, and Defaults as Dissipative Instantons
Effective Caching for the Secure Content Distribution in Information-Centric Networking
Self-Adaptive Systems in Organic Computing: Strategies for Self-Improvement
A Hybrid Recommender System for Patient-Doctor Matchmaking in Primary Care
Evolutionary optimisation of neural network models for fish collective behaviours in mixed groups of robots and zebrafish
Random forest prediction of Alzheimer’s disease using pairwise selection from time series data
VerIDeep: Verifying Integrity of Deep Neural Networks through Sensitive-Sample Fingerprinting
Temporal starvation in multi-channel CSMA networks: an analytical framework
Who Falls for Online Political Manipulation?
The frog model on trees with drift
Code-Mixed Sentiment Analysis Using Machine Learning and Neural Network Approaches
$α$-Approximation Density-based Clustering of Multi-valued Objects
On-Chip Optical Convolutional Neural Networks
The Elephant in the Room
Longest Increasing Subsequence under Persistent Comparison Errors
The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data
DeepMag: Source Specific Motion Magnification Using Gradient Ascent
Transport coefficients in multi-component fluids from equilibrium molecular dynamics
On Physical Layer Security over Fox’s $H$-Function Wiretap Fading Channels
Deep Learning for Single Image Super-Resolution: A Brief Review
Uncovering the Spread of Chagas Disease in Argentina and Mexico
Efficient human-like semantic representations via the Information Bottleneck principle
Sequence-Based OOK for Orthogonal Multiplexing of Wake-up Radio Signals and OFDM Waveforms
Error Forward-Propagation: Reusing Feedforward Connections to Propagate Errors in Deep Learning
Collective irregular dynamics in balanced networks of leaky integrate-and-fire neurons
A Panel Quantile Approach to Attrition Bias in Big Data: Evidence from a Randomized Experiment
A note on partial rejection sampling for the hard disks model in the plane
Blue Phase: Optimal Network Traffic Control for Legacy and Autonomous Vehicles
A survey of data transfer and storage techniques in prevalent cryptocurrencies and suggested improvements
Code-division multiplexed resistive pulse sensor networks for spatio-temporal detection of particles in microfluidic devices
Classes of graphs with e-positive chromatic symmetric function
Distinctiveness, complexity, and repeatability of online signature templates
End-to-end Active Object Tracking and Its Real-world Deployment via Reinforcement Learning
A note on the critical barrier for the survival of $α-$stable branching random walk with absorption
On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks
Linear time algorithm to check the singularity of block graphs
Efficient Measurement on Programmable SwitchesUsing Probabilistic Recirculation
DeepWrinkles: Accurate and Realistic Clothing Modeling
(De)localization of Fermions in Correlated-Spin Background
Learning and Inference on Generative Adversarial Quantum Circuits
WonDerM: Skin Lesion Classification with Fine-tuned Neural Networks
Power Minimization Based Joint Task Scheduling and Resource Allocation in Downlink C-RAN
Stochastic $R_0$ Tensors to Stochastic Tensor Complementarity Problems
Hybrid approach for transliteration of Algerian arabizi: a primary study
Spin systems on Bethe lattices
On the optimal designs for the prediction of complex Ornstein-Uhlenbeck processes
The Moment-SOS hierarchy
Stability for Intersecting Families of Perfect Matchings
Weakly-Supervised Attention and Relation Learning for Facial Action Unit Detection
The Evolution of Sex Chromosomes through the Baldwin Effect
Cross-location wind speed forecasting for wind energy applications using machine learning based models
Deep Learning Based Speed Estimation for Constraining Strapdown Inertial Navigation on Smartphones
Pulse-laser Based Long-range Non-line-of-sight Ultraviolet Communication with Pulse Response Position Estimation
Construction of cospectral graphs
On the Complexity of Solving Subtraction Games
The Power of Cut-Based Parameters for Computing Edge Disjoint Paths
Extremal process of the zero-average Gaussian Free Field for $d\ge 3$
Model Approximation Using Cascade of Tree Decompositions
ChipNet: Real-Time LiDAR Processing for Drivable Region Segmentation on an FPGA
Making effective use of healthcare data using data-to-text technology
Band selection with Higher Order Multivariate Cumulants for small target detection in hyperspectral images
Optimizing error of high-dimensional statistical queries under differential privacy
On testing for high-dimensional white noise
Evaluation of the Spatial Consistency Feature in the 3GPP GSCM Channel Model
Atmospheric turbulence mitigation for sequences with moving objects using recursive image fusion
Dynamic all scores matrices for LCS score
Ektelo: A Framework for Defining Differentially-Private Computations
Existence of symmetric maximal noncrossing collections of $k$-element sets
Finding a Small Number of Colourful Components
Choosing the optimal multi-point iterative method for the Colebrook flow friction equation — Numerical validation
Densely Connected Convolutional Networks for Speech Recognition
Physics-based Learned Design: Optimized Coded-Illumination for Quantitative Phase Imaging
Enumerating Anchored Permutations with Bounded Gaps
Weakly- and Semi-Supervised Panoptic Segmentation
Johnson type bounds for mixed dimension subspace codes
Shape differentiability of Lagrangians and application to Stokes problem
One-dimensional quasicrystals with power-law hopping
On the moments of the (2+1)-dimensional directed polymer and stochastic heat equation in the critical window
A simplified convolutional sparse filter for impulsive signature enhancement and its application to the prognostic of rotating machinery
New global optimality conditions for nonsmooth DC optimization problems
On the structure of the set of active sets in constrained linear quadratic regulation
Overlapping community detection using superior seed set selection in social networks
Rigidity of proper colorings of $\mathbb{Z}^d$
Using Randomness to Improve Robustness of Machine-Learning Models Against Evasion Attacks
Disease Progression Timeline Estimation for Alzheimer’s Disease using Discriminative Event Based Modeling
Weakly supervised learning of indoor geometry by dual warping
An Iterative Path-Breaking Approach with Mutation and Restart Strategies for the MAX-SAT Problem
Reliability of Relay Networks under Random Linear Network Coding
Automorphism groups of Steiner triple systems
Improved Bit-Flipping Algorithm for Successive Cancellation Decoding of Polar Codes
A Novel Method for Epileptic Seizure Detection Using Coupled Hidden Markov Models
A New Algorithm for the Robust Semi-random Independent Set Problem