An Online Algorithm for Nonparametric Correlations

Nonparametric correlations such as Spearman’s rank correlation and Kendall’s tau correlation are widely applied in scientific and engineering fields. This paper investigates the problem of computing nonparametric correlations on the fly for streaming data. Standard batch algorithms are generally too slow to handle real-world big data applications. They also require too much memory because all the data need to be stored in the memory before processing. This paper proposes a novel online algorithm for computing nonparametric correlations. The algorithm has O(1) time complexity and O(1) memory cost and is quite suitable for edge devices, where only limited memory and processing power are available. You can seek a balance between speed and accuracy by changing the number of cutpoints specified in the algorithm. The online algorithm can compute the nonparametric correlations 10 to 1,000 times faster than the corresponding batch algorithm, and it can compute them based either on all past observations or on fixed-size sliding windows.

Sequence Mining and Pattern Analysis in Drilling Reports with Deep Natural Language Processing

Drilling activities in the oil and gas industry have been reported over decades for thousands of wells on a daily basis, yet the analysis of this text at large-scale for information retrieval, sequence mining, and pattern analysis is very challenging. Drilling reports contain interpretations written by drillers from noting measurements in downhole sensors and surface equipment, and can be used for operation optimization and accident mitigation. In this initial work, a methodology is proposed for automatic classification of sentences written in drilling reports into three relevant labels (EVENT, SYMPTOM and ACTION) for hundreds of wells in an actual field. Some of the main challenges in the text corpus were overcome, which include the high frequency of technical symbols, mistyping/abbreviation of technical terms, and the presence of incomplete sentences in the drilling reports. We obtain state-of-the-art classification accuracy within this technical language and illustrate advanced queries enabled by the tool.

Characterization of Fundamental Networks

In the framework of coupled cell systems, a coupled cell network describes graphically the dynamical dependencies between individual dynamical systems, the cells. The fundamental network of a network reveals the hidden symmetries of that network. Subspaces defined by equalities of coordinates which are flow-invariant for any coupled cell system consistent with a network structure are called the network synchrony subspaces. Moreover, for every synchrony subspaces, each network admissible system restricted to that subspace is a dynamical systems consistent with a smaller network. The original network is then said to be a lift of the smaller network. We characterize networks such that: its fundamental network is a lift of the network; the network is a subnetwork of its fundamental network, and the network is a fundamental network. The size of cycles in a network and the distance of a cell to a cycle are two important properties concerning the description of the network architecture. In this paper, we relate these two architectural properties in a network and its fundamental network.

Autonomous development and learning in artificial intelligence and robotics: Scaling up deep learning to human–like learning

Autonomous lifelong development and learning is a fundamental capability of humans, differentiating them from current deep learning systems. However, other branches of artificial intelligence have designed crucial ingredients towards autonomous learning: curiosity and intrinsic motivation, social learning and natural interaction with peers, and embodiment. These mechanisms guide exploration and autonomous choice of goals, and integrating them with deep learning opens stimulating perspectives. Deep learning (DL) approaches made great advances in artificial intelligence, but are still far away from human learning. As argued convincingly by Lake et al., differences include human capabilities to learn causal models of the world from very little data, leveraging compositional representations and priors like intuitive physics and psychology. However, there are other fundamental differences between current DL systems and human learning, as well as technical ingredients to fill this gap, that are either superficially, or not adequately, discussed by Lake et al. These fundamental mechanisms relate to autonomous development and learning. They are bound to play a central role in artificial intelligence in the future. Current DL systems require engineers to manually specify a task-specific objective function for every new task, and learn through off-line processing of large training databases. On the contrary, humans learn autonomously open-ended repertoires of skills, deciding for themselves which goals to pursue or value, and which skills to explore, driven by intrinsic motivation/curiosity and social learning through natural interaction with peers. Such learning processes are incremental, online, and progressive. Human child development involves a progressive increase of complexity in a curriculum of learning where skills are explored, acquired, and built on each other, through particular ordering and timing. Finally, human learning happens in the physical world, and through bodily and physical experimentation, under severe constraints on energy, time, and computational resources. In the two last decades, the field of Developmental and Cognitive Robotics (Cangelosi and Schlesinger, 2015, Asada et al., 2009), in strong interaction with developmental psychology and neuroscience, has achieved significant advances in computational

Tensor Approximation of Advanced Metrics for Sensitivity Analysis

Following up on the success of the analysis of variance (ANOVA) decomposition and the Sobol indices (SI) for global sensitivity analysis, various related quantities of interest have been defined in the literature including the effective and mean dimensions, the dimension distribution, and the Shapley values. Such metrics combine up to exponential numbers of SI in different ways and can be of great aid in uncertainty quantification and model interpretation tasks, but are computationally challenging. We focus on surrogate based sensitivity analysis for independently distributed variables, namely via the tensor train (TT) decomposition. This format permits flexible and scalable surrogate modeling and can efficiently extract all SI at once in a compressed TT representation of their own. Based on this, we contribute a range of novel algorithms that compute more advanced sensitivity metrics by selecting and aggregating certain subsets of SI in the tensor compressed domain. Drawing on an interpretation of the TT model in terms of deterministic finite automata, we are able to construct explicit auxiliary TT tensors that encode exactly all necessary index selection masks. Having both the SI and the masks in the TT format allows efficient computation of all aforementioned metrics, as we demonstrate in a number of example models.

Neural Cross-Lingual Entity Linking

A major challenge in Entity Linking (EL) is making effective use of contextual information to disambiguate mentions to Wikipedia that might refer to different entities in different contexts. The problem exacerbates with cross-lingual EL which involves linking mentions written in non-English documents to entries in the English Wikipedia: to compare textual clues across languages we need to compute similarity between textual fragments across languages. In this paper, we propose a neural EL model that trains fine-grained similarities and dissimilarities between the query and candidate document from multiple perspectives, combined with convolution and tensor networks. Further, we show that this English-trained system can be applied, in zero-shot learning, to other languages by making surprisingly effective use of multi-lingual embeddings. The proposed system has strong empirical evidence yielding state-of-the-art results in English as well as cross-lingual: Spanish and Chinese TAC 2015 datasets.

Analyzing Large-Scale, Distributed and Uncertain Data

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively handling such large data sets. MapReduce is a novel programming paradigm for processing distributable problems over large-scale data using a computer cluster. In this work we explore the MapReduce paradigm from three different angles. We begin by examining a well-known problem in the field of data mining: mining closed frequent itemsets over a large dataset. By harnessing the power of MapReduce, we present a novel algorithm for mining closed frequent itemsets that outperforms existing algorithms. Next, we explore one of the fundamental implications of ‘Big Data’: The data is not known with complete certainty. A probabilistic database is a relational database with the addendum that each tuple is associated with a probability of its existence. A natural development of MapReduce is of a distributed relational database management system, where relational calculus has been reduced to a combination of map and reduce function. We take this development a step further by proposing a query optimizer over distributed, probabilistic database. Finally, we analyze the best known implementation of MapReduce called Hadoop, aiming to overcome one of its major drawbacks: it does not directly support the explicit specification of the data repeatedly processed throughout different jobs.Many data-mining algorithms, such as clustering and association-rules require iterative computation: the same data are processed again and again until the computation converges or a stopping condition is satisfied. We propose a modification to Hadoop such that it will support efficient access to the same data in different jobs.

Probabilistic treatment of the uncertainty from the finite size of weighted Monte Carlo data
Asymptotic properties of random unlabelled block-weighted graphs
Optimal Stabilization of Boolean Networks through Collective Influence
Learning Sparse Neural Networks through $L_0$ Regularization
Learning User Intent from Action Sequences on Interactive Systems
Examining Cooperation in Visual Dialog Models
Time-Space Tradeoffs for the Memory Game
Modelling collective motion based on the principle of agency
Extending linear and quadratic functions from high rank varieties
Self-supervised Learning of Motion Capture
Multiscale systems, homogenization, and rough paths
Long-Term Visual Object Tracking Benchmark
3D Semantic Trajectory Reconstruction from 3D Pixel Continuum
A+D-Net: Shadow Detection with Adversarial Shadow Attenuation
Revisiting Fast Practical Byzantine Fault Tolerance
Noncommutative Davis type decompositions and applications
Linearly-Recurrent Autoencoder Networks for Learning Dynamics
Imagine it for me: Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts
Visual to Sound: Generating Natural Sound for Videos in the Wild
Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars
Approximating the Sum of Independent Non-Identical Binomial Random Variables
#anorexia, #anarexia, #anarexyia: Characterizing Online Community Practices with Orthographic Variation
Three-dimensional maps and subgroup growth
Human activity recognition from mobile inertial sensors using recurrence plots
Geometrically stopped Markovian random growth processes and Pareto tails
AI Oriented Large-Scale Video Management for Smart City: Technologies, Standards and Beyond
Zone-based Keyword Spotting in Bangla and Devanagari Documents
Measuring Cluster Stability for Bayesian Nonparametrics Using the Linear Bootstrap
Evolution of amorphous carbon across densities: an inferential study
4DFAB: A Large Scale 4D Facial Expression Database for Biometric Applications
Harnessing NLOS Components for Position and Orientation Estimation in 5G mmWave MIMO
Gaussian Process bandits with adaptive discretization
Open problems in geometry of continued fractions
Wave analysis in one dimensional structures with a wavelet finite element model and precise integration method
Multimodal Storytelling via Generative Adversarial Imitation Learning
Learning to Fuse Music Genres with Generative Adversarial Dual Learning
Asymptotics of convolution with the semi-regular-variation tail and its application to risk
AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus
Broadcast Caching Networks with Two Receivers and Multiple Correlated Sources
Deep linear neural networks with arbitrary loss: All local minima are global
Estimation for high-frequency data under parametric market microstructure noise
Counter Simulations via Higher Order Quantifier Elimination: a preliminary report
Determinism in the Certification of UNSAT Proofs
Learning a Semantically Discriminative Joint Space for Attribute Based Person Re-identification
Learning Pain from Action Unit Combinations: A Weakly Supervised Approach via Multiple Instance Learning
Gridless Two-dimensional DOA Estimation With L-shaped Array Based on the Cross-covariance Matrix
Spectral conditions of complement for some graphical properties
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks
Joint Base Station Clustering and Beamforming for Non-Orthogonal Multicast and Unicast Transmission with Backhaul Constraints
Fully Automatic Segmentation of Lumbar Vertebrae from CT Images using Cascaded 3D Fully Convolutional Networks
Joint Embedding and Classification for SAR Target Recognition
Hohmann Transfer via Constrained Optimization
Optimal control in ink-jet printing via instantaneous control
Amplitude bounds for biochemical oscillators
Collecting Telemetry Data Privately
Energy-Efficient Sensor Censoring for Compressive Distributed Sparse Signal Recovery
Robust Optimization Approaches for the Design of an Electric Machine used for E-Mobility
O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis
Properties of uniform tangent sets and Lagrange multiplier rule
A Short Note on Undirected Fitch Graphs
G-CORE: A Core for Future Graph Query Languages
Manifold-valued Image Generation with Wasserstein Adversarial Networks
Second-order and local characteristics of network intensity functions
Shot-noise excursions and non-stabilizing Poisson functionals
EmTaggeR: A Word Embedding Based Novel Method for Hashtag Recommendation on Twitter
Robust toll pricing: A novel approach
Liouville quantum gravity as a metric space and a scaling limit
Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces
Queues on a dynamically evolving graph
Approximate robust output regulation of boundary control systems
Deep Semantic Role Labeling with Self-Attention
Global existence results and duality for non-linear models of plates and shells
Deep learning for semantic segmentation of remote sensing images with rich spectral content
Rooted tree maps and the derivation relation for multiple zeta values
Supersolvable simplicial arrangements
Deep Learning for automatic sale receipt understanding
Cubical-like geometry of quasi-median graphs and applications to geometric group theory
Empirically Analyzing the Effect of Dataset Biases on Deep Face Recognition Systems
Efficient sequential experimental design for surrogate modeling of nested codes
A Finite-Time Cutting Plane Algorithm for Distributed Mixed Integer Linear Programming
Groups acting on quasi-median graphs. An introduction
On Deterministic Sampling Patterns for Robust Low-Rank Matrix Completion
A review on anisotropy analysis of spatial point patterns
The Role of Compliance in Heterogeneous Interacting Agents: Data from Observations
Recognizing Gender from Human Facial Regions using Genetic Algorithm
Learning a Generative Model for Validity in Complex Discrete Structures
Particle based gPC methods for mean-field models of swarming with uncertainty
Experimental analysis of lattice walks
IEOPF: An Active Contour Model for Image Segmentation with Inhomogeneities Estimated by Orthogonal Primary Functions
Linear Convergence of An Iterative Phase Retrieval Algorithm with Data Reuse
Results on stochastic reaction networks with non-mass action kinetics
Phylogenetics of Indo-European Language families via an Algebro-Geometric Analysis of their Syntactic Structures
Automated Pruning for Deep Neural Network Compression
Lattice homomorphisms between weak orders
Approximating the Spectrum of a Graph
OLÉ: Orthogonal Low-rank Embedding, A Plug and Play Geometric Loss for Deep Learning
Phase transition for continuum Widom-Rowlinson model with random radii
Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best-Worst Scaling
An estimator for the tail-index of graphex processes
Spectral characterization of mixed extensions of small graphs
Convolutional Recurrent Neural Networks for Dynamic MR Image Reconstruction
Modeling the formation of R\&D alliances: An agent-based model with empirical validation
Intersection patterns of linear subspaces with the hypercube
Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
Tech Report: A Fast Multiscale Spatial Regularization for Sparse Hyperspectral Unmixing
Optimal Fast Johnson-Lindenstrauss Embeddings for Large Data Sets
Estimating linear functionals of a sparse family of Poisson means
Phase transition in the spiked random tensor with Rademacher prior
Pay for a Sliding Bloom Filter and Get Counting, Distinct Elements, and Entropy for Free
Arithmetic Progression Hypergraphs: Examining the Second Moment Method
Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems
Novel insights into animal sociality from multilayer networks
Stochastic Localization + Stieltjes Barrier = Tight Bound for Log-Sobolev
Sum-of-squares optimization without semidefinite programming
Posterior Integration on an Embedded Riemannian Manifold
The Effect of Negators, Modals, and Degree Adverbs on Sentiment Composition
One for All: Towards Language Independent Named Entity Linking
A Neighborhood-Assisted Hotelling’s $T^2$ Test for High-Dimensional Means
R-FCN-3000 at 30fps: Decoupling Detection and Classification
Improving the Performance of Online Neural Transducer Models
Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene
Machine learning as an instrument for data unfolding
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models
Nonlocal Games and Quantum Permutation Groups
Neural Machine Translation by Generating Multiple Linguistic Factors
A new extended Cardioid model: an application to wind data