RedisGraph is a Redis module developed by Redis Labs to add graph database functionality to the Redis database. RedisGraph represents connected data as adjacency matrices. By representing the data as sparse matrices and employing the power of GraphBLAS (a highly optimized library for sparse matrix operations), RedisGraph delivers a fast and efficient way to store, manage and process graphs. Initial benchmarks indicate that RedisGraph is significantly faster than comparable graph databases.
Artificial Intelligence (AI), defined in its most simple form, is a technological tool that makes machines intelligent. Since learning is at the core of intelligence, machine learning poses itself as a core sub-field of AI. Then there comes a subclass of machine learning, known as deep learning, to address the limitations of their predecessors. AI has generally acquired its prominence over the past few years due to its considerable progress in various fields. AI has vastly invaded the realm of research. This has led physicists to attentively direct their research towards implementing AI tools. Their central aim has been to gain better understanding and enrich their intuition. This review article is meant to supplement the previously presented efforts to bridge the gap between AI and physics, and take a serious step forward to filter out the ‘Babelian’ clashes brought about from such gabs. This necessitates first to have fundamental knowledge about common AI tools. To this end, the review’s primary focus shall be on deep learning models called artificial neural networks. They are deep learning models which train themselves through different learning processes. It discusses also the concept of Markov decision processes. Finally, shortcut to the main goal, the review thoroughly examines how these neural networks are capable to construct a physical theory describing some observations without applying any previous physical knowledge.
We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks.
Understanding and interpreting the decisions made by deep learning models is valuable in many domains. In computer vision, computing heatmaps from a deep network is a popular approach for visualizing and understanding deep networks. However, heatmaps that do not correlate with the network may mislead human, hence the performance of heatmaps in providing a faithful explanation to the underlying deep network is crucial. In this paper, we propose I-GOS, which optimizes for a heatmap so that the classification scores on the masked image would maximally decrease. The main novelty of the approach is to compute descent directions based on the integrated gradients instead of the normal gradient, which avoids local optima and speeds up convergence. Compared with previous approaches, our method can flexibly compute heatmaps at any resolution for different user needs. Extensive experiments on several benchmark datasets show that the heatmaps produced by our approach are more correlated with the decision of the underlying deep network, in comparison with other state-of-the-art approaches.
We propose a vector auto-regressive (VAR) model with a low-rank constraint on the transition matrix. This new model is well suited to predict high-dimensional series that are highly correlated, or that are driven by a small number of hidden factors. We study estimation, prediction, and rank selection for this model in a very general setting. Our method shows excellent performances on a wide variety of simulated datasets. On macro-economic data from Giannone et al. (2015), our method is competitive with state-of-the-art methods in small dimension, and even improves on them in high dimension.
Deep reinforcement learning algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically struggle with achieving effective exploration and are extremely sensitive to the choice of hyperparameters. One reason is that most approaches use a noisy version of their operating policy to explore – thereby limiting the range of exploration. In this paper, we introduce Collaborative Evolutionary Reinforcement Learning (CERL), a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space. A collection of learners – typically proven algorithms like TD3 – optimize over varying time-horizons leading to this diverse portfolio. All learners contribute to and use a shared replay buffer to achieve greater sample efficiency. Computational resources are dynamically distributed to favor the best learners as a form of online algorithm selection. Neuroevolution binds this entire process to generate a single emergent learner that exceeds the capabilities of any individual learner. Experiments in a range of continuous control benchmarks demonstrate that the emergent learner significantly outperforms its composite learners while remaining overall more sample-efficient – notably solving the Mujoco Humanoid benchmark where all of its composite learners (TD3) fail entirely in isolation.
Identifying the number $K$ of clusters in a dataset is one of the most difficult problems in clustering analysis. A choice of $K$ that correctly characterizes the features of the data is essential for building meaningful clusters. In this paper we tackle the problem of estimating the number of clusters in functional data analysis by introducing a new measure that can be used with different procedures in selecting the optimal $K$. The main idea is to use a combination of two test statistics, which measure the lack of parallelism and the mean distance between curves, to compute criteria such as the within and between cluster sum of squares. Simulations in challenging scenarios suggest that procedures using this measure can detect the correct number of clusters more frequently than existing methods in the literature. The application of the proposed method is illustrated on several real datasets.
Motivation: Biomedical event detection is fundamental for information extraction in molecular biology and biomedical research. The detected events form the central basis for comprehensive biomedical knowledge fusion, facilitating the digestion of massive information influx from literature. Limited by the feature context, the existing event detection models are mostly applicable for a single task. A general and scalable computational model is desiderated for biomedical knowledge management. Results: We consider and propose a bottom-up detection framework to identify the events from recognized arguments. To capture the relations between the arguments, we trained a bi-directional Long Short-Term Memory (LSTM) network to model their context embedding. Leveraging the compositional attributes, we further derived the candidate samples for training event classifiers. We built our models on the datasets from BioNLP Shared Task for evaluations. Our method achieved the average F-scores of 0.81 and 0.92 on BioNLPST-BGI and BioNLPST-BB datasets respectively. Comparing with 7 state-of-the-art methods, our method nearly doubled the existing F-score performance (0.92 vs 0.56) on the BioNLPST-BB dataset. Case studies were conducted to reveal the underlying reasons. Availability: https://…/evntextrc
Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework.
Graphs are a natural abstraction for many problems where nodes represent entities and edges represent a relationship across entities. An important area of research that has emerged over the last decade is the use of graphs as a vehicle for non-linear dimensionality reduction in a manner akin to previous efforts based on manifold learning with uses for downstream database processing, machine learning and visualization. In this systematic yet comprehensive experimental survey, we benchmark several popular network representation learning methods operating on two key tasks: link prediction and node classification. We examine the performance of 12 unsupervised embedding methods on 15 datasets. To the best of our knowledge, the scale of our study — both in terms of the number of methods and number of datasets — is the largest to date. Our results reveal several key insights about work-to-date in this space. First, we find that certain baseline methods (task-specific heuristics, as well as classic manifold methods) that have often been dismissed or are not considered by previous efforts can compete on certain types of datasets if they are tuned appropriately. Second, we find that recent methods based on matrix factorization offer a small but relatively consistent advantage over alternative methods (e.g., random-walk based methods) from a qualitative standpoint. Specifically, we find that MNMF, a community preserving embedding method, is the most competitive method for the link prediction task. While NetMF is the most competitive baseline for node classification. Third, no single method completely outperforms other embedding methods on both node classification and link prediction tasks. We also present several drill-down analysis that reveals settings under which certain algorithms perform well (e.g., the role of neighborhood context on performance) — guiding the end-user.
Inspired by convolutional neural networks on 1D and 2D data, graph convolutional neural networks (GCNNs) have been developed for various learning tasks on graph data, and have shown superior performance on real-world datasets. Despite their success, there is a dearth of theoretical explorations of GCNN models such as their generalization properties. In this paper, we take a first step towards developing a deeper theoretical understanding of GCNN models by analyzing the stability of single-layer GCNN models and deriving their generalization guarantees in a semi-supervised graph learning setting. In particular, we show that the algorithmic stability of a GCNN model depends upon the largest absolute eigenvalue of its graph convolution filter. Moreover, to ensure the uniform stability needed to provide strong generalization guarantees, the largest absolute eigenvalue must be independent of the graph size. Our results shed new insights on the design of new & improved graph convolution filters with guaranteed algorithmic stability. We evaluate the generalization gap and stability on various real-world graph datasets and show that the empirical results indeed support our theoretical findings. To the best of our knowledge, we are the first to study stability bounds on graph learning in a semi-supervised setting and derive generalization bounds for GCNN models.
This paper presents an experimentation platform for human robot collaboration as a system of systems as well as proposes a conceptual framework describing the aspects of Human Robot Collaboration. These aspects are Awareness, Intelligence and Compliance of the system. Based on this framework case studies describing experiment setups performed using this platform are discussed. Each experiment highlights the use of the subsystems such as the digital twin, motion capture system, human-physiological monitoring system, data collection system and robot control and interface systems. A highlight of this paper showcases a subsystem with the ability to monitor human physiological feedback during a human robot collaboration task.
In machine learning, classification models need to be trained in order to predict class labels. When the training data contains personal information about individuals, collecting training data becomes difficult due to privacy concerns. Local differential privacy is a definition to measure the individual privacy when there is no trusted data curator. Individuals interact with an untrusted data aggregator who obtains statistical information about the population without learning personal data. In order to train a Naive Bayes classifier in an untrusted setting, we propose to use methods satisfying local differential privacy. Individuals send their perturbed inputs that keep the relationship between the feature values and class labels. The data aggregator estimates all probabilities needed by the Naive Bayes classifier. Then, new instances can be classified based on the estimated probabilities. We propose solutions for both discrete and continuous data. In order to eliminate high amount of noise and decrease communication cost in multi-dimensional data, we propose utilizing dimensionality reduction techniques which can be applied by individuals before perturbing their inputs. Our experimental results show that the accuracy of the Naive Bayes classifier is maintained even when the individual privacy is guaranteed under local differential privacy, and that using dimensionality reduction enhances the accuracy.
Modern biomedical applications often involve time-series data, from high-throughput phenotyping of model organisms, through to individual disease diagnosis and treatment using biomedical data streams. Data and tools for time-series analysis are developed and applied across the sciences and in industry, but meaningful cross-disciplinary interactions are limited by the challenge of identifying fruitful connections. Here we introduce the web platform, CompEngine, a self-organizing, living library of time-series data that lowers the barrier to forming meaningful interdisciplinary connections between time series. Using a canonical feature-based representation, CompEngine places all time series in a common space, regardless of their origin, allowing users to upload their data and immediately explore interdisciplinary connections to other data with similar properties, and be alerted when similar data is uploaded in the future. In contrast to conventional databases, which are organized by assigned metadata, CompEngine incentivizes data sharing by automatically connecting experimental and theoretical scientists across disciplines based on the empirical structure of their data. CompEngine’s growing library of interdisciplinary time-series data also facilitates comprehensively characterization of algorithm performance across diverse types of data, and can be used to empirically motivate the development of new time-series analysis algorithms.
In this paper we apply a compressibility loss that enables learning highly compressible neural network weights. The loss was previously proposed as a measure of negated sparsity of a signal, yet in this paper we show that minimizing this loss also enforces the non-zero parts of the signal to have very low entropy, thus making the entire signal more compressible. For an optimization problem where the goal is to minimize the compressibility loss (the objective), we prove that at any critical point of the objective, the weight vector is a ternary signal and the corresponding value of the objective is the squared root of the number of non-zero elements in the signal, thus directly related to sparsity. In the experiments, we train neural networks with the compressibility loss and we show that the proposed method achieves weight sparsity and compression ratios comparable with the state-of-the-art.
The false coverage rate (FCR) is the expected ratio of number of constructed confidence intervals (CIs) that fail to cover their respective parameters to the total number of constructed CIs. Procedures for FCR control exist in the offline setting, but none so far have been designed with the online setting in mind. In the online setting, there is an infinite sequence of fixed unknown parameters $\theta_t$ ordered by time. At each step, we see independent data that is informative about $\theta_t$, and must immediately make a decision whether to report a CI for $\theta_t$ or not. If $\theta_t$ is selected for coverage, the task is to determine how to construct a CI for $\theta_t$ such that $\text{FCR} \leq \alpha$ for any $T\in \mathbb{N}$. A straightforward solution is to construct at each step a $(1-\alpha)$ level conditional CI. In this paper, we present a novel solution to the problem inspired by online false discovery rate (FDR) algorithms, which only requires the statistician to be able to construct a marginal CI at any given level. Apart from the fact that marginal CIs are usually simpler to construct than conditional ones, the marginal procedure has an important qualitative advantage over the conditional solution, namely, it allows selection to be determined by the candidate CI itself. We take advantage of this to offer solutions to some online problems which have not been addressed before. For example, we show that our general CI procedure can be used to devise online sign-classification procedures that control the false sign rate (FSR). In terms of power and length of the constructed CIs, we demonstrate that the two approaches have complementary strengths and weaknesses using simulations. Last, all of our methodology applies equally well to online FCR control for prediction intervals, having particular implications for assumption-free selective conformal inference.
Existing domain adaptation methods generally assume different domains have the identical label space, which is quite restrict for real-world applications. In this paper, we focus on a more realistic and challenging case of open set domain adaptation. Particularly, in open set domain adaptation, we allow the classes from the source and target domains to be partially overlapped. In this case, the assumption of conventional distribution alignment does not hold anymore, due to the different label spaces in two domains. To tackle this challenge, we propose a new approach coined as Known-class Aware Self-Ensemble (KASE), which is built upon the recently developed self-ensemble model. In KASE, we first introduce a Known-class Aware Recognition (KAR) module to identify the known and unknown classes from the target domain, which is achieved by encouraging a low cross-entropy for known classes and a high entropy based on the source data from the unknown class. Then, we develop a Known-class Aware Adaptation (KAA) module to better adapt from the source domain to the target by reweighing the adaptation loss based on the likeliness to belong to known classes of unlabeled target samples as predicted by KAR. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our approach.
We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in the DeepMind Control Suite benchmark. Moreover, we find the residual algorithm an effective approach to the distribution mismatch problem in model-based planning. Compared with the existing TD($k$) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost.
Principal component analysis (PCA) is a powerful data reductionmethod for Structural Health Monitoring. However, its computational cost and data memory footprint pose a significant challenge when PCA has to run on limited capability embedded platformsin low-cost IoT gateways. This paper presents a memory-efficient parallel implementation of the streaming History PCA algorithm. On our dataset, it achieves 10x compression factor and 59x memory reduction with less than 0.15 dB degradation in the reconstructed signal-to-noise ratio (RSNR) compared to standard PCA. More-over, the algorithm benefits from parallelization on multiple cores, achieving a maximum speedup of 4.8x on Samsung ARTIK 710.
We present a technique to perform dimensionality reduction on data that is subject to uncertainty. Our method is a generalization of traditional principal component analysis (PCA) to multivariate probability distributions. In comparison to non-linear methods, linear dimensionality reduction techniques have the advantage that the characteristics of such probability distributions remain intact after projection. We derive a representation of the covariance matrix that respects potential uncertainty in each of the observations, building the mathematical foundation of our new method uncertainty-aware PCA. In addition to the accuracy and performance gained by our approach over sampling-based strategies, our formulation allows us to perform sensitivity analysis with regard to the uncertainty in the data. For this, we propose factor traces as a novel visualization that enables us to better understand the influence of uncertainty on the chosen principal components. We provide multiple examples of our technique using real-world datasets and show how to propagate multivariate normal distributions through PCA in closed-form. Furthermore, we discuss extensions and limitations of our approach.
Any generic deep machine learning algorithm is essentially a function fitting exercise, where the network tunes its weights and parameters to learn discriminatory features by minimizing some cost function. Though the network tries to learn the optimal feature space, it seldom tries to learn an optimal distance metric in the cost function, and hence misses out on an additional layer of abstraction. We present a simple effective way of achieving this by learning a generic Mahalanabis distance in a collaborative loss function in an end-to-end fashion with any standard convolutional network as the feature learner. The proposed method DML-CRC gives state-of-the-art performance on benchmark fine-grained classification datasets CUB Birds, Oxford Flowers and Oxford-IIIT Pets using the VGG-19 deep network. The method is network agnostic and can be used for any similar classification tasks.
Several approaches to causal inference from observational studies have been proposed. Since the proposal of Rubin (1974) many works have developed a counterfactual approach to causality, statistically formalized by potential outcomes. Pearl (2000) has put forward a theory of structural causal models which gives an important role to graphical models and do not necessarily use potential outcomes. On the other hand, several authors have developed a dynamical approach in line with Granger (1969). We analyze prospective and retrospective causal questions and their different modalities. Following Dawid (2000) we develop criticisms about the potential outcome approach and we show that causal effects can be estimated without potential outcomes: in particular direct computation of the marginal effect can be done by a change of probability measure. Finally, we highlight the need to adopt a dynamic approach to causality through two examples, ‘truncation by death’ and the ‘obesity paradox’.
Scientific Computing relies on executing computer algorithms coded in some programming languages. Given a particular available hardware, algorithms speed is a crucial factor. There are many scientific computing environments used to code such algorithms. Matlab is one of the most tremendously successful and widespread scientific computing environments that is rich of toolboxes, libraries, and data visualization tools. OpenCV is a (C++)-based library written primarily for Computer Vision and its related areas. This paper presents a comparative study using 20 different real datasets to compare the speed of Matlab and OpenCV for some Machine Learning algorithms. Although Matlab is more convenient in developing and data presentation, OpenCV is much faster in execution, where the speed ratio reaches more than 80 in some cases. The best of two worlds can be achieved by exploring using Matlab or similar environments to select the most successful algorithm; then, implementing the selected algorithm using OpenCV or similar environments to gain a speed factor.
Urban dispersal events are processes where an unusually large number of people leave the same area in a short period. Early prediction of dispersal events is important in mitigating congestion and safety risks and making better dispatching decisions for taxi and ride-sharing fleets. Existing work mostly focuses on predicting taxi demand in the near future by learning patterns from historical data. However, they fail in case of abnormality because dispersal events with abnormally high demand are non-repetitive and violate common assumptions such as smoothness in demand change over time. Instead, in this paper we argue that dispersal events follow a complex pattern of trips and other related features in the past, which can be used to predict such events. Therefore, we formulate the dispersal event prediction problem as a survival analysis problem. We propose a two-stage framework (DILSA), where a deep learning model combined with survival analysis is developed to predict the probability of a dispersal event and its demand volume. We conduct extensive case studies and experiments on the NYC Yellow taxi dataset from 2014-2016. Results show that DILSA can predict events in the next 5 hours with F1-score of 0.7 and with average time error of 18 minutes. It is orders of magnitude better than the state-ofthe-art deep learning approaches for taxi demand prediction.
The inner product operation between tensors is the corner stone of deep neural network architectures, directly inherited from linear algebra. There is a striking contrast between the unicity of this basic construct and the extreme diversity of high level constructs which have been invented to address various application domains. This paper is interested in an intermediate construct, convolution, and its corollary, attention, which have become ubiquitous in many applications, but are still presented in an ad-hoc fashion depending on the application context. We first identify the common problem addressed by most existing forms of convolution, and show how the solution to that problem naturally involves another very generic operation of linear algebra, the outer product between tensors. We then proceed to show that attention is a form of convolution, called ‘content based’ convolution, hence amenable to the generic formulation based on the outer product. The reader looking for yet another architecture yielding better performance results on a specific task is in for some disappointment. The reader aiming at a better, more grounded understanding of familiar concepts may find food for thought.