Whittemore: An embedded domain specific language for causal programming

This paper introduces Whittemore, a language for causal programming. Causal programming is based on the theory of structural causal models and consists of two primary operations: identification, which finds formulas that compute causal queries, and estimation, which applies formulas to transform probability distributions to other probability distribution. Causal programming provides abstractions to declare models, queries, and distributions with syntax similar to standard mathematical notation, and conducts rigorous causal inference, without requiring detailed knowledge of the underlying algorithms. Examples of causal inference with real data are provided, along with discussion of the implementation and possibilities for future extension.

State representation learning with recurrent capsule networks

Unsupervised learning of compact and relevant state representations has been proved very useful at solving complex reinforcement learning tasks. In this paper, we propose a recurrent capsule network that learns such representations by trying to predict the future observations in an agent’s trajectory.

Dynamic Planning Networks

We introduce Dynamic Planning Networks (DPN), a novel architecture for deep reinforcement learning, that combines model-based and model-free aspects for online planning. Our architecture learns to dynamically construct plans using a learned state-transition model by selecting and traversing between simulated states and actions to maximize valuable information before acting. In contrast to model-free methods, model-based planning lets the agent efficiently test action hypotheses without performing costly trial-and-error in the environment. DPN learns to efficiently form plans by expanding a single action-conditional state transition at a time instead of exhaustively evaluating each action, reducing the required number of state-transitions during planning by up to 96%. We observe various emergent planning patterns used to solve environments, including classical search methods such as breadth-first and depth-first search. Learning To Plan shows improved data efficiency, performance, and generalization to new and unseen domains in comparison to several baselines.

Dynamic Models with Robust Decision Makers: Identification and Estimation

This paper studies identification and estimation of a class of dynamic models in which the decision maker (DM) is uncertain about the data-generating process. The DM maximizes his or her continuation value under a worst-case model which lies within a nonparametric neighborhood of a benchmark model. The DM’s benchmark model and preference parameters are jointly underidentified. With the DM’s benchmark model fixed, primitive conditions are established for nonparametric identification of the worst-case model and local identification of the DM’s preference parameters. The key step in the identification analysis is to establish existence and uniqueness of the DM’s continuation value function allowing for unbounded statespace and unbounded utilities, both of which are important in applications. To do so, we derive new fixed-point results which use monotonicity and convexity of the value function recursion and which are embedded within a Banach space of ‘thin-tailed’ functions that arises naturally from the structure of recursion. The fixed-point results are quite general and are also applied to models where the DM learns about a hidden state and Rust-type dynamic discrete choice models. A perturbation result is derived which provides a necessary and sufficient condition for consistent estimation of continuation values and the worst-case model. A robust consumption-investment problem is studied as an empirical application and some connections are drawn with the literature on macroeconomic uncertainty.

Towards Finding Non-obvious Papers: An Analysis of Citation Recommender Systems

As science advances, the academic community has published millions of research papers. Researchers devote time and effort to search relevant manuscripts when writing a paper or simply to keep up with current research. In this paper, we consider the problem of citation recommendation by extending a set of known-to-be-relevant references. Our analysis shows the degrees of cited papers in the subgraph induced by the citations of a paper, called projection graph, follow a power law distribution. Existing popular methods are only good at finding the long tail papers, the ones that are highly connected to others. In other words, the majority of cited papers are loosely connected in the projection graph but they are not going to be found by existing methods. To address this problem, we propose to combine author, venue and keyword information to interpret the citation behavior behind those loosely connected papers. Results show that different methods are finding cited papers with widely different properties. We suggest multiple recommended lists by different algorithms could satisfy various users for a real citation recommendation system. Moreover, we also explore the fast local approximation for combined methods in order to improve the efficiency.

A Two-Phase Dynamic Throughput Optimization Model for Big Data Transfers

The amount of data moved over dedicated and non-dedicated network links increases much faster than the increase in the network capacity, but the current solutions fail to guarantee even the promised achievable transfer throughputs. In this paper, we propose a novel dynamic throughput optimization model based on mathematical modeling with offline knowledge discovery/analysis and adaptive online decision making. In offline analysis, we mine historical transfer logs to perform knowledge discovery about the transfer characteristics. Online phase uses the discovered knowledge from the offline analysis along with real-time investigation of the network condition to optimize the protocol parameters. As real-time investigation is expensive and provides partial knowledge about the current network status, our model uses historical knowledge about the network and data to reduce the real-time investigation overhead while ensuring near optimal throughput for each transfer. Our novel approach is tested over different networks with different datasets and outperformed its closest competitor by 1.7x and the default case by 5x. It also achieved up to 93% accuracy compared with the optimal achievable throughput possible on those networks.

A Real-time Robust Low-Frequency Oscillation Detection and Analysis (LFODA) System with Innovative Ensemble Filtering

Low-frequency oscillations are hazardous to power system operation, which can lead to cascading failures if not detected and mitigated in a timely manner. This paper presents a robust and automated real-time monitoring system for detecting grid oscillations and analyzing their mode shapes using PMU measurements. A novel Extended Kalman Filtering (EKF) based approach is introduced to detect and analyze oscillations. To further improve the accuracy and efficiency of the presented software system, it takes advantages of three effective signal processing methods (including Prony’s Method, Hankel Total Least Square (HTLS) Method, EKF) and adopts a novel voting schema to significantly reduce the computation cost. Results from these methods are processed through a time-series filter to ensure the consistency of detected oscillations and reduce the number of false alarms. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method is used to accurately classify oscillation modes and the PMU measurement channels. The LFODA system has been functioning well in the State Grid Jiangsu Electric Power Company with 176 PMUs and 1000+ channels since Feb. 2018, demonstrating outstanding performance in reducing false alarms with much less computational cost.

Weakly-Supervised Hierarchical Text Classification

Hierarchical text classification, which aims to classify text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for text classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical text classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weakly-supervised neural method for hierarchical text classification. Our method does not require a large amount of training data but requires only easy-to-provide weak supervision signals such as a few class-related documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pre-training, and then performs self-training on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines.

End-to-end neural relation extraction using deep biaffine attention

We propose a neural network model for joint extraction of named entities and relations between them, without any hand-crafted features. The key contribution of our model is to extend a BiLSTM-CRF-based entity recognition model with a deep biaffine attention layer to model second-order interactions between latent features for relation classification, specifically attending to the role of an entity in a directional relationship. On the benchmark ‘relation and entity recognition’ dataset CoNLL04, experimental results show that our model outperforms previous models, producing new state-of-the-art performances.

Meta Reinforcement Learning with Distribution of Exploration Parameters Learned by Evolution Strategies

In this paper, we propose a novel meta-learning method in a reinforcement learning setting, based on evolution strategies (ES), exploration in parameter space and deterministic policy gradients. ES methods are easy to parallelize, which is desirable for modern training architectures; however, such methods typically require a huge number of samples for effective training. We use deterministic policy gradients during adaptation and other techniques to compensate for the sample-efficiency problem while maintaining the inherent scalability of ES methods. We demonstrate that our method achieves good results compared to gradient-based meta-learning in high-dimensional control tasks in the MuJoCo simulator. In addition, because of gradient-free methods in the meta-training phase, which do not need information about gradients and policies in adaptation training, we predict and confirm our algorithm performs better in tasks that need multi-step adaptation.

Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction

A capsule is a group of neurons, whose activity vector represents the instantiation parameters of a specific type of entity. In this paper, we explore the capsule networks used for relation extraction in a multi-instance multi-label learning framework and propose a novel neural approach based on capsule networks with attention mechanisms. We evaluate our method with different benchmarks, and it is demonstrated that our method improves the precision of the predicted relations. Particularly, we show that capsule networks improve multiple entity pairs relation extraction.

Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are state-of-the-art in numerous computer vision tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and strongly limits their usability in budget-constrained devices such as embedded devices. In this paper, we propose a combination of a new pruning technique and a quantization scheme that effectively reduce the complexity and memory usage of convolutional layers of CNNs, and replace the complex convolutional operation by a low-cost multiplexer. We perform experiments on the CIFAR10, CIFAR100 and SVHN and show that the proposed method achieves almost state-of-the-art accuracy, while drastically reducing the computational and memory footprints. We also propose an efficient hardware architecture to accelerate CNN operations. The proposed hardware architecture is a pipeline and accommodates multiple layers working at the same time to speed up the inference process.

Escaping local minima with derivative-free methods: a numerical investigation

We apply a state-of-the-art, local derivative-free solver, Py-BOBYQA, to global optimization problems, and propose an algorithmic improvement that is beneficial in this context. Our numerical findings are illustrated on a commonly-used test set of global optimization problems and associated noisy variants, and on hyperparameter tuning for the machine learning test set MNIST. As Py-BOBYQA is a model-based trust-region method, we compare mostly (but not exclusively) with other global optimization methods for which (global) models are important, such as Bayesian optimization and response surface methods; we also consider state-of-the-art representative deterministic and stochastic codes, such as DIRECT and CMA-ES. We find Py-BOBYQA to be competitive with global solvers that are provably designed for finding global optima, for all accuracy/budget regimes, in both smooth and noisy settings. In particular, Py-BOBYQA variants are best performing for smooth and multiplicative noise problems in high-accuracy regimes. As a by-product, some preliminary conclusions can be drawn on the relative performance of the global solvers we have tested with default settings.

Explaining Aggregates for Exploratory Analytics

Analysts wishing to explore multivariate data spaces, typically pose queries involving selection operators, i.e., range or radius queries, which define data subspaces of possible interest and then use aggregation functions, the results of which determine their exploratory analytics interests. However, such aggregate query (AQ) results are simple scalars and as such, convey limited information about the queried subspaces for exploratory analysis. We address this shortcoming aiding analysts to explore and understand data subspaces by contributing a novel explanation mechanism coined XAXA: eXplaining Aggregates for eXploratory Analytics. XAXA’s novel AQ explanations are represented using functions obtained by a three-fold joint optimization problem. Explanations assume the form of a set of parametric piecewise-linear functions acquired through a statistical learning model. A key feature of the proposed solution is that model training is performed by only monitoring AQs and their answers on-line. In XAXA, explanations for future AQs can be computed without any database (DB) access and can be used to further explore the queried data subspaces, without issuing any more queries to the DB. We evaluate the explanation accuracy and efficiency of XAXA through theoretically grounded metrics over real-world and synthetic datasets and query workloads.


Long Short-Term Memory (LSTM) Recurrent Neural networks (RNNs) rely on gating signals, each driven by a function of a weighted sum of at least 3 components: (i) one of an adaptive weight matrix multiplied by the incoming external input vector sequence, (ii) one adaptive weight matrix multiplied by the previous memory/state vector, and (iii) one adaptive bias vector. In effect, they augment the simple Recurrent Neural Networks (sRNNs) structure with the addition of a ‘memory cell’ and the incorporation of at most 3 gating signals. The standard LSTM structure and components encompass redundancy and overly increased parameterization. In this paper, we systemically introduce variants of the LSTM RNNs, referred to as SLIM LSTMs. These variants express aggressively reduced parameterizations to achieve computational saving and/or speedup in (training) performance—while necessarily retaining (validation accuracy) performance comparable to the standard LSTM RNN.

Loss Aversion in Recommender Systems: Utilizing Negative User Preference to Improve Recommendation Quality

Negative user preference is an important context that is not sufficiently utilized by many existing recommender systems. This context is especially useful in scenarios where the cost of negative items is high for the users. In this work, we describe a new recommender algorithm that explicitly models negative user preferences in order to recommend more positive items at the top of recommendation-lists. We build upon existing machine-learning model to incorporate the contextual information provided by negative user preference. With experimental evaluations on two openly available datasets, we show that our method is able to improve recommendation quality: by improving accuracy and at the same time reducing the number of negative items at the top of recommendation-lists. Our work demonstrates the value of the contextual information provided by negative feedback, and can also be extended to signed social networks and link prediction in other networks.

Multivariate Arrival Times with Recurrent Neural Networks for Personalized Demand Forecasting

Access to a large variety of data across a massive population has made it possible to predict customer purchase patterns and responses to marketing campaigns. In particular, accurate demand forecasts for popular products with frequent repeat purchases are essential since these products are one of the main drivers of profits. However, buyer purchase patterns are extremely diverse and sparse on a per-product level due to population heterogeneity as well as dependence in purchase patterns across product categories. Traditional methods in survival analysis have proven effective in dealing with censored data by assuming parametric distributions on inter-arrival times. Distributional parameters are then fitted, typically in a regression framework. On the other hand, neural-network based models take a non-parametric approach to learn relations from a larger functional class. However, the lack of distributional assumptions make it difficult to model partially observed data. In this paper, we model directly the inter-arrival times as well as the partially observed information at each time step in a survival-based approach using Recurrent Neural Networks (RNN) to model purchase times jointly over several products. Instead of predicting a point estimate for inter-arrival times, the RNN outputs parameters that define a distributional estimate. The loss function is the negative log-likelihood of these parameters given partially observed data. This approach allows one to leverage both fully observed data as well as partial information. By externalizing the censoring problem through a log-likelihood loss function, we show that substantial improvements over state-of-the-art machine learning methods can be achieved. We present experimental results based on two open datasets as well as a study on a real dataset from a large retailer.

A General Deep Learning Framework for Structure and Dynamics Reconstruction from Time Series Data

In this work, we present Gumbel Graph Network, a model-free deep learning framework for dynamics learning and network reconstruction from the observed time series data. Our method requires no prior knowledge about underlying dynamics and has shown the state-of-the-art performance in three typical dynamical systems on complex networks.

Partially Non-Recurrent Controllers for Memory-Augmented Neural Networks

Memory-Augmented Neural Networks (MANNs) are a class of neural networks equipped with an external memory, and are reported to be effective for tasks requiring a large long-term memory and its selective use. The core module of a MANN is called a controller, which is usually implemented as a recurrent neural network (RNN) (e.g., LSTM) to enable the use of contextual information in controlling the other modules. However, such an RNN-based controller often allows a MANN to directly solve the given task by using the (small) internal memory of the controller, and prevents the MANN from making the best use of the external memory, thereby resulting in a suboptimally trained model. To address this problem, we present a novel type of RNN-based controller that is partially non-recurrent and avoids the direct use of its internal memory for solving the task, while keeping the ability of using contextual information in controlling the other modules. Our empirical experiments using Neural Turing Machines and Differentiable Neural Computers on the Toy and bAbI tasks demonstrate that the proposed controllers give substantially better results than standard RNN-based controllers.

Optimal User Pairing for Achieving Rate Fairness in Downlink NOMA Networks

In this paper, a downlink non-orthogonal multiple access (NOMA) network is studied. We investigate the problem of jointly optimizing user pairing and beamforming design to maximize the minimum rate among all users. The considered problem belongs to a difficult class of mixed-integer nonconvex optimization programming. We first relax the binary constraints and adopt sequential convex approximation method to solve the relaxed problem, which is guaranteed to converge at least to a locally optimal solution. Numerical results show that the proposed method attains higher rate fairness among users, compared with traditional beamforming solutions, i.e., random pairing NOMA and beamforming systems.

Low-Latency Broadband Analog Aggregation for Federated Edge Learning

The popularity of mobile devices results in the availability of enormous data and computational resources at the network edge. To leverage the data and resources, a new machine learning paradigm, called edge learning, has emerged where learning algorithms are deployed at the edge for providing fast and intelligent services to mobile users. While computing speeds are advancing rapidly, the communication latency is becoming the bottleneck of fast edge learning. To address this issue, this work is focused on designing a low latency multi-access scheme for edge learning. We consider a popular framework, federated edge learning (FEEL), where edge-server and on-device learning are synchronized to train a model without violating user-data privacy. It is proposed that model updates simultaneously transmitted by devices over broadband channels should be analog aggregated ‘over-the-air’ by exploiting the superposition property of a multi-access channel. Thereby, ‘interference’ is harnessed to provide fast implementation of the model aggregation. This results in dramatical latency reduction compared with the traditional orthogonal access (i.e., OFDMA). In this work, the performance of FEEL is characterized targeting a single-cell random network. First, due to power alignment between devices as required for aggregation, a fundamental tradeoff is shown to exist between the update-reliability and the expected update-truncation ratio. This motivates the design of an opportunistic scheduling scheme for FEEL that selects devices within a distance threshold. This scheme is shown using real datasets to yield satisfactory learning performance in the presence of high mobility. Second, both the multi-access latency of the proposed analog aggregation and the OFDMA scheme are analyzed. Their ratio, which quantifies the latency reduction of the former, is proved to scale almost linearly with device population.

CoSpace: Common Subspace Learning from Hyperspectral-Multispectral Correspondences

With a large amount of open satellite multispectral imagery (e.g., Sentinel-2 and Landsat-8), considerable attention has been paid to global multispectral land cover classification. However, its limited spectral information hinders further improving the classification performance. Hyperspectral imaging enables discrimination between spectrally similar classes but its swath width from space is narrow compared to multispectral ones. To achieve accurate land cover classification over a large coverage, we propose a cross-modality feature learning framework, called common subspace learning (CoSpace), by jointly considering subspace learning and supervised classification. By locally aligning the manifold structure of the two modalities, CoSpace linearly learns a shared latent subspace from hyperspectral-multispectral(HS-MS) correspondences. The multispectral out-of-samples can be then projected into the subspace, which are expected to take advantages of rich spectral information of the corresponding hyperspectral data used for learning, and thus leads to a better classification. Extensive experiments on two simulated HSMS datasets (University of Houston and Chikusei), where HS-MS data sets have trade-offs between coverage and spectral resolution, are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous state-of-the-art methods.

AIR5: Five Pillars of Artificial Intelligence Research

In this article, we provide and overview of what we consider to be some of the most pressing research questions facing the field of artificial intelligence (AI); as well as its sub-field of computational intelligence (CI). We demarcate these questions using five unique Rs – namely,
(i) Rationalizability,
(ii) Resilience,
(iii) Reproducibility,
(iv) Realism, and
(v) Responsibility.
Just as air serves as the basic element of biological life, the term AIR5 – cumulatively referring to the five aforementioned Rs – is introduced herein to mark some of the basic elements of artificial life (supporting the sustained growth of AI and CI). A brief summary of each of the Rs is presented, highlighting their relevance as pillars of future research in this arena.

Improving forecasting accuracy of time series data using a new ARIMA-ANN hybrid method and empirical mode decomposition

Many applications in different domains produce large amount of time series data. Making accurate forecasting is critical for many decision makers. Various time series forecasting methods exist which use linear and nonlinear models separately or combination of both. Studies show that combining of linear and nonlinear models can be effective to improve forecasting performance. However, some assumptions that those existing methods make, might restrict their performance in certain situations. We provide a new Autoregressive Integrated Moving Average (ARIMA)-Artificial Neural Network(ANN) hybrid method that work in a more general framework. Experimental results show that strategies for decomposing the original data and for combining linear and nonlinear models throughout the hybridization process are key factors in the forecasting performance of the methods. By using appropriate strategies, our hybrid method can be an effective way to improve forecasting accuracy obtained by traditional hybrid methods and also either of the individual methods used separately.

Comparison between DeepESNs and gated RNNs on multivariate time-series prediction

We propose an experimental comparison between Deep Echo State Networks (DeepESNs) and gated Recurrent Neural Networks (RNNs) on multivariate time-series prediction tasks. In particular, we compare reservoir and fully-trained RNNs able to represent signals featured by multiple time-scales dynamics. The analysis is performed in terms of efficiency and prediction accuracy on 4 polyphonic music tasks. Our results show that DeepESN is able to outperform ESN in terms of prediction accuracy and efficiency. Whereas, between fully-trained approaches, Gated Recurrent Units (GRU) outperforms Long Short-Term Memory (LSTM) and simple RNN models in most cases. Overall, DeepESN turned out to be extremely more efficient than others RNN approaches and the best solution in terms of prediction accuracy on 3 out of 4 tasks.