# If you did not already know

Modular, Optimal Learning Testing Environment (MOLTE)
We address the relative paucity of empirical testing of learning algorithms (of any type) by introducing a new public-domain, Modular, Optimal Learning Testing Environment (MOLTE) for Bayesian ranking and selection problem, stochastic bandits or sequential experimental design problems. The Matlab-based simulator allows the comparison of a number of learning policies (represented as a series of .m modules) in the context of a wide range of problems (each represented in its own .m module) which makes it easy to add new algorithms and new test problems. State-of-the-art policies and various problem classes are provided in the package. The choice of problems and policies is guided through a spreadsheet-based interface. Different graphical metrics are included. MOLTE is designed to be compatible with parallel computing to scale up from local desktop to clusters and clouds. We offer MOLTE as an easy-to-use tool for the research community that will make it possible to perform much more comprehensive testing, spanning a broader selection of algorithms and test problems. We demonstrate the capabilities of MOLTE through a series of comparisons of policies on a starter library of test problems. We also address the problem of tuning and constructing priors that have been largely overlooked in optimal learning literature. We envision MOLTE as a modest spur to provide researchers an easy environment to study interesting questions involved in optimal learning. …

Beholder-GAN
Beauty is in the eye of the beholder. This maxim, emphasizing the subjectivity of the perception of beauty, has enjoyed a wide consensus since ancient times. In the digitalera, data-driven methods have been shown to be able to predict human-assigned beauty scores for facial images. In this work, we augment this ability and train a generative model that generates faces conditioned on a requested beauty score. In addition, we show how this trained generator can be used to beautify an input face image. By doing so, we achieve an unsupervised beautification model, in the sense that it relies on no ground truth target images. …

Multiple Graph Optimized Convolutional Network (M-GOCN)
Graph Convolutional Networks (GCNs) have been widely studied for graph data representation and learning tasks. Existing GCNs generally use a fixed single graph which may lead to weak suboptimal for data representation/learning and are also hard to deal with multiple graphs. To address these issues, we propose a novel Graph Optimized Convolutional Network (GOCN) for graph data representation and learning. Our GOCN is motivated based on our re-interpretation of graph convolution from a regularization/optimization framework. The core idea of GOCN is to formulate graph optimization and graph convolutional representation into a unified framework and thus conducts both of them cooperatively to boost their respective performance in GCN learning scheme. Moreover, based on the proposed unified graph optimization-convolution framework, we propose a novel Multiple Graph Optimized Convolutional Network (M-GOCN) to naturally address the data with multiple graphs. Experimental results demonstrate the effectiveness and benefit of the proposed GOCN and M-GOCN. …

# If you did not already know

Hierarchical Planning and Reinforcement Learning (HIP-RL)
Long-term planning poses a major difficulty to many reinforcement learning algorithms. This problem becomes even more pronounced in dynamic visual environments. In this work we propose Hierarchical Planning and Reinforcement Learning (HIP-RL), a method for merging the benefits and capabilities of Symbolic Planning with the learning abilities of Deep Reinforcement Learning. We apply HIPRL to the complex visual tasks of interactive question answering and visual semantic planning and achieve state-of-the-art results on three challenging datasets all while taking fewer steps at test time and training in fewer iterations. Sample results can be found at youtu.be/0TtWJ_0mPfI …

Definition Extraction Tool (DefExt)
We present DefExt, an easy to use semi supervised Definition Extraction Tool. DefExt is designed to extract from a target corpus those textual fragments where a term is explicitly mentioned together with its core features, i.e. its definition. It works on the back of a Conditional Random Fields based sequential labeling algorithm and a bootstrapping approach. Bootstrapping enables the model to gradually become more aware of the idiosyncrasies of the target corpus. In this paper we describe the main components of the toolkit as well as experimental results stemming from both automatic and manual evaluation. We release DefExt as open source along with the necessary files to run it in any Unix machine. We also provide access to training and test data for immediate use. …

Acumos
Applying Machine Learning (ML) to business applications for automation usually faces difficulties when integrating diverse ML dependencies and services, mainly because of the lack of a common ML framework. In most cases, the ML models are developed for applications which are targeted for specific business domain use cases, leading to duplicated effort, and making reuse impossible. This paper presents Acumos, an open platform capable of packaging ML models into portable containerized microservices which can be easily shared via the platform’s catalog, and can be integrated into various business applications. We present a case study of packaging sentiment analysis and classification ML models via the Acumos platform, permitting easy sharing with others. We demonstrate that the Acumos platform reduces the technical burden on application developers when applying machine learning models to their business applications. Furthermore, the platform allows the reuse of readily available ML microservices in various business domains. …

DeceptionNet
We present a novel approach to tackle domain adaptation between synthetic and real data. Instead of employing ‘blind’ domain randomization, i.e. augmenting synthetic renderings with random backgrounds or changing illumination and colorization, we leverage the task network as its own adversarial guide towards useful augmentations that maximize the uncertainty of the output. To this end, we design a min-max optimization scheme where a given task competes against a special deception network, with the goal of minimizing the task error subject to specific constraints enforced by the deceiver. The deception network samples from a family of differentiable pixel-level perturbations and exploits the task architecture to find the most destructive augmentations. Unlike GAN-based approaches that require unlabeled data from the target domain, our method achieves robust mappings that scale well to multiple target distributions from source data alone. We apply our framework to the tasks of digit recognition on enhanced MNIST variants as well as classification and object pose estimation on the Cropped LineMOD dataset and compare to a number of domain adaptation approaches, demonstrating similar results with superior generalization capabilities. …

# If you did not already know

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm which has the high-performance rate for dataset where clusters have the constant density of data points. One of the significant attributes of this algorithm is noise cancellation. However, DBSCAN demonstrates reduced performances for clusters with different densities. Therefore, in this paper, an adaptive DBSCAN is proposed which can work significantly well for identifying clusters with varying densities. …

Imitation Learning
Learning from Demonstration’: Imitation learning, a.k.a behavioral cloning, is learning from demonstration. In other words, in imitation learning, a machine learns how to behave by looking at what a teacher (or expert) does and then mimics that behavior. An example can be when we collect driving data from human and then use that data for a self driving car.
Imitation Learning in Tensorflow

LP-3DCNN
Traditional 3D Convolutional Neural Networks (CNNs) are computationally expensive, memory intensive, prone to overfit, and most importantly, there is a need to improve their feature learning capabilities. To address these issues, we propose Rectified Local Phase Volume (ReLPV) block, an efficient alternative to the standard 3D convolutional layer. The ReLPV block extracts the phase in a 3D local neighborhood (e.g., 3x3x3) of each position of the input map to obtain the feature maps. The phase is extracted by computing 3D Short Term Fourier Transform (STFT) at multiple fixed low frequency points in the 3D local neighborhood of each position. These feature maps at different frequency points are then linearly combined after passing them through an activation function. The ReLPV block provides significant parameter savings of at least, 3^3 to 13^3 times compared to the standard 3D convolutional layer with the filter sizes 3x3x3 to 13x13x13, respectively. We show that the feature learning capabilities of the ReLPV block are significantly better than the standard 3D convolutional layer. Furthermore, it produces consistently better results across different 3D data representations. We achieve state-of-the-art accuracy on the volumetric ModelNet10 and ModelNet40 datasets while utilizing only 11% parameters of the current state-of-the-art. We also improve the state-of-the-art on the UCF-101 split-1 action recognition dataset by 5.68% (when trained from scratch) while using only 15% of the parameters of the state-of-the-art. The project webpage is available at https://…/home.

DAG Variational Autoencoder (D-VAE)
Graph structured data are abundant in the real world. Among different graph types, directed acyclic graphs (DAGs) are of particular interests to machine learning researchers, as many machine learning models are realized as computations on DAGs, including neural networks and Bayesian networks. In this paper, we study deep generative models for DAGs, and propose a novel DAG variational autoencoder (D-VAE). To encode DAGs into the latent space, we leverage graph neural networks. We propose a DAG-style asynchronous message passing scheme that allows encoding the computations defined by DAGs, rather than using existing simultaneous message passing schemes to encode the graph structures. We demonstrate the effectiveness of our proposed D-VAE through two tasks: neural architecture search and Bayesian network structure learning. Experiments show that our model not only generates novel and valid DAGs, but also produces a smooth latent space that facilitates searching for DAGs with better performance through Bayesian optimization. …

# If you did not already know

SGD (Stochastic Gradient Descent) is a popular algorithm for large scale optimization problems due to its low iterative cost. However, SGD can not achieve linear convergence rate as FGD (Full Gradient Descent) because of the inherent gradient variance. To attack the problem, mini-batch SGD was proposed to get a trade-off in terms of convergence rate and iteration cost. In this paper, a general CVI (Convergence-Variance Inequality) equation is presented to state formally the interaction of convergence rate and gradient variance. Then a novel algorithm named SSAG (Stochastic Stratified Average Gradient) is introduced to reduce gradient variance based on two techniques, stratified sampling and averaging over iterations that is a key idea in SAG (Stochastic Average Gradient). Furthermore, SSAG can achieve linear convergence rate of $\mathcal {O}((1-\frac{\mu}{8CL})^k)$ at smaller storage and iterative costs, where $C\geq 2$ is the category number of training data. This convergence rate depends mainly on the variance between classes, but not on the variance within the classes. In the case of $C\ll N$ ($N$ is the training data size), SSAG’s convergence rate is much better than SAG’s convergence rate of $\mathcal {O}((1-\frac{\mu}{8NL})^k)$. Our experimental results show SSAG outperforms SAG and many other algorithms. …

BitSplit-Net
Significant computational cost and memory requirements for deep neural networks (DNNs) make it difficult to utilize DNNs in resource-constrained environments. Binary neural network (BNN), which uses binary weights and binary activations, has been gaining interests for its hardware-friendly characteristics and minimal resource requirement. However, BNN usually suffers from accuracy degradation. In this paper, we introduce ‘BitSplit-Net’, a neural network which maintains the hardware-friendly characteristics of BNN while improving accuracy by using multi-bit precision. In BitSplit-Net, each bit of multi-bit activations propagates independently throughout the network before being merged at the end of the network. Thus, each bit path of the BitSplit-Net resembles BNN and hardware friendly features of BNN, such as bitwise binary activation function, are preserved in our scheme. We demonstrate that the BitSplit version of LeNet-5, VGG-9, AlexNet, and ResNet-18 can be trained to have similar classification accuracy at a lower computational cost compared to conventional multi-bit networks with low bit precision (<= 4-bit). We further evaluate BitSplit-Net on GPU with custom CUDA kernel, showing that BitSplit-Net can achieve better hardware performance in comparison to conventional multi-bit networks. …

Vehicle Transfer Generative Adversarial Network (VTGAN)
Vehicle re-identification (reID) is to identify a target vehicle in different cameras with non-overlapping views. When deploy the well-trained model to a new dataset directly, there is a severe performance drop because of differences among datasets named domain bias. To address this problem, this paper proposes an domain adaptation framework which contains an image-to-image translation network named vehicle transfer generative adversarial network (VTGAN) and an attention-based feature learning network (ATTNet). VTGAN could make images from the source domain (well-labeled) have the style of target domain (unlabeled) and preserve identity information of source domain. To further improve the domain adaptation ability for various backgrounds, ATTNet is proposed to train generated images with the attention structure for vehicle reID. Comprehensive experimental results clearly demonstrate that our method achieves excellent performance on VehicleID dataset. …

Condition Monitoring (CM)
Condition monitoring (or, colloquially, CM) is the process of monitoring a parameter of condition in machinery (vibration, temperature etc.), in order to identify a significant change which is indicative of a developing fault. It is a major component of “Predictive Maintenance”. The use of condition monitoring allows maintenance to be scheduled, or other actions to be taken to prevent failure and avoid its consequences. Condition monitoring has a unique benefit in that conditions that would shorten normal lifespan can be addressed before they develop into a major failure. Condition monitoring techniques are normally used on rotating equipment and other machinery (pumps, electric motors, internal combustion engines, presses), while periodic inspection using non-destructive testing techniques and fit for service (FFS) evaluation are used for stationary plant equipment such as steam boilers, piping and heat exchangers.
http://…/9781466584051

# If you did not already know

Skip-GANomaly
Despite inherent ill-definition, anomaly detection is a research endeavor of great interest within machine learning and visual scene understanding alike. Most commonly, anomaly detection is considered as the detection of outliers within a given data distribution based on some measure of normality. The most significant challenge in real-world anomaly detection problems is that available data is highly imbalanced towards normality (i.e. non-anomalous) and contains a most a subset of all possible anomalous samples – hence limiting the use of well-established supervised learning methods. By contrast, we introduce an unsupervised anomaly detection model, trained only on the normal (non-anomalous, plentiful) samples in order to learn the normality distribution of the domain and hence detect abnormality based on deviation from this model. Our proposed approach employs an encoder-decoder convolutional neural network with skip connections to thoroughly capture the multi-scale distribution of the normal data distribution in high-dimensional image space. Furthermore, utilizing an adversarial training scheme for this chosen architecture provides superior reconstruction both within high-dimensional image space and a lower-dimensional latent vector space encoding. Minimizing the reconstruction error metric within both the image and hidden vector spaces during training aids the model to learn the distribution of normality as required. Higher reconstruction metrics during subsequent test and deployment are thus indicative of a deviation from this normal distribution, hence indicative of an anomaly. Experimentation over established anomaly detection benchmarks and challenging real-world datasets, within the context of X-ray security screening, shows the unique promise of such a proposed approach. …

What-If Tool
What If… you could inspect a machine learning model, with no coding required? Building effective machine learning systems means asking a lot of questions. It’s not enough to train a model and walk away. Instead, good practitioners act as detectives, probing to understand their model better. But answering these kinds of questions isn’t easy. Probing ‘what if’ scenarios often means writing custom, one-off code to analyze a specific model. Not only is this process inefficient, it makes it hard for non-programmers to participate in the process of shaping and improving machine learning models. For us, making it easier for a broad set of people to examine, evaluate, and debug machine learning systems is a key concern. That’s why we built the What-If Tool. Built into the open-source TensorBoard web application – a standard part of the TensorFlow platform – the tool allows users to analyze an machine learning model without the need for writing any further code. Given pointers to a TensorFlow model and a dataset, the What-If Tool offers an interactive visual interface for exploring model results. …

Merged-Averaged Classifiers via Hashing (MACH)
We present Merged-Averaged Classifiers via Hashing (MACH) for K-classification with ultra-large values of K. Compared to traditional one-vs-all classifiers that require O(Kd) memory and inference cost, MACH only need O(d log K) (d is dimensionality )memory while only requiring O(K log K + d log K) operation for inference. MACH is a generic K-classification algorithm, with provably theoretical guarantees, which requires O(log K) memory without any assumption on the relationship between classes. MACH uses universal hashing to reduce classification with a large number of classes to few independent classification tasks with small (constant) number of classes. We provide theoretical quantification of discriminability-memory tradeoff. With MACH we can train ODP dataset with 100,000 classes and 400,000 features on a single Titan X GPU, with the classification accuracy of 19.28%, which is the best-reported accuracy on this dataset. Before this work, the best performing baseline is a one-vs-all classifier that requires 40 billion parameters (160 GB model size) and achieves 9% accuracy. In contrast, MACH can achieve 9% accuracy with 480x reduction in the model size (of mere 0.3GB). With MACH, we also demonstrate complete training of fine-grained imagenet dataset (compressed size 104GB), with 21,000 classes, on a single GPU. To the best of our knowledge, this is the first work to demonstrate complete training of these extreme-class datasets on a single Titan X. …

Spatially-Preserved Doubly-Injected Object Detection CNN (S-DOD-CNN)
We present a novel event recognition approach called Spatially-preserved Doubly-injected Object Detection CNN (S-DOD-CNN), which incorporates the spatially preserved object detection information in both a direct and an indirect way. Indirect injection is carried out by simply sharing the weights between the object detection modules and the event recognition module. Meanwhile, our novelty lies in the fact that we have preserved the spatial information for the direct injection. Once multiple regions-of-intereset (RoIs) are acquired, their feature maps are computed and then projected onto a spatially-preserving combined feature map using one of the four RoI Projection approaches we present. In our architecture, combined feature maps are generated for object detection which are directly injected to the event recognition module. Our method provides the state-of-the-art accuracy for malicious event recognition. …

# If you did not already know

Tarjan’s Strongly Connected Components Algorithm
Tarjan’s Algorithm (named for its discoverer, Robert Tarjan) is a graph theory algorithm for finding the strongly connected components of a graph. Although it precedes it chronologically, it can be seen as an improved version of Kosaraju’s algorithm, and is comparable in efficiency to the path-based strong component algorithm. …

OpenBLAS
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version. …

When observing the actions of others, humans carry out inferences about why the others acted as they did, and what this implies about their view of the world. Humans also use the fact that their actions will be interpreted in this manner when observed by others, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player, zero-sum games, scalable multi-agent reinforcement learning algorithms that can discover effective strategies and conventions in complex, partially observable settings have proven elusive. We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment. Together with the public belief, this Bayesian update effectively defines a new Markov decision process, the public belief MDP, in which the action space consists of deterministic partial policies, parameterised by deep neural networks, that can be sampled for a given public state. It exploits the fact that an agent acting only on this public belief state can still learn to use its private information if the action space is augmented to be over partial policies mapping private information into environment actions. The Bayesian update is also closely related to the theory of mind reasoning that humans carry out when observing others’ actions. We first validate BAD on a proof-of-principle two-step matrix game, where it outperforms traditional policy gradient methods. We then evaluate BAD on the challenging, cooperative partial-information card game Hanabi, where in the two-player setting the method surpasses all previously published learning and hand-coded approaches. …

RuSentRel
In this paper we present the RuSentRel corpus including analytical texts in the sphere of international relations. For each document we annotated sentiments from the author to mentioned named entities, and sentiments of relations between mentioned entities. In the current experiments, we considered the problem of extracting sentiment relations between entities for the whole documents as a three-class machine learning task. We experimented with conventional machine-learning methods (Naive Bayes, SVM, Random Forest). …

# If you did not already know

Multi-Directional Recurrent Neural Network (M-RNN)
Missing data is a ubiquitous problem. It is especially challenging in medical settings because many streams of measurements are collected at different – and often irregular – times. Accurate estimation of those missing measurements is critical for many reasons, including diagnosis, prognosis and treatment. Existing methods address this estimation problem by interpolating within data streams or imputing across data streams (both of which ignore important information) or ignoring the temporal aspect of the data and imposing strong assumptions about the nature of the data-generating process and/or the pattern of missing data (both of which are especially problematic for medical data). We propose a new approach, based on a novel deep learning architecture that we call a Multi-directional Recurrent Neural Network (M-RNN) that interpolates within data streams and imputes across data streams. We demonstrate the power of our approach by applying it to five real-world medical datasets. We show that it provides dramatically improved estimation of missing measurements in comparison to 11 state-of-the-art benchmarks (including Spline and Cubic Interpolations, MICE, MissForest, matrix completion and several RNN methods); typical improvements in Root Mean Square Error are between 35% – 50%. Additional experiments based on the same five datasets demonstrate that the improvements provided by our method are extremely robust. …

StackExchange
Existing keyphrase generation studies suffer from the problems of generating duplicate phrases and deficient evaluation based on a fixed number of predicted phrases. We propose a recurrent generative model that generates multiple keyphrases sequentially from a text, with specific modules that promote generation diversity. We further propose two new metrics that consider a variable number of phrases. With both existing and proposed evaluation setups, our model demonstrates superior performance to baselines on three types of keyphrase generation datasets, including two newly introduced in this work: StackExchange and TextWorld ACG. In contrast to previous keyphrase generation approaches, our model generates sets of diverse keyphrases of a variable number. …

CODED
A powerful approach to detecting erroneous data is to check which potentially dirty data records are incompatible with a user’s domain knowledge. Previous approaches allow the user to specify domain knowledge in the form of logical constraints (e.g., functional dependency and denial constraints). We extend the constraint-based approach by introducing a novel class of statistical constraints (SCs). An SC treats each column as a random variable, and enforces an independence or dependence relationship between two (or a few) random variables. Statistical constraints are expressive, allowing the user to specify a wide range of domain knowledge, beyond traditional integrity constraints. Furthermore, they work harmoniously with downstream statistical modeling. We develop CODED, an SC-Oriented Data Error Detection system that supports three key tasks: (1) Checking whether an SC is violated or not on a given dataset, (2) Identify the top-k records that contribute the most to the violation of an SC, and (3) Checking whether a set of input SCs have conflicts or not. We present effective solutions for each task. Experiments on synthetic and real-world data illustrate how SCs apply to error detection, and provide evidence that CODED performs better than state-of-the-art approaches. …

Maler
In this paper, we study adaptive online convex optimization, and aim to design a universal algorithm that achieves optimal regret bounds for multiple common types of loss functions. Existing universal methods are limited in the sense that they are optimal for only a subclass of loss functions. To address this limitation, we propose a novel online method, namely Maler, which enjoys the optimal $O(\sqrt{T})$, $O(d\log T)$ and $O(\log T)$ regret bounds for general convex, exponentially concave, and strongly convex functions respectively. The essential idea is to run multiple types of learning algorithms with different learning rates in parallel, and utilize a meta algorithm to track the best one on the fly. Empirical results demonstrate the effectiveness of our method. …

# If you did not already know

Binary Stochastic Filtering (BSF)
Binary Stochastic Filtering (BSF), the algorithm for feature selection and neuron pruning is proposed in this work. Filtering layer stochastically passes or filters out features based on individual weights, which are tuned during neural network training process. By placing BSF after the neural network input, the filtering of input features is performed, i.e. feature selection. More then 5-fold dimensionality decrease was achieved in the experiments. Placing BSF layer in between hidden layers allows filtering of neuron outputs and could be used for neuron pruning. Up to 34-fold decrease in the number of weights in the network was reached, which corresponds to the significant increase of performance, that is especially important for mobile and embedded applications. …

k-PDTM
Analyzing the sub-level sets of the distance to a compact sub-manifold of R d is a common method in TDA to understand its topology. The distance to measure (DTM) was introduced by Chazal, Cohen-Steiner and M{\’e}rigot in [7] to face the non-robustness of the distance to a compact set to noise and outliers. This function makes possible the inference of the topology of a compact subset of R d from a noisy cloud of n points lying nearby in the Wasserstein sense. In practice, these sub-level sets may be computed using approximations of the DTM such as the q-witnessed distance [10] or other power distance [6]. These approaches lead eventually to compute the homology of unions of n growing balls, that might become intractable whenever n is large. To simultaneously face the two problems of large number of points and noise, we introduce the k-power distance to measure (k-PDTM). This new approximation of the distance to measure may be thought of as a k-coreset based approximation of the DTM. Its sublevel sets consist in union of k-balls, k << n, and this distance is also proved robust to noise. We assess the quality of this approximation for k possibly dramatically smaller than n, for instance k = n 1 3 is proved to be optimal for 2-dimensional shapes. We also provide an algorithm to compute this k-PDTM. …

Iris
Today’s conversational agents are restricted to simple standalone commands. In this paper, we present Iris, an agent that draws on human conversational strategies to combine commands, allowing it to perform more complex tasks that it has not been explicitly designed to support: for example, composing one command to ‘plot a histogram’ with another to first ‘log-transform the data’. To enable this complexity, we introduce a domain specific language that transforms commands into automata that Iris can compose, sequence, and execute dynamically by interacting with a user through natural language, as well as a conversational type system that manages what kinds of commands can be combined. We have designed Iris to help users with data science tasks, a domain that requires support for command combination. In evaluation, we find that data scientists complete a predictive modeling task significantly faster (2.6 times speedup) with Iris than a modern non-conversational programming environment. Iris supports the same kinds of commands as today’s agents, but empowers users to weave together these commands to accomplish complex goals. …

Evolving Graph Convolutional Network (EvolveGCN)
Graph representation learning resurges as a trending research subject owing to the widespread use of deep learning for Euclidean data, which inspire various creative designs of neural networks in the non-Euclidean domain, particularly graphs. With the success of these graph neural networks (GNN) in the static setting, we approach further practical scenarios where the graph dynamically evolves. For this case, combining the GNN with a recurrent neural network (RNN, broadly speaking) is a natural idea. Existing approaches typically learn one single graph model for all the graphs, by using the RNN to capture the dynamism of the output node embeddings and to implicitly regulate the graph model. In this work, we propose a different approach, coined EvolveGCN, that uses the RNN to evolve the graph model itself over time. This model adaptation approach is model oriented rather than node oriented, and hence is advantageous in the flexibility on the input. For example, in the extreme case, the model can handle at a new time step, a completely new set of nodes whose historical information is unknown, because the dynamism has been carried over to the GNN parameters. We evaluate the proposed approach on tasks including node classification, edge classification, and link prediction. The experimental results indicate a generally higher performance of EvolveGCN compared with related approaches. …

# If you did not already know

Distilled-Exposition Enhanced Matching Network (DEMN)
This paper proposes a Distilled-Exposition Enhanced Matching Network (DEMN) for story-cloze test, which is still a challenging task in story comprehension. We divide a complete story into three narrative segments: an \textit{exposition}, a \textit{climax}, and an \textit{ending}. The model consists of three modules: input module, matching module, and distillation module. The input module provides semantic representations for the three segments and then feeds them into the other two modules. The matching module collects interaction features between the ending and the climax. The distillation module distills the crucial semantic information in the exposition and infuses it into the matching module in two different ways. We evaluate our single and ensemble model on ROCStories Corpus \cite{Mostafazadeh2016ACA}, achieving an accuracy of 80.1\% and 81.2\% on the test set respectively. The experimental results demonstrate that our DEMN model achieves a state-of-the-art performance. …

Truncated Variance Reduction (TruVaR)
We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically non-trivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and real-world data sets. …

Proportional Degree
Several algorithms have been proposed to filter information on a complete graph of correlations across stocks to build a stock-correlation network. Among them the planar maximally filtered graph (PMFG) algorithm uses $3n-6$ edges to build a graph whose features include a high frequency of small cliques and a good clustering of stocks. We propose a new algorithm which we call proportional degree (PD) to filter information on the complete graph of normalised mutual information (NMI) across stocks. Our results show that the PD algorithm produces a network showing better homogeneity with respect to cliques, as compared to economic sectoral classification than its PMFG counterpart. We also show that the partition of the PD network obtained through normalised spectral clustering (NSC) agrees better with the NSC of the complete graph than the corresponding one obtained from PMFG. Finally, we show that the clusters in the PD network are more robust with respect to the removal of random sets of edges than those in the PMFG network. …

Non-Stationary Streaming PCA
We consider the problem of streaming principal component analysis (PCA) when the observations are noisy and generated in a non-stationary environment. Given $T$, $p$-dimensional noisy observations sampled from a non-stationary variant of the spiked covariance model, our goal is to construct the best linear $k$-dimensional subspace of the terminal observations. We study the effect of non-stationarity by establishing a lower bound on the number of samples and the corresponding recovery error obtained by any algorithm. We establish the convergence behaviour of the noisy power method using a novel proof technique which maybe of independent interest. We conclude that the recovery guarantee of the noisy power method matches the fundamental limit, thereby generalizing existing results on streaming PCA to a non-stationary setting. …

# If you did not already know

Residual Sum of Squares (RSS, SSR, SSE)
In statistics, the residual sum of squares (RSS) is the sum of squares of residuals. It is also known as the sum of squared residuals (SSR) or the sum of squared errors of prediction (SSE). It is a measure of the discrepancy between the data and an estimation model. A small RSS indicates a tight fit of the model to the data.
In general, total sum of squares = explained sum of squares + residual sum of squares. For a proof of this in the multivariate ordinary least squares (OLS) case, see partitioning in the general OLS model. …

Inferential Model (IM)
Probability is a useful tool for describing uncertainty, so it is natural to strive for a system of statistical inference based on probabilities for or against various hypotheses. But existing probabilistic inference methods struggle to provide a meaningful interpretation of the probabilities across experiments in sufficient generality. In this paper we further develop a promising new approach based on what are called inferential models (IMs). The fundamental idea behind IMs is that there is an unobservable auxiliary variable that itself describes the inherent uncertainty about the parameter of interest, and that posterior probabilistic inference can be accomplished by predicting this unobserved quantity. We describe a simple and intuitive threestep construction of a random set of candidate parameter values, each being consistent with the model, the observed data, and a auxiliary variable prediction. Then prior-free posterior summaries of the available statistical evidence for and against a hypothesis of interest are obtained by calculating the probability that this random set falls completely in and completely out of the hypothesis, respectively. We prove that these IM-based measures of evidence are calibrated in a frequentist sense, showing that IMs give easily-interpretable results both within and across experiments. …

Golomb Ruler Problem
The Golomb ruler problem is defined as follows: Given a positive integer n, locate n marks on a ruler such that the distance between any two distinct pair of marks are different from each other and the total length of the ruler is minimized. The Golomb ruler problem has applications in information theory, astronomy and communications, and it can be seen as a challenge for combinatorial optimization algorithms. Although constructing high quality rulers is well-studied, proving optimality is a far more challenging task. …

Origraph
Data wrangling is widely acknowledged to be a critical part of the data analysis pipeline. Nevertheless, there are currently no techniques to efficiently wrangle network datasets. Here we introduce a set of interaction techniques that enable analysts to carry out complex network wrangling operations. These operations include deriving attributes across connected classes, converting nodes to edges and vice-versa, and faceting nodes and edges based on attributes. We implement these operations in a web-based, open-source system, Origraph, which provides interfaces to execute the operations and investigate the results. Designed for wrangling, rather than analysis, Origraph can be used to load data in many forms, wrangle and transform the network, and export it in formats compatible with common network visualization tools. We demonstrate Origraph’s usefulness in a series of examples with different datasets from a variety of sources. …