# If you did not already know

Aspect-Based Rating Prediction Model (AspeRa)
We propose a novel end-to-end Aspect-based Rating Prediction model (AspeRa) that estimates user rating based on review texts for the items and at the same time discovers coherent aspects of reviews that can be used to explain predictions or profile users. The AspeRa model uses max-margin losses for joint item and user embedding learning and a dual-headed architecture; it significantly outperforms recently proposed state-of-the-art models such as DeepCoNN, HFT, NARRE, and TransRev on two real world data sets of user reviews. With qualitative examination of the aspects and quantitative evaluation of rating prediction models based on these aspects, we show how aspect embeddings can be used in a recommender system. …

GEP-PG
In continuous action domains, standard deep reinforcement learning algorithms like DDPG suffer from inefficient exploration when facing sparse or deceptive reward problems. Conversely, evolutionary and developmental methods focusing on exploration like novelty search, quality-diversity or goal exploration processes are less sample efficient during exploitation. In this paper, we present the GEP-PG approach, taking the best of both worlds by sequentially combining two variants of a goal exploration process and two variants of DDPG. We study the learning performance of these components and their combination on a low dimensional deceptive reward problem and on the larger Half-Cheetah benchmark. Among other things, we show that DDPG fails on the former and that GEP-PG obtains performance above the state-of-the-art on the latter. …

Backdrop
We introduce backdrop, a flexible and simple-to-implement method, intuitively described as dropout acting only along the backpropagation pipeline. Backdrop is implemented via one or more masking layers which are inserted at specific points along the network. Each backdrop masking layer acts as the identity in the forward pass, but randomly masks parts of the backward gradient propagation. Intuitively, inserting a backdrop layer after any convolutional layer leads to stochastic gradients corresponding to features of that scale. Therefore, backdrop is well suited for problems in which the data have a multi-scale, hierarchical structure. Backdrop can also be applied to problems with non-decomposable loss functions where standard SGD methods are not well suited. We perform a number of experiments and demonstrate that backdrop leads to significant improvements in generalization. …

Sparsified Capsule Network
Capsule network has shown various advantages over convolutional neural network (CNN). It keeps more precise spatial information than CNN and uses equivariance instead of invariance during inference and highly potential to be a new effective tool for visual tasks. However, the current capsule networks have incompatible performance with CNN when facing datasets with background and complex target objects and are lacking in universal and efficient regularization method. We analyze the main reason of the incompatible performance as the conflict between information sensitiveness of capsule network and unreasonably higher activation value distribution of capsules in primary capsule layer. Correspondingly, we propose sparsified capsule network by sparsifying and restraining the activation value of capsules in primary capsule layer to suppress non-informative capsules and highlight discriminative capsules. In the experiments, the sparsified capsule network has achieved better performances on various mainstream datasets. In addition, the proposed sparsifying methods can be seen as a suitable, simple and efficient regularization method that can be generally used in capsule network. …

# If you did not already know

Neuroevolution
Neuroevolution, or neuro-evolution, is a form of artificial intelligence that uses evolutionary algorithms to generate artificial neural networks (ANN), parameters, topology and rules. It is most commonly applied in artificial life, general game playing and evolutionary robotics. The main benefit is that neuroevolution can be applied more widely than supervised learning algorithms, which require a syllabus of correct input-output pairs. In contrast, neuroevolution requires only a measure of a network’s performance at a task. For example, the outcome of a game (i.e. whether one player won or lost) can be easily measured without providing labeled examples of desired strategies. Neuroevolution can be contrasted with conventional deep learning techniques that use gradient descent on a neural network with a fixed topology.
Adaptive Genomic Evolution of Neural Network Topologies (AGENT) for State-to-Action Mapping in Autonomous Agents

Artificial Intelligence principles define social and ethical considerations to develop future AI. They come from research institutes, government organizations and industries. All versions of AI principles are with different considerations covering different perspectives and making different emphasis. None of them can be considered as complete and can cover the rest AI principle proposals. Here we introduce LAIP, an effort and platform for linking and analyzing different Artificial Intelligence Principles. We want to explicitly establish the common topics and links among AI Principles proposed by different organizations and investigate on their uniqueness. Based on these efforts, for the long-term future of AI, instead of directly adopting any of the AI principles, we argue for the necessity of incorporating various AI Principles into a comprehensive framework and focusing on how they can interact and complete each other. …

K-Quantiles Clustering
A new cluster analysis method, $K$-quantiles clustering, is introduced. $K$-quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd’s algorithm for $K$-means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for different levels of parsimony and computational efficiency. Although $K$-quantiles clustering is conceived as nonparametric, it can be connected to a fixed partition model of generalized asymmetric Laplace-distributions. The consistency of $K$-quantiles clustering is proved, and it is shown that $K$-quantiles clusters correspond to well separated mixture components in a nonparametric mixture. In a simulation, $K$-quantiles clustering is compared with a number of popular clustering methods with good results. A high-dimensional microarray dataset is clustered by $K$-quantiles. …

The prevalence of networked sensors and actuators in many real-world systems such as smart buildings, factories, power plants, and data centers generate substantial amounts of multivariate time series data for these systems. The rich sensor data can be continuously monitored for intrusion events through anomaly detection. However, conventional threshold-based anomaly detection methods are inadequate due to the dynamic complexities of these systems, while supervised machine learning methods are unable to exploit the large amounts of data due to the lack of labeled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system for detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs). Instead of treating each data stream independently, our proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies by discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPS: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results showed that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems. …

# If you did not already know

Multi-Objective Reinforced Evolution in Mobile Neural Architecture Search (MoreMNAS)
Fabricating neural models for a wide range of mobile devices demands for specific design of networks due to highly constrained resources. Both evolution algorithms (EA) and reinforced learning methods (RL) have been introduced to address Neural Architecture Search, distinct efforts to integrate both categories have also been proposed. However, these combinations usually concentrate on a single objective such as error rate of image classification. They also fail to harness the very benefits from both sides. In this paper, we present a new multi-objective oriented algorithm called MoreMNAS (Multi-Objective Reinforced Evolution in Mobile Neural Architecture Search) by leveraging good virtues from both EA and RL. In particular, we incorporate a variant of multi-objective genetic algorithm NSGA-II, in which the search space is composed of various cells so that crossovers and mutations can be performed at the cell level. Moreover, reinforced control is mixed with random process to regulate arbitrary mutation, maintaining a delicate balance between exploration and exploitation. Therefore, not only does our method prevent the searched models from degrading during the evolution process, but it also makes better use of learned knowledge. Our preliminary experiments conducted in Super Resolution domain (SR) deliver rivalling models compared to some state-of-the-art methods with much less FLOPS. More results will be disclosed very soon …

Semantic Label Indexing, Neural Matching, and Efficient Ranking (SLINMER)
Extreme multi-label classification (XMC) aims to assign to an instance the most relevant subset of labels from a colossal label set. Due to modern applications that lead to massive label sets, the scalability of XMC has attracted much recent attention from both academia and industry. In this paper, we establish a three-stage framework to solve XMC efficiently, which includes 1) indexing the labels, 2) matching the instance to the relevant indices, and 3) ranking the labels from the relevant indices. This framework unifies many existing XMC approaches. Based on this framework, we propose a modular deep learning approach SLINMER: Semantic Label Indexing, Neural Matching, and Efficient Ranking. The label indexing stage of SLINMER can adopt different semantic label representations leading to different configurations of SLINMER. Empirically, we demonstrate that several individual configurations of SLINMER achieve superior performance than the state-of-the-art XMC approaches on several benchmark datasets. Moreover, by ensembling those configurations, SLINMER can achieve even better results. In particular, on a Wiki dataset with around 0.5 millions of labels, the precision@1 is increased from 61% to 67%. …

MASSES
We introduce MASSES, a simple evaluation metric for the task of Visual Question Answering (VQA). In its standard form, the VQA task is operationalized as follows: Given an image and an open-ended question in natural language, systems are required to provide a suitable answer. Currently, model performance is evaluated by means of a somehow simplistic metric: If the predicted answer is chosen by at least 3 human annotators out of 10, then it is 100% correct. Though intuitively valuable, this metric has some important limitations. First, it ignores whether the predicted answer is the one selected by the Majority (MA) of annotators. Second, it does not account for the quantitative Subjectivity (S) of the answers in the sample (and dataset). Third, information about the Semantic Similarity (SES) of the responses is completely neglected. Based on such limitations, we propose a multi-component metric that accounts for all these issues. We show that our metric is effective in providing a more fine-grained evaluation both on the quantitative and qualitative level. …

Interactive Growing Hierarchical SOM (interactive GHSOM)
Self Organizing Map is trained using unsupervised learning to produce a two-dimensional discretized representation of input space of the training cases. Growing Hierarchical SOM is an architecture which grows both in a hierarchical way representing the structure of data distribution and in a horizontal way representation the size of each individual maps. The control method of the growing degree of GHSOM by pruning off the redundant branch of hierarchy in SOM is proposed in this paper. Moreover, the interface tool for the proposed method called interactive GHSOM is developed. We discuss the computation results of Iris data by using the developed tool. …

# If you did not already know

Instancewise Feature Selection
We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show that the resulting method compares favorably to other model explanation methods on a variety of synthetic and real data sets using both quantitative metrics and human evaluation. …

Marginalized Average Aggregation (MAA)
In weakly-supervised temporal action localization, previous works have failed to locate dense and integral regions for each entire action due to the overestimation of the most salient regions. To alleviate this issue, we propose a marginalized average attentional network (MAAN) to suppress the dominant response of the most salient regions in a principled manner. The MAAN employs a novel marginalized average aggregation (MAA) module and learns a set of latent discriminative probabilities in an end-to-end fashion. MAA samples multiple subsets from the video snippet features according to a set of latent discriminative probabilities and takes the expectation over all the averaged subset features. Theoretically, we prove that the MAA module with learned latent discriminative probabilities successfully reduces the difference in responses between the most salient regions and the others. Therefore, MAAN is able to generate better class activation sequences and identify dense and integral action regions in the videos. Moreover, we propose a fast algorithm to reduce the complexity of constructing MAA from O($2^T$) to O($T^2$). Extensive experiments on two large-scale video datasets show that our MAAN achieves superior performance on weakly-supervised temporal action localization …

Similarity-based Random Survival Forest
Predicting the time to a clinical outcome for patients in intensive care units (ICUs) helps to support critical medical treatment decisions. The time to an event of interest could be, for example, survival time or time to recovery from a disease/ailment observed within the ICU. The massive health datasets generated from the uptake of Electronic Health Records (EHRs) are diverse in variety as patients can be quite dissimilar in their relationship between the feature vector and the outcome, adding more noise than information to prediction. We propose a modified random forest method for survival data that identifies similar cases and improves prediction accuracy. We also introduce an adaptation of our methodology in the case of dependent censoring. Our proposed method is demonstrated in the Medical Information Mart for Intensive Care (MIMIC-III) database, and we also present properties of our methodology through a comprehensive simulation study. Introducing similarity to the random survival forest method indeed provides additional predictive accuracy compared to random survival forest alone in the various analyses we undertook. …

Evolutionary Graph Recurrent Network (EGRN)
Time series modeling aims to capture the intrinsic factors underpinning observed data and its evolution. However, most existing studies ignore the evolutionary relations among these factors, which are what cause the combinatorial evolution of a given time series. In this paper, we propose to represent time-varying relations among intrinsic factors of time series data by means of an evolutionary state graph structure. Accordingly, we propose the Evolutionary Graph Recurrent Networks (EGRN) to learn representations of these factors, along with the given time series, using a graph neural network framework. The learned representations can then be applied to time series classification tasks. From our experiment results, based on six real-world datasets, it can be seen that our approach clearly outperforms ten state-of-the-art baseline methods (e.g. +5% in terms of accuracy, and +15% in terms of F1 on average). In addition, we demonstrate that due to the graph structure’s improved interpretability, our method is also able to explain the logical causes of the predicted events. …

# If you did not already know

Learning to Recommend with Missing Modalities (LRMM)
Multimodal learning has shown promising performance in content-based recommendation due to the auxiliary user and item information of multiple modalities such as text and images. However, the problem of incomplete and missing modality is rarely explored and most existing methods fail in learning a recommendation model with missing or corrupted modalities. In this paper, we propose LRMM, a novel framework that mitigates not only the problem of missing modalities but also more generally the cold-start problem of recommender systems. We propose modality dropout (m-drop) and a multimodal sequential autoencoder (m-auto) to learn multimodal representations for complementing and imputing missing modalities. Extensive experiments on real-world Amazon data show that LRMM achieves state-of-the-art performance on rating prediction tasks. More importantly, LRMM is more robust to previous methods in alleviating data-sparsity and the cold-start problem. …

Motion Transformation Variational Auto-Encoder (MT-VAE)
Long-term human motion can be represented as a series of motion modes—motion sequences that capture short-term temporal dynamics—with transitions between them. We leverage this structure and present a novel Motion Transformation Variational Auto-Encoders (MT-VAE) for learning motion sequence generation. Our model jointly learns a feature embedding for motion modes (that the motion sequence can be reconstructed from) and a feature transformation that represents the transition of one motion mode to the next motion mode. Our model is able to generate multiple diverse and plausible motion sequences in the future from the same input. We apply our approach to both facial and full body motion, and demonstrate applications like analogy-based motion transfer and video synthesis. …

Neural Basis Expansion Analysis for Interpretable Time Series Forecasting (N-BEATS)
We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on the well-known M4 competition dataset containing 100k time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year’s winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on the M4 dataset strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without loss in accuracy. …

CoordConv
Uber uses convolutional neural networks in many domains that could potentially involve coordinate transforms, from designing self-driving vehicles to automating street sign detection to build maps and maximizing the efficiency of spatial movements in the Uber Marketplace. In deep learning, few ideas have experienced as much impact as convolution. Almost all state-of-the-art results in machine vision make use of stacks of convolutional layers as basic building blocks. Since such architectures are widespread, we should expect that they excel at simple tasks like painting a single pixel in a tiny image, right Surprisingly, it turns out that convolution often has difficulty completing seemingly trivial tasks. In our paper, An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution, we expose and analyze a generic inability of convolutional neural networks (CNNs) to transform spatial representations between two different types: coordinates in (i, j) Cartesian space and coordinates in one-hot pixel space. It´s surprising because the task appears so simple, and it may be important because such coordinate transforms seem to be required to solve many common tasks, like detecting objects in images, training generative models of images, and training reinforcement learning (RL) agents from pixels. It turns out that these tasks may have subtly suffered from this failing of convolution all along, as suggested by performance improvements we demonstrate across several domains when using the solution we propose, a layer called CoordConv. …

# If you did not already know

COBRAS
Constraint-based clustering algorithms exploit background knowledge to construct clusterings that are aligned with the interests of a particular user. This background knowledge is often obtained by allowing the clustering system to pose pairwise queries to the user: should these two elements be in the same cluster or not? Active clustering methods aim to minimize the number of queries needed to obtain a good clustering by querying the most informative pairs first. Ideally, a user should be able to answer a couple of these queries, inspect the resulting clustering, and repeat these two steps until a satisfactory result is obtained. We present COBRAS, an approach to active clustering with pairwise constraints that is suited for such an interactive clustering process. A core concept in COBRAS is that of a super-instance: a local region in the data in which all instances are assumed to belong to the same cluster. COBRAS constructs such super-instances in a top-down manner to produce high-quality results early on in the clustering process, and keeps refining these super-instances as more pairwise queries are given to get more detailed clusterings later on. We experimentally demonstrate that COBRAS produces good clusterings at fast run times, making it an excellent candidate for the iterative clustering scenario outlined above. …

Random Warping Series (RWS)
Time series data analytics has been a problem of substantial interests for decades, and Dynamic Time Warping (DTW) has been the most widely adopted technique to measure dissimilarity between time series. A number of global-alignment kernels have since been proposed in the spirit of DTW to extend its use to kernel-based estimation method such as support vector machine. However, those kernels suffer from diagonal dominance of the Gram matrix and a quadratic complexity w.r.t. the sample size. In this work, we study a family of alignment-aware positive definite (p.d.) kernels, with its feature embedding given by a distribution of \emph{Random Warping Series (RWS)}. The proposed kernel does not suffer from the issue of diagonal dominance while naturally enjoys a \emph{Random Features} (RF) approximation, which reduces the computational complexity of existing DTW-based techniques from quadratic to linear in terms of both the number and the length of time-series. We also study the convergence of the RF approximation for the domain of time series of unbounded length. Our extensive experiments on 16 benchmark datasets demonstrate that RWS outperforms or matches state-of-the-art classification and clustering methods in both accuracy and computational time. Our code and data is available at { \url{https://…/RandomWarpingSeries}}.

Anaconda
We investigate distribution testing with access to non-adaptive conditional samples. In the conditional sampling model, the algorithm is given the following access to a distribution: it submits a query set $S$ to an oracle, which returns a sample from the distribution conditioned on being from $S$. In the non-adaptive setting, all query sets must be specified in advance of viewing the outcomes. Our main result is the first polylogarithmic-query algorithm for equivalence testing, deciding whether two unknown distributions are equal to or far from each other. This is an exponential improvement over the previous best upper bound, and demonstrates that the complexity of the problem in this model is intermediate to the the complexity of the problem in the standard sampling model and the adaptive conditional sampling model. We also significantly improve the sample complexity for the easier problems of uniformity and identity testing. For the former, our algorithm requires only $\tilde O(\log n)$ queries, matching the information-theoretic lower bound up to a $O(\log \log n)$-factor. Our algorithm works by reducing the problem from $\ell_1$-testing to $\ell_\infty$-testing, which enjoys a much cheaper sample complexity. Necessitated by the limited power of the non-adaptive model, our algorithm is very simple to state. However, there are significant challenges in the analysis, due to the complex structure of how two arbitrary distributions may differ. …

Variational Contrastive Divergence (VCD)
We develop a method to combine Markov chain Monte Carlo (MCMC) and variational inference (VI), leveraging the advantages of both inference approaches. Specifically, we improve the variational distribution by running a few MCMC steps. To make inference tractable, we introduce the variational contrastive divergence (VCD), a new divergence that replaces the standard Kullback-Leibler (KL) divergence used in VI. The VCD captures a notion of discrepancy between the initial variational distribution and its improved version (obtained after running the MCMC steps), and it converges asymptotically to the symmetrized KL divergence between the variational distribution and the posterior of interest. The VCD objective can be optimized efficiently with respect to the variational parameters via stochastic optimization. We show experimentally that optimizing the VCD leads to better predictive performance on two latent variable models: logistic matrix factorization and variational autoencoders (VAEs). …

# If you did not already know

Hierarchical Configuration Model
We introduce a class of random graphs with a hierarchical community structure, which we call the hierarchical configuration model. On the inter-community level, the graph is a configuration model, and on the intra-community level, every vertex in the configuration model is replaced by a community: a small graph. These communities may have any shape, as long as they are connected. For these hierarchical graphs, we find the size of the largest component, the degree distribution and the clustering coefficient. Furthermore, we determine the conditions under which a giant percolation cluster exists, and find its size. …

KT-Speech-Crawler
In this paper, we describe KT-Speech-Crawler: an approach for automatic dataset construction for speech recognition by crawling YouTube videos. We outline several filtering and post-processing steps, which extract samples that can be used for training end-to-end neural speech recognition systems. In our experiments, we demonstrate that a single-core version of the crawler can obtain around 150 hours of transcribed speech within a day, containing an estimated 3.5% word error rate in the transcriptions. Automatically collected samples contain reading and spontaneous speech recorded in various conditions including background noise and music, distant microphone recordings, and a variety of accents and reverberation. When training a deep neural network on speech recognition, we observed around 40\% word error rate reduction on the Wall Street Journal dataset by integrating 200 hours of the collected samples into the training set. The demo (http://emnlp-demo.lakomkin.me ) and the crawler code (https://…/KTSpeechCrawler ) are publicly available. …

Iterative Normalization (IterNorm)
Batch Normalization (BN) is ubiquitously employed for accelerating neural network training and improving the generalization capability by performing standardization within mini-batches. Decorrelated Batch Normalization (DBN) further boosts the above effectiveness by whitening. However, DBN relies heavily on either a large batch size, or eigen-decomposition that suffers from poor efficiency on GPUs. We propose Iterative Normalization (IterNorm), which employs Newton’s iterations for much more efficient whitening, while simultaneously avoiding the eigen-decomposition. Furthermore, we develop a comprehensive study to show IterNorm has better trade-off between optimization and generalization, with theoretical and experimental support. To this end, we exclusively introduce Stochastic Normalization Disturbance (SND), which measures the inherent stochastic uncertainty of samples when applied to normalization operations. With the support of SND, we provide natural explanations to several phenomena from the perspective of optimization, e.g., why group-wise whitening of DBN generally outperforms full-whitening and why the accuracy of BN degenerates with reduced batch sizes. We demonstrate the consistently improved performance of IterNorm with extensive experiments on CIFAR-10 and ImageNet over BN and DBN. …

Transfer Entropy
Transfer entropy is a non-parametric statistic measuring the amount of directed (time-asymmetric) transfer of information between two random processes. Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y. Transfer entropy reduces to Granger causality for vector auto-regressive processes. Hence, it is advantageous when the model assumption of Granger causality doesn’t hold, for example, analysis of non-linear signals. However, it usually requires more samples for accurate estimation. The probabilities in the entropy formula can be estimated using different approaches (binning, nearest neighbors) or, in order to reduce complexity, using a non-uniform embedding. While it was originally defined for bivariate analysis, transfer entropy has been extended to multivariate forms, either conditioning on other potential source variables or considering transfer from a collection of sources, although these forms require more samples again. …

# If you did not already know

DeepDream is a computer vision program created by Google engineer Alexander Mordvintsev which uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia, thus creating a dream-like hallucinogenic appearance in the deliberately over-processed images. Google’s program popularized the term (deep) ‘dreaming’ to refer to the generation of images that produce desired activations in a trained deep network, and the term now refers to a collection of related approaches. …

Attention-Based Convolutional Neural Network-Bidirectional Long-Short Term Memory (CNN-BLSTM)
In this paper, we present an end-to-end language identification framework, the attention-based Convolutional Neural Network-Bidirectional Long-short Term Memory (CNN-BLSTM). The model is performed on the utterance level, which means the utterance-level decision can be directly obtained from the output of the neural network. To handle speech utterances with entire arbitrary and potentially long duration, we combine CNN-BLSTM model with a self-attentive pooling layer together. The front-end CNN-BLSTM module plays a role as local pattern extractor for the variable-length inputs, and the following self-attentive pooling layer is built on top to get the fixed-dimensional utterance-level representation. We conducted experiments on NIST LRE07 closed-set task, and the results reveal that the proposed attention-based CNN-BLSTM model achieves comparable error reduction with other state-of-the-art utterance-level neural network approaches for all 3 seconds, 10 seconds, 30 seconds duration tasks. …

Bidirectional Inference Network (BIN)
We consider the problem of inferring the values of an arbitrary set of variables (e.g., risk of diseases) given other observed variables (e.g., symptoms and diagnosed diseases) and high-dimensional signals (e.g., MRI images or EEG). This is a common problem in healthcare since variables of interest often differ for different patients. Existing methods including Bayesian networks and structured prediction either do not incorporate high-dimensional signals or fail to model conditional dependencies among variables. To address these issues, we propose bidirectional inference networks (BIN), which stich together multiple probabilistic neural networks, each modeling a conditional dependency. Predictions are then made via iteratively updating variables using backpropagation (BP) to maximize corresponding posterior probability. Furthermore, we extend BIN to composite BIN (CBIN), which involves the iterative prediction process in the training stage and improves both accuracy and computational efficiency by adaptively smoothing the optimization landscape. Experiments on synthetic and real-world datasets (a sleep study and a dermatology dataset) show that CBIN is a single model that can achieve state-of-the-art performance and obtain better accuracy in most inference tasks than multiple models each specifically trained for a different task. …

Mean Average Precision (mAP)
Mean average precision for a set of queries is the mean of the average precision scores for each query.
Breaking down Mean Average Precision (mAP)

# If you did not already know

Apache Thrift
The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages. …

IBN-Net
Convolutional neural networks (CNNs) have achieved great successes in many computer vision problems. Unlike existing works that designed CNN architectures to improve performance on a single task of a single domain and not generalizable, we present IBN-Net, a novel convolutional architecture, which remarkably enhances a CNN’s modeling ability on one domain (e.g. Cityscapes) as well as its generalization capacity on another domain (e.g. GTA5) without finetuning. IBN-Net carefully integrates Instance Normalization (IN) and Batch Normalization (BN) as building blocks, and can be wrapped into many advanced deep networks to improve their performances. This work has three key contributions. (1) By delving into IN and BN, we disclose that IN learns features that are invariant to appearance changes, such as colors, styles, and virtuality/reality, while BN is essential for preserving content related information. (2) IBN-Net can be applied to many advanced deep architectures, such as DenseNet, ResNet, ResNeXt, and SENet, and consistently improve their performance without increasing computational cost. (3) When applying the trained networks to new domains, e.g. from GTA5 to Cityscapes, IBN-Net achieves comparable improvements as domain adaptation methods, even without using data from the target domain. With IBN-Net, we won the 1st place on the WAD 2018 Challenge Drivable Area track, with an mIoU of 86.18%. …

UNet++
In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply-supervised encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways. The re-designed skip pathways aim at reducing the semantic gap between the feature maps of the encoder and decoder sub-networks. We argue that the optimizer would deal with an easier learning task when the feature maps from the decoder and encoder networks are semantically similar. We have evaluated UNet++ in comparison with U-Net and wide U-Net architectures across multiple medical image segmentation tasks: nodule segmentation in the low-dose CT scans of chest, nuclei segmentation in the microscopy images, liver segmentation in abdominal CT scans, and polyp segmentation in colonoscopy videos. Our experiments demonstrate that UNet++ with deep supervision achieves an average IoU gain of 3.9 and 3.4 points over U-Net and wide U-Net, respectively. …

Alpha Model-Agnostic Meta-Learning (Alpha MAML)
Model-agnostic meta-learning (MAML) is a meta-learning technique to train a model on a multitude of learning tasks in a way that primes the model for few-shot learning of new tasks. The MAML algorithm performs well on few-shot learning problems in classification, regression, and fine-tuning of policy gradients in reinforcement learning, but comes with the need for costly hyperparameter tuning for training stability. We address this shortcoming by introducing an extension to MAML, called Alpha Model-agnostic meta-learning, to incorporate an online hyperparameter adaptation scheme that eliminates the need to tune meta-learning and learning rates. Our results with the Omniglot database demonstrate a substantial reduction in the need to tune MAML training hyperparameters and improvement to training stability with less sensitivity to hyperparameter choice. …

# If you did not already know

Datalog
Datalog is a declarative logic programming language that syntactically is a subset of Prolog. It is often used as a query language for deductive databases. In recent years, Datalog has found new application in data integration, information extraction, networking, program analysis, security, and cloud computing. Its origins date back to the beginning of logic programming, but it became prominent as a separate area around 1977 when Hervé Gallaire and Jack Minker organized a workshop on logic and databases. David Maier is credited with coining the term Datalog. …

Random Geometric Graph (RGG)
We propose an interdependent random geometric graph (RGG) model for interdependent networks. Based on this model, we study the robustness of two interdependent spatially embedded networks where interdependence exists between geographically nearby nodes in the two networks. We study the emergence of the giant mutual component in two interdependent RGGs as node densities increase, and define the percolation threshold as a pair of node densities above which the giant mutual component first appears. In contrast to the case for a single RGG, where the percolation threshold is a unique scalar for a given connection distance, for two interdependent RGGs, multiple pairs of percolation thresholds may exist, given that a smaller node density in one RGG may increase the minimum node density in the other RGG in order for a giant mutual component to exist. We derive analytical upper bounds on the percolation thresholds of two interdependent RGGs by discretization, and obtain $99\%$ confidence intervals for the percolation thresholds by simulation. Based on these results, we derive conditions for the interdependent RGGs to be robust under random failures and geographical attacks. …

Data Distillery
The paper tackles the unsupervised estimation of the effective dimension of a sample of dependent random vectors. The proposed method uses the principal components (PC) decomposition of sample covariance to establish a low-rank approximation that helps uncover the hidden structure. The number of PCs to be included in the decomposition is determined via a Probabilistic Principal Components Analysis (PPCA) embedded in a penalized profile likelihood criterion. The choice of penalty parameter is guided by a data-driven procedure that is justified via analytical derivations and extensive finite sample simulations. Application of the proposed penalized PPCA is illustrated with three gene expression datasets in which the number of cancer subtypes is estimated from all expression measurements. The analyses point towards hidden structures in the data, e.g. additional subgroups, that could be of scientific interest. …

DeepInf
Social and information networking activities such as on Facebook, Twitter, WeChat, and Weibo have become an indispensable part of our everyday life, where we can easily access friends’ behaviors and are in turn influenced by them. Consequently, an effective social influence prediction for each user is critical for a variety of applications such as online recommendation and advertising. Conventional social influence prediction approaches typically design various hand-crafted rules to extract user- and network-specific features. However, their effectiveness heavily relies on the knowledge of domain experts. As a result, it is usually difficult to generalize them into different domains. Inspired by the recent success of deep neural networks in a wide range of computing applications, we design an end-to-end framework, DeepInf, to learn users’ latent feature representation for predicting social influence. In general, DeepInf takes a user’s local network as the input to a graph neural network for learning her latent social representation. We design strategies to incorporate both network structures and user-specific features into convolutional neural and attention networks. Extensive experiments on Open Academic Graph, Twitter, Weibo, and Digg, representing different types of social and information networks, demonstrate that the proposed end-to-end model, DeepInf, significantly outperforms traditional feature engineering-based approaches, suggesting the effectiveness of representation learning for social applications. …