# If you did not already know

Apache SINGA
SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. A variety of popular deep learning models are supported, namely feed-forward models including convolutional neural networks (CNN), energy models like restricted Boltzmann machine (RBM), and recurrent neural networks (RNN). Many built-in layers are provided for users. SINGA architecture is sufficiently flexible to run synchronous, asynchronous and hybrid training frameworks. SINGA also supports different neural net partitioning schemes to parallelize the training of large models, namely partitioning on batch dimension, feature dimension or hybrid partitioning. …

The recent success of Generative Adversarial Networks (GAN) is a result of their ability to generate high quality images from a latent vector space. An important application is the generation of images from a text description, where the text description is encoded and further used in the conditioning of the generated image. Thus the generative network has to additionally learn a mapping from the text latent vector space to a highly complex and multi-modal image data distribution, which makes the training of such models challenging. To handle the complexities of fashion image and meta data, we propose Ontology Generative Adversarial Networks (O-GANs) for fashion image synthesis that is conditioned on an hierarchical fashion ontology in order to improve the image generation fidelity. We show that the incorporation of the ontology leads to better image quality as measured by Fr\'{e}chet Inception Distance and Inception Score. Additionally, we show that the O-GAN achieves better conditioning results evaluated by implicit similarity between the text and the generated image. …

TrIK-SVM
The proposed work aims at proposing a alternative kernel decomposition in the context of kernel machines with indefinite kernels. The original paper of KSVM (SVM in Kre\v{i}n spaces) uses the eigen-decomposition, our proposition avoids this decompostion. We explain how it can help in designing an algorithm that won’t require to compute the full kernel matrix. Finally we illustrate the good behavior of the proposed method compared to KSVM. …

Multi-Layer Iterative Soft Thresholding (ML-ISTA)
Parsimonious representations in data modeling are ubiquitous and central for processing information. Motivated by the recent Multi-Layer Convolutional Sparse Coding (ML-CSC) model, we herein generalize the traditional Basis Pursuit regression problem to a multi-layer setting, introducing similar sparse enforcing penalties at different representation layers in a symbiotic relation between synthesis and analysis sparse priors. We propose and analyze different iterative algorithms to solve this new problem in practice. We prove that the presented multi-layer Iterative Soft Thresholding (ML-ISTA) and multi-layer Fast ISTA (ML-FISTA) converge to the global optimum of our multi-layer formulation at a rate of $\mathcal{O}(1/k)$ and $\mathcal{O}(1/k^2)$, respectively. We further show how these algorithms effectively implement particular recurrent neural networks that generalize feed-forward architectures without any increase in the number of parameters. We demonstrate the different architectures resulting from unfolding the iterations of the proposed multi-layer pursuit algorithms, providing a principled way to construct deep recurrent CNNs from feed-forward ones. We demonstrate the emerging constructions by training them in an end-to-end manner, consistently improving the performance of classical networks without introducing extra filters or parameters. …

# If you did not already know

Multimodal Interest-Related Item Similarity Model (Multimodal IRIS)
Nowadays, the recommendation systems are applied in the fields of e-commerce, video websites, social networking sites, etc., which bring great convenience to people’s daily lives. The types of the information are diversified and abundant in recommendation systems, therefore the proportion of unstructured multimodal data like text, image and video is increasing. However, due to the representation gap between different modalities, it is intractable to effectively use unstructured multimodal data to improve the efficiency of recommendation systems. In this paper, we propose an end-to-end Multimodal Interest-Related Item Similarity model (Multimodal IRIS) to provide recommendations based on multimodal data source. Specifically, the Multimodal IRIS model consists of three modules, i.e., multimodal feature learning module, the Interest-Related Network (IRN) module and item similarity recommendation module. The multimodal feature learning module adds knowledge sharing unit among different modalities. Then IRN learn the interest relevance between target item and different historical items respectively. At last, the multimodal data feature learning, IRN and item similarity recommendation modules are unified into an integrated system to achieve performance enhancements and to accommodate the addition or absence of different modal data. Extensive experiments on real-world datasets show that, by dealing with the multimodal data which people may pay more attention to when selecting items, the proposed Multimodal IRIS significantly improves accuracy and interpretability on top-N recommendation task over the state-of-the-art methods. …

MatchZoo
In recent years, deep neural models have been widely adopted for text matching tasks, such as question answering and information retrieval, showing improved performance as compared with previous methods. In this paper, we introduce the MatchZoo toolkit that aims to facilitate the designing, comparing and sharing of deep text matching models. Specifically, the toolkit provides a unified data preparation module for different text matching problems, a flexible layer-based model construction process, and a variety of training objectives and evaluation metrics. In addition, the toolkit has implemented two schools of representative deep text matching models, namely representation-focused models and interaction-focused models. Finally, users can easily modify existing models, create and share their own models for text matching in MatchZoo.
MatchZoo

Neural Recommendation with Personalized Attention (NRPA)
Existing review-based recommendation methods usually use the same model to learn the representations of all users/items from reviews posted by users towards items. However, different users have different preference and different items have different characteristics. Thus, the same word or similar reviews may have different informativeness for different users and items. In this paper we propose a neural recommendation approach with personalized attention to learn personalized representations of users and items from reviews. We use a review encoder to learn representations of reviews from words, and a user/item encoder to learn representations of users or items from reviews. We propose a personalized attention model, and apply it to both review and user/item encoders to select different important words and reviews for different users/items. Experiments on five datasets validate our approach can effectively improve the performance of neural recommendation. …

Graph Fourier Transform (GFT)
In this paper, we propose a new regression-based algorithm to compute Graph Fourier Transform (GFT). Our algorithm allows different regularizations to be included when computing the GFT analysis components, so that the resulting components can be tuned for a specific task. We propose using the lasso penalty in our proposed framework to obtain analysis components with sparse loadings. We show that the components from this proposed {\em sparse GFT} can identify and select correlated signal sources into sub-graphs, and perform frequency analysis {\em locally} within these sub-graphs of correlated sources. Using real network traffic datasets, we demonstrate that sparse GFT can achieve outstanding performance in an anomaly detection task. …

# If you did not already know

ML Health
Deployment of machine learning (ML) algorithms in production for extended periods of time has uncovered new challenges such as monitoring and management of real-time prediction quality of a model in the absence of labels. However, such tracking is imperative to prevent catastrophic business outcomes resulting from incorrect predictions. The scale of these deployments makes manual monitoring prohibitive, making automated techniques to track and raise alerts imperative. We present a framework, ML Health, for tracking potential drops in the predictive performance of ML models in the absence of labels. The framework employs diagnostic methods to generate alerts for further investigation. We develop one such method to monitor potential problems when production data patterns do not match training data distributions. We demonstrate that our method performs better than standard ‘distance metrics’, such as RMSE, KL-Divergence, and Wasserstein at detecting issues with mismatched data sets. Finally, we present a working system that incorporates the ML Health approach to monitor and manage ML deployments within a realistic full production ML lifecycle. …

Guided Zoom
We propose Guided Zoom, an approach that utilizes spatial grounding to make more informed predictions. It does so by making sure the model has ‘the right reasons’ for a prediction, being defined as reasons that are coherent with those used to make similar correct decisions at training time. The reason/evidence upon which a deep neural network makes a prediction is defined to be the spatial grounding, in the pixel space, for a specific class conditional probability in the model output. Guided Zoom questions how reasonable the evidence used to make a prediction is. In state-of-the-art deep single-label classification models, the top-k (k = 2, 3, 4, …) accuracy is usually significantly higher than the top-1 accuracy. This is more evident in fine-grained datasets, where differences between classes are quite subtle. We show that Guided Zoom results in the refinement of a model’s classification accuracy on three finegrained classification datasets. We also explore the complementarity of different grounding techniques, by comparing their ensemble to an adversarial erasing approach that iteratively reveals the next most discriminative evidence. …

UniParse
This paper describes the design and use of the graph-based parsing framework and toolkit UniParse, released as an open-source python software package. UniParse as a framework novelly streamlines research prototyping, development and evaluation of graph-based dependency parsing architectures. UniParse does this by enabling highly efficient, sufficiently independent, easily readable, and easily extensible implementations for all dependency parser components. We distribute the toolkit with ready-made configurations as re-implementations of all current state-of-the-art first-order graph-based parsers, including even more efficient Cython implementations of both encoders and decoders, as well as the required specialised loss functions. …

Sparse Constraint Preserving Matching (SPM)
Many problems of interest in computer vision can be formulated as a problem of finding consistent correspondences between two feature sets. Feature correspondence (matching) problem with one-to-one mapping constraint is usually formulated as an Integral Quadratic Programming (IQP) problem with permutation (or orthogonal) constraint. Since it is NP-hard, relaxation models are required. One main challenge for optimizing IQP matching problem is how to incorporate the discrete one-to-one mapping (permutation) constraint in its quadratic objective optimization. In this paper, we present a new relaxation model, called Sparse Constraint Preserving Matching (SPM), for IQP matching problem. SPM is motivated by our observation that the discrete permutation constraint can be well encoded via a sparse constraint. Comparing with traditional relaxation models, SPM can incorporate the discrete one-to-one mapping constraint straightly via a sparse constraint and thus provides a tighter relaxation for original IQP matching problem. A simple yet effective update algorithm has been derived to solve the proposed SPM model. Experimental results on several feature matching tasks demonstrate the effectiveness and efficiency of SPM method. …

# If you did not already know

Data Archaeology
Data archaeology refers to the art and science of recovering computer data encoded and/or encrypted in now obsolete media or formats. Data archaeology can also refer to recovering information from damaged electronic formats after natural or man made disasters. …

UR-FUNNY
Humor is a unique and creative communicative behavior displayed during social interactions. It is produced in a multimodal manner, through the usage of words (text), gestures (vision) and prosodic cues (acoustic). Understanding humor from these three modalities falls within boundaries of multimodal language; a recent research trend in natural language processing that models natural language as it happens in face-to-face communication. Although humor detection is an established research area in NLP, in a multimodal context it is an understudied area. This paper presents a diverse multimodal dataset, called UR-FUNNY, to open the door to understanding multimodal language used in expressing humor. The dataset and accompanying studies, present a framework in multimodal humor detection for the natural language processing community. UR-FUNNY is publicly available for research. …

Firebreak Decision Problem
Suppose we have a network that is represented by a graph $G$. Potentially a fire (or other type of contagion) might erupt at some vertex of $G$. We are able to respond to this outbreak by establishing a firebreak at $k$ other vertices of $G$, so that the fire cannot pass through these fortified vertices. The question that now arises is which $k$ vertices will result in the greatest number of vertices being saved from the fire, assuming that the fire will spread to every vertex that is not fully behind the $k$ vertices of the firebreak. This is the essence of the Firebreak decision problem. …

Stochastic Substitute Training

# If you did not already know

Equilibrium-Independent Passivity-Short System (EIPS)
Maximal equilibrium-independent passivity (MEIP) is a recently introduced system property which has acquired special attention in the study of networked dynamical systems. MEIP requires a system to be passive with respect to any forced equilibrium configuration and the associated steady-state input-output map must be maximally monotone. In practice, however, most of the systems are not well behaved and possess shortage of passivity or non-passiveness in their operation. In this paper, we consider a class of passivity-short systems, namely equilibrium-independent passivity-short (EIPS) systems, and presents an input-output transformation based generalized passivation approach to ensure their MEIP properties. We characterize the steady-state input-output relations of the EIPS systems and establish their connection with that of the transformed MEIP systems. We further study the diffusively-coupled networked interactions of such EIPS systems and explore their connection to a pair of dual network optimization problems, under the proposed matrix transformation. A simulation example is given to illustrate the theoretical results. …

Balanced Linear Contextual Bandits
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for linear contextual bandits with balancing and show that our algorithms match the state of the art theoretical guarantees. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model misspecification and prejudice in the initial training data. …

Curiosity-Driven Prioritization
In Reinforcement Learning (RL), an agent explores the environment and collects trajectories into the memory buffer for later learning. However, the collected trajectories can easily be imbalanced with respect to the achieved goal states. The problem of learning from imbalanced data is a well-known problem in supervised learning, but has not yet been thoroughly researched in RL. To address this problem, we propose a novel Curiosity-Driven Prioritization (CDP) framework to encourage the agent to over-sample those trajectories that have rare achieved goal states. The CDP framework mimics the human learning process and focuses more on relatively uncommon events. We evaluate our methods using the robotic environment provided by OpenAI Gym. The environment contains six robot manipulation tasks. In our experiments, we combined CDP with Deep Deterministic Policy Gradient (DDPG) with or without Hindsight Experience Replay (HER). The experimental results show that CDP improves both performance and sample-efficiency of reinforcement learning agents, compared to state-of-the-art methods. …

Inducibility
The quantity that captures the asymptotic value of the maximum number of appearances of a given topological tree (a rooted tree with no vertices of outdegree $1$) $S$ with $k$ leaves in an arbitrary tree with sufficiently large number of leaves is called the inducibility of $S$. …

# If you did not already know

AutoFD
We study the problem of discovering functional dependencies (FD) from a noisy dataset. We focus on FDs that correspond to statistical dependencies in a dataset and draw connections between FD discovery and structure learning in probabilistic graphical models. We show that discovering FDs from a noisy dataset is equivalent to learning the structure of a graphical model over binary random variables, where each random variable corresponds to a functional of the dataset attributes. We build upon this observation to introduce AutoFD a conceptually simple framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that our methods can recover true functional dependencies across a diverse array of real-world and synthetic datasets, even in the presence of noisy or missing data. We find that AutoFD scales to large data instances with millions of tuples and hundreds of attributes while it yields an average F1 improvement of 2 times against state-of-the-art FD discovery methods. …

Sequential Attention Relational Network (SARN)
This paper proposes an attention module augmented relational network called SARN (Sequential Attention Relational Network) that can carry out relational reasoning by extracting reference objects and making efficient pairing between objects. SARN greatly reduces the computational and memory requirements of the relational network, which computes all object pairs. It also shows high accuracy on the Sort-of-CLEVR dataset compared to other models, especially on relational questions. …

Debagging
It is easy to convert a sentence into a bag of words, but it is much harder to convert a bag of words into a meaningful sentence. We name the latter the debagging problem. …

Actional-Structural Graph Convolution Network (AS-GCN)
Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higher-order dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, we further propose the actional-structural graph convolution network (AS-GCN), which stacks actional-structural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through self-supervision. We validate AS-GCN in action recognition using two skeleton data sets, NTU-RGB+D and Kinetics. The proposed AS-GCN achieves consistently large improvement compared to the state-of-the-art methods. As a side product, AS-GCN also shows promising results for future pose prediction. …

# If you did not already know

Maximal Mean Variance (MMV)
We propose a new sufficient dimension reduction approach designed deliberately for high-dimensional classification. This novel method is named maximal mean variance (MMV), inspired by the mean variance index first proposed by Cui, Li and Zhong (2015), which measures the dependence between a categorical random variable with multiple classes and a continuous random variable. Our method requires reasonably mild restrictions on the predicting variables and keeps the model-free advantage without the need to estimate the link function. The consistency of the MMV estimator is established under regularity conditions for both fixed and diverging dimension (p) cases and the number of the response classes can also be allowed to diverge with the sample size n. We also construct the asymptotic normality for the estimator when the dimension of the predicting vector is fixed. Furthermore, our method works pretty well when n < p. The surprising classification efficiency gain of the proposed method is demonstrated by simulation studies and real data analysis. …

Unsupervised Correlation Analysis (UCA)
Linking between two data sources is a basic building block in numerous computer vision problems. In this paper, we set to answer a fundamental cognitive question: are prior correspondences necessary for linking between different domains? One of the most popular methods for linking between domains is Canonical Correlation Analysis (CCA). All current CCA algorithms require correspondences between the views. We introduce a new method Unsupervised Correlation Analysis (UCA), which requires no prior correspondences between the two domains. The correlation maximization term in CCA is replaced by a combination of a reconstruction term (similar to autoencoders), full cycle loss, orthogonality and multiple domain confusion terms. Due to lack of supervision, the optimization leads to multiple alternative solutions with similar scores and we therefore introduce a consensus-based mechanism that is often able to recover the desired solution. Remarkably, this suffices in order to link remote domains such as text and images. We also present results on well accepted CCA benchmarks, showing that performance far exceeds other unsupervised baselines, and approaches supervised performance in some cases. …

AntisymmetricRNN
Recurrent neural networks have gained widespread use in modeling sequential data. Learning long-term dependencies using these models remains difficult though, due to exploding or vanishing gradients. In this paper, we draw connections between recurrent networks and ordinary differential equations. A special form of recurrent networks called the AntisymmetricRNN is proposed under this theoretical framework, which is able to capture long-term dependencies thanks to the stability property of its underlying differential equation. Existing approaches to improving RNN trainability often incur significant computation overhead. In comparison, AntisymmetricRNN achieves the same goal by design. We showcase the advantage of this new architecture through extensive simulations and experiments. AntisymmetricRNN exhibits much more predictable dynamics. It outperforms regular LSTM models on tasks requiring long-term memory and matches the performance on tasks where short-term dependencies dominate despite being much simpler. …

Cuttlefish
Modern data processing applications execute increasingly sophisticated analysis that requires operations beyond traditional relational algebra. As a result, operators in query plans grow in diversity and complexity. Designing query optimizer rules and cost models to choose physical operators for all of these novel logical operators is impractical. To address this challenge, we develop Cuttlefish, a new primitive for adaptively processing online query plans that explores candidate physical operator instances during query execution and exploits the fastest ones using multi-armed bandit reinforcement learning techniques. We prototype Cuttlefish in Apache Spark and adaptively choose operators for image convolution, regular expression matching, and relational joins. Our experiments show Cuttlefish-based adaptive convolution and regular expression operators can reach 72-99% of the throughput of an all-knowing oracle that always selects the optimal algorithm, even when individual physical operators are up to 105x slower than the optimal. Additionally, Cuttlefish achieves join throughput improvements of up to 7.5x compared with Spark SQL’s query optimizer. …

# If you did not already know

LINSPECTOR

Block Tree (BT)
The Block Tree (BT) is a novel compact data structure designed to compress sequence collections. It obtains compression ratios close to Lempel-Ziv and supports efficient direct access to any substring. The BT divides the text recursively into fixed-size blocks and those appearing earlier are represented with pointers. On repetitive collections, a few blocks can represent all the others, and thus the BT reduces the size by orders of magnitude. …

Parametrized Deep Q-Network (P-DQN)
Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely. Motivated by applications in computer games, we consider the scenario with discrete-continuous hybrid action space. To handle hybrid action space, previous works either approximate the hybrid space by discretization, or relax it into a continuous set. In this paper, we propose a parametrized deep Q-network (P- DQN) framework for the hybrid action space without approximation or relaxation. Our algorithm combines the spirits of both DQN (dealing with discrete action space) and DDPG (dealing with continuous action space) by seamlessly integrating them. Empirical results on a simulation example, scoring a goal in simulated RoboCup soccer and the solo mode in game King of Glory (KOG) validate the efficiency and effectiveness of our method. …

Barista
Pre-trained deep learning models are increasingly being used to offer a variety of compute-intensive predictive analytics services such as fitness tracking, speech and image recognition. The stateless and highly parallelizable nature of deep learning models makes them well-suited for serverless computing paradigm. However, making effective resource management decisions for these services is a hard problem due to the dynamic workloads and diverse set of available resource configurations that have their deployment and management costs. To address these challenges, we present a distributed and scalable deep-learning prediction serving system called Barista and make the following contributions. First, we present a fast and effective methodology for forecasting workloads by identifying various trends. Second, we formulate an optimization problem to minimize the total cost incurred while ensuring bounded prediction latency with reasonable accuracy. Third, we propose an efficient heuristic to identify suitable compute resource configurations. Fourth, we propose an intelligent agent to allocate and manage the compute resources by horizontal and vertical scaling to maintain the required prediction latency. Finally, using representative real-world workloads for urban transportation service, we demonstrate and validate the capabilities of Barista. …

# If you did not already know

Robotic Process Automation (RPA)
Robotic process automation (or RPA) is an emerging form of clerical process automation technology based on the notion of software robots or artificial intelligence (AI) workers. A software ‘robot’ is a software application that replicates the actions of a human being interacting with the user interface of a computer system. For example, the execution of data entry into an ERP system – or indeed a full end-to-end business process – would be a typical activity for a software robot. The software robot operates on the user interface (UI) in the same way that a human would; this is a significant departure from traditional forms of IT integration which have historically been based on Application Programming Interfaces (or APIs) – that is to say, machine-to-machine forms of communication based on data layers which operate at an architectural layer beneath the UI. …

Covariate Gaussian Process Latent Variable Model (c-GPLVM)
Gaussian Process Regression (GPR) and Gaussian Process Latent Variable Models (GPLVM) offer a principled way of performing probabilistic non-linear regression and dimensionality reduction. In this paper we propose a hybrid between the two, the covariate-GPLVM (c-GPLVM), to perform dimensionality reduction in the presence of covariate information (e.g. continuous covariates, class labels, or censored survival times). This construction lets us adjust for covariate effects and reveals meaningful latent structure which is not revealed when using GPLVM. Furthermore, we introduce structured decomposable kernels which will let us interpret how the fixed and latent inputs contribute to feature-level variation, e.g. identify the presence of a non-linear interaction. We demonstrate the utility of this model on applications in disease progression modelling from high-dimensional gene expression data in the presence of additional phenotypes. …

Memory-Limited Online Subspace Estimation Scheme (MOSES)
This paper introduces Memory-limited Online Subspace Estimation Scheme (MOSES) for both estimating the principal components of data and reducing its dimension. More specifically, consider a scenario where the data vectors are presented sequentially to a user who has limited storage and processing time available, for example in the context of sensor networks. In this scenario, MOSES maintains an estimate of leading principal components of the data that has arrived so far and also reduces its dimension. In terms of its origins, MOSES slightly generalises the popular incremental Singular Vale Decomposition (SVD) to handle thin blocks of data. This simple generalisation is in part what allows us to complement MOSES with a comprehensive statistical analysis that is not available for incremental SVD, despite its empirical success. This generalisation also enables us to concretely interpret MOSES as an approximate solver for the underlying non-convex optimisation program. We also find that MOSES shows state-of-the-art performance in our numerical experiments with both synthetic and real-world datasets. …

Stein’s example (or phenomenon or paradox), in decision theory and estimation theory, is the phenomenon that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on average (that is, having lower expected mean-squared error) than any method that handles the parameters separately.
An intuitive explanation is that optimizing for the mean-squared error of a combined estimator is not the same as optimizing for the errors of separate estimators of the individual parameters. In practical terms, if the combined error is in fact of interest, then a combined estimator should be used, even if the underlying parameters are independent; this occurs in channel estimation in telecommunications, for instance (different factors affect overall channel performance). On the other hand, if one is instead interested in estimating an individual parameter, then using a combined estimator does not help and is in fact worse.

# If you did not already know

Contour-Aware Informative Aggregation Network (CIA-Net)
Accurate segmenting nuclei instances is a crucial step in computer-aided image analysis to extract rich features for cellular estimation and following diagnosis as well as treatment. While it still remains challenging because the wide existence of nuclei clusters, along with the large morphological variances among different organs make nuclei instance segmentation susceptible to over-/under-segmentation. Additionally, the inevitably subjective annotating and mislabeling prevent the network learning from reliable samples and eventually reduce the generalization capability for robustly segmenting unseen organ nuclei. To address these issues, we propose a novel deep neural network, namely Contour-aware Informative Aggregation Network (CIA-Net) with multi-level information aggregation module between two task-specific decoders. Rather than independent decoders, it leverages the merit of spatial and texture dependencies between nuclei and contour by bi-directionally aggregating task-specific features. Furthermore, we proposed a novel smooth truncated loss that modulates losses to reduce the perturbation from outliers. Consequently, the network can focus on learning from reliable and informative samples, which inherently improves the generalization capability. Experiments on the 2018 MICCAI challenge of Multi-Organ-Nuclei-Segmentation validated the effectiveness of our proposed method, surpassing all the other 35 competitive teams by a significant margin. …

Parenting
Autonomous agents trained via reinforcement learning present numerous safety concerns: reward hacking, negative side effects, and unsafe exploration, among others. In the context of near-future autonomous agents, operating in environments where humans understand the existing dangers, human involvement in the learning process has proved a promising approach to AI Safety. Here we demonstrate that a precise framework for learning from human input, loosely inspired by the way humans parent children, solves a broad class of safety problems in this context. We show that our Parenting algorithm solves these problems in the relevant AI Safety gridworlds of Leike et al. (2017), that an agent can learn to outperform its parent as it ‘matures’, and that policies learnt through Parenting are generalisable to new environments. …

DUal view Point deep Learning architecture for time series classificatiOn (DuPLO)
Nowadays, modern Earth Observation systems continuously generate huge amounts of data. A notable example is represented by the Sentinel-2 mission, which provides images at high spatial resolution (up to 10m) with high temporal revisit period (every 5 days), which can be organized in Satellite Image Time Series (SITS). While the use of SITS has been proved to be beneficial in the context of Land Use/Land Cover (LULC) map generation, unfortunately, machine learning approaches commonly leveraged in remote sensing field fail to take advantage of spatio-temporal dependencies present in such data. Recently, new generation deep learning methods allowed to significantly advance research in this field. These approaches have generally focused on a single type of neural network, i.e., Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs), which model different but complementary information: spatial autocorrelation (CNNs) and temporal dependencies (RNNs). In this work, we propose the first deep learning architecture for the analysis of SITS data, namely \method{} (DUal view Point deep Learning architecture for time series classificatiOn), that combines Convolutional and Recurrent neural networks to exploit their complementarity. Our hypothesis is that, since CNNs and RNNs capture different aspects of the data, a combination of both models would produce a more diverse and complete representation of the information for the underlying land cover classification task. Experiments carried out on two study sites characterized by different land cover characteristics (i.e., the \textit{Gard} site in France and the \textit{Reunion Island} in the Indian Ocean), demonstrate the significance of our proposal. …

Knockoff Net
Machine Learning (ML) models are increasingly deployed in the wild to perform a wide range of tasks. In this work, we ask to what extent can an adversary steal functionality of such ‘victim’ models based solely on blackbox interactions: image in, predictions out. In contrast to prior work, we present an adversary lacking knowledge of train/test data used by the model, its internals, and semantics over model outputs. We formulate model functionality stealing as a two-step approach: (i) querying a set of input images to the blackbox model to obtain predictions; and (ii) training a ‘knockoff’ with queried image-prediction pairs. We make multiple remarkable observations: (a) querying random images from a different distribution than that of the blackbox training data results in a well-performing knockoff; (b) this is possible even when the knockoff is represented using a different architecture; and (c) our reinforcement learning approach additionally improves query sample efficiency in certain settings and provides performance gains. We validate model functionality stealing on a range of datasets and tasks, as well as on a popular image analysis API where we create a reasonable knockoff for as little as \$30. …