MALTS: Matching After Learning to Stretch

We introduce a flexible framework for matching in causal inference that produces high quality almost-exact matches. Most prior work in matching uses ad hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates that degrade the distance metric. In this work, we learn an interpretable distance metric used for matching, which leads to substantially higher quality matches. The distance metric can stretch continuous covariates and matches exactly on categorical covariates. The framework is flexible in that the user can choose the form of distance metric, the type of optimization algorithm, and the type of relaxation for matching. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for estimation of conditional average treatment effects.

Stochastic Deep Networks

Machine learning is increasingly targeting areas where input data cannot be accurately described by a single vector, but can be modeled instead using the more flexible concept of random vectors, namely probability measures or more simply point clouds of varying cardinality. Using deep architectures on measures poses, however, many challenging issues. Indeed, deep architectures are originally designed to handle fixedlength vectors, or, using recursive mechanisms, ordered sequences thereof. In sharp contrast, measures describe a varying number of weighted observations with no particular order. We propose in this work a deep framework designed to handle crucial aspects of measures, namely permutation invariances, variations in weights and cardinality. Architectures derived from this pipeline can (i) map measures to measures – using the concept of push-forward operators; (ii) bridge the gap between measures and Euclidean spaces – through integration steps. This allows to design discriminative networks (to classify or reduce the dimensionality of input measures), generative architectures (to synthesize measures) and recurrent pipelines (to predict measure dynamics). We provide a theoretical analysis of these building blocks, review our architectures’ approximation abilities and robustness w.r.t. perturbation, and try them on various discriminative and generative tasks.

The PyTorch-Kaldi Speech Recognition Toolkit

The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous interest within the machine learning community thanks to its simplicity and flexibility. The PyTorch-Kaldi project aims to bridge the gap between these popular toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch-Kaldi is not only a simple interface between these software, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug-in user-defined acoustic models. As an alternative, users can exploit several pre-implemented neural networks that can be customized using intuitive configuration files. PyTorch-Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly-released along with a rich documentation and is designed to properly work locally or on HPC clusters. Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.

Unsupervised Domain Adaptation: An Adaptive Feature Norm Approach

Unsupervised domain adaptation aims to mitigate the domain shift when transferring knowledge from a supervised source domain to an unsupervised target domain. Adversarial Feature Alignment has been successfully explored to minimize the domain discrepancy. However, existing methods are usually struggling to optimize mixed learning objectives and vulnerable to negative transfer when two domains do not share the identical label space. In this paper, we empirically reveal that the erratic discrimination of target domain mainly reflects in its much lower feature norm value with respect to that of the source domain. We present a non-parametric Adaptive Feature Norm AFN approach, which is independent of the association between label spaces of the two domains. We demonstrate that adapting feature norms of source and target domains to achieve equilibrium over a large range of values can result in significant domain transfer gains. Without bells and whistles but a few lines of code, our method largely lifts the discrimination of target domain (23.7\% from the Source Only in VisDA2017) and achieves the new state of the art under the vanilla setting. Furthermore, as our approach does not require to deliberately align the feature distributions, it is robust to negative transfer and can outperform the existing approaches under the partial setting by an extremely large margin (9.8\% on Office-Home and 14.1\% on VisDA2017). Code is available at https://…/AFN. We are responsible for the reproducibility of our method.

An Efficient Transfer Learning Technique by Using Final Fully-Connected Layer Output Features of Deep Networks

In this paper, we propose a computationally efficient transfer learning approach using the output vector of final fully-connected layer of deep convolutional neural networks for classification. Our proposed technique uses a single layer perceptron classifier designed with hyper-parameters to focus on improving computational efficiency without adversely affecting the performance of classification compared to the baseline technique. Our investigations show that our technique converges much faster than baseline yielding very competitive classification results. We execute thorough experiments to understand the impact of similarity between pre-trained and new classes, similarity among new classes, number of training samples in the performance of classification using transfer learning of the final fully-connected layer’s output features.

Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics

Natural spatiotemporal processes can be highly non-stationary in many ways, e.g. the low-level non-stationarity such as spatial correlations or temporal dependencies of local pixel values; and the high-level variations such as the accumulation, deformation or dissipation of radar echoes in precipitation forecasting. From Cramer’s Decomposition, any non-stationary process can be decomposed into deterministic, time-variant polynomials, plus a zero-mean stochastic term. By applying differencing operations appropriately, we may turn time-variant polynomials into a constant, making the deterministic component predictable. However, most previous recurrent neural networks for spatiotemporal prediction do not use the differential signals effectively, and their relatively simple state transition functions prevent them from learning too complicated variations in spacetime. We propose the Memory In Memory (MIM) networks and corresponding recurrent blocks for this purpose. The MIM blocks exploit the differential signals between adjacent recurrent states to model the non-stationary and approximately stationary properties in spatiotemporal dynamics with two cascaded, self-renewed memory modules. By stacking multiple MIM blocks, we could potentially handle higher-order non-stationarity. The MIM networks achieve the state-of-the-art results on three spatiotemporal prediction tasks across both synthetic and real-world datasets. We believe that the general idea of this work can be potentially applied to other time-series forecasting tasks.

DEXON: A Highly Scalable, Decentralized DAG-Based Consensus Algorithm

A blockchain system is a replicated state machine that must be fault tolerant. When designing a blockchain system, there is usually a trade-off between decentralization, scalability, and security. In this paper, we propose a novel blockchain system, DEXON, which achieves high scalability while remaining decentralized and robust in the real-world environment. We have two main contributions. First, we present a highly scalable sharding framework for blockchain. This framework takes an arbitrary number of single chains and transforms them into the \textit{blocklattice} data structure, enabling \textit{high scalability} and \textit{low transaction confirmation latency} with asymptotically optimal communication overhead. Second, we propose a single-chain protocol based on our novel verifiable random function and a new Byzantine agreement that achieves high decentralization and low latency.

Variational Bayesian Dropout

Variational dropout (VD) is a generalization of Gaussian dropout, which aims at inferring the posterior of network weights based on a log-uniform prior on them to learn these weights as well as dropout rate simultaneously. The log-uniform prior not only interprets the regularization capacity of Gaussian dropout in network training, but also underpins the inference of such posterior. However, the log-uniform prior is an improper prior (i.e., its integral is infinite) which causes the inference of posterior to be ill-posed, thus restricting the regularization performance of VD. To address this problem, we present a new generalization of Gaussian dropout, termed variational Bayesian dropout (VBD), which turns to exploit a hierarchical prior on the network weights and infer a new joint posterior. Specifically, we implement the hierarchical prior as a zero-mean Gaussian distribution with variance sampled from a uniform hyper-prior. Then, we incorporate such a prior into inferring the joint posterior over network weights and the variance in the hierarchical prior, with which both the network training and the dropout rate estimation can be cast into a joint optimization problem. More importantly, the hierarchical prior is a proper prior which enables the inference of posterior to be well-posed. In addition, we further show that the proposed VBD can be seamlessly applied to network compression. Experiments on both classification and network compression tasks demonstrate the superior performance of the proposed VBD in terms of regularizing network training.

Contributors profile modelization in crowdsourcing platforms

The crowdsourcing consists in the externalisation of tasks to a crowd of people remunerated to execute this ones. The crowd, usually diversified, can include users without qualification and/or motivation for the tasks. In this paper we will introduce a new method of user expertise modelization in the crowdsourcing platforms based on the theory of belief functions in order to identify serious and qualificated users.

Fine-grained Classification using Heterogeneous Web Data and Auxiliary Categories

Fine-grained classification remains a very challenging problem, because of the absence of well-labeled training data caused by the high cost of annotating a large number of fine-grained categories. In the extreme case, given a set of test categories without any well-labeled training data, the majority of existing works can be grouped into the following two research directions: 1) crawl noisy labeled web data for the test categories as training data, which is dubbed as webly supervised learning; 2) transfer the knowledge from auxiliary categories with well-labeled training data to the test categories, which corresponds to zero-shot learning setting. Nevertheless, the above two research directions still have critical issues to be addressed. For the first direction, web data have noisy labels and considerably different data distribution from test data. For the second direction, zero-shot learning is struggling to achieve compelling results compared with conventional supervised learning. The issues of the above two directions motivate us to develop a novel approach which can jointly exploit both noisy web training data from test categories and well-labeled training data from auxiliary categories. In particular, on one hand, we crawl web data for test categories as noisy training data. On the other hand, we transfer the knowledge from auxiliary categories with well-labeled training data to test categories by virtue of free semantic information (e.g., word vector) of all categories. Moreover, given the fact that web data are generally associated with additional textual information (e.g., title and tag), we extend our method by using the surrounding textual information of web data as privileged information. Extensive experiments show the effectiveness of our proposed methods.

Deep Active Learning with a Neural Architecture Search

We consider active learning of deep neural networks. Most active learning works in this context have focused on studying effective querying mechanisms and assumed that an appropriate network architecture is a priori known for the problem at hand. We challenge this assumption and propose a novel active strategy whereby the learning algorithm searches for effective architectures on the fly, while actively learning. We apply our strategy using three known querying techniques (softmax response, MC-dropout, and coresets) and show that the proposed approach overwhelmingly outperforms active learning using fixed architectures.

Self-Referenced Deep Learning

Knowledge distillation is an effective approach to transferring knowledge from a teacher neural network to a student target network for satisfying the low-memory and fast running requirements in practice use. Whilst being able to create stronger target networks compared to the vanilla non-teacher based learning strategy, this scheme needs to train additionally a large teacher model with expensive computational cost. In this work, we present a Self-Referenced Deep Learning (SRDL) strategy. Unlike both vanilla optimisation and existing knowledge distillation, SRDL distils the knowledge discovered by the in-training target model back to itself to regularise the subsequent learning procedure therefore eliminating the need for training a large teacher model. SRDL improves the model generalisation performance compared to vanilla learning and conventional knowledge distillation approaches with negligible extra computational cost. Extensive evaluations show that a variety of deep networks benefit from SRDL resulting in enhanced deployment performance on both coarse-grained object categorisation tasks (CIFAR10, CIFAR100, Tiny ImageNet, and ImageNet) and fine-grained person instance identification tasks (Market-1501).

A Trustworthy, Responsible and Interpretable System to Handle Chit-Chat in Conversational Bots

Most often, chat-bots are built to solve the purpose of a search engine or a human assistant: Their primary goal is to provide information to the user or help them complete a task. However, these chat-bots are incapable of responding to unscripted queries like ‘Hi, what’s up’, ‘What’s your favourite food’. Human evaluation judgments show that 4 humans come to a consensus on the intent of a given query which is from chat domain only 77% of the time, thus making it evident how non-trivial this task is. In our work, we show why it is difficult to break the chitchat space into clearly defined intents. We propose a system to handle this task in chat-bots, keeping in mind scalability, interpretability, appropriateness, trustworthiness, relevance and coverage. Our work introduces a pipeline for query understanding in chitchat using hierarchical intents as well as a way to use seq-seq auto-generation models in professional bots. We explore an interpretable model for chat domain detection and also show how various components such as adult/offensive classification, grammars/regex patterns, curated personality based responses, generic guided evasive responses and response generation models can be combined in a scalable way to solve this problem.

Outlier Aware Network Embedding for Attributed Networks

Attributed network embedding has received much interest from the research community as most of the networks come with some content in each node, which is also known as node attributes. Existing attributed network approaches work well when the network is consistent in structure and attributes, and nodes behave as expected. But real world networks often have anomalous nodes. Typically these outliers, being relatively unexplainable, affect the embeddings of other nodes in the network. Thus all the downstream network mining tasks fail miserably in the presence of such outliers. Hence an integrated approach to detect anomalies and reduce their overall effect on the network embedding is required. Towards this end, we propose an unsupervised outlier aware network embedding algorithm (ONE) for attributed networks, which minimizes the effect of the outlier nodes, and hence generates robust network embeddings. We align and jointly optimize the loss functions coming from structure and attributes of the network. To the best of our knowledge, this is the first generic network embedding approach which incorporates the effect of outliers for an attributed network without any supervision. We experimented on publicly available real networks and manually planted different types of outliers to check the performance of the proposed algorithm. Results demonstrate the superiority of our approach to detect the network outliers compared to the state-of-the-art approaches. We also consider different downstream machine learning applications on networks to show the efficiency of ONE as a generic network embedding technique. The source code is made available at https://…/ONE.

An efficient density-based clustering algorithm using reverse nearest neighbour

Density-based clustering is the task of discovering high-density regions of entities (clusters) that are separated from each other by contiguous regions of low-density. DBSCAN is, arguably, the most popular density-based clustering algorithm. However, its cluster recovery capabilities depend on the combination of the two parameters. In this paper we present a new density-based clustering algorithm which uses reverse nearest neighbour (RNN) and has a single parameter. We also show that it is possible to estimate a good value for this parameter using a clustering validity index. The RNN queries enable our algorithm to estimate densities taking more than a single entity into account, and to recover clusters that are not well-separated or have different densities. Our experiments on synthetic and real-world data sets show our proposed algorithm outperforms DBSCAN and its recent variant ISDBSCAN.

Chat More If You Like: Dynamic Cue Words Planning to Flow Longer Conversations

To build an open-domain multi-turn conversation system is one of the most interesting and challenging tasks in Artificial Intelligence. Many research efforts have been dedicated to building such dialogue systems, yet few shed light on modeling the conversation flow in an ongoing dialogue. Besides, it is common for people to talk about highly relevant aspects during a conversation. And the topics are coherent and drift naturally, which demonstrates the necessity of dialogue flow modeling. To this end, we present the multi-turn cue-words driven conversation system with reinforcement learning method (RLCw), which strives to select an adaptive cue word with the greatest future credit, and therefore improve the quality of generated responses. We introduce a new reward to measure the quality of cue words in terms of effectiveness and relevance. To further optimize the model for long-term conversations, a reinforcement approach is adopted in this paper. Experiments on real-life dataset demonstrate that our model consistently outperforms a set of competitive baselines in terms of simulated turns, diversity and human evaluation.

An Influence-based Clustering Model on Twitter

This paper introduces a temporal framework for detecting and clustering emergent and viral topics on social networks. Endogenous and exogenous influence on developing viral content is explored using a clustering method based on the a user’s behavior on social network and a dataset from Twitter API. Results are discussed by introducing metrics such as popularity, burstiness, and relevance score. The results show clear distinction in characteristics of developed content by the two classes of users.

When Conventional machine learning meets neuromorphic engineering: Deep Temporal Networks (DTNets) a machine learning frawmework allowing to operate on Events and Frames and implantable on Tensor Flow Like Hardware

We introduce in this paper the principle of Deep Temporal Networks that allow to add time to convolutional networks by allowing deep integration principles not only using spatial information but also increasingly large temporal window. The concept can be used for conventional image inputs but also event based data. Although inspired by the architecture of brain that inegrates information over increasingly larger spatial but also temporal scales it can operate on conventional hardware using existing architectures. We introduce preliminary results to show the efficiency of the method. More in-depth results and analysis will be reported soon!

Complexity Analysis of a Sampling-Based Interior Point Method for Convex Optimization

We develop a short-step interior point method to optimize a linear function over a convex body assuming that one only knows a membership oracle for this body. The approach is based on Abernethy and Hazan’s sketch of a universal interior point method using the so-called entropic barrier [arXiv 1507.02528v2, 2015]. It is well-known that the gradient and Hessian of the entropic barrier can be approximated by sampling from Boltzmann-Gibbs distributions, and the entropic barrier was shown to be self-concordant by Bubeck and Eldan [arXiv 1412.1587v3, 2015]. The analysis of our algorithm uses properties of the entropic barrier, mixing times for hit-and-run random walks by Lov\’asz and Vempala [Foundations of Computer Science, 2006], approximation quality guarantees for the mean and covariance of a log-concave distribution, and results from De Klerk, Glineur and Taylor on inexact Newton-type methods [arXiv 1709.0519, 2017].

Efficient keyword spotting using dilated convolutions and gating

We explore the application of end-to-end stateless temporal modeling to small-footprint keyword spotting as opposed to recurrent networks that model long-term temporal dependencies using internal states. We propose a model inspired by the recent success of dilated convolutions in sequence modeling applications, allowing to train deeper architectures in resource-constrained configurations. Gated activations and residual connections are also added, following a similar configuration to WaveNet. In addition, we apply a custom target labeling that back-propagates loss from specific frames of interest, therefore yielding higher accuracy and only requiring to detect the end of the keyword. Our experimental results show that our model outperforms a max-pooling loss trained recurrent neural network using LSTM cells, with a significant decrease in false rejection rate. The underlying dataset – ‘Hey Snips’ utterances recorded by over 2.2K different speakers – has been made publicly available to establish an open reference for wake-word detection.

Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?

Yes, they do. This work investigates a perspective for deep learning: whether different normalization layers in a ConvNet require different normalizers. This is the first step towards understanding this phenomenon. We allow each convolutional layer to be stacked before a switchable normalization (SN) that learns to choose a normalizer from a pool of normalization methods. Through systematic experiments in ImageNet, COCO, Cityscapes, and ADE20K, we answer three questions: (a) Is it useful to allow each normalization layer to select its own normalizer? (b) What impacts the choices of normalizers? (c) Do different tasks and datasets prefer different normalizers? Our results suggest that (1) using distinct normalizers improves both learning and generalization of a ConvNet; (2) the choices of normalizers are more related to depth and batch size, but less relevant to parameter initialization, learning rate decay, and solver; (3) different tasks and datasets have different behaviors when learning to select normalizers.

Reinforcement Learning with A* and a Deep Heuristic

A* is a popular path-finding algorithm, but it can only be applied to those domains where a good heuristic function is known. Inspired by recent methods combining Deep Neural Networks (DNNs) and trees, this study demonstrates how to train a heuristic represented by a DNN and combine it with A*. This new algorithm which we call aleph-star can be used efficiently in domains where the input to the heuristic could be processed by a neural network. We compare aleph-star to N-Step Deep Q-Learning (DQN Mnih et al. 2013) in a driving simulation with pixel-based input, and demonstrate significantly better performance in this scenario.

How far from automatically interpreting deep learning

In recent years, deep learning researchers have focused on how to find the interpretability behind deep learning models. However, today cognitive competence of human has not completely covered the deep learning model. In other words, there is a gap between the deep learning model and the cognitive mode. How to evaluate and shrink the cognitive gap is a very important issue. In this paper, the interpretability evaluation, the relationship between the generalization performance and the interpretability of the model and the method for improving the interpretability are concerned. A universal learning framework is put forward to solve the equilibrium problem between the two performances. The uniqueness of solution of the problem is proved and condition of unique solution is obtained. Probability upper bound of the sum of the two performances is analyzed.

Building Efficient Deep Neural Networks with Unitary Group Convolutions

We propose unitary group convolutions (UGConvs), a building block for CNNs which compose a group convolution with unitary transforms in feature space to learn a richer set of representations than group convolution alone. UGConvs generalize two disparate ideas in CNN architecture, channel shuffling (i.e. ShuffleNet) and block-circulant networks (i.e. CirCNN), and provide unifying insights that lead to a deeper understanding of each technique. We experimentally demonstrate that dense unitary transforms can outperform channel shuffling in DNN accuracy. On the other hand, different dense transforms exhibit comparable accuracy performance. Based on these observations we propose HadaNet, a UGConv network using Hadamard transforms. HadaNets achieve similar accuracy to circulant networks with lower computation complexity, and better accuracy than ShuffleNets with the same number of parameters and floating-point multiplies.

Weighted Ensemble of Statistical Models

We present a detailed description of our submission for the M4 forecasting competition, in which it ranked 3rd overall. Our solution utilizes several commonly used statistical models, which are weighted according to their performance on historical data. We cluster series within each type of frequency with respect to the existence of trend and seasonality. Every class of series is assigned a different set of algorithms to combine. We conduct experiments with holdout set to manually pick pools of models that perform best for a given series type, as well as to choose the combination approaches.

How to Use Heuristics for Differential Privacy

We develop theory for using heuristics to solve computationally hard problems in differential privacy. Heuristic approaches have enjoyed tremendous success in machine learning, for which performance can be empirically evaluated. However, privacy guarantees cannot be evaluated empirically, and must be proven — without making heuristic assumptions. We show that learning problems over broad classes of functions can be solved privately and efficiently, assuming the existence of a non-private oracle for solving the same problem. Our first algorithm yields a privacy guarantee that is contingent on the correctness of the oracle. We then give a reduction which applies to a class of heuristics which we call certifiable, which allows us to convert oracle-dependent privacy guarantees to worst-case privacy guarantee that hold even when the heuristic standing in for the oracle might fail in adversarial ways. Finally, we consider a broad class of functions that includes most classes of simple boolean functions studied in the PAC learning literature, including conjunctions, disjunctions, parities, and discrete halfspaces. We show that there is an efficient algorithm for privately constructing synthetic data for any such class, given a non-private learning oracle. This in particular gives the first oracle-efficient algorithm for privately generating synthetic data for contingency tables. The most intriguing question left open by our work is whether or not every problem that can be solved differentially privately can be privately solved with an oracle-efficient algorithm. While we do not resolve this, we give a barrier result that suggests that any generic oracle-efficient reduction must fall outside of a natural class of algorithms (which includes the algorithms given in this paper).

Explicit Bias Discovery in Visual Question Answering Models

Researchers have observed that Visual Question Answering (VQA) models tend to answer questions by learning statistical biases in the data. For example, their answer to the question ‘What is the color of the grass?’ is usually ‘Green’, whereas a question like ‘What is the title of the book?’ cannot be answered by inferring statistical biases. It is of interest to the community to explicitly discover such biases, both for understanding the behavior of such models, and towards debugging them. Our work address this problem. In a database, we store the words of the question, answer and visual words corresponding to regions of interest in attention maps. By running simple rule mining algorithms on this database, we discover human-interpretable rules which give us unique insight into the behavior of such models. Our results also show examples of unusual behaviors learned by models in attempting VQA tasks.

Deeper Interpretability of Deep Networks

Deep Convolutional Neural Networks (CNNs) have been one of the most influential recent developments in computer vision, particularly for categorization. There is an increasing demand for explainable AI as these systems are deployed in the real world. However, understanding the information represented and processed in CNNs remains in most cases challenging. Within this paper, we explore the use of new information theoretic techniques developed in the field of neuroscience to enable novel understanding of how a CNN represents information. We trained a 10-layer ResNet architecture to identify 2,000 face identities from 26M images generated using a rigorously controlled 3D face rendering model that produced variations of intrinsic (i.e. face morphology, gender, age, expression and ethnicity) and extrinsic factors (i.e. 3D pose, illumination, scale and 2D translation). With our methodology, we demonstrate that unlike human’s network overgeneralizes face identities even with extreme changes of face shape, but it is more sensitive to changes of texture. To understand the processing of information underlying these counterintuitive properties, we visualize the features of shape and texture that the network processes to identify faces. Then, we shed a light into the inner workings of the black box and reveal how hidden layers represent these features and whether the representations are invariant to pose. We hope that our methodology will provide an additional valuable tool for interpretability of CNNs.

Sampling on Social Networks from a Decision Theory Perspective

Some of the most used sampling mechanisms that propagate through a social network are defined in terms of tuning parameters, for instance, Respondent-Driven Sampling (RDS) is specified by the number of seeds and maximum number of referrals. We are interested in the problem of optimising these tuning parameters with the purpose of improving the inference of a population quantity, where such quantity is a function of the network and measurements taken at the nodes. This is done by formulating the problem in terms of Decision Theory. The optimisation procedure for different sampling mechanisms is illustrated via simulations in the fashion of the ones used for Bayesian clinical trials.

On the Network Visibility Problem

Social media is an attention economy where users are constantly competing for attention in their followers’ feeds. Users are likely to elicit greater attention from their followers, their audience, if their posts remain visible at the top of their followers’ feeds for a longer period of time. However, this depends on the rate at which their followers receive information in their feeds, which in turn depends on the users their followers follow. Then, who should follow whom to maximize the visibility each user achieve? In this paper, we represent users’ posts and feeds using the framework of temporal point processes. Under this representation, the problem reduces to optimizing a non-submodular nondecreasing set function under matroid constraints. Then, we show that the set function satisfies a novel property, \xi-submodularity, which allows a simple and efficient greedy algorithm to enjoy theoretical guarantees. In particular, we prove that the greedy algorithm offers a (1/\xi + 1) approximation factor, where \xi is the strong submodularity ratio, a new measure of approximate submodularity that we are able to bound in our problem. Experiments on both synthetic and real data gathered from Twitter show that our greedy algorithm is able to consistently outperform several baselines.

Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions

A recent flurry of research activity has attempted to quantitatively define ‘fairness’ for decisions based on statistical and machine learning (ML) predictions. The rapid growth of this new field has led to wildly inconsistent terminology and notation, presenting a serious challenge for cataloguing and comparing definitions. This paper attempts to bring much-needed order. First, we explicate the various choices and assumptions made—often implicitly—to justify the use of prediction-based decisions. Next, we show how such choices and assumptions can raise concerns about fairness and we present a notationally consistent catalogue of fairness definitions from the ML literature. In doing so, we offer a concise reference for thinking through the choices, assumptions, and fairness considerations of prediction-based decision systems.

Scalable agent alignment via reward modeling: a research direction

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task objective. This gives rise to the agent alignment problem: how do we create agents that behave in accordance with the user’s intentions? We outline a high-level research direction to solve the agent alignment problem centered around reward modeling: learning a reward function from interaction with the user and optimizing the learned reward function with reinforcement learning. We discuss the key challenges we expect to face when scaling reward modeling to complex and general domains, concrete approaches to mitigate these challenges, and ways to establish trust in the resulting agents.

Guiding Policies with Language via Meta-Learning

Behavioral skills or policies for autonomous agents are conventionally learned from reward functions, via reinforcement learning, or from demonstrations, via imitation learning. However, both modes of task specification have their disadvantages: reward functions require manual engineering, while demonstrations require a human expert to be able to actually perform the task in order to generate the demonstration. Instruction following from natural language instructions provides an appealing alternative: in the same way that we can specify goals to other humans simply by speaking or writing, we would like to be able to specify tasks for our machines. However, a single instruction may be insufficient to fully communicate our intent or, even if it is, may be insufficient for an autonomous agent to actually understand how to perform the desired task. In this work, we propose an interactive formulation of the task specification problem, where iterative language corrections are provided to an autonomous agent, guiding it in acquiring the desired skill. Our proposed language-guided policy learning algorithm can integrate an instruction and a sequence of corrections to acquire new skills very quickly. In our experiments, we show that this method can enable a policy to follow instructions and corrections for simulated navigation and manipulation tasks, substantially outperforming direct, non-interactive instruction following.

Patterns in Random Permutations

Every k entries in a permutation can have one of k! different relative orders, called patterns. How many times does each pattern occur in a large random permutation of size n? The distribution of this k!-dimensional vector of pattern densities was studied by Janson, Nakamura, and Zeilberger (2015). Their analysis showed that some component of this vector is asymptotically multinormal of order 1/sqrt(n), while the orthogonal component is smaller. Using representations of the symmetric group, and the theory of U-statistics, we refine the analysis of this distribution. We show that it decomposes into k asymptotically uncorrelated components of different orders in n, that correspond to representations of Sk. Some combinations of pattern densities that arise in this decomposition have interpretations as practical nonparametric statistical tests.

The problematic nature of potentially polynomial-time algorithms solving the subset-sum problem
Compact localized states of open scattering media
Analyticity results in Bernoulli Percolation
Multimodal Densenet
Optimal H2 moment matching-based model reduction for linear systems by (non)convex optimization
Domain expansion and transient scaling regimes in population networks with in-domain cyclic selection
The Preemptive Resource Allocation Problem
Realtime Scheduling and Power Allocation Using Deep Neural Networks
PerSIM: Multi-resolution Image Quality Assessment in the Perceptually Uniform Color Domain
Periodic switching strategies for an isoperimetric control problem with application to nonlinear chemical reactions
Harmonic Recomposition using Conditional Autoregressive Modeling
The core consistency of a compressed tensor
Understanding and Measuring Psychological Stress using Social Media
Pixel-Anchor: A Fast Oriented Scene Text Detector with Combined Networks
Non-Hermitian Quasi-Localization and Ring Attractor Neural Networks
Limitations of Source-Filter Coupling In Phonation
On buildings that compute. A proposal
Learning to Generate the ‘Unseen’ via Part Synthesis and Composition
Predictive and Semantic Layout Estimation for Robotic Applications in Manhattan Worlds
Sorting permutations with a transposition tree
High-precision timing and frequency synchronization method for MIMO-OFDM systems in double-selective channels
Testing local properties of arrays
Regular and biregular planar cages
An Investigation on Partitions with Equal Products
Product of sumsets over arbitrary finite fields
On Geometric Alignment in Low Doubling Dimension
Generalizable Adversarial Training via Spectral Normalization
Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection
Indoor GeoNet: Weakly Supervised Hybrid Learning for Depth and Pose Estimation
Towards Nearly-linear Time Algorithms for Submodular Maximization with a Matroid Constraint
Bayesian CycleGAN via Marginalizing Latent Sampling
Multi-scale 3D Convolution Network for Video Based Person Re-Identification
Exploring Small-World Network with an Elite-Clique: Bringing Embeddedness Theory into the Dynamic Evolution of a Venture Capital Network
Denoising and Completion of Structured Low-Rank Matrices via Iteratively Reweighted Least Squares
On the Sweep Map for $\vec{k}$-Dyck Paths
Best-arm identification with cascading bandits
Global and Local Sensitivity Guided Key Salient Object Re-augmentation for Video Saliency Detection
Intersection theorems for families of matchings of complete $k$-partite $k$-graphs
Minimum degree condition for a graph to be knitted
Show, Attend and Translate: Unpaired Multi-Domain Image-to-Image Translation with Visual Attention
Reducing Visual Confusion with Discriminative Attention
Visual-Texual Emotion Analysis with Deep Coupled Video and Danmu Neural Networks
Re-Identification with Consistent Attentive Siamese Networks
Quantifying Human Behavior on the Block Design Test Through Automated Multi-Level Analysis of Overhead Video
A Self-Adaptive Network For Multiple Sclerosis Lesion Segmentation From Multi-Contrast MRI With Various Imaging Protocols
DeepSeeNet: A deep learning model for automated classification of patient-based age-related macular degeneration severity from color fundus photographs
FotonNet: A HW-Efficient Object Detection System Using 3D-Depth Segmentation and 2D-DNN Classifier
A Comparative Analysis of Content-based Geolocation in Blogs and Tweets
Robust Visual Tracking using Multi-Frame Multi-Feature Joint Modeling
Optimal Iterative Threshold-Kernel Estimation of Jump Diffusion Processes
Fast Efficient Object Detection Using Selective Attention
Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition
Low Complexity Iterative Detection for a Large-scale Distributed MIMO Prototyping System
Modularity in biological evolution and evolutionary computation
NSEEN: Neural Semantic Embedding for Entity Normalization
Classical Algorithms from Quantum and Arthur-Merlin Communication Protocols
Unsupervised Learning in Reservoir Computing for EEG-based Emotion Recognition
Multiuser Computation Offloading and Downloading for Edge Computing with Virtualization
Corrected pair correlation functions for environments with obstacles
High Order Neural Networks for Video Classification
A Note on Two Constructions of Zero-Difference Balanced Functions
Practical Deep Reinforcement Learning Approach for Stock Trading
Feature selection as Monte-Carlo Search in Growing Single Rooted Directed Acyclic Graph by Best Leaf Identification
Note on the exact delay stability margin computation of hybrid dynamical systems
MIMO Channel Information Feedback Using Deep Recurrent Network
A Pretrained DenseNet Encoder for Brain Tumor Segmentation
CA3Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification
Representation based and Attention augmented Meta learning
Multilevel Monte Carlo estimation of expected information gains
iQIYI-VID: A Large Dataset for Multi-modal Person Identification
Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning
Upper Tails for Edge Eigenvalues of Random Graphs
Three Dimensional Convolutional Neural Network Pruning with Regularization-Based Method
NECST: Neural Joint Source-Channel Coding
Random walk in a stratified independent random environment
Understanding the combined effect of $k$-space undersampling and transient states excitation in MR Fingerprinting reconstructions
Nash equilibrium seeking in potential games with double-integrator agents
Algebraic structures on typed decorated rooted trees
Restricting Schubert classes to symplectic Grassmannians using self-dual puzzles
Localisation via Deep Imagination: learn the features not the map
Deep Frank-Wolfe For Neural Network Optimization
Measurement-based adaptation protocol with quantum reinforcement learning in a Rigetti quantum computer
Quantum Inspired High Dimensional Conceptual Space as KID Model for Elderly Assistance
Adversarial Autoencoders for Generating 3D Point Clouds
On graceful and harmonious labelings of trees
Weakly Supervised Soft-detection-based Aggregation Method for Image Retrieval
Approximate Eigenvalue Decompositions of Linear Transformations with a Few Householder Reflectors
Reconstruction and prediction of random dynamical systems under borrowing of strength
Beyond Attributes: Adversarial Erasing Embedding Network for Zero-shot Learning
Mixed Likelihood Gaussian Process Latent Variable Model
ATOM: Accurate Tracking by Overlap Maximization
SEIGAN: Towards Compositional Image Generation by Simultaneously Learning to Segment, Enhance, and Inpaint
Collaborative Dense SLAM
Mismatch error correction for time interleaved analog-to-digital converter over a wide frequency range
Watermark Retrieval from 3D Printed Objects via Convolutional Neural Networks
What’s in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform
Visibility Extension via Reflective Edges to an Exact Quantity
External branch lengths of $Λ$-coalescents without a dust component
Synthesis of Spatial Charging/Discharging Patterns of In-Vehicle Batteries for Provision of Ancillary Service and Mitigation of Voltage Impact
Intention Oriented Image Captions with Guiding Objects
FD-GAN: Face-demorphing generative adversarial network for restoring accomplice’s facial image
Fast submodular maximization subject to k-extendible system constraints
An Adaptive Oversampling Learning Method for Class-Imbalanced Fault Diagnostics and Prognostics
Distributions of mesh patterns of short lengths
Representations of mock theta functions
Towards Global Explanations for Credit Risk Scoring
Asymptotic enumeration of Cayley digraphs
Ehrhart polynomials of polytopes and spectrum at infinity of Laurent polynomials
Cyclic bent functions and their applications in codes, codebooks, designs, MUBs and sequences
M2U-Net: Effective and Efficient Retinal Vessel Segmentation for Resource-Constrained Environments
Social interaction networks and depressive symptoms
The infinite dimensional manifold of Hölder equilibrium probabilities has non-negative curvature
Past, Present, and Future Approaches Using Computer Vision for Animal Re-Identification from Camera Trap Data
Contextual Face Recognition with a Nested-Hierarchical Nonparametric Identity Model
Decentralized Exploration in Multi-Armed Bandits
Injecting and removing malignant features in mammography with CycleGAN: Investigation of an automated adversarial attack using neural networks
Multi-dimensional BSDEs with diagonal generators driven by $G$-Brownian motion
Experimental Evaluation of Parameterized Algorithms for Graph Separation Problems: Half-Integral Relaxations and Matroid-based Kernelization
A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling
Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN
Optimal medication for tumors modeled by a Cahn-Hilliard-Brinkman equation
Lifted and geometric differentiability of the squared quadratic Wasserstein distance
Deep Shape-from-Template: Wide-Baseline, Dense and Fast Registration and Deformable Reconstruction from a Single Image
DeepIR: A Deep Semantics Driven Framework for Image Retargeting
On mean field limit for Brownian particles with Coulomb interaction in 3D
Semantic Security and the Second-Largest Eigenvalue of Biregular Graphs
Distributed Learning of Average Belief Over Networks Using Sequential Observations
Event-based Gesture Recognition with Dynamic Background Suppression using Smartphone Computational Capabilities
Explicitly Sample-Equivalent Dynamic Models for Gaussian Markov, Reciprocal, and Conditionally Markov Sequences
Paracontrolled approach to the three-dimensional stochastic nonlinear wave equation with quadratic nonlinearity
Learning Actionable Representations with Goal-Conditioned Policies
Efficient random graph matching via degree profiles
Forman-Ricci Curvature for Hypergraphs
A priori positivity of solutions to a non-conservative stochastic thin-film equation
Domain of Inverse Double Arcsine Transformation
Experimental evaluation of kernelization algorithms to Dominating Set
Edgeworth expansion for Euler approximation of continuous diffusion processes
Safe and Complete Real-Time Planning and Exploration in Unknown Environments
Event-Based Features Selection and Tracking from Intertwined Estimation of Velocity and Generative Contours
Behavioral Malware Classification using Convolutional Recurrent Neural Networks
Slit-slide-sew bijections for bipartite and quasibipartite plane maps
The Mafiascum Dataset: A Large Text Corpus for Deception Detection
Discrete-time port-Hamiltonian systems: A definition based on symplectic integration
Characterizing the spread of exaggerated news content over social media
On Well-posedness of Stochastic Anisotropic $p$-Laplace Equation Driven by Lévy noise
Equitable Partitions into Matchings and Coverings in Mixed Graphs
OrthoSeg: A Deep Multimodal Convolutional Neural Network for Semantic Segmentation of Orthoimagery
A Faster DiSH: Hardware Implementation of a Discrete Cell Signaling Network Simulator
Polynomial partitioning over varieties
Simulated Autonomous Driving in a Realistic Driving Environment using Deep Reinforcement Learning and a Deterministic Finite State Machine
The orientation morphism: from graph cocycles to deformations of Poisson structures