On Discrimination Discovery and Removal in Ranked Data using Causal Graph

Predictive models learned from historical data are widely used to help companies and organizations make decisions. However, they may digitally unfairly treat unwanted groups, raising concerns about fairness and discrimination. In this paper, we study the fairness-aware ranking problem which aims to discover discrimination in ranked datasets and reconstruct the fair ranking. Existing methods in fairness-aware ranking are mainly based on statistical parity that cannot measure the true discriminatory effect since discrimination is causal. On the other hand, existing methods in causal-based anti-discrimination learning focus on classification problems and cannot be directly applied to handle the ranked data. To address these limitations, we propose to map the rank position to a continuous score variable that represents the qualification of the candidates. Then, we build a causal graph that consists of both the discrete profile attributes and the continuous score. The path-specific effect technique is extended to the mixed-variable causal graph to identify both direct and indirect discrimination. The relationship between the path-specific effects for the ranked data and those for the binary decision is theoretically analyzed. Finally, algorithms for discovering and removing discrimination from a ranked dataset are developed. Experiments using the real dataset show the effectiveness of our approaches.

Multi-class Active Learning: A Hybrid Informative and Representative Criterion Inspired Approach

Labeling each instance in a large dataset is extremely labor- and time- consuming . One way to alleviate this problem is active learning, which aims to which discover the most valuable instances for labeling to construct a powerful classifier. Considering both informativeness and representativeness provides a promising way to design a practical active learning. However, most existing active learning methods select instances favoring either informativeness or representativeness. Meanwhile, many are designed based on the binary class, so that they may present suboptimal solutions on the datasets with multiple classes. In this paper, a hybrid informative and representative criterion based multi-class active learning approach is proposed. We combine the informative informativeness and representativeness into one formula, which can be solved under a unified framework. The informativeness is measured by the margin minimum while the representative information is measured by the maximum mean discrepancy. By minimizing the upper bound for the true risk, we generalize the empirical risk minimization principle to the active learning setting. Simultaneously, our proposed method makes full use of the label information, and the proposed active learning is designed based on multiple classes. So the proposed method is not suitable to the binary class but also the multiple classes. We conduct our experiments on twelve benchmark UCI data sets, and the experimental results demonstrate that the proposed method performs better than some state-of-the-art methods.

Randomization inference with general interference and censoring

Interference occurs between individuals when the treatment (or exposure) of one individual affects the outcome of another individual. Previous work on causal inference methods in the presence of interference has focused on the setting where a priori it is assumed there is ‘partial interference,’ in the sense that individuals can be partitioned into groups wherein there is no interference between individuals in different groups. Bowers, Fredrickson, and Panagopoulos (2012) and Bowers, Fredrickson, and Aronow (2016) consider randomization-based inferential methods that allow for more general interference structures in the context of randomized experiments. In this paper, extensions of Bowers et al. are considered, including allowing for failure time outcomes subject to right censoring. Permitting right censored outcomes is challenging because standard randomization-based tests of the null hypothesis of no treatment effect assume that whether an individual is censored does not depend on treatment. The proposed extension of Bowers et al. to allow for censoring entails adapting the method of Wang, Lagakos, and Gray (2010) for two sample survival comparisons in the presence of unequal censoring. The methods are examined via simulation studies and utilized to assess the effects of cholera vaccination in an individually-randomized trial of 73,000 children and women in Matlab, Bangladesh.

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

Stochastic optimization naturally arises in machine learning. Efficient algorithms with provable guarantees, however, are still largely missing, when the objective function is nonconvex and the data points are dependent. This paper studies this fundamental challenge through a streaming PCA problem for stationary time series data. Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution. Computationally, we propose a variant of Oja’s algorithm combined with downsampling to control the bias of the stochastic gradient caused by the data dependency. Theoretically, we quantify the uncertainty of our proposed stochastic algorithm based on diffusion approximations. This allows us to prove the global convergence in terms of the continuous time limiting solution trajectory and further implies near optimal sample complexity. Numerical experiments are provided to support our analysis.

Style Memory: Making a Classifier Network Generative

Deep networks have shown great performance in classification tasks. However, the parameters learned by the classifier networks usually discard stylistic information of the input, in favour of information strictly relevant to classification. We introduce a network that has the capacity to do both classification and reconstruction by adding a ‘style memory’ to the output layer of the network. We also show how to train such a neural network as a deep multi-layer autoencoder, jointly minimizing both classification and reconstruction losses. The generative capacity of our network demonstrates that the combination of style-memory neurons with the classifier neurons yield good reconstructions of the inputs when the classification is correct. We further investigate the nature of the style memory, and how it relates to composing digits and letters. Finally, we propose that this architecture enables the bidirectional flow of information used in predictive coding, and that such bidirectional networks can help mitigate against being fooled by ambiguous or adversarial input.

An Online Algorithm for Learning Buyer Behavior under Realistic Pricing Restrictions

We propose a new efficient online algorithm to learn the parameters governing the purchasing behavior of a utility maximizing buyer, who responds to prices, in a repeated interaction setting. The key feature of our algorithm is that it can learn even non-linear buyer utility while working with arbitrary price constraints that the seller may impose. This overcomes a major shortcoming of previous approaches, which use unrealistic prices to learn these parameters making them unsuitable in practice.

Intent-aware Multi-agent Reinforcement Learning

This paper proposes an intent-aware multi-agent planning framework as well as a learning algorithm. Under this framework, an agent plans in the goal space to maximize the expected utility. The planning process takes the belief of other agents’ intents into consideration. Instead of formulating the learning problem as a partially observable Markov decision process (POMDP), we propose a simple but effective linear function approximation of the utility function. It is based on the observation that for humans, other people’s intents will pose an influence on our utility for a goal. The proposed framework has several major advantages: i) it is computationally feasible and guaranteed to converge. ii) It can easily integrate existing intent prediction and low-level planning algorithms. iii) It does not suffer from sparse feedbacks in the action space. We experiment our algorithm in a real-world problem that is non-episodic, and the number of agents and goals can vary over time. Our algorithm is trained in a scene in which aerial robots and humans interact, and tested in a novel scene with a different environment. Experimental results show that our algorithm achieves the best performance and human-like behaviors emerge during the dynamic process.

Understanding Short-Horizon Bias in Stochastic Meta-Optimization

Careful tuning of the learning rate, or even schedules thereof, can be crucial to effective neural net training. There has been much recent interest in gradient-based meta-optimization, where one tunes hyperparameters, or even learns an optimizer, in order to minimize the expected loss when the training procedure is unrolled. But because the training procedure must be unrolled thousands of times, the meta-objective must be defined with an orders-of-magnitude shorter time horizon than is typical for neural net training. We show that such short-horizon meta-objectives cause a serious bias towards small step sizes, an effect we term short-horizon bias. We introduce a toy problem, a noisy quadratic cost function, on which we analyze short-horizon bias by deriving and comparing the optimal schedules for short and long time horizons. We then run meta-optimization experiments (both offline and online) on standard benchmark datasets, showing that meta-optimization chooses too small a learning rate by multiple orders of magnitude, even when run with a moderately long time horizon (100 steps) typical of work in the area. We believe short-horizon bias is a fundamental problem that needs to be addressed if meta-optimization is to scale to practical neural net training regimes.

Accelerated Gradient Boosting

Gradient tree boosting is a prediction algorithm that sequentially produces a model in the form of linear combinations of decision trees, by solving an infinite-dimensional optimization problem. We combine gradient boosting and Nesterov’s accelerated descent to design a new algorithm, which we call AGB (for Accelerated Gradient Boosting). Substantial numerical evidence is provided on both synthetic and real-life data sets to assess the excellent performance of the method in a large variety of prediction problems. It is empirically shown that AGB is much less sensitive to the shrinkage parameter and outputs predictors that are considerably more sparse in the number of trees, while retaining the exceptional performance of gradient boosting.

Online Deep Learning: Growing RBM on the fly

We propose a novel online learning algorithm for Restricted Boltzmann Machines (RBM), namely, the Online Generative Discriminative Restricted Boltzmann Machine (OGD-RBM), that provides the ability to build and adapt the network architecture of RBM according to the statistics of streaming data. The OGD-RBM is trained in two phases: (1) an online generative phase for unsupervised feature representation at the hidden layer and (2) a discriminative phase for classification. The online generative training begins with zero neurons in the hidden layer, adds and updates the neurons to adapt to statistics of streaming data in a single pass unsupervised manner, resulting in a feature representation best suited to the data. The discriminative phase is based on stochastic gradient descent and associates the represented features to the class labels. We demonstrate the OGD-RBM on a set of multi-category and binary classification problems for data sets having varying degrees of class-imbalance. We first apply the OGD-RBM algorithm on the multi-class MNIST dataset to characterize the network evolution. We demonstrate that the online generative phase converges to a stable, concise network architecture, wherein individual neurons are inherently discriminative to the class labels despite unsupervised training. We then benchmark OGD-RBM performance to other machine learning, neural network and ClassRBM techniques for credit scoring applications using 3 public non-stationary two-class credit datasets with varying degrees of class-imbalance. We report that OGD-RBM improves accuracy by 2.5-3% over batch learning techniques while requiring at least 24%-70% fewer neurons and fewer training samples. This online generative training approach can be extended greedily to multiple layers for training Deep Belief Networks in non-stationary data mining applications without the need for a priori fixed architectures.

Model Selection as a Multiple Testing Procedure: Improving Akaike’s Information Criterion

By interpreting the model selection problem as a multiple hypothesis testing task, we propose a modification of Akaike’s Information Criterion that avoids overfitting, even when the sample size is small. We call this correction an over-penalization procedure. As a proof of concept, we show nonasymptotic optimality of our procedure for histogram selection in density estimation, by establishing sharp oracle inequalities for the Kullback-Leibler divergence. A strong feature of our theoretical results is that they include the estimation of unbounded log-densities. To do so, we prove several analytical and probabilistic lemmas that are of independent interest. In an experimental study, we also demonstrate state-of-the-art performance of our over-penalization procedure for bin size selection.

VIPE: A new interactive classification framework for large sets of short texts – application to opinion mining

This paper presents a new interactive opinion mining tool that helps users to classify large sets of short texts originated from Web opinion polls, technical forums or Twitter. From a manual multi-label pre-classification of a very limited text subset, a learning algorithm predicts the labels of the remaining texts of the corpus and the texts most likely associated to a selected label. Using a fast matrix factorization, the algorithm is able to handle large corpora and is well-adapted to interactivity by integrating the corrections proposed by the users on the fly. Experimental results on classical datasets of various sizes and feedbacks of users from marketing services of the telecommunication company Orange confirm the quality of the obtained results.

MIRIAM: A Multimodal Chat-Based Interface for Autonomous Systems

We present MIRIAM (Multimodal Intelligent inteRactIon for Autonomous systeMs), a multimodal interface to support situation awareness of autonomous vehicles through chat-based interaction. The user is able to chat about the vehicle’s plan, objectives, previous activities and mission progress. The system is mixed initiative in that it pro-actively sends messages about key events, such as fault warnings. We will demonstrate MIRIAM using SeeByte’s SeeTrack command and control interface and Neptune autonomy simulator.

CF4CF: Recommending Collaborative Filtering algorithms using Collaborative Filtering

Automatic solutions which enable the selection of the best algorithms for a new problem are commonly found in the literature. One research area which has recently received considerable efforts is Collaborative Filtering. Existing work includes several approaches using Metalearning, which relate the characteristics of datasets with the performance of the algorithms. This work explores an alternative approach to tackle this problem. Since, in essence, both are recommendation problems, this work uses Collaborative Filtering algorithms to select Collaborative Filtering algorithms. Our approach integrates subsampling landmarkers, which are a data characterization approach commonly used in Metalearning, with a standard Collaborative Filtering method. The experimental results show that CF4CF competes with standard Metalearning strategies in the problem of Collaborative Filtering algorithm selection.

Deep Information Networks

We describe a novel classifier with a tree structure, designed using information theory concepts. This Information Network is made of information nodes, that compress the input data, and multiplexers, that connect two or more input nodes to an output node. Each information node is trained, independently of the others, to minimize a local cost function that minimizes the mutual information between its input and output with the constraint of keeping a given mutual information between its output and the target (information bottleneck). We show that the system is able to provide good results in terms of accuracy, while it shows many advantages in terms of modularity and reduced complexity.

An End-to-End Goal-Oriented Dialog System with a Generative Natural Language Response Generation

Recently advancements in deep learning allowed the development of end-to-end trained goal-oriented dialog systems. Although these systems already achieve good performance, some simplifications limit their usage in real-life scenarios. In this work, we address two of these limitations: ignoring positional information and a fixed number of possible response candidates. We propose to use positional encodings in the input to model the word order of the user utterances. Furthermore, by using a feedforward neural network, we are able to generate the output word by word and are no longer restricted to a fixed number of possible response candidates. Using the positional encoding, we were able to achieve better accuracies and using the feedforward neural network for generating the response, we were able to save computation time and space consumption.

Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement Learning

We present an algorithm for rapidly learning controllers for robotics systems. The algorithm follows the model-based reinforcement learning paradigm, and improves upon existing algorithms; namely Probabilistic learning in Control (PILCO) and a sample-based version of PILCO with neural network dynamics (Deep-PILCO). We propose training a neural network dynamics model using variational dropout with truncated Log-Normal noise. This allows us to obtain a dynamics model with calibrated uncertainty, which can be used to simulate controller executions via rollouts. We also describe set of techniques, inspired by viewing PILCO as a recurrent neural network model, that are crucial to improve the convergence of the method. We test our method on a variety of benchmark tasks, demonstrating data-efficiency that is competitive with PILCO, while being able to optimize complex neural network controllers. Finally, we assess the performance of the algorithm for learning motor controllers for a six legged autonomous underwater vehicle. This demonstrates the potential of the algorithm for scaling up the dimensionality and dataset sizes, in more complex control tasks.

Deep Super Learner: A Deep Ensemble for Classification Problems

Deep learning has become very popular for tasks such as predictive modeling and pattern recognition in handling big data. Deep learning is a powerful machine learning method that extracts lower level features and feeds them forward for the next layer to identify higher level features that improve performance. However, deep neural networks have drawbacks, which include many hyper-parameters and infinite architectures, opaqueness into results, and relatively slower convergence on smaller datasets. While traditional machine learning algorithms can address these drawbacks, they are not typically capable of the performance levels achieved by deep neural networks. To improve performance, ensemble methods are used to combine multiple base learners. Super learning is an ensemble that finds the optimal combination of diverse learning algorithms. This paper proposes deep super learning as an approach which achieves log loss and accuracy results competitive to deep neural networks while employing traditional machine learning algorithms in a hierarchical structure. The deep super learner is flexible, adaptable, and easy to train with good performance across different tasks using identical hyper-parameter values. Using traditional machine learning requires fewer hyper-parameters, allows transparency into results, and has relatively fast convergence on smaller datasets. Experimental results show that the deep super learner has superior performance compared to the individual base learners, single-layer ensembles, and in some cases deep neural networks. Performance of the deep super learner may further be improved with task-specific tuning.

Universal Quantum Control through Deep Reinforcement Learning
On Algebraic Proofs of Stability for Homogeneous Vector Fields
Implicit representation conjecture for semi-algebraic graphs
Controllability implies mixing I. Convergence in the total variation metric
Controllability implies mixing II. Convergence in the dual-Lipschitz metric
Convergence of Gradient Descent on Separable Data
Abnormality Detection in Mammography using Deep Convolutional Neural Networks
Energy-entropy competition and the effectiveness of stochastic gradient descent in machine learning
A Hybrid Heuristic for a Broad Class of Vehicle Routing Problems with Heterogeneous Fleet
Structure and generation of crossing-critical graphs
Lower Bounds for the Exponential Domination Number of $C_m \times C_n$
The morphospace of language networks
ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks
M3Fusion: A Deep Learning Architecture for Multi-{Scale/Modal/Temporal} satellite data fusion
Segmentation of Drosophila Heart in Optical Coherence Microscopy Images Using Convolutional Neural Networks
Yang-Mills for probabilists
Uniformity thresholds for the asymptotic size of extremal Berge-$F$-free hypergraphs
Thermodynamics of Restricted Boltzmann Machines and related learning dynamics
Learning Scene Gist with Convolutional Neural Networks to Improve Object Recognition
Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries
Learning Filter Bank Sparsifying Transforms
Flip-Flop Spectrum-Revealing QR Factorization and Its Applications on Singular Value Decomposition
Bayesian Predictive Synthesis: Forecast Calibration and Combination
ABC and Indirect Inference
Occupancy Map Prediction Using Generative and Fully Convolutional Networks for Vehicle Navigation
MIS-SLAM: Real-time Large Scale Dense Deformable SLAM System in Minimal Invasive Surgery Based on Heterogeneous Computing
Depth and regularity of monomial ideals via polarization and combinatorial optimization
Accumulate Then Transmit: Multi-user Scheduling in Full-Duplex Wireless-Powered IoT Systems
Using Survival Information in Truncation by Death Problems Without the Monotonicity Assumption
Exact partial information decompositions for Gaussian systems based on dependency constraints
John’s Walk
The Earth ain’t Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving Camera
SOS-Convex Lyapunov Functions and Stability of Difference Inclusions
Formal Intercept of Sturmian words
The Contextual Loss for Image Transformation with Non-Aligned Data
Improved lower and upper bounds for the critical value of the high dimensional two-stage contact process
Explain Yourself: A Natural Language Interface for Scrutable Autonomous Robots
Cooperative Tracking of Cyclists Based on Smart Devices and Infrastructure
Where is my Device? – Detecting the Smart Device’s Wearing Location in the Context of Active Safety for Vulnerable Road Users
A Hybrid Method for Traffic Flow Forecasting Using Multimodal Deep Learning
The ORCA Hub: Explainable Offshore Robotics through Intelligent Interfaces
Optimal Grassmann Manifold Eavesdropping: A Huge Security Disaster for M-1-2 Wiretap Channels
A global stochastic maximum principle for fully coupled forward-backward stochastic systems
Algorithmic bias amplifies opinion polarization: A bounded confidence model
Nonlocality-Reinforced Convolutional Neural Networks for Image Denoising
Quasi-invariance of completely random measure
Equality case in van der Corput’s inequality and collisions in multiple lattice tilings
Towards Mission-Critical Control at the Edge and Over 5G
A Non-Technical Survey on Deep Convolutional Neural Network Architectures
Trees within trees: Simple nested coalescents
Limit distribution of the quartet balance index for Aldous’s b>=0-model
Transition from homogeneous to inhomogeneous limit cycles: Effect of local filtering in coupled oscillators
Conceptualization of Object Compositions Using Persistent Homology
A note on joint functional convergence of partial sum and maxima for linear processes
A comparison of semi-Lagrangian discontinuous Galerkin and spline based Vlasov solvers in four dimensions
A functional CLT for partial traces of random matrices
Induced and Weak Induced Arboricities
Self-Attention with Relative Position Representations
Exogenous Approach to Grid Cost Allocation in Peer-to-Peer Electricity Markets
Flux driven and geometry controlled spin filtering for arbitrary spins in aperiodic quantum networks
Kinetic models for optimal control of wealth inequalities
Radio Imaging With Information Field Theory
Quantum Walks via Quantum Cellular Automata
A Joint Central Limit Theorem for the Sum-of-Digits Function, and Asymptotic Divisibility of Catalan-like Sequences
The Impact of Semantic Context Cues on the User Acceptance of Tag Recommendations: An Online Study
Input-Output Performance of Linear-Quadratic Saddle-Point Algorithms with Application to Distributed Resource Allocation Problems
Some Relations on Paratopisms and An Intuitive Interpretation on the Adjugates of a Latin Square
Practical sample-and-hold stabilization of nonlinear systems under approximate optimizers
Code Review Comments: Language Matters
Experiments and Shaping Tradeoffs for Long-Haul Transmissions
Fully Convolutional Grasp Detection Network with Oriented Anchor Box
On Simple Back-Off in Complicated Radio Networks
Subspace Tracking and Least Squares Approaches to Channel Estimation in Millimeter Wave Multiuser MIMO
A lower bound for the Bogomolny-Schmit constant for random monochromatic plane waves
Non-fringe subtrees in conditioned Galton–Watson trees
Predictability of sequences and subsequences with spectrum degeneracy at periodically located points
Generalized Designs on Graphs: Sampling, Spectra, Symmetries
Precise but Natural Specification for Robot Tasks
On the weak-hash metric for boundedly finite integer-valued measures
Early Start Intention Detection of Cyclists Using Motion History Images and a Deep Residual Network
Achieving Low Latency Two-Way Communication by Downlink and Uplink Decoupled Access
CliNER 2.0: Accessible and Accurate Clinical Concept Extraction
Rest-Katyusha: Exploiting the Solution’s Structure via Scheduled Restart Schemes
MIMO Graph Filters for Convolutional Neural Networks
Methodology to analyze the accuracy of 3D objects reconstructed with collaborative robot based monocular LSD-SLAM
User-Centric 5G Cellular Networks: Resource Allocation and Comparison with the Cell-Free Massive MIMO Approach
Quantum theory within the probability calculus: a there-you-go theorem and partially exchangeable models
Theory meets experiment for the aging rate of spin glasses
On stochastic imitation dynamics in large-scale networks
ExpandNet: A Deep Convolutional Neural Network for High Dynamic Range Expansion from Low Dynamic Range Content
Personalized Attention-Aware Exposure Control Using Reinforcement Learning
Revisiting Frequency Moment Estimation in Random Order Streams
A geometric view of Biodiversity: scaling to metagenomics
GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose
Zero-Shot Sketch-Image Hashing
Hybrid Multi-camera Visual Servoing to Moving Target
Learning monocular visual odometry with dense 3D mapping from dense 3D flow
Improved Scaling Law for Activity Detection in Massive MIMO Systems
Testing the complexity of a valued CSP language
Bouligand-Landweber iteration for a non-smooth ill-posed problem
A Relation between Disorder Chaos and Incongruent States in Spin Glasses on ${\mathbb Z}^d$
Deep Thermal Imaging: Proximate Material Type Recognition in the Wild through Deep Learning of Spatial Surface Temperature Patterns
Comparison of Deep Learning Approaches for Multi-Label Chest X-Ray Classification
Upper Covers of Chains in Sets of Indecomposable Subsets
Annotation Artifacts in Natural Language Inference Data
Comparison of various image fusion methods for impervious surface classification from VNREDSat-1
Hidden Symmetries in Real and Theoretical Networks
Learning Memory Access Patterns
Applications of Graded Methods to Cluster Variables in Arbitrary Types