We report on experiences with implementing conversational agents in the recruitment domain based on a machine learning (ML) system. Recruitment chatbots mediate communication between job-seekers and recruiters by exposing ML data to recruiter teams. Errors are difficult to understand, communicate, and resolve because they may span and combine UX, ML, and software issues. In an effort to improve organizational and technical transparency, we came to rely on a key contact role. Though effective for design and development, the centralization of this role poses challenges for transparency in sustained maintenance of this kind of ML-based mediating system.
Federated Learning (FL) is capable of leveraging massively distributed private data, e.g., on mobile phones and IoT devices, to collaboratively train a shared machine learning model with the help of a cloud server. However, its iterative training process results in intolerable communication latency, and causes huge burdens on the backbone network. Thus, reducing the communication overhead is critical to implement FL in practice. Meanwhile, the model performance degradation due to the unique non-IID data distribution at different devices is another big issue for FL. In this paper, by introducing the mobile edge computing platform as an intermediary structure, we propose a hierarchical FL architecture to reduce the communication rounds between users and the cloud. In particular, a Hierarchical Federated Averaging (HierFAVG) algorithm is proposed, which allows multiple local aggregations at each edge server before one global aggregation at the cloud. We establish the convergence of HierFAVG for both convex and non-convex objective functions with non-IID user data. It is demonstrated that HierFAVG can reach a desired model performance with less communication, and outperform the traditional Federated Averaging algorithm.
Compressed sensing (CS) provides an elegant framework for recovering sparse signals from compressed measurements. For example, CS can exploit the structure of natural images and recover an image from only a few random measurements. CS is flexible and data efficient, but its application has been restricted by the strong assumption of sparsity and costly reconstruction process. A recent approach that combines CS with neural network generators has removed the constraint of sparsity, but reconstruction remains slow. Here we propose a novel framework that significantly improves both the performance and speed of signal recovery by jointly training a generator and the optimisation process for reconstruction via meta-learning. We explore training the measurements with different objectives, and derive a family of models based on minimising measurement errors. We show that Generative Adversarial Nets (GANs) can be viewed as a special case in this family of models. Borrowing insights from the CS perspective, we develop a novel way of improving GANs using gradient information from the discriminator.
Access to sufficient annotated data is a common challenge in training deep neural networks on medical images. As annotating data is expensive and time-consuming, it is difficult for an individual medical center to reach large enough sample sizes to build their own, personalized models. As an alternative, data from all centers could be pooled to train a centralized model that everyone can use. However, such a strategy is often infeasible due to the privacy-sensitive nature of medical data. Recently, federated learning (FL) has been introduced to collaboratively learn a shared prediction model across centers without the need for sharing data. In FL, clients are locally training models on site-specific datasets for a few epochs and then sharing their model weights with a central server, which orchestrates the overall training process. Importantly, the sharing of models does not compromise patient privacy. A disadvantage of FL is the dependence on a central server, which requires all clients to agree on one trusted central body, and whose failure would disrupt the training process of all clients. In this paper, we introduce BrainTorrent, a new FL framework without a central server, particularly targeted towards medical applications. BrainTorrent presents a highly dynamic peer-to-peer environment, where all centers directly interact with each other without depending on a central body. We demonstrate the overall effectiveness of FL for the challenging task of whole brain segmentation and observe that the proposed server-less BrainTorrent approach does not only outperform the traditional server-based one but reaches a similar performance to a model trained on pooled data.
We consider the numerical stability of the parameter recovery problem in Linear Structural Equation Model ($\LSEM$) of causal inference. A long line of work starting from Wright (1920) has focused on understanding which sub-classes of $\LSEM$ allow for efficient parameter recovery. Despite decades of study, this question is not yet fully resolved. The goal of this paper is complementary to this line of work; we want to understand the stability of the recovery problem in the cases when efficient recovery is possible. Numerical stability of Pearl’s notion of causality was first studied in Schulman and Srivastava (2016) using the concept of condition number where they provide ill-conditioned examples. In this work, we provide a condition number analysis for the $\LSEM$. First we prove that under a sufficient condition, for a certain sub-class of $\LSEM$ that are \emph{bow-free} (Brito and Pearl (2002)), the parameter recovery is stable. We further prove that \emph{randomly} chosen input parameters for this family satisfy the condition with a substantial probability. Hence for this family, on a large subset of parameter space, recovery is numerically stable. Next we construct an example of $\LSEM$ on four vertices with \emph{unbounded} condition number. We then corroborate our theoretical findings via simulations as well as real-world experiments for a sociology application. Finally, we provide a general heuristic for estimating the condition number of any $\LSEM$ instance.
Artificial neural networks revolutionized many areas of computer science in recent years since they provide solutions to a number of previously unsolved problems. On the other hand, for many problems, classic algorithms exist, which typically exceed the accuracy and stability of neural networks. To combine these two concepts, we present a new kind of neural networks—algorithmic neural networks (AlgoNets). These networks integrate smooth versions of classic algorithms and data structures into the topology of neural networks. A forward AlgoNet includes algorithmic layers into existing architectures while a backward AlgoNet can solve inverse problems without or with only weak supervision. In addition, we present the \texttt{algonet} package, a PyTorch based library that includes, inter alia, a smooth evaluated programming language, a smooth 3D mesh renderer, and smooth sorting algorithms.
Many real-world time series, such as in health, have changepoints where the system’s structure or parameters change. Since changepoints can indicate critical events such as onset of illness, it is highly important to detect them. However, existing methods for changepoint detection (CPD) often require user-specified models and cannot recognize changes that occur gradually or at multiple time-scales. To address both, we show how CPD can be treated as a supervised learning problem, and propose a new deep neural network architecture to efficiently identify both abrupt and gradual changes at multiple timescales from multivariate data. Our proposed pyramid recurrent neural network (PRN) provides scale-invariance using wavelets and pyramid analysis techniques from multi-scale signal processing. Through experiments on synthetic and real-world datasets, we show that PRN can detect abrupt and gradual changes with higher accuracy than the state of the art and can extrapolate to detect changepoints at novel scales not seen in training.
This paper surveys an approach to the XAI problem, using post-hoc explanation by example, that hinges on twinning Artificial Neural Networks (ANNs) with Case-Based Reasoning (CBR) systems, so-called ANN-CBR twins. A systematic survey of 1100+ papers was carried out to identify the fragmented literature on this topic and to trace it influence through to more recent work involving Deep Neural Networks (DNNs). The paper argues that this twin-system approach, especially using ANN-CBR twins, presents one possible coherent, generic solution to the XAI problem (and, indeed, XCBR problem). The paper concludes by road-mapping some future directions for this XAI solution involving (i) further tests of feature-weighting techniques, (iii) explorations of how explanatory cases might best be deployed (e.g., in counterfactuals, near-miss cases, a fortori cases), and (iii) the raising of the unwelcome and, much ignored, issue of human user evaluation.
We present contrastive fairness, a new direction in causal inference applied to algorithmic fairness. Earlier methods dealt with the ‘what if?’ question (counterfactual fairness, NeurIPS’17). We establish the theoretical and mathematical implications of the contrastive question ‘why this and not that?’ in context of algorithmic fairness in machine learning. This is essential to defend the fairness of algorithmic decisions in tasks where a person or sub-group of people is chosen over another (job recruitment, university admission, company layovers, etc). This development is also helpful to institutions to ensure or defend the fairness of their automated decision making processes. A test case of employee job location allocation is provided as an illustrative example.