Spectral clustering refers to a family of unsupervised learning algorithms that compute a spectral embedding of the original data based on the eigenvectors of a similarity graph. This non-linear transformation of the data is both the key of these algorithms’ success and their Achilles heel: forming a graph and computing its dominant eigenvectors can indeed be computationally prohibitive when dealing with more that a few tens of thousands of points. In this paper, we review the principal research efforts aiming to reduce this computational cost. We focus on methods that come with a theoretical control on the clustering performance and incorporate some form of sampling in their operation. Such methods abound in the machine learning, numerical linear algebra, and graph signal processing literature and, amongst others, include Nystr\’om-approximation, landmarks, coarsening, coresets, and compressive spectral clustering. We present the approximation guarantees available for each and discuss practical merits and limitations. Surprisingly, despite the breadth of the literature explored, we conclude that there is still a gap between theory and practice: the most scalable methods are only intuitively motivated or loosely controlled, whereas those that come with end-to-end guarantees rely on strong assumptions or enable a limited gain of computation time.
We propose a new layer in Convolutional Neural Networks (CNNs) to increase their robustness to several types of noise perturbations of the input images. We call this a push-pull layer and compute its response as the combination of two half-wave rectified convolutions, with kernels of opposite polarity. It is based on a biologically-motivated non-linear model of certain neurons in the visual system that exhibit a response suppression phenomenon, known as push-pull inhibition. We validate our method by substituting the first convolutional layer of the LeNet-5 and WideResNet architectures with our push-pull layer. We train the networks on nonperturbed training images from the MNIST, CIFAR-10 and CIFAR-100 data sets, and test on images perturbed by noise that is unseen by the training process. We demonstrate that our push-pull layers contribute to a considerable improvement in robustness of classification of images perturbed by noise, while maintaining state-of-the-art performance on the original image classification task.
We present a novel family of deep neural architectures, named partially exchangeable networks (PENs) that leverage probabilistic symmetries. By design, PENs are invariant to block-switch transformations, which characterize the partial exchangeability properties of conditionally Markovian processes. Moreover, we show that any block-switch invariant function has a PEN-like representation. The DeepSets architecture is a special case of PEN and we can therefore also target fully exchangeable data. We employ PENs to learn summary statistics in approximate Bayesian computation (ABC). When comparing PENs to previous deep learning methods for learning summary statistics, our results are highly competitive, both considering time series and static models. Indeed, PENs provide more reliable posterior samples even when using less training data.
The design of activation functions is a growing research area in the field of neural networks. In particular, instead of using fixed point-wise functions (e.g., the rectified linear unit), several authors have proposed ways of learning these functions directly from the data in a non-parametric fashion. In this paper we focus on the kernel activation function (KAF), a recently proposed framework wherein each function is modeled as a one-dimensional kernel model, whose weights are adapted through standard backpropagation-based optimization. One drawback of KAFs is the need to select a single kernel function and its eventual hyper-parameters. To partially overcome this problem, we motivate an extension of the KAF model, in which multiple kernels are linearly combined at every neuron, inspired by the literature on multiple kernel learning. We provide an application of the resulting multi-KAF on a realistic use case, specifically handwritten Latin OCR, on a large dataset collected in the context of the `In Codice Ratio’ project. Results show that multi-KAFs can improve the accuracy of the convolutional networks previously developed for the task, with faster convergence, even with a smaller number of overall parameters.
Network representation learning (NRL) has been widely used to help analyze large-scale networks through mapping original networks into a low-dimensional vector space. However, existing NRL methods ignore the impact of properties of relations on the object relevance in heterogeneous information networks (HINs). To tackle this issue, this paper proposes a new NRL framework, called Event2vec, for HINs to consider both quantities and properties of relations during the representation learning process. Specifically, an event (i.e., a complete semantic unit) is used to represent the relation among multiple objects, and both event-driven first-order and second-order proximities are defined to measure the object relevance according to the quantities and properties of relations. We theoretically prove how event-driven proximities can be preserved in the embedding space by Event2vec, which utilizes event embeddings to facilitate learning the object embeddings. Experimental studies demonstrate the advantages of Event2vec over state-of-the-art algorithms on four real-world datasets and three network analysis tasks (including network reconstruction, link prediction, and node classification).
The dynamics between agents and the environment are an important component of multi-agent Reinforcement Learning (RL), and learning them provides a basis for decision making. However, a major challenge in optimizing a learned dynamics model is the accumulation of error when predicting multiple steps into the future. Recent advances in variational inference provide model based solutions that predict complete trajectory segments, and optimize over a latent representation of trajectories. For single-agent scenarios, several recent studies have explored this idea, and showed its benefits over conventional methods. In this work, we extend this approach to the multi-agent case, and effectively optimize over a latent space that encodes multi-agent strategies. We discuss the challenges in optimizing over a latent variable model for multiple agents, both in the optimization algorithm and in the model representation, and propose a method for both cooperative and competitive settings based on risk-sensitive optimization. We evaluate our method on tasks in the multi-agent particle environment and on a simulated RoboCup domain.
In this paper, we introduce a time-continuous production model that enables random machine failures, where the failure probability depends historically on the production itself. This bidirectional relationship between historical failure probabilities and production is mathematically modeled by the theory of piecewise deterministic Markov processes (PDMPs). On this way, the system is rewritten into a Markovian system such that classical results can be applied. In addition, we present a suitable solution, taken from machine reliability theory, to connect past production and the failure rate. Finally, we investigate the behavior of the presented model numerically in examples by considering sample means of relevant quantities and relative frequencies of number of repairs.
Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin.
Advances in robotics, artificial intelligence, and machine learning are ushering in a new age of automation, as machines match or outperform human performance. Machine intelligence can enable businesses to improve performance by reducing errors, improving sensitivity, quality and speed, and in some cases achieving outcomes that go beyond current resource capabilities. Relevant applications include new product architecture design, rapid material characterization, and life-cycle management tied with a digital strategy that will enable efficient development of products from cradle to grave. In addition, there are also challenges to overcome that must be addressed through a major, sustained research effort that is based solidly on both inferential and computational principles applied to design tailoring of functionally optimized structures. Current applications of structural materials in the aerospace industry demand the highest quality control of material microstructure, especially for advanced rotational turbomachinery in aircraft engines in order to have the best tailored material property. In this paper, deep convolutional neural networks were developed to accurately predict processing-structure-property relations from materials microstructures images, surpassing current best practices and modeling efforts. The models automatically learn critical features, without the need for manual specification and/or subjective and expensive image analysis. Further, in combination with generative deep learning models, a framework is proposed to enable rapid material design space exploration and property identification and optimization. The implementation must take account of real-time decision cycles and the trade-offs between speed and accuracy.
StocHy is a software tool for the quantitative analysis of discrete-time stochastic hybrid systems (SHS). StocHy accepts a high-level description of stochastic models and constructs an equivalent SHS model. The tool allows to (i) simulate the SHS evolution over a given time horizon; and to automatically construct formal abstractions of the SHS. Abstractions are then employed for (ii) formal verification or (iii) control (policy, strategy) synthesis. StocHy allows for modular modelling, and has separate simulation, verification and synthesis engines, which are implemented as independent libraries. This allows for libraries to be easily used and for extensions to be easily built. The tool is implemented in C++ and employs manipulations based on vector calculus, the use of sparse matrices, the symbolic construction of probabilistic kernels, and multi-threading. Experiments show StocHy’s markedly improved performance when compared to existing abstraction-based approaches: in particular, StocHy beats state-of-the-art tools in terms of precision (abstraction error) and computational effort, and finally attains scalability to large-sized models (12 continuous dimensions). StocHy is available at http://www.gitlab.com/natchi92/StocHy.
The increasing interest in user privacy is leading to new privacy preserving machine learning paradigms. In the Federated Learning paradigm, a master machine learning model is distributed to user clients, the clients use their locally stored data and model for both inference and calculating model updates. The model updates are sent back and aggregated on the server to update the master model then redistributed to the clients. In this paradigm, the user data never leaves the client, greatly enhancing the user’ privacy, in contrast to the traditional paradigm of collecting, storing and processing user data on a backend server beyond the user’s control. In this paper we introduce, as far as we are aware, the first federated implementation of a Collaborative Filter. The federated updates to the model are based on a stochastic gradient approach. As a classical case study in machine learning, we explore a personalized recommendation system based on users’ implicit feedback and demonstrate the method’s applicability to both the MovieLens and an in-house dataset. Empirical validation confirms a collaborative filter can be federated without a loss of accuracy compared to a standard implementation, hence enhancing the user’s privacy in a widely used recommender application while maintaining recommender performance.
Sparse regression models are increasingly prevalent due to their ease of interpretability and superior out-of-sample performance. However, the exact model of sparse regression with an $\ell_0$ constraint restricting the support of the estimators is a challenging non-convex optimization problem. In this paper, we derive new strong convex relaxations for sparse regression. These relaxations are based on the ideal (convex-hull) formulations for rank-one quadratic terms with indicator variables. The new relaxations can be formulated as semidefinite optimization problems in an extended space and are stronger and more general than the state-of-the-art formulations, including the perspective reformulation and formulations with the reverse Huber penalty and the minimax concave penalty functions. Furthermore, the proposed rank-one strengthening can be interpreted as a non-separable, non-convex sparsity-inducing regularizer, which dynamically adjusts its penalty according to the shape of the error function. In our computational experiments with benchmark datasets, the proposed conic formulations are solved within seconds and result in near-optimal solutions (with 0.4\% optimality gap) for non-convex $\ell_0$-problems. Moreover, the resulting estimators also outperform alternative convex approaches from a statistical viewpoint, achieving high prediction accuracy and good interpretability.
Classification algorithms aim to predict an unknown label (e.g., a quality class) for a new instance (e.g., a product). Therefore, training samples (instances and labels) are used to deduct classification hypotheses. Often, it is relatively easy to capture instances but the acquisition of the corresponding labels remain difficult or expensive. Active learning algorithms select the most beneficial instances to be labeled to reduce cost. In research, this labeling procedure is simulated and therefore a ground truth is available. But during deployment, active learning is a one-shot problem and an evaluation set is not available. Hence, it is not possible to reliably estimate the performance of the classification system during learning and it is difficult to decide when the system fulfills the quality requirements (stopping criteria). In this article, we formalize the task and review existing strategies to assess the performance of an actively trained classifier during training. Furthermore, we identified three major challenges: 1)~to derive a performance distribution, 2)~to preserve representativeness of the labeled subset, and 3) to correct against sampling bias induced by an intelligent selection strategy. In a qualitative analysis, we evaluate different existing approaches and show that none of them reliably estimates active learning performance stating a major challenge for future research for such systems. All plots and experiments are provided in a Jupyter notebook that is available for download.
Gaussian process (GP) covariance function is proposed as a matching tool in GPMatch within a full Bayesian framework under relatively weaker causal assumptions. The matching is accomplished by utilizing GP prior covariance function to define matching distance. We show that GPMatch provides a doubly robust estimate of the averaged treatment effect (ATE) much like the G-estimation, the ATE is correctly estimated when either conditions are satisfied: 1) the GP mean function correctly specifies potential outcome $$Y^{(0)}$$; or 2) the GP covariance function correctly specifies matching structure. Simulation studies were carried out without assuming any known matching structure nor functional form of the outcomes. The results demonstrate that GPMatch enjoys well calibrated frequentist properties, and outperforms many widely used methods including Bayesian Additive Regression Trees. The case study compares effectiveness of early aggressive use of biological medication in treating children with newly diagnosed Juvenile Idiopathic Arthritis, using data extracted from electronic medical records.
Training of Generative Adversarial Networks (GANs) is notoriously fragile, which partially attributed to the discriminator performing well very quickly; its loss converges to zero, providing no reliable backpropagation signal to the generator. In this work we introduce a new technique – progressive augmentation of GANs (PA-GAN) – that helps to mitigate this issue and thus improve the GAN training. The key idea is to gradually increase the task difficulty of the discriminator by progressively augmenting its input or feature space, enabling continuous learning of the generator. We show that the proposed progressive augmentation preserves the original GAN objective, does not bias the optimality of the discriminator and encourages the healthy competition between the generator and discriminator, leading to a better-performing generator. We experimentally demonstrate the effectiveness of PA-GAN across different architectures and on multiple benchmarks for the image generation task.
Motivated by concerns that machine learning algorithms may introduce significant bias in classification models, developing fair classifiers has become an important problem in machine learning research. One important paradigm towards this has been providing algorithms for adversarially learning fair classifiers (Zhang et al., 2018; Madras et al., 2018). We formulate the adversarial learning problem as a multi-objective optimization problem and find the fair model using gradient descent-ascent algorithm with a modified gradient update step, inspired by the approach of Zhang et al., 2018. We provide theoretical insight and guarantees that formalize the heuristic arguments presented previously towards taking such an approach. We test our approach empirically on the Adult dataset and synthetic datasets and compare against state of the art algorithms (Celis et al., 2018; Zhang et al., 2018; Zafar et al., 2017). The results show that our models and algorithms have comparable or better accuracy than other algorithms while performing better in terms of fairness, as measured using statistical rate or false discovery rate.
This paper studies the problem of robustly learning the correlation function for a univariate time series with the presence of noise, outliers and missing entries. The outliers or anomalies considered here are sparse and rare events that deviate from normality which is depicted by a correlation function and an uncertainty condition. This general formulation is applied to univariate time series of event counts (or non-negative time series) where the correlation is a log-linear function with the uncertainty condition following the Poisson distribution. Approximations to the sparsity constraint, such as $\ell^r, 0< r\le 1$, are used to obtain robustness in the presence of outliers. The $\ell^r$ constraint is also applied to the correlation function to reduce the number of active coefficients. This task also helps bypassing the model selection procedure. Simulated results are presented to validate the model.
Decentralized Online Learning (online learning in decentralized networks) attracts more and more attention, since it is believed that Decentralized Online Learning can help the data providers cooperatively better solve their online problems without sharing their private data to a third party or other providers. Typically, the cooperation is achieved by letting the data providers exchange their models between neighbors, e.g., recommendation model. However, the best regret bound for a decentralized online learning algorithm is $\Ocal{n\sqrt{T}}$, where $n$ is the number of nodes (or users) and $T$ is the number of iterations. This is clearly insignificant since this bound can be achieved \emph{without} any communication in the networks. This reminds us to ask a fundamental question: \emph{Can people really get benefit from the decentralized online learning by exchanging information?} In this paper, we studied when and why the communication can help the decentralized online learning to reduce the regret. Specifically, each loss function is characterized by two components: the adversarial component and the stochastic component. Under this characterization, we show that decentralized online gradient (DOG) enjoys a regret bound $\Ocal{n\sqrt{T}G + \sqrt{nT}\sigma}$, where $G$ measures the magnitude of the adversarial component in the private data (or equivalently the local loss function) and $\sigma$ measures the randomness within the private data. This regret suggests that people can get benefits from the randomness in the private data by exchanging private information. Another important contribution of this paper is to consider the dynamic regret — a more practical regret to track users’ interest dynamics. Empirical studies are also conducted to validate our analysis.