A new system-wide diversity measure for recommendations with efficient algorithms

Recommender systems often operate on item catalogs clustered by genres, and user bases that have natural clusterings into user types by demographic or psychographic attributes. Prior work on system-wide diversity has mainly focused on defining intent-aware metrics among such categories and maximizing relevance of the resulting recommendations, but has not combined the notions of diversity from the two point of views of items and users. In this work, (1) we introduce two new system-wide diversity metrics to simultaneously address the problems of diversifying the categories of items that each user sees, diversifying the types of users that each item is shown, and maintaining high recommendation quality. We model this as a subgraph selection problem on the bipartite graph of candidate recommendations between users and items. (2) In the case of disjoint item categories and user types, we show that the resulting problems can be solved exactly in polynomial time, by a reduction to a minimum cost flow problem. (3) In the case of non-disjoint categories and user types, we prove NP-completeness of the objective and present efficient approximation algorithms using the submodularity of the objective. (4) Finally, we validate the effectiveness of our algorithms on the MovieLens-1m and Netflix datasets, and show that algorithms designed for our objective also perform well on sales diversity metrics, and even some intent-aware diversity metrics. Our experimental results justify the validity of our new composite diversity metrics.

Improving Reconstruction Autoencoder Out-of-distribution Detection with Mahalanobis Distance

There is an increasingly apparent need for validating the classifications made by deep learning systems in safety-critical applications like autonomous vehicle systems. A number of recent papers have proposed methods for detecting anomalous image data that appear different from known inlier data samples, including reconstruction-based autoencoders. Autoencoders optimize the compression of input data to a latent space of a dimensionality smaller than the original input and attempt to accurately reconstruct the input using that compressed representation. Since the latent vector is optimized to capture the salient features from the inlier class only, it is commonly assumed that images of objects from outside of the training class cannot effectively be compressed and reconstructed. Some thus consider reconstruction error as a kind of novelty measure. Here we suggest that reconstruction-based approaches fail to capture particular anomalies that lie far from known inlier samples in latent space but near the latent dimension manifold defined by the parameters of the model. We propose incorporating the Mahalanobis distance in latent space to better capture these out-of-distribution samples and our results show that this method often improves performance over the baseline approach.

SqueezeFit: Label-aware dimensionality reduction by semidefinite programming

Given labeled points in a high-dimensional vector space, we seek a low-dimensional subspace such that projecting onto this subspace maintains some prescribed distance between points of differing labels. Intended applications include compressive classification. Taking inspiration from large margin nearest neighbor classification, this paper introduces a semidefinite relaxation of this problem. Unlike its predecessors, this relaxation is amenable to theoretical analysis, allowing us to provably recover a planted projection operator from the data.

Verification of deep probabilistic models

Probabilistic models are a critical part of the modern deep learning toolbox – ranging from generative models (VAEs, GANs), sequence to sequence models used in machine translation and speech processing to models over functional spaces (conditional neural processes, neural processes). Given the size and complexity of these models, safely deploying them in applications requires the development of tools to analyze their behavior rigorously and provide some guarantees that these models are consistent with a list of desirable properties or specifications. For example, a machine translation model should produce semantically equivalent outputs for innocuous changes in the input to the model. A functional regression model that is learning a distribution over monotonic functions should predict a larger value at a larger input. Verification of these properties requires a new framework that goes beyond notions of verification studied in deterministic feedforward networks, since requiring worst-case guarantees in probabilistic models is likely to produce conservative or vacuous results. We propose a novel formulation of verification for deep probabilistic models that take in conditioning inputs and sample latent variables in the course of producing an output: We require that the output of the model satisfies a linear constraint with high probability over the sampling of latent variables and for every choice of conditioning input to the model. We show that rigorous lower bounds on the probability that the constraint is satisfied can be obtained efficiently. Experiments with neural processes show that several properties of interest while modeling functional spaces can be modeled within this framework (monotonicity, convexity) and verified efficiently using our algorithms

End-to-End Streaming Keyword Spotting

We present a system for keyword spotting that, except for a frontend component for feature generation, it is entirely contained in a deep neural network (DNN) model trained ‘end-to-end’ to predict the presence of the keyword in a stream of audio. The main contributions of this work are, first, an efficient memoized neural network topology that aims at making better use of the parameters and associated computations in the DNN by holding a memory of previous activations distributed over the depth of the DNN. The second contribution is a method to train the DNN, end-to-end, to produce the keyword spotting score. This system significantly outperforms previous approaches both in terms of quality of detection as well as size and computation.

Automatically Explaining Machine Learning Prediction Results: A Demonstration on Type 2 Diabetes Risk Prediction

Background: Predictive modeling is a key component of solutions to many healthcare problems. Among all predictive modeling approaches, machine learning methods often achieve the highest prediction accuracy, but suffer from a long-standing open problem precluding their widespread use in healthcare. Most machine learning models give no explanation for their prediction results, whereas interpretability is essential for a predictive model to be adopted in typical healthcare settings. Methods: This paper presents the first complete method for automatically explaining results for any machine learning predictive model without degrading accuracy. We did a computer coding implementation of the method. Using the electronic medical record data set from the Practice Fusion diabetes classification competition containing patient records from all 50 states in the United States, we demonstrated the method on predicting type 2 diabetes diagnosis within the next year. Results: For the champion machine learning model of the competition, our method explained prediction results for 87.4% of patients who were correctly predicted by the model to have type 2 diabetes diagnosis within the next year. Conclusions: Our demonstration showed the feasibility of automatically explaining results for any machine learning predictive model without degrading accuracy.

Progressive Sampling-Based Bayesian Optimization for Efficient and Automatic Machine Learning Model Selection

Purpose: Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. Methods: To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. Results: We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. Conclusions: This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.

Three Tools for Practical Differential Privacy

Differentially private learning on real-world data poses challenges for standard machine learning practice: privacy guarantees are difficult to interpret, hyperparameter tuning on private data reduces the privacy budget, and ad-hoc privacy attacks are often required to test model privacy. We introduce three tools to make differentially private machine learning more practical: (1) simple sanity checks which can be carried out in a centralized manner before training, (2) an adaptive clipping bound to reduce the effective number of tuneable privacy parameters, and (3) we show that large-batch training improves model performance.

Mode Shape Estimation using Complex Principal Component Analysis and k-Means Clustering

We propose an empirical method for identifying low damped modes and corresponding mode shapes using frequency measurements from a Wide Area Monitoring System. The method consists of two main steps: Firstly, Complex Principal Component Analysis is used in combination with the Hilbert Transform and Empirical Mode Decomposition to provide estimates of modes and mode shapes. The estimates are stored as multidimensional points. Secondly, the points are grouped using a clustering algorithm, and new averaged estimates of modes and mode shapes are computed as the centroids of the clusters. Applying the method on data resulting from a non-linear power system simulator yields estimates of dominant modes and corresponding mode shapes that are similar to those resulting from modal analysis of the linearized system model. Encouraged by the results, the method is further tested with real PMU data at transmission grid level. Initial results indicate that the performance of the proposed method is promising.

ShuffleNASNets: Efficient CNN models through modified Efficient Neural Architecture Search

Neural network architectures found by sophistic search algorithms achieve strikingly good test performance, surpassing most human-crafted network models by significant margins. Although computationally efficient, their design is often very complex, impairing execution speed. Additionally, finding models outside of the search space is not possible by design. While our space is still limited, we implement undiscoverable expert knowledge into the economic search algorithm Efficient Neural Architecture Search (ENAS), guided by the design principles and architecture of ShuffleNet V2. While maintaining baseline-like 2.85% test error on CIFAR-10, our ShuffleNASNets are significantly less complex, require fewer parameters, and are two times faster than the ENAS baseline in a classification task. These models also scale well to a low parameter space, achieving less than 5% test error with little regularization and only 236K parameters.

Time Series Featurization via Topological Data Analysis: an Application to Cryptocurrency Trend Forecasting

We propose a novel methodology for feature extraction from time series data based on topological data analysis. The proposed procedure applies a dimensionality reduction technique via principal component analysis to the point cloud of the Takens’ embedding from the observed time series and then evaluates the persistence landscape and silhouettes based on the corresponding Rips complex. We define a new notion of Rips distance function that is especially suited for persistence homologies built on Rips complexes and prove stability theorems for it. We use these results to demonstrate in turn some stability properties of the topological features extracted using our procedure with respect to additive noise and sampling. We further apply our method to the problem of trend forecasting for cryptocurrency prices, where we manage to achieve significantly lower error rates than more standard, non TDA-based methodologies in complex pattern classification tasks. We expect our method to provide a new insight on feature engineering for granular, noisy time series data.

A method to align time series segments based on envelope features as anchor points

In the time series analysis field, there is not a unique recipe for studying signal similarities. On the other hand, averaging signals of the same nature is an essential tool in the analysis of different kinds of data. Here we propose a method to align and average segments of time series with similar patterns. A simple implementation based on \textit{python} code is provided for the procedure. The analysis was inspired by the study of canary sound syllables, but it is possible to apply it in semi periodic signals of different nature and not necessarily related to sounds.

Deep Variational Transfer: Transfer Learning through Semi-supervised Deep Generative Models

In real-world applications, it is often expensive and time-consuming to obtain labeled examples. In such cases, knowledge transfer from related domains, where labels are abundant, could greatly reduce the need for extensive labeling efforts. In this scenario, transfer learning comes in hand. In this paper, we propose Deep Variational Transfer (DVT), a variational autoencoder that transfers knowledge across domains using a shared latent Gaussian mixture model. Thanks to the combination of a semi-supervised ELBO and parameters sharing across domains, we are able to simultaneously: (i) align all supervised examples of the same class into the same latent Gaussian Mixture component, independently from their domain; (ii) predict the class of unsupervised examples from different domains and use them to better model the occurring shifts. We perform tests on MNIST and USPS digits datasets, showing DVT’s ability to perform transfer learning across heterogeneous datasets. Additionally, we present DVT’s top classification performances on the MNIST semi-supervised learning challenge. We further validate DVT on a astronomical datasets. DVT achieves states-of-the-art classification performances, transferring knowledge across real stars surveys datasets, EROS, MACHO and HiTS, . In the worst performance, we double the achieved F1-score for rare classes. These experiments show DVT’s ability to tackle all major challenges posed by transfer learning: different covariate distributions, different and highly imbalanced class distributions and different feature spaces.