**Adaptive Matrix Completion for the Users and the Items in Tail**

Recommender systems are widely used to recommend the most appealing items to users. These recommendations can be generated by applying collaborative filtering methods. The low-rank matrix completion method is the state-of-the-art collaborative filtering method. In this work, we show that the skewed distribution of ratings in the user-item rating matrix of real-world datasets affects the accuracy of matrix-completion-based approaches. Also, we show that the number of ratings that an item or a user has positively correlates with the ability of low-rank matrix-completion-based approaches to predict the ratings for the item or the user accurately. Furthermore, we use these insights to develop four matrix completion-based approaches, i.e., Frequency Adaptive Rating Prediction (FARP), Truncated Matrix Factorization (TMF), Truncated Matrix Factorization with Dropout (TMF + Dropout) and Inverse Frequency Weighted Matrix Factorization (IFWMF), that outperforms traditional matrix-completion-based approaches for the users and the items with few ratings in the user-item rating matrix.

**Feature-based factorized Bilinear Similarity Model for Cold-Start Top-n Item Recommendation**

Recommending new items to existing users has remained a challenging problem due to absence of user’s past preferences for these items. The user personalized non-collaborative methods based on item features can be used to address this item cold-start problem. These methods rely on similarities between the target item and user’s previous preferred items. While computing similarities based on item features, these methods overlook the interactions among the features of the items and consider them independently. Modeling interactions among features can be helpful as some features, when considered together, provide a stronger signal on the relevance of an item when compared to case where features are considered independently. To address this important issue, in this work we introduce the Feature-based factorized Bilinear Similarity Model (FBSM), which learns factorized bilinear similarity model for TOP-n recommendation of new items, given the information about items preferred by users in past as well as the features of these items. We carry out extensive empirical evaluations on benchmark datasets, and we find that the proposed FBSM approach improves upon traditional non-collaborative methods in terms of recommendation performance. Moreover, the proposed approach also learns insightful interactions among item features from data, which lead to deep understanding on how these interactions contribute to personalized recommendation.

**Sparse Neural Attentive Knowledge-based Models for Grade Prediction**

Grade prediction for future courses not yet taken by students is important as it can help them and their advisers during the process of course selection as well as for designing personalized degree plans and modifying them based on their performance. One of the successful approaches for accurately predicting a student’s grades in future courses is Cumulative Knowledge-based Regression Models (CKRM). CKRM learns shallow linear models that predict a student’s grades as the similarity between his/her knowledge state and the target course. A student’s knowledge state is built by linearly accumulating the learned provided knowledge components of the courses he/she has taken in the past, weighted by his/her grades in them. However, not all the prior courses contribute equally to the target course. In this paper, we propose a novel Neural Attentive Knowledge-based model (NAK) that learns the importance of each historical course in predicting the grade of a target course. Compared to CKRM and other competing approaches, our experiments on a large real-world dataset consisting of

1.5 grades show the effectiveness of the proposed NAK model in accurately predicting the students’ grades. Moreover, the attention weights learned by the model can be helpful in better designing their degree plans.

**Appliance Event Detection — A Multivariate, Supervised Classification Approach**

Non-intrusive load monitoring (NILM) is a modern and still expanding technique, helping to understand fundamental energy consumption patterns and appliance characteristics. Appliance event detection is an elementary step in the NILM pipeline. Unfortunately, several types of appliances (e.g., switching mode power supply (SMPS) or multi-state) are known to challenge state-of-the-art event detection systems due to their noisy consumption profiles. Classical rule-based event detection system become infeasible and complex for these appliances. By stepping away from distinct event definitions, we can learn from a consumer-configured event model to differentiate between relevant and irrelevant event transients. We introduce a boosting oriented adaptive training, that uses false positives from the initial training area to reduce the number of false positives on the test area substantially. The results show a false positive decrease by more than a factor of eight on a dataset that has a strong focus on SMPS-driven appliances. To obtain a stable event detection system, we applied several experiments on different parameters to measure its performance. These experiments include the evaluation of six event features from the spectral and time domain, different types of feature space normalization to eliminate undesired feature weighting, the conventional and adaptive training, and two common classifiers with its optimal parameter settings. The evaluations are performed on two publicly available energy datasets with high sampling rates: BLUED and BLOND-50.

**Storage Solutions for Big Data Systems: A Qualitative Study and Comparison**

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed.

**Multivariate Functional Data Modeling with Time-varying Clustering**

We consider the situation where multivariate functional data has been collected over time at each of a set of sites. Our illustrative setting is bivariate, monitoring ozone and PM

levels as a function of time over the course of a year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City which record hourly ozone and PM

levels. We use the data for the year 2017. Hence, we have 48 functions to work with. Our objective is to implement model-based clustering of the functions across the sites. Using our example, such clustering can be considered for ozone and PM

individually or jointly. It may occur differentially for the two pollutants. More importantly for us, we allow that such clustering can vary with time. We model the multivariate functions across sites using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a stochastic process specification for the distribution of the collection of multivariate functions over the say

sites. Furthermore, to cluster the functions, either individually by component or jointly with all components, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise in continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ a partitioning of the time scale to capture time-varying clustering.

**Event Driven Fusion**

This paper presents a technique that combines the occurrence of certain events, as observed by different sensors, in order to detect and classify objects. This technique explores the extent of dependence between features being observed by the sensors, and generates more informed probability distributions over the events. Provided some additional information about the features of the object, this fusion technique can outperform other existing decision level fusion approaches that may not take into account the relationship between different features. Furthermore, this paper also addresses the issue of dealing with damaged sensors during implementation of the model, by learning a hidden space between sensor modalities which can be exploited to safeguard detection performance.

**Decision Forest: A Nonparametric Approach to Modeling Irrational Choice**

Customer behavior is often assumed to follow weak rationality, which implies that adding a product to an assortment will not increase the choice probability of another product in that assortment. However, an increasing amount of research has revealed that customers are not necessarily rational when making decisions. In this paper, we study a new nonparametric choice model that relaxes this assumption and can model a wider range of customer behavior, such as decoy effects between products. In this model, each customer type is associated with a binary decision tree, which represents a decision process for making a purchase based on checking for the existence of specific products in the assortment. Together with a probability distribution over customer types, we show that the resulting model — a decision forest — is able to represent any customer choice model, including models that are inconsistent with weak rationality. We theoretically characterize the depth of the forest needed to fit a data set of historical assortments and prove that asymptotically, a forest whose depth scales logarithmically in the number of assortments is sufficient to fit most data sets. We also propose an efficient algorithm for estimating such models from data, based on combining randomization and optimization. Using synthetic data and real transaction data exhibiting non-rational behavior, we show that the model outperforms the multinomial logit and ranking-based models in out-of-sample predictive ability.

**Zap~Q-Learning for Optimal Stopping Time Problems**

We propose a novel reinforcement learning algorithm that approximates solutions to the problem of discounted optimal stopping in an irreducible, uniformly ergodic Markov chain evolving on a compact subset of

. A dynamic programming approach has been taken by Tsitsikilis and Van Roy to solve this problem, wherein they propose a Q-learning algorithm to estimate the value function, in a linear function approximation setting. The Zap-Q learning algorithm proposed in this work is the first algorithm that is designed to achieve {optimal asymptotic variance}. We prove convergence of the algorithm using ODE analysis, and the optimal asymptotic variance property is reflected via fast convergence in a finance example.

**Probing What Different NLP Tasks Teach Machines about Function Word Comprehension**

We introduce a set of nine challenge tasks that test for the understanding of function words. These tasks are created by structurally mutating sentences from existing datasets to target the comprehension of specific types of function words (e.g., prepositions, wh-words). Using these probing tasks, we explore the effects of various pretraining objectives for sentence encoders (e.g., language modeling, CCG supertagging and natural language inference (NLI)) on the learned representations. Our results show that pretraining on CCG—our most syntactic objective—performs the best on average across our probing tasks, suggesting that syntactic knowledge helps function word comprehension. Language modeling also shows strong performance, supporting its widespread use for pretraining state-of-the-art NLP models. Overall, no pretraining objective dominates across the board, and our function word probing tasks highlight several intuitive differences between pretraining objectives, e.g., that NLI helps the comprehension of negation.

**Neural Text Generation from Rich Semantic Representations**

We propose neural models to generate high-quality text from structured representations based on Minimal Recursion Semantics (MRS). MRS is a rich semantic representation that encodes more precise semantic detail than other representations such as Abstract Meaning Representation (AMR). We show that a sequence-to-sequence model that maps a linearization of Dependency MRS, a graph-based representation of MRS, to English text can achieve a BLEU score of 66.11 when trained on gold data. The performance can be improved further using a high-precision, broad coverage grammar-based parser to generate a large silver training corpus, achieving a final BLEU score of 77.17 on the full test set, and 83.37 on the subset of test data most closely matching the silver data domain. Our results suggest that MRS-based representations are a good choice for applications that need both structured semantics and the ability to produce natural language text as output.

**Derivative-free optimization methods**

In many optimization problems arising from scientific, engineering and artificial intelligence applications, objective and constraint functions are available only as the output of a black-box or simulation oracle that does not provide derivative information. Such settings necessitate the use of methods for derivative-free, or zeroth-order, optimization. We provide a review and perspectives on developments in these methods, with an emphasis on highlighting recent developments and on unifying treatment of such problems in the non-linear optimization and machine learning literature. We categorize methods based on assumed properties of the black-box functions, as well as features of the methods. We first overview the primary setting of deterministic methods applied to unconstrained, non-convex optimization problems where the objective function is defined by a deterministic black-box oracle. We then discuss developments in randomized methods, methods that assume some additional structure about the objective (including convexity, separability and general non-smooth compositions), methods for problems where the output of the black-box oracle is stochastic, and methods for handling different types of constraints.

**Bayesian Factor Analysis for Inference on Interactions**

This article is motivated by the problem of inference on interactions among chemical exposures impacting human health outcomes. Chemicals often co-occur in the environment or in synthetic mixtures and as a result exposure levels can be highly correlated. We propose a latent factor joint model, which includes shared factors in both the predictor and response components while assuming conditional independence. By including a quadratic regression in the latent variables in the response component, we induce flexible dimension reduction in characterizing main effects and interactions. We propose a Bayesian approach to inference under this Factor analysis for INteractions (FIN) framework. Through appropriate modifications of the factor modeling structure, FIN can accommodate higher order interactions and multivariate outcomes. We provide theory on posterior consistency and the impact of misspecifying the number of factors. We evaluate the performance using a simulation study and data from the National Health and Nutrition Examination Survey (NHANES). Code is available on GitHub.

**Meta-Sim: Learning to Generate Synthetic Datasets**

Training models to high-end performance requires availability of large labeled datasets, which are expensive to get. The goal of our work is to automatically synthesize labeled datasets that are relevant for a downstream task. We propose Meta-Sim, which learns a generative model of synthetic scenes, and obtain images as well as its corresponding ground-truth via a graphics engine. We parametrize our dataset generator with a neural network, which learns to modify attributes of scene graphs obtained from probabilistic scene grammars, so as to minimize the distribution gap between its rendered outputs and target data. If the real dataset comes with a small labeled validation set, we additionally aim to optimize a meta-objective, i.e. downstream task performance. Experiments show that the proposed method can greatly improve content generation quality over a human-engineered probabilistic scene grammar, both qualitatively and quantitatively as measured by performance on a downstream task.

**Testing statistical laws in complex systems**

The availability of large datasets requires an improved view on statistical laws in complex systems, such as Zipf’s law of word frequencies, the Gutenberg-Richter law of earthquake magnitudes, or scale-free degree distribution in networks. In this paper we discuss how the statistical analysis of these laws are affected by correlations present in the observations, the typical scenario for data from complex systems. We first show how standard maximum-likelihood recipes lead to false rejections of statistical laws in the presence of correlations. We then propose a conservative method (based on shuffling and under-sampling the data) to test statistical laws and find that accounting for correlations leads to smaller rejection rates and larger confidence intervals on estimated parameters.

**From Predictions to Prescriptions in Multistage Optimization Problems**

In this paper, we introduce a framework for solving finite-horizon multistage optimization problems under uncertainty in the presence of auxiliary data. We assume the joint distribution of the uncertain quantities is unknown, but noisy observations, along with observations of auxiliary covariates, are available. We utilize effective predictive methods from machine learning (ML), including

-nearest neighbors regression (

NN), classification and regression trees (CART), and random forests (RF), to develop specific methods that are applicable to a wide variety of problems. We demonstrate that our solution methods are asymptotically optimal under mild conditions. Additionally, we establish finite sample guarantees for the optimality of our method with

NN weight functions. Finally, we demonstrate the practicality of our approach with computational examples. We see a significant decrease in cost by taking into account the auxiliary data in the multistage setting.

**Bayesian Generative Active Deep Learning**

Deep learning models have demonstrated outstanding performance in several problems, but their training process tends to require immense amounts of computational and human resources for training and labeling, constraining the types of problems that can be tackled. Therefore, the design of effective training methods that require small labeled training sets is an important research direction that will allow a more effective use of resources. Among current approaches designed to address this issue, two are particularly interesting: data augmentation and active learning. Data augmentation achieves this goal by artificially generating new training points, while active learning relies on the selection of the ‘most informative’ subset of unlabeled training samples to be labelled by an oracle. Although successful in practice, data augmentation can waste computational resources because it indiscriminately generates samples that are not guaranteed to be informative, and active learning selects a small subset of informative samples (from a large un-annotated set) that may be insufficient for the training process. In this paper, we propose a Bayesian generative active deep learning approach that combines active learning with data augmentation — we provide theoretical and empirical evidence (MNIST, CIFAR-

, and SVHN) that our approach has more efficient training and better classification results than data augmentation and active learning.

**Regular Expression Matching on billion-nodes Graphs**

In many applications, it is necessary to retrieve pairs of vertices with the path between them satisfying certain constraints, since regular expression is a powerful tool to describe patterns of a sequence. To meet such requirements, in this paper, we define regular expression (RE) query on graphs to use regular expression to represent the constraints between vertices. To process RE queries on large graphs such as social networks, we propose the RE query processing method with the index size sublinear to the graph size. Considering that large graphs may be randomly distributed in multiple machines, the parallel RE processing algorithms are presented without the assumption of graph distribution. To achieve high efficiency for complex RE query processing, we develop cost-based query optimization strategies with only a small size statistical information which is suitable for querying large graphs. Comprehensive experimental results show that this approach works scale well for large graphs.

**AutoKGE: Searching Scoring Functions for Knowledge Graph Embedding**

Knowledge graph embedding (KGE) aims to find low dimensional vector representations of entities and relations so that their similarities can be quantized. Scoring functions (SFs), which are used to build a model to measure the similarity between entities based on a given relation, have developed as the crux of KGE. Humans have designed lots of SFs in the literature, and the evolving of SF has become the primary power source of boosting KGE’s performance. However, such improvements gradually get marginal. Besides, with so many SFs, how to make a proper choice among existing SFs already becomes a non-trivial problem. Inspired by the recent success of automated machine learning (AutoML), in this paper, we propose automated KGE (AutoKGE), to design and discover distinct SFs for KGE automatically. We first identify a unified representation over popularly used SFs, which helps to set up a search space for AutoKGE. Then, we propose a greedy algorithm, which is enhanced by a predictor to estimate the final performance without model training, to search through the space. Extensive experiments on benchmark datasets demonstrate the effectiveness and efficiency of our AutoKGE. Finally, the SFs, searched by our method, are KG dependent, new to the literature, and outperform existing state-of-the-arts SFs designed by humans.

**Neural Logic Machines**

We propose the Neural Logic Machine (NLM), a neural-symbolic architecture for both inductive learning and logic reasoning. NLMs exploit the power of both neural networks—as function approximators, and logic programming—as a symbolic processor for objects with properties, relations, logic connectives, and quantifiers. After being trained on small-scale tasks (such as sorting short arrays), NLMs can recover lifted rules, and generalize to large-scale tasks (such as sorting longer arrays). In our experiments, NLMs achieve perfect generalization in a number of tasks, from relational reasoning tasks on the family tree and general graphs, to decision making tasks including sorting arrays, finding shortest paths, and playing the blocks world. Most of these tasks are hard to accomplish for neural networks or inductive logic programming alone.

**General risk measures for robust machine learning**

A wide array of machine learning problems are formulated as the minimization of the expectation of a convex loss function on some parameter space. Since the probability distribution of the data of interest is usually unknown, it is is often estimated from training sets, which may lead to poor out-of-sample performance. In this work, we bring new insights in this problem by using the framework which has been developed in quantitative finance for risk measures. We show that the original min-max problem can be recast as a convex minimization problem under suitable assumptions. We discuss several important examples of robust formulations, in particular by defining ambiguity sets based on

-divergences and the Wasserstein metric.We also propose an efficient algorithm for solving the corresponding convex optimization problems involving complex convex constraints. Through simulation examples, we demonstrate that this algorithm scales well on real data sets.

**Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory**

Deep learning based knowledge tracing model has been shown to outperform traditional knowledge tracing model without the need for human-engineered features, yet its parameters and representations have long been criticized for not being explainable. In this paper, we propose Deep-IRT (deep item response theory) which is a synthesis of the item response theory (IRT) model and a knowledge tracing model that is based on the deep neural network architecture called dynamic key-value memory network (DKVMN) to make deep learning based knowledge tracing explainable. Specifically, we use the DKVMN model to process the student’s learning trajectory and estimate the student ability level and the item difficulty level over time. Then, we use the IRT model to estimate the probability that a student will answer an item correctly using the estimated student ability and the item difficulty. Experiments show that the Deep-IRT model retains the performance of the DKVMN model, while it provides a direct psychological interpretation of both students and items.

**Representation Similarity Analysis for Efficient Task taxonomy & Transfer Learning**

Transfer learning is widely used in deep neural network models when there are few labeled examples available. The common approach is to take a pre-trained network in a similar task and finetune the model parameters. This is usually done blindly without a pre-selection from a set of pre-trained models, or by finetuning a set of models trained on different tasks and selecting the best performing one by cross-validation. We address this problem by proposing an approach to assess the relationship between visual tasks and their task-specific models. Our method uses Representation Similarity Analysis (RSA), which is commonly used to find a correlation between neuronal responses from brain data and models. With RSA we obtain a similarity score among tasks by computing correlations between models trained on different tasks. Our method is efficient as it requires only pre-trained models, and a few images with no further training. We demonstrate the effectiveness and efficiency of our method for generating task taxonomy on Taskonomy dataset. We next evaluate the relationship of RSA with the transfer learning performance on Taskonomy tasks and a new task: Pascal VOC semantic segmentation. Our results reveal that models trained on tasks with higher similarity score show higher transfer learning performance. Surprisingly, the best transfer learning result for Pascal VOC semantic segmentation is not obtained from the pre-trained model on semantic segmentation, probably due to the domain differences, and our method successfully selects the high performing models.

**Poisson PCA: Poisson Measurement Error corrected PCA, with Application to Microbiome Data**

In this paper, we study the problem of computing a Principal Component Analysis of data affected by Poisson noise. We assume samples are drawn from independent Poisson distributions. We want to estimate principle components of a fixed transformation of the latent Poisson means. Our motivating example is microbiome data, though the methods apply to many other situations. We develop a semiparametric approach to correct the bias of variance estimators, both for untransformed and transformed (with particular attention to log-transformation) Poisson means. Furthermore, we incorporate methods for correcting different exposure or sequencing depth in the data. In addition to identifying the principal components, we also address the non-trivial problem of computing the principal scores in this semiparametric framework. Most previous approaches tend to take a more parametric line. For example the Poisson-log-normal (PLN) model, approach. We compare our method with the PLN approach and find that our method is better at identifying the main principal components of the latent log-transformed Poisson means, and as a further major advantage, takes far less time to compute. Comparing methods on real data, we see that our method also appears to be more robust to outliers than the parametric method.

**Parameter clustering in Bayesian functional PCA of fMRI data**

The extraordinary advancements in neuroscientific technology for brain recordings over the last decades have led to increasingly complex spatio-temporal datasets. To reduce oversimplifications, new models have been developed to be able to identify meaningful patterns and new insights within a highly demanding data environment. To this extent, we propose a new model that merges ideas from Functional Data Analysis and Bayesian nonparametrics to obtain a flexible and computationally feasible exploration of spatio-temporal neuroscientific data. In particular, we make use of a Dirichlet process Gaussian mixture model to cluster functional Principal Component (fPC) scores within the standard Bayesian functional Principal Component Analysis (fPCA) framework; this allows us to capture the structure of spatial dependence among smoothed time series (curves) and its interaction with the time domain. Moreover, by moving the mixture from data to fPC scores, we obtain a more general clustering procedure, thus allowing much finer curve classification and higher level of intricate insight and understanding of the data. We present results from a Monte Carlo simulation study showing improvements in curves and correlation reconstruction compared with the standard Bayesian fPCA model before applying our method to a resting-state fMRI data analysis providing a rich exploration of the spatio-temporal dependence in brain time series that offer further insights into the underlying neurophysiological processes.

**Robustness Verification of Support Vector Machines**

We study the problem of formally verifying the robustness to adversarial examples of support vector machines (SVMs), a major machine learning model for classification and regression tasks. Following a recent stream of works on formal robustness verification of (deep) neural networks, our approach relies on a sound abstract version of a given SVM classifier to be used for checking its robustness. This methodology is parametric on a given numerical abstraction of real values and, analogously to the case of neural networks, needs neither abstract least upper bounds nor widening operators on this abstraction. The standard interval domain provides a simple instantiation of our abstraction technique, which is enhanced with the domain of reduced affine forms, which is an efficient abstraction of the zonotope abstract domain. This robustness verification technique has been fully implemented and experimentally evaluated on SVMs based on linear and nonlinear (polynomial and radial basis function) kernels, which have been trained on the popular MNIST dataset of images and on the recent and more challenging Fashion-MNIST dataset. The experimental results of our prototype SVM robustness verifier appear to be encouraging: this automated verification is fast, scalable and shows significantly high percentages of provable robustness on the test set of MNIST, in particular compared to the analogous provable robustness of neural networks.

**Think Again Networks, the Delta Loss, and an Application in Language Modeling**

This short paper introduces an abstraction called Think Again Networks (ThinkNet) which can be applied to any state-dependent function (such as a recurrent neural network). Here we show a simple application in Language Modeling which achieves state of the art perplexity on the Penn Treebank.

**AlphaClean: Automatic Generation of Data Cleaning Pipelines**

The analyst effort in data cleaning is gradually shifting away from the design of hand-written scripts to building and tuning complex pipelines of automated data cleaning libraries. Hyper-parameter tuning for data cleaning is very different than hyper-parameter tuning for machine learning since the pipeline components and objective functions have structure that tuning algorithms can exploit. This paper proposes a framework, called AlphaClean, that rethinks parameter tuning for data cleaning pipelines. AlphaClean provides users with a rich library to define data quality measures with weighted sums of SQL aggregate queries. AlphaClean applies generate-then-search framework where each pipelined cleaning operator contributes candidate transformations to a shared pool. Asynchronously, in separate threads, a search algorithm sequences them into cleaning pipelines that maximize the user-defined quality measures. This architecture allows AlphaClean to apply a number of optimizations including incremental evaluation of the quality measures and learning dynamic pruning rules to reduce the search space. Our experiments on real and synthetic benchmarks suggest that AlphaClean finds solutions of up-to 9x higher quality than naively applying state-of-the-art parameter tuning methods, is significantly more robust to straggling data cleaning methods and redundancy in the data cleaning library, and can incorporate state-of-the-art cleaning systems such as HoloClean as cleaning operators.

**Graph Optimized Convolutional Networks**

Graph Convolutional Networks (GCNs) have been widely studied for graph data representation and learning tasks. Existing GCNs generally use a fixed single graph which may lead to weak suboptimal for data representation/learning and are also hard to deal with multiple graphs. To address these issues, we propose a novel Graph Optimized Convolutional Network (GOCN) for graph data representation and learning. Our GOCN is motivated based on our re-interpretation of graph convolution from a regularization/optimization framework. The core idea of GOCN is to formulate graph optimization and graph convolutional representation into a unified framework and thus conducts both of them cooperatively to boost their respective performance in GCN learning scheme. Moreover, based on the proposed unified graph optimization-convolution framework, we propose a novel Multiple Graph Optimized Convolutional Network (M-GOCN) to naturally address the data with multiple graphs. Experimental results demonstrate the effectiveness and benefit of the proposed GOCN and M-GOCN.

**Multi-group Binary Choice with Social Interaction and a Random Communication Structure – a Random Graph Approach**

We construct and analyze a random graph model for discrete choice with social interaction and several groups of equal size. We concentrate on the case of two groups of equal sizes and we allow the interaction strength within a group to differ from the interaction strength between the two groups. Given that the resulting graph is sufficiently dense we show that, with probability one, the average decision in each of the two groups is the same as in the fully connected model. In particular, we show that there is a phase transition: If the interaction among a group and between the groups is strong enough the average decision per group will either be positive or negative and the decision of the two groups will be correlated. We also compute the free energy per particle in our model.

**Perceptual Attention-based Predictive Control**

In this paper, we present a novel information processing architecture for end-to-end visual navigation of autonomous systems. The proposed information processing architecture is used to support a perceptual attention-based predictive control algorithm that leverages model predictive control, convolutional neural networks and uncertainty quantification methods. The key idea relies on using model predictive control to train convolutional neural networks to predict regions of interest in the input visual information. These regions of interest are then used as input to the Macula-Network, a 3D convolutional neural network that is trained to produce control actions as well as estimates of epistemic and aleatoric uncertainty in the incoming stream of data. The proposed architecture is tested on simulated examples and a 1:5 scale terrestrial vehicle. Experimental results show that the proposed architecture outperforms previous approaches on early detection of novel object/data which are outside of the initial training set. The proposed architecture is a first step towards using end-to-end perceptual control policies in safety-critical domains.

**Evaluating the Success of a Data Analysis**

A fundamental problem in the practice and teaching of data science is how to evaluate the quality of a given data analysis, which is different than the evaluation of the science or question underlying the data analysis. Previously, we defined a set of principles for describing data analyses that can be used to create a data analysis and to characterize the variation between data analyses. Here, we introduce a metric of quality evaluation that we call the success of a data analysis, which is different than other potential metrics such as completeness, validity, or honesty. We define a successful data analysis as the matching of principles between the analyst and the audience on which the analysis is developed. In this paper, we propose a statistical model and general framework for evaluating the success of a data analysis. We argue that this framework can be used as a guide for practicing data scientists and students in data science courses for how to build a successful data analysis.

**New visualizations for Monte Carlo simulations**

In Monte Carlo simulations, samples are obtained from a target distribution in order to estimate various features. We present a flexible class of visualizations for assessing the quality of estimation, which are principled, practical, and easy to implement. To this end, we establish joint asymptotic normality for any collection of means and quantiles. Using the limit distribution, we construct

level simultaneous confidence intervals, which we integrate within visualization plots. We demonstrate the utility of our visualizations in various Monte Carlo simulation settings including Monte Carlo estimation of expectations and quantiles, Monte Carlo simulation studies, and Bayesian analyses using Markov chain Monte Carlo sampling. The marginal-friendly interpretation enables practitioners to visualize simultaneous uncertainty, a substantial improvement from current visualizations.

**SWALP : Stochastic Weight Averaging in Low-Precision Training**

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.

**On Exact Computation with an Infinitely Wide Neural Net**

How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its ‘width’ — namely, number of channels in convolutional layers, and number of nodes in fully-connected internal layers — is allowed to increase to infinity? Such questions have come to the forefront in the quest to theoretically understand deep learning and its mysteries about optimization and generalization. They also connect deep learning to notions such as Gaussian processes and kernels. A recent paper [Jacot et al., 2018] introduced the Neural Tangent Kernel (NTK) which captures the behavior of fully-connected deep nets in the infinite width limit trained by gradient descent; this object was implicit in some other recent papers. A subsequent paper [Lee et al., 2019] gave heuristic Monte Carlo methods to estimate the NTK and its extension, Convolutional Neural Tangent Kernel (CNTK) and used this to try to understand the limiting behavior on datasets like CIFAR-10. The current paper gives the first efficient exact algorithm (based upon dynamic programming) for computing CNTK as well as an efficient GPU implementation of this algorithm. This results in a significant new benchmark for performance of a pure kernel-based method on CIFAR-10, being 10% higher than the methods reported in [Novak et al., 2019], and only 5% lower than the performance of the corresponding finite deep net architecture (once batch normalization etc. are turned off). We give the first non-asymptotic proof showing that a fully-trained sufficiently wide net is indeed equivalent to the kernel regression predictor using NTK. Our experiments also demonstrate that earlier Monte Carlo approximation can degrade the performance significantly, thus highlighting the power of our exact kernel computation, which we have applied even to the full CIFAR-10 dataset and 20-layer nets.

**Lifting AutoEncoders: Unsupervised Learning of a Fully-Disentangled 3D Morphable Model using Deep Non-Rigid Structure from Motion**

In this work we introduce Lifting Autoencoders, a generative 3D surface-based model of object categories. We bring together ideas from non-rigid structure from motion, image formation, and morphable models to learn a controllable, geometric model of 3D categories in an entirely unsupervised manner from an unstructured set of images. We exploit the 3D geometric nature of our model and use normal information to disentangle appearance into illumination, shading and albedo. We further use weak supervision to disentangle the non-rigid shape variability of human faces into identity and expression. We combine the 3D representation with a differentiable renderer to generate RGB images and append an adversarially trained refinement network to obtain sharp, photorealistic image reconstruction results. The learned generative model can be controlled in terms of interpretable geometry and appearance factors, allowing us to perform photorealistic image manipulation of identity, expression, 3D pose, and illumination properties.

**CoachAI: A Conversational Agent Assisted Health Coaching Platform**

Poor lifestyle represents a health risk factor and is the leading cause of morbidity and chronic conditions. The impact of poor lifestyle can be significantly altered by individual behavior change. Although the current shift in healthcare towards a long lasting modifiable behavior, however, with increasing caregiver workload and individuals’ continuous needs of care, there is a need to ease caregiver’s work while ensuring continuous interaction with users. This paper describes the design and validation of CoachAI, a conversational agent assisted health coaching system to support health intervention delivery to individuals and groups. CoachAI instantiates a text based healthcare chatbot system that bridges the remote human coach and the users. This research provides three main contributions to the preventive healthcare and healthy lifestyle promotion: (1) it presents the conversational agent to aid the caregiver; (2) it aims to decrease caregiver’s workload and enhance care given to users, by handling (automating) repetitive caregiver tasks; and (3) it presents a domain independent mobile health conversational agent for health intervention delivery. We will discuss our approach and analyze the results of a one month validation study on physical activity, healthy diet and stress management.