Large Scale Product Categorization using Structured and Unstructured Attributes

Product categorization using text data for eCommerce is a very challenging extreme classification problem with several thousands of classes and several millions of products to classify. Even though multi-class text classification is a well studied problem both in academia and industry, most approaches either deal with treating product content as a single pile of text, or only consider a few product attributes for modelling purposes. Given the variety of products sold on popular eCommerce platforms, it is hard to consider all available product attributes as part of the modeling exercise, considering that products possess their own unique set of attributes based on category. In this paper, we compare hierarchical models to flat models and show that in specific cases, flat models perform better. We explore two Deep Learning based models that extract features from individual pieces of unstructured data from each product and then combine them to create a product signature. We also propose a novel idea of using structured attributes and their values together in an unstructured fashion along with convolutional filters such that the ordering of the attributes and the differing attributes by product categories no longer becomes a modelling challenge. This approach is also more robust to the presence of faulty product attribute names and values and can elegantly generalize to use both closed list and open list attributes.

Automatic Ontology Learning from Domain-Specific Short Unstructured Text Data

Ontology learning is a critical task in industry, dealing with identifying and extracting concepts captured in text data such that these concepts can be used in different tasks, e.g. information retrieval. Ontology learning is non-trivial due to several reasons with limited amount of prior research work that automatically learns a domain specific ontology from data. In our work, we propose a two-stage classification system to automatically learn an ontology from unstructured text data. We first collect candidate concepts, which are classified into concepts and irrelevant collocates by our first classifier. The concepts from the first classifier are further classified by the second classifier into different concept types. The proposed system is deployed as a prototype at a company and its performance is validated by using complaint and repair verbatim data collected in automotive industry from different data sources.

Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond

We propose a static loop vectorization optimization on top of high level dataflow IR used by frameworks like TensorFlow. A new statically vectorized parallel-for abstraction is provided on top of TensorFlow, and used for applications ranging from auto-batching and per-example gradients, to jacobian computation, optimized map functions and input pipeline optimization. We report huge speedups compared to both loop based implementations, as well as run-time batching adopted by the DyNet framework.

A cooperative game for automated learning of elasto-plasticity knowledge graphs and models with AI-guided experimentation

We introduce a multi-agent meta-modeling game to generate data, knowledge, and models that make predictions on constitutive responses of elasto-plastic materials. We introduce a new concept from graph theory where a modeler agent is tasked with evaluating all the modeling options recast as a directed multigraph and find the optimal path that links the source of the directed graph (e.g. strain history) to the target (e.g. stress) measured by an objective function. Meanwhile, the data agent, which is tasked with generating data from real or virtual experiments (e.g. molecular dynamics, discrete element simulations), interacts with the modeling agent sequentially and uses reinforcement learning to design new experiments to optimize the prediction capacity. Consequently, this treatment enables us to emulate an idealized scientific collaboration as selections of the optimal choices in a decision tree search done automatically via deep reinforcement learning.

RESTORE: Automated Regression Testing for Datasets

In data mining, the data in various business cases (e.g., sales, marketing, and demography) gets refreshed periodically. During the refresh, the old dataset is replaced by a new one. Confirming the quality of the new dataset can be challenging because changes are inevitable. How do analysts distinguish reasonable real-world changes vs. errors related to data capture or data transformation? While some of the errors are easy to spot, the others may be more subtle. In order to detect such types of errors, an analyst will typically have to examine the data manually and assess if the data produced are ‘believable’. Due to the scale of data, such examination is tedious and laborious. Thus, to save the analyst’s time, it is important to detect these errors automatically. However, both the literature and the industry are still lacking methods to assess the difference between old and new versions of a dataset during the refresh process. In this paper, we present a comprehensive set of tests for the detection of abnormalities in a refreshed dataset, based on the information obtained from a previous vintage of the dataset. We implement these tests in automated test harness made available as an open-source package, called RESTORE, for R language. The harness accepts flat or hierarchical numeric datasets. We also present a validation case study, where we apply our test harness to hierarchical demographic datasets. The results of the study and feedback from data scientists using the package suggest that RESTORE enables fast and efficient detection of errors in the data as well as decreases the cost of testing.

Transparency, Fairness, Data Protection, Neutrality: Data Management Challenges in the Face of New Regulation

The data revolution continues to transform every sector of science, industry and government. Due to the incredible impact of data-driven technology on society, we are becoming increasingly aware of the imperative to use data and algorithms responsibly — in accordance with laws and ethical norms. In this article we discuss three recent regulatory frameworks: the European Union’s General Data Protection Regulation (GDPR), the New York City Automated Decisions Systems (ADS) Law, and the Net Neutrality principle, that aim to protect the rights of individuals who are impacted by data collection and analysis. These frameworks are prominent examples of a global trend: Governments are starting to recognize the need to regulate data-driven algorithmic technology. Our goal in this paper is to bring these regulatory frameworks to the attention of the data management community, and to underscore the technical challenges they raise and which we, as a community, are well-equipped to address. The main take-away of this article is that legal and ethical norms cannot be incorporated into data-driven systems as an afterthought. Rather, we must think in terms of responsibility by design, viewing it as a systems requirement.

RoPAD: Robust Presentation Attack Detection through Unsupervised Adversarial Invariance

For enterprise, personal and societal applications, there is now an increasing demand for automated authentication of identity from images using computer vision. However, current authentication technologies are still vulnerable to presentation attacks. We present RoPAD, an end-to-end deep learning model for presentation attack detection that employs unsupervised adversarial invariance to ignore visual distractors in images for increased robustness and reduced overfitting. Experiments show that the proposed framework exhibits state-of-the-art performance on presentation attack detection on several benchmark datasets.

Everything old is new again: A multi-view learning approach to learning using privileged information and distillation

We adopt a multi-view approach for analyzing two knowledge transfer settings—learning using privileged information (LUPI) and distillation—in a common framework. Under reasonable assumptions about the complexities of hypothesis spaces, and being optimistic about the expected loss achievable by the student (in distillation) and a transformed teacher predictor (in LUPI), we show that encouraging agreement between the teacher and the student leads to reduced search space. As a result, improved convergence rate can be obtained with regularized empirical risk minimization.

Image Privacy Prediction Using Deep Neural Networks

Images today are increasingly shared online on social networking sites such as Facebook, Flickr, Foursquare, and Instagram. Despite that current social networking sites allow users to change their privacy preferences, this is often a cumbersome task for the vast majority of users on the Web, who face difficulties in assigning and managing privacy settings. Thus, automatically predicting images’ privacy to warn users about private or sensitive content before uploading these images on social networking sites has become a necessity in our current interconnected world. In this paper, we explore learning models to automatically predict appropriate images’ privacy as private or public using carefully identified image-specific features. We study deep visual semantic features that are derived from various layers of Convolutional Neural Networks (CNNs) as well as textual features such as user tags and deep tags generated from deep CNNs. Particularly, we extract deep (visual and tag) features from four pre-trained CNN architectures for object recognition, i.e., AlexNet, GoogLeNet, VGG-16, and ResNet, and compare their performance for image privacy prediction. Results of our experiments on a Flickr dataset of over thirty thousand images show that the learning models trained on features extracted from ResNet outperform the state-of-the-art models for image privacy prediction. We further investigate the combination of user tags and deep tags derived from CNN architectures using two settings: (1) SVM on the bag-of-tags features; and (2) text-based CNN. Our results show that even though the models trained on the visual features perform better than those trained on the tag features, the combination of deep visual features with image tags shows improvements in performance over the individual feature sets.

Jointly Learning Explainable Rules for Recommendation with Knowledge Graph

Explainability and effectiveness are two key aspects for building recommender systems. Prior efforts mostly focus on incorporating side information to achieve better recommendation performance. However, these methods have some weaknesses: (1) prediction of neural network-based embedding methods are hard to explain and debug; (2) symbolic, graph-based approaches (e.g., meta path-based models) require manual efforts and domain knowledge to define patterns and rules, and ignore the item association types (e.g. substitutable and complementary). In this paper, we propose a novel joint learning framework to integrate \textit{induction of explainable rules from knowledge graph} with \textit{construction of a rule-guided neural recommendation model}. The framework encourages two modules to complement each other in generating effective and explainable recommendation: 1) inductive rules, mined from item-centric knowledge graphs, summarize common multi-hop relational patterns for inferring different item associations and provide human-readable explanation for model prediction; 2) recommendation module can be augmented by induced rules and thus have better generalization ability dealing with the cold-start issue. Extensive experiments\footnote{Code and data can be found at: \url{https://…/RuleRec}} show that our proposed method has achieved significant improvements in item recommendation over baselines on real-world datasets. Our model demonstrates robust performance over ‘noisy’ item knowledge graphs, generated by linking item names to related entities.

Machine Learning Based Prediction and Classification of Computational Jobs in Cloud Computing Centers

With the rapid growth of the data volume and the fast increasing of the computational model complexity in the scenario of cloud computing, it becomes an important topic that how to handle users’ requests by scheduling computational jobs and assigning the resources in data center. In order to have a better perception of the computing jobs and their requests of resources, we analyze its characteristics and focus on the prediction and classification of the computing jobs with some machine learning approaches. Specifically, we apply LSTM neural network to predict the arrival of the jobs and the aggregated requests for computing resources. Then we evaluate it on Google Cluster dataset and it shows that the accuracy has been improved compared to the current existing methods. Additionally, to have a better understanding of the computing jobs, we use an unsupervised hierarchical clustering algorithm, BIRCH, to make classification and get some interpretability of our results in the computing centers.

Mutual Clustering on Comparative Texts via Heterogeneous Information Networks

Currently, many intelligence systems contain the texts from multi-sources, e.g., bulletin board system (BBS) posts, tweets and news. These texts can be “comparative” since they may be semantically correlated and thus provide us with different perspectives toward the same topics or events. To better organize the multi-sourced texts and obtain more comprehensive knowledge, we propose to study the novel problem of Mutual Clustering on Comparative Texts (MCCT), which aims to cluster the comparative texts simultaneously and collaboratively. The MCCT problem is difficult to address because 1) comparative texts usually present different data formats and structures and thus they are hard to organize, and 2) there lacks an effective method to connect the semantically correlated comparative texts to facilitate clustering them in an unified way. To this aim, in this paper we propose a Heterogeneous Information Network-based Text clustering framework HINT. HINT first models multi-sourced texts (e.g. news and tweets) as heterogeneous information networks by introducing the shared “anchor texts” to connect the comparative texts. Next, two similarity matrices based on HINT as well as a transition matrix for cross-text-source knowledge transfer are constructed. Comparative texts clustering are then conducted by utilizing the constructed matrices. Finally, a mutual clustering algorithm is also proposed to further unify the separate clustering results of the comparative texts by introducing a clustering consistency constraint. We conduct extensive experimental on three tweets-news datasets, and the results demonstrate the effectiveness and robustness of the proposed method in addressing the MCCT problem.

Interpreting and Understanding Graph Convolutional Neural Network using Gradient-based Attribution Methods

In order to solve the problem that convolutional neural networks (CNN) are difficult to process non-image type relational data, Kipf et al. proposed a graph convolutional neural network (GCN). The core idea is to perform two-fold information fusion for each node in a given graph during each iteration: the fusion of graph structure information and the fusion of node feature dimensions. Although GCN has been widely used in the fields of scene semantic relationship analysis, natural language processing, and few-shot learning because of its ability to combine generalization, owing to its two-information fusion involves mathematical irreversible calculations, it is hard for GCN to explain that the predicting reason for each node classification (i.e. attribution analysis). However, the existing attribution analysis methods cannot be directly applied to the GCN because compared with the independence among CNN input data, there is correlation between GCN input data. This leads to the existing attribution method only to obtain the partial contribution of the final decision of the GCN from target node feature, the complete contribution and the contribution from neighbor nodes features cannot be obtained. To this end, we propose a gradient attribution analysis method for GCN, NAM (Node Attribution Method), can get the contribution of the target node and its neighbor nodes to the GCN output. We also propose the NIV (Node Importance Visualization) method to visualize the target node of the GCN and its neighbor nodes based on the value of the contribution value. We use the perturbation analysis method to verify the effect of NAM based on the citation network dataset. The experimental results show that NAM can well learn the contribution of each node to the node classification prediction.

Logic Rules Powered Knowledge Graph Embedding

Large scale knowledge graph embedding has attracted much attention from both academia and industry in the field of Artificial Intelligence. However, most existing methods concentrate solely on fact triples contained in the given knowledge graph. Inspired by the fact that logic rules can provide a flexible and declarative language for expressing rich background knowledge, it is natural to integrate logic rules into knowledge graph embedding, to transfer human knowledge to entity and relation embedding, and strengthen the learning process. In this paper, we propose a novel logic rule-enhanced method which can be easily integrated with any translation based knowledge graph embedding model, such as TransE . We first introduce a method to automatically mine the logic rules and corresponding confidences from the triples. And then, to put both triples and mined logic rules within the same semantic space, all triples in the knowledge graph are represented as first-order logic. Finally, we define several operations on the first-order logic and minimize a global loss over both of the mined logic rules and the transformed first-order logics. We conduct extensive experiments for link prediction and triple classification on three datasets: WN18, FB166, and FB15K. Experiments show that the rule-enhanced method can significantly improve the performance of several baselines. The highlight of our model is that the filtered Hits@1, which is a pivotal evaluation in the knowledge inference task, has a significant improvement (up to 700% improvement).

Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

Achieving good speed and accuracy trade-off on target platform is very important in deploying deep neural networks. Most existing automatic architecture search approaches only pursue high performance but ignores such an important factor. In this work, we propose an algorithm ‘Partial Order Pruning’ to prune architecture search space with partial order assumption, quickly lift the boundary of speed/accuracy trade-off on target platform, and automatically search the architecture with the best speed and accuracy trade-off. Our algorithm explicitly take profile information about the inference speed on target platform into consideration. With the proposed algorithm, we present several ‘Dongfeng’ networks that provide high accuracy and fast inference speed on various application GPU platforms. By further searching decoder architecture, our DF-Seg real-time segmentation models yields state-of-the-art speed/accuracy trade-off on both embedded device and high-end GPU.

Program Classification Using Gated Graph Attention Neural Network for Online Programming Service

The online programing services, such as Github,TopCoder, and EduCoder, have promoted a lot of social interactions among the service users. However, the existing social interactions is rather limited and inefficient due to the rapid increasing of source-code repositories, which is difficult to explore manually. The emergence of source-code mining provides a promising way to analyze those source codes, so that those source codes can be relatively easy to understand and share among those service users. Among all the source-code mining attempts,program classification lays a foundation for various tasks related to source-code understanding, because it is impossible for a machine to understand a computer program if it cannot classify the program correctly. Although numerous machine learning models, such as the Natural Language Processing (NLP) based models and the Abstract Syntax Tree (AST) based models, have been proposed to classify computer programs based on their corresponding source codes, the existing works cannot fully characterize the source codes from the perspective of both the syntax and semantic information. To address this problem, we proposed a Graph Neural Network (GNN) based model, which integrates data flow and function call information to the AST,and applies an improved GNN model to the integrated graph, so as to achieve the state-of-art program classification accuracy. The experiment results have shown that the proposed work can classify programs with accuracy over 97%.

Successive Over Relaxation Q-Learning

In a discounted reward Markov Decision Process (MDP) the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and fixed point iteration scheme known as value iteration is utilized to obtain the solution. In [1], a successive over relaxation value iteration scheme is proposed to speed up the computation of the optimal value function. They propose a modified Bellman equation and prove the faster convergence to the optimal value function. However, in many practical applications, the model information is not known and we resort to Reinforcement Learning (RL) algorithms to obtain optimal policy and value function. One such popular algorithm is Q-Learning. In this paper, we propose Successive Over Relaxation (SOR) Q-Learning. We first derive a fixed point iteration for optimal Q-values based on [1] and utilize the stochastic approximation scheme to derive a learning algorithm to compute the optimal value function and an optimal policy. We then prove the convergence of the SOR Q-Learning to optimal Q-values. Finally, through numerical experiments, we show that SOR Q-Learning is faster compared to the Q-Learning algorithm.

Interpolation Consistency Training for Semi-Supervised Learning

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark datasets.

Functional Principal Component Analysis for Extrapolating Multi-stream Longitudinal Data

The advance of modern sensor technologies enables collection of multi-stream longitudinal data where multiple signals from different units are collected in real-time. In this article, we present a non-parametric approach to predict the evolution of multi-stream longitudinal data for an in-service unit through borrowing strength from other historical units. Our approach first decomposes each stream into a linear combination of eigenfunctions and their corresponding functional principal component (FPC) scores. A Gaussian process prior for the FPC scores is then established based on a functional semi-metric that measures similarities between streams of historical units and the in-service unit. Finally, an empirical Bayesian updating strategy is derived to update the established prior using real-time stream data obtained from the in-service unit. Experiments on synthetic and real world data show that the proposed framework outperforms state-of-the-art approaches and can effectively account for heterogeneity as well as achieve high predictive accuracy.

DeepPool: Distributed Model-free Algorithm for Ride-sharing using Deep Reinforcement Learning

The success of modern ride-sharing platforms crucially depends on the profit of the ride-sharing fleet operating companies, and how efficiently the resources are managed. Further, ride-sharing allows sharing costs and, hence, reduces the congestion and emission by making better use of vehicle capacities. In this work, we develop a distributed model-free, DeepPool, that uses deep Q-network (DQN) techniques to learn optimal dispatch policies by interacting with the environment. Further, DeepPool efficiently incorporates travel demand statistics and deep learning models to manage dispatching vehicles for improved ride sharing services. Using real-world dataset of taxi trip records in New York City, DeepPool performs better than other strategies, proposed in the literature, that do not consider ride sharing or do not dispatch the vehicles to regions where the future demand is anticipated. Finally, DeepPool can adapt rapidly to dynamic environments since it is implemented in a distributed manner in which each vehicle solves its own DQN individually without coordination.

GNN Explainer: A Tool for Post-hoc Explanation of Graph Neural Networks

Graph Neural Networks (GNNs) are a powerful tool for machine learning on graphs. GNNs combine node feature information with the graph structure by using neural networks to pass messages through edges in the graph. However, incorporating both graph structure and feature information leads to complex non-linear models and explaining predictions made by GNNs remains to be a challenging task. Here we propose GnnExplainer, a general model-agnostic approach for providing interpretable explanations for predictions of any GNN-based model on any graph-based machine learning task (node and graph classification, link prediction). In order to explain a given node’s predicted label, GnnExplainer provides a local interpretation by highlighting relevant features as well as an important subgraph structure by identifying the edges that are most relevant to the prediction. Additionally, the model provides single-instance explanations when given a single prediction as well as multi-instance explanations that aim to explain predictions for an entire class of instances/nodes. We formalize GnnExplainer as an optimization task that maximizes the mutual information between the prediction of the full model and the prediction of simplified explainer model. We experiment on synthetic as well as real-world data. On synthetic data we demonstrate that our approach is able to highlight relevant topological structures from noisy graphs. We also demonstrate GnnExplainer to provide a better understanding of pre-trained models on real-world tasks. GnnExplainer provides a variety of benefits, from the identification of semantically relevant structures to explain predictions to providing guidance when debugging faulty graph neural network models.

Rectangular Bounding Process

Stochastic partition models divide a multi-dimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity. Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a new parsimonious partition model — the Rectangular Bounding Process (RBP) — to efficiently partition multi-dimensional spaces, by employing a bounding strategy to enclose data points within rectangular bounding boxes. Unlike existing approaches, the RBP possesses several attractive theoretical properties that make it a powerful nonparametric partition prior on a hypercube. In particular, the RBP is self-consistent and as such can be directly extended from a finite hypercube to infinite (unbounded) space. We apply the RBP to regression trees and relational models as a flexible partition prior. The experimental results validate the merit of the RBP {in rich yet parsimonious expressiveness} compared to the state-of-the-art methods.

Fair Logistic Regression: An Adversarial Perspective

Fair prediction methods have primarily been built around existing classification techniques using pre-processing methods, post-hoc adjustments, reduction-based constructions, or deep learning procedures. We investigate a new approach to fair data-driven decision making by designing predictors with fairness requirements integrated into their core formulations. We augment a game-theoretic construction of the logistic regression model with fairness constraints, producing a novel prediction model that robustly and fairly minimizes the logarithmic loss. We demonstrate the advantages of our approach on a range of benchmark datasets for fairness.

Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Autonomous Robots

Modern cyber-physical systems (e.g., robotics systems) are typically composed of physical and software components, the characteristics of which are likely to change over time. Assumptions about parts of the system made at design time may not hold at run time, especially when a system is deployed for long periods (e.g., over decades). Self-adaptation is designed to find reconfigurations of systems to handle such run-time inconsistencies. Planners can be used to find and enact optimal reconfigurations in such an evolving context. However, for systems that are highly configurable, such planning becomes intractable due to the size of the adaptation space. To overcome this challenge, in this paper we explore an approach that (a) uses machine learning to find Pareto-optimal configurations without needing to explore every configuration and (b) restricts the search space to such configurations to make planning tractable. We explore this in the context of robot missions that need to consider task timeliness and energy consumption. An independent evaluation shows that our approach results in high-quality adaptation plans in uncertain and adversarial environments.

Asynchronous Federated Optimization

Federated learning enables training on a massive number of edge devices. To improve flexibility and scalability, we propose a new asynchronous federated optimization algorithm. We prove that the proposed approach has near-linear convergence to a global optimum, for both strongly and non-strongly convex problems, as well as a restricted family of non-convex problems. Empirical results show that the proposed algorithm converges fast and tolerates staleness.

DeepTagRec: A Content-cum-User based Tag Recommendation Framework for Stack Overflow

In this paper, we develop a content-cum-user based deep learning framework DeepTagRec to recommend appropriate question tags on Stack Overflow. The proposed system learns the content representation from question title and body. Subsequently, the learnt representation from heterogeneous relationship between user and tags is fused with the content representation for the final tag prediction. On a very large-scale dataset comprising half a million question posts, DeepTagRec beats all the baselines; in particular, it significantly outperforms the best performing baseline T agCombine achieving an overall gain of 60.8% and 36.8% in precision@3 and recall@10 respectively. DeepTagRec also achieves 63% and 33.14% maximum improvement in exact-k accuracy and top-k accuracy respectively over TagCombine

From Low-Level Events to Activities — A Session-Based Approach (Extended Version)

Process-Mining techniques aim to use event data about past executions to gain insight into how processes are executed. While these techniques are proven to be very valuable, they are less successful to reach their goal if the process is flexible and, hence, events can potentially occur in any order. Furthermore, information systems can record events at very low level, which do not match the high-level concepts known at business level. Without abstracting sequences of events to high-level concepts, the results of applying process mining (e.g., discovered models) easily become very complex and difficult to interpret, which ultimately means that they are of little use. A large body of research exists on event abstraction but typically a large amount of domain knowledge is required to be fed in, which is often not readily available. Other abstraction techniques are unsupervised, which give lower accuracy. This paper puts forward a technique that requires limited domain knowledge that can be easily provided. Traces are divided in sessions, and each session is abstracted as one single high-level activity execution. The abstraction is based on a combination of automatic clustering and visualization methods. The technique was assessed on two case studies that evidently exhibits a large amount of behavior. The results clearly illustrate the benefits of the abstraction to convey knowledge to stakeholders.