**GENO — GENeric Optimization for Classical Machine Learning**

Although optimization is the longstanding algorithmic backbone of machine learning, new models still require the time-consuming implementation of new solvers. As a result, there are thousands of implementations of optimization algorithms for machine learning problems. A natural question is, if it is always necessary to implement a new solver, or if there is one algorithm that is sufficient for most models. Common belief suggests that such a one-algorithm-fits-all approach cannot work, because this algorithm cannot exploit model specific structure and thus cannot be efficient and robust on a wide variety of problems. Here, we challenge this common belief. We have designed and implemented the optimization framework GENO (GENeric Optimization) that combines a modeling language with a generic solver. GENO generates a solver from the declarative specification of an optimization problem class. The framework is flexible enough to encompass most of the classical machine learning problems. We show on a wide variety of classical but also some recently suggested problems that the automatically generated solvers are (1) as efficient as well-engineered specialized solvers, (2) more efficient by a decent margin than recent state-of-the-art solvers, and (3) orders of magnitude more efficient than classical modeling language plus solver approaches.

**A multi-series framework for demand forecasts in E-commerce**

Sales forecasts are crucial for the E-commerce business. State-of-the-art techniques typically apply only univariate methods to make prediction for each series independently. However, due to the short nature of sales times series in E-commerce, univariate methods don’t apply well. In this article, we propose a global model which outperforms state-of-the-art models on real dataset. It is achieved by using Tree Boosting Methods that exploit non-linearity and cross-series information. We also proposed a preprocessing framework to overcome the inherent difficulties in the E-commerce data. In particular, we use different schemes to limit the impact of the volatility of the data.

**Time Series Anomaly Detection Using Convolutional Neural Networks and Transfer Learning**

Time series anomaly detection plays a critical role in automated monitoring systems. Most previous deep learning efforts related to time series anomaly detection were based on recurrent neural networks (RNN). In this paper, we propose a time series segmentation approach based on convolutional neural networks (CNN) for anomaly detection. Moreover, we propose a transfer learning framework that pretrains a model on a large-scale synthetic univariate time series data set and then fine-tunes its weights on small-scale, univariate or multivariate data sets with previously unseen classes of anomalies. For the multivariate case, we introduce a novel network architecture. The approach was tested on multiple synthetic and real data sets successfully.

**GSN: A Graph-Structured Network for Multi-Party Dialogues**

Existing neural models for dialogue response generation assume that utterances are sequentially organized. However, many real-world dialogues involve multiple interlocutors (i.e., multi-party dialogues), where the assumption does not hold as utterances from different interlocutors can occur ‘in parallel.’ This paper generalizes existing sequence-based models to a Graph-Structured neural Network (GSN) for dialogue modeling. The core of GSN is a graph-based encoder that can model the information flow along the graph-structured dialogues (two-party sequential dialogues are a special case). Experimental results show that GSN significantly outperforms existing sequence-based models.

**Augmenting Transfer Learning with Semantic Reasoning**

Transfer learning aims at building robust prediction models by transferring knowledge gained from one problem to another. In the semantic Web, learning tasks are enhanced with semantic representations. We exploit their semantics to augment transfer learning by dealing with when to transfer with semantic measurements and what to transfer with semantic embeddings. We further present a general framework that integrates the above measurements and embeddings with existing transfer learning algorithms for higher performance. It has demonstrated to be robust in two real-world applications: bus delay forecasting and air quality forecasting.

**Explainability Techniques for Graph Convolutional Networks**

Graph Networks are used to make decisions in potentially complex scenarios but it is usually not obvious how or why they made them. In this work, we study the explainability of Graph Network decisions using two main classes of techniques, gradient-based and decomposition-based, on a toy dataset and a chemistry task. Our study sets the ground for future development as well as application to real-world problems.

**Do Human Rationales Improve Machine Explanations?**

Work on ‘learning with rationales’ shows that humans providing explanations to a machine learning system can improve the system’s predictive accuracy. However, this work has not been connected to work in ‘explainable AI’ which concerns machines explaining their reasoning to humans. In this work, we show that learning with rationales can also improve the quality of the machine’s explanations as evaluated by human judges. Specifically, we present experiments showing that, for CNN- based text classification, explanations generated using ‘supervised attention’ are judged superior to explanations generated using normal unsupervised attention.

**Pre-Training Graph Neural Networks for Generic Structural Feature Extraction**

Graph neural networks (GNNs) are shown to be successful in modeling applications with graph structures. However, training an accurate GNN model requires a large collection of labeled data and expressive features, which might be inaccessible for some applications. To tackle this problem, we propose a pre-training framework that captures generic graph structural information that is transferable across tasks. Our framework can leverage the following three tasks: 1) denoising link reconstruction, 2) centrality score ranking, and 3) cluster preserving. The pre-training procedure can be conducted purely on the synthetic graphs, and the pre-trained GNN is then adapted for downstream applications. With the proposed pre-training procedure, the generic structural information is learned and preserved, thus the pre-trained GNN requires less amount of labeled data and fewer domain-specific features to achieve high performance on different downstream tasks. Comprehensive experiments demonstrate that our proposed framework can significantly enhance the performance of various tasks at the level of node, link, and graph.

**End to end learning and optimization on graphs**

Real-world applications often combine learning and optimization problems on graphs. For instance, our objective may be to cluster the graph in order to detect meaningful communities (or solve other common graph optimization problems such as facility location, maxcut, and so on). However, graphs or related attributes are often only partially observed, introducing learning problems such as link prediction which must be solved prior to optimization. We propose an approach to integrate a differentiable proxy for common graph optimization problems into training of machine learning models for tasks such as link prediction. This allows the model to focus specifically on the downstream task that its predictions will be used for. Experimental results show that our end-to-end system obtains better performance on example optimization tasks than can be obtained by combining state of the art link prediction methods with expert-designed graph optimization algorithms.

**High Dimensional Classification via Empirical Risk Minimization: Improvements and Optimality**

In this article, we investigate a family of classification algorithms defined by the principle of empirical risk minimization, in the high dimensional regime where the feature dimension and data number are both large and comparable. Based on recent advances in high dimensional statistics and random matrix theory, we provide under mixture data model a unified stochastic characterization of classifiers learned with different loss functions. Our results are instrumental to an in-depth understanding as well as practical improvements on this fundamental classification approach. As the main outcome, we demonstrate the existence of a universally optimal loss function which yields the best high dimensional performance at any given ratio.