We overview results on the topic of Poisson approximation that are missed in existing surveys. The topic of Poisson approximation to the distribution of a sum of integer-valued random variables is presented as well. We do not restrict ourselves to a particular method, and overview the whole range of issues including the general limit theorem, estimates of the accuracy of approximation, asymptotic expansions, etc. Related results on the accuracy of compound Poisson approximation are presented as well. We indicate a number of open problems and discuss directions of further research.

**A Fast Distributed Asynchronous Newton-Based Optimization Algorithm**

One of the most important problems in the field of distributed optimization is the problem of minimizing a sum of local convex objective functions over a networked system. Most of the existing work in this area focus on developing distributed algorithms in a synchronous setting under the presence of a central clock, where the agents need to wait for the slowest one to finish the update, before proceeding to the next iterate. Asynchronous distributed algorithms remove the need for a central coordinator, reduce the synchronization wait, and allow some agents to compute faster and execute more iterations. In the asynchronous setting, the only known algorithms for solving this problem could achieve either linear or sublinear rate of convergence. In this work, we built upon the existing literature to develop and analyze an asynchronous Newton-based method to solve a penalized version of the problem. We show that this algorithm guarantees almost sure convergence with global linear and local quadratic rate in expectation. Numerical studies confirm superior performance of our algorithm against other asynchronous methods.

**An Adaptive Weighted Deep Forest Classifier**

A modification of the confidence screening mechanism based on adaptive weighing of every training instance at each cascade level of the Deep Forest is proposed. The idea underlying the modification is very simple and stems from the confidence screening mechanism idea proposed by Pang et al. to simplify the Deep Forest classifier by means of updating the training set at each level in accordance with the classification accuracy of every training instance. However, if the confidence screening mechanism just removes instances from training and testing processes, then the proposed modification is more flexible and assigns weights by taking into account the classification accuracy. The modification is similar to the AdaBoost to some extent. Numerical experiments illustrate good performance of the proposed modification in comparison with the original Deep Forest proposed by Zhou and Feng.

**Sheaves: A Topological Approach to Big Data**

This document develops general concepts useful for extracting knowledge embedded in large graphs or datasets that have pair-wise relationships, such as cause-effect-type relations. Almost no underlying assumptions are made, other than that the data can be presented in terms of pair-wise relationships between objects/events. This assumption is used to mine for patterns in the dataset, defining a reduced graph or dataset that boils-down or concentrates information into a more compact form. The resulting extracted structure or set of patterns are manifestly symbolic in nature, as they capture and encode the graph structure of the dataset in terms of a (generative) grammar. This structure is identified as having the formal mathematical structure of a sheaf. In essence, this paper introduces the basic concepts of sheaf theory into the domain of graphical datasets.

**Graph Neural Networks with distributed ARMA filters**

Recent graph neural networks implement convolutional layers based on polynomial filters operating in the spectral domain. In this paper, we propose a novel graph convolutional layer based on auto-regressive moving average (ARMA) filters that, compared to the polynomial ones, provides a more flexible response thanks to a rich transfer function that accounts for the concept of state. We implement the ARMA filter with a recursive and distributed formulation, obtaining a convolutional layer that is efficient to train, it is localized in the node space and can be applied to graphs with different topologies. In order to learn more abstract and compressed representations in deeper layers of the network, we alternate pooling operations based on node decimation with convolutions on coarsened versions of the original graph. We consider three major graph inference problems: semi-supervised node classification, graph classification, and graph signal classification. Results show that the proposed network with ARMA filters outperform those based on polynomial filters and defines the new state-of-the-art in several tasks.

**Deep Reinforcement Learning for Imbalanced Classification**

Data in real-world application often exhibit skewed class distribution which poses an intense challenge for machine learning. Conventional classification algorithms are not effective in the case of imbalanced data distribution, and may fail when the data distribution is highly imbalanced. To address this issue, we propose a general imbalanced classification model based on deep reinforcement learning. We formulate the classification problem as a sequential decision-making process and solve it by deep Q-learning network. The agent performs a classification action on one sample at each time step, and the environment evaluates the classification action and returns a reward to the agent. The reward from minority class sample is larger so the agent is more sensitive to the minority class. The agent finally finds an optimal classification policy in imbalanced data under the guidance of specific reward function and beneficial learning environment. Experiments show that our proposed model outperforms the other imbalanced classification algorithms, and it can identify more minority samples and has great classification performance.

**PoincarĂ© Wasserstein Autoencoder**

This work presents a reformulation of the recently proposed Wasserstein autoencoder framework on a non-Euclidean manifold, the Poincar\’e ball model of the hyperbolic space. By assuming the latent space to be hyperbolic, we can use its intrinsic hierarchy to impose structure on the learned latent space representations. We demonstrate the model in the visual domain to analyze some of its properties and show competitive results on a graph link prediction task.

The Artificial Neural Networks (ANNs) have been originally designed to function like a biological neural network, but does an ANN really work in the same way as a biological neural network? As we know, the human brain holds information in its memory cells, so if the ANNs use the same model as our brains, they should store datasets in a similar manner. The most popular type of ANN architecture is based on a layered structure of neurons, whereas a human brain has trillions of complex interconnections of neurons continuously establishing new connections, updating existing ones, and removing the irrelevant connections across different parts of the brain. In this paper, we propose a novel approach to building ANNs which are truly inspired by the biological network containing a mesh of subnets controlled by a central mechanism. A subnet is a network of neurons that hold the dataset values. We attempt to address the following fundamental questions: (1) What is the architecture of the ANN model? Whether the layered architecture is the most appropriate choice? (2) Whether a neuron is a process or a memory cell? (3) What is the best way of interconnecting neurons and what weight-assignment mechanism should be used? (4) How to incorporate prior knowledge, bias, and generalizations for features extraction and prediction? Our proposed ANN architecture leverages the accuracy on textual data and our experimental findings confirm the effectiveness of our model. We also collaborate with the construction of the ANN model for storing and processing the images.

**LanczosNet: Multi-Scale Deep Graph Convolutional Networks**

We propose the Lanczos network (LanczosNet), which uses the Lanczos algorithm to construct low rank approximations of the graph Laplacian for graph convolution. Relying on the tridiagonal decomposition of the Lanczos algorithm, we not only efficiently exploit multi-scale information via fast approximated computation of matrix power but also design learnable spectral filters. Being fully differentiable, LanczosNet facilitates both graph kernel learning as well as learning node embeddings. We show the connection between our LanczosNet and graph based manifold learning methods, especially the diffusion maps. We benchmark our model against several recent deep graph networks on citation networks and QM8 quantum chemistry dataset. Experimental results show that our model achieves the state-of-the-art performance in most tasks.

**What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning**

Long-term planning poses a major difficulty to many reinforcement learning algorithms. This problem becomes even more pronounced in dynamic visual environments. In this work we propose Hierarchical Planning and Reinforcement Learning (HIP-RL), a method for merging the benefits and capabilities of Symbolic Planning with the learning abilities of Deep Reinforcement Learning. We apply HIPRL to the complex visual tasks of interactive question answering and visual semantic planning and achieve state-of-the-art results on three challenging datasets all while taking fewer steps at test time and training in fewer iterations. Sample results can be found at youtu.be/0TtWJ_0mPfI

**Understanding the (un)interpretability of natural image distributions using generative models**

Probability density estimation is a classical and well studied problem, but standard density estimation methods have historically lacked the power to model complex and high-dimensional image distributions. More recent generative models leverage the power of neural networks to implicitly learn and represent probability models over complex images. We describe methods to extract explicit probability density estimates from GANs, and explore the properties of these image density functions. We perform sanity check experiments to provide evidence that these probabilities are reasonable. However, we also show that density functions of natural images are difficult to interpret and thus limited in use. We study reasons for this lack of interpretability, and show that we can get interpretability back by doing density estimation on latent representations of images.

**Sparse estimation for case-control studies with multiple subtypes of cases**

The analysis of case-control studies with several subtypes of cases is increasingly common, e.g. in cancer epidemiology. For matched designs, we show that a natural strategy is based on a stratified conditional logistic regression model. Then, to account for the potential homogeneity among the subtypes of cases, we adapt the ideas of data shared lasso, which has been recently proposed for the estimation of regression models in a stratified setting. For unmatched designs, we compare two standard methods based on L1-norm penalized multinomial logistic regression. We describe formal connections between these two approaches, from which practical guidance can be derived. We show that one of these approaches, which is based on a symmetric formulation of the multinomial logistic regression model, actually reduces to a data shared lasso version of the other. Consequently, the relative performance of the two approaches critically depends on the level of homogeneity that exists among the subtypes of cases: more precisely, when homogeneity is moderate to high, the non-symmetric formulation with controls as the reference is not recommended. Empirical results obtained from synthetic data are presented, which confirm the benefit of properly accounting for potential homogeneity under both matched and unmatched designs. We also present preliminary results from the analysis a case-control study nested within the EPIC cohort, where the objective is to identify metabolites associated with the occurrence of subtypes of breast cancer.

The linear Support Vector Machine (SVM) is one of the most popular binary classification techniques in machine learning. Motivated by applications in modern high dimensional statistics, we consider penalized SVM problems involving the minimization of a hinge-loss function with a convex sparsity-inducing regularizer such as: the L1-norm on the coefficients, its grouped generalization and the sorted L1-penalty (aka Slope). Each problem can be expressed as a Linear Program (LP) and is computationally challenging when the number of features and/or samples is large — the current state of algorithms for these problems is rather nascent when compared to the usual L2-regularized linear SVM. To this end, we propose new computational algorithms for these LPs by bringing together techniques from (a) classical column (and constraint) generation methods and (b) first order methods for non-smooth convex optimization – techniques that are rarely used together for solving large scale LPs. These components have their respective strengths; and while they are found to be useful as separate entities, they have not been used together in the context of solving large scale LPs such as the ones studied herein. Our approach complements the strengths of (a) and (b) — leading to a scheme that seems to outperform commercial solvers as well as specialized implementations for these problems by orders of magnitude. We present numerical results on a series of real and synthetic datasets demonstrating the surprising effectiveness of classic column/constraint generation methods in the context of challenging LP-based machine learning tasks.