# If you did not already know

Stacked Bidirectional LSTM/GRU Network
Videos have become ubiquitous on the Internet. And video analysis can provide lots of information for detecting and recognizing objects as well as help people understand human actions and interactions with the real world. However, facing data as huge as TB level, effective methods should be applied. Recurrent neural network (RNN) architecture has wildly been used on many sequential learning problems such as Language Model, Time-Series Analysis, etc. In this paper, we propose some variations of RNN such as stacked bidirectional LSTM/GRU network with attention mechanism to categorize large-scale video data. We also explore different multimodal fusion methods. Our model combines both visual and audio information on both video and frame level and received great result. Ensemble methods are also applied. Because of its multimodal characteristics, we decide to call this method Deep Multimodal Learning(DML). Our DML-based model was trained on Google Cloud and our own server and was tested in a well-known video classification competition on Kaggle held by Google. …

RecLab
Different software tools have been developed with the purpose of performing offline evaluations of recommender systems. However, the results obtained with these tools may be not directly comparable because of subtle differences in the experimental protocols and metrics. Furthermore, it is difficult to analyze in the same experimental conditions several algorithms without disclosing their implementation details. For these reasons, we introduce RecLab, an open source software for evaluating recommender systems in a distributed fashion. By relying on consolidated web protocols, we created RESTful APIs for training and querying recommenders remotely. In this way, it is possible to easily integrate into the same toolkit algorithms realized with different technologies. In details, the experimenter can perform an evaluation by simply visiting a web interface provided by RecLab. The framework will then interact with all the selected recommenders and it will compute and display a comprehensive set of measures, each representing a different metric. The results of all experiments are permanently stored and publicly available in order to support accountability and comparative analyses. …

Metropolized Knockoff Sampling
Model-X knockoffs is a wrapper that transforms essentially any feature importance measure into a variable selection algorithm, which discovers true effects while rigorously controlling the expected fraction of false positives. A frequently discussed challenge to apply this method is to construct knockoff variables, which are synthetic variables obeying a crucial exchangeability property with the explanatory variables under study. This paper introduces techniques for knockoff generation in great generality: we provide a sequential characterization of all possible knockoff distributions, which leads to a Metropolis-Hastings formulation of an exact knockoff sampler. We further show how to use conditional independence structure to speed up computations. Combining these two threads, we introduce an explicit set of sequential algorithms and empirically demonstrate their effectiveness. Our theoretical analysis proves that our algorithms achieve near-optimal computational complexity in certain cases. The techniques we develop are sufficiently rich to enable knockoff sampling in challenging models including cases where the covariates are continuous and heavy-tailed, and follow a graphical model such as the Ising model. …

Model-Based Value Expansion
Recent model-free reinforcement learning algorithms have proposed incorporating learned dynamics models as a source of additional data with the intention of reducing sample complexity. Such methods hold the promise of incorporating imagined data coupled with a notion of model uncertainty to accelerate the learning of continuous control tasks. Unfortunately, they rely on heuristics that limit usage of the dynamics model. We present model-based value expansion, which controls for uncertainty in the model by only allowing imagination to fixed depth. By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning. …

# Document worth reading: “Lecture Notes: Temporal Point Processes and the Conditional Intensity Function”

These short lecture notes contain a not too technical introduction to point processes on the time line. The focus lies on defining these processes using the conditional intensity function. Furthermore, likelihood inference, methods of simulation and residual analysis for temporal point processes specified by a conditional intensity function are considered. Lecture Notes: Temporal Point Processes and the Conditional Intensity Function

# Document worth reading: “Reinforcement Learning in Healthcare: A Survey”

As a subfield of machine learning, \emph{reinforcement learning} (RL) aims at empowering one’s capabilities in behavioural decision making by using interaction experience with the world and an evaluative feedback. Unlike traditional supervised learning methods that usually rely on one-shot, exhaustive and supervised reward signals, RL tackles with sequential decision making problems with sampled, evaluative and delayed feedback simultaneously. Such distinctive features make RL technique a suitable candidate for developing powerful solutions in a variety of healthcare domains, where diagnosing decisions or treatment regimes are usually characterized by a prolonged and sequential procedure. This survey will discuss the broad applications of RL techniques in healthcare domains, in order to provide the research community with systematic understanding of theoretical foundations, enabling methods and techniques, existing challenges, and new insights of this emerging paradigm. By first briefly examining theoretical foundations and key techniques in RL research from efficient and representational directions, we then provide an overview of RL applications in a variety of healthcare domains, ranging from dynamic treatment regimes in chronic diseases and critical care, automated medical diagnosis from both unstructured and structured clinical data, as well as many other control or scheduling domains that have infiltrated many aspects of a healthcare system. Finally, we summarize the challenges and open issues in current research, and point out some potential solutions and directions for future research. Reinforcement Learning in Healthcare: A Survey

# If you did not already know

Fair Forest
The potential lack of fairness in the outputs of machine learning algorithms has recently gained attention both within the research community as well as in society more broadly. Surprisingly, there is no prior work developing tree-induction algorithms for building fair decision trees or fair random forests. These methods have widespread popularity as they are one of the few to be simultaneously interpretable, non-linear, and easy-to-use. In this paper we develop, to our knowledge, the first technique for the induction of fair decision trees. We show that our ‘Fair Forest’ retains the benefits of the tree-based approach, while providing both greater accuracy and fairness than other alternatives, for both ‘group fairness’ and ‘individual fairness.” We also introduce new measures for fairness which are able to handle multinomial and continues attributes as well as regression problems, as opposed to binary attributes and labels only. Finally, we demonstrate a new, more robust evaluation procedure for algorithms that considers the dataset in its entirety rather than only a specific protected attribute. …

Extreme View Synthesis
We present Extreme View Synthesis, a solution for novel view extrapolation when the number of input images is small. Occlusions and depth uncertainty, in this context, are two of the most pressing issues, and worsen as the degree of extrapolation increases. State-of-the-art methods approach this problem by leveraging explicit geometric constraints, or learned priors. Our key insight is that only by modeling both depth uncertainty and image priors can the extreme cases be solved. We first generate a depth probability volume for the novel view and synthesize an estimate of the sought image. Then, we use learned priors combined with depth uncertainty, to refine it. Our method is the first to show visually pleasing results for baseline magnifications of up to 30X. …

Heuristics Allied with Distant Supervision (HAnDS)
Fine-grained Entity Recognition (FgER) is the task of detecting and classifying entity mentions to a large set of types spanning diverse domains such as biomedical, finance and sports. We observe that when the type set spans several domains, detection of entity mention becomes a limitation for supervised learning models. The primary reason being lack of dataset where entity boundaries are properly annotated while covering a large spectrum of entity types. Our work directly addresses this issue. We propose Heuristics Allied with Distant Supervision (HAnDS) framework to automatically construct a quality dataset suitable for the FgER task. HAnDS framework exploits the high interlink among Wikipedia and Freebase in a pipelined manner, reducing annotation errors introduced by naively using distant supervision approach. Using HAnDS framework, we create two datasets, one suitable for building FgER systems recognizing up to 118 entity types based on the FIGER type hierarchy and another for up to 1115 entity types based on the TypeNet hierarchy. Our extensive empirical experimentation warrants the quality of the generated datasets. Along with this, we also provide a manually annotated dataset for benchmarking FgER systems. …

Generalized Strucutral Causal Model (GSCM)
Structural causal models are a popular tool to describe causal relations in systems in many fields such as economy, the social sciences, and biology. In this work, we show that these models are not flexible enough in general to give a complete causal representation of equilibrium states in dynamical systems that do not have a unique stable equilibrium independent of initial conditions. We prove that our proposed generalized structural causal models do capture the essential causal semantics that characterize these systems. We illustrate the power and flexibility of this extension on a dynamical system corresponding to a basic enzymatic reaction. We motivate our approach further by showing that it also efficiently describes the effects of interventions on functional laws such as the ideal gas law. …

# If you did not already know

Stochastic Conjugate Gradient Algorithm with Variance Reduction (CGVR)
Conjugate gradient methods are a class of important methods for solving linear equations and nonlinear optimization. In our work, we propose a new stochastic conjugate gradient algorithm with variance reduction (CGVR) and prove its linear convergence with the Fletcher and Revves method for strongly convex and smooth functions. We experimentally demonstrate that the CGVR algorithm converges faster than its counterparts for six large-scale optimization problems that may be convex, non-convex or non-smooth, and its AUC (Area Under Curve) performance with $L2$-regularized $L2$-loss is comparable to that of LIBLINEAR but with significant improvement in computational efficiency. …

Continuation Multiple Instance Learning (C-MIL)
Weakly supervised object detection (WSOD) is a challenging task when provided with image category supervision but required to simultaneously learn object locations and object detectors. Many WSOD approaches adopt multiple instance learning (MIL) and have non-convex loss functions which are prone to get stuck into local minima (falsely localize object parts) while missing full object extent during training. In this paper, we introduce a continuation optimization method into MIL and thereby creating continuation multiple instance learning (C-MIL), with the intention of alleviating the non-convexity problem in a systematic way. We partition instances into spatially related and class related subsets, and approximate the original loss function with a series of smoothed loss functions defined within the subsets. Optimizing smoothed loss functions prevents the training procedure falling prematurely into local minima and facilitates the discovery of Stable Semantic Extremal Regions (SSERs) which indicate full object extent. On the PASCAL VOC 2007 and 2012 datasets, C-MIL improves the state-of-the-art of weakly supervised object detection and weakly supervised object localization with large margins. …

Feedback Particle Filter (FPF)
A new formulation of the particle filter for nonlinear filtering is presented, based on concepts from optimal control, and from the mean-field game theory. The optimal control is chosen so that the posterior distribution of a particle matches as closely as possible the posterior distribution of the true state given the observations. This is achieved by introducing a cost function, defined by the Kullback-Leibler (K-L) divergence between the actual posterior, and the posterior of any particle. The optimal control input is characterized by a certain Euler-Lagrange (E-L) equation, and is shown to admit an innovation error-based feedback structure. For diffusions with continuous observations, the value of the optimal control solution is ideal. The two posteriors match exactly, provided they are initialized with identical priors. The feedback particle filter is defined by a family of stochastic systems, each evolving under this optimal control law. A numerical algorithm is introduced and implemented in two general examples, and a neuroscience application involving coupled oscillators. Some preliminary numerical comparisons between the feed- back particle filter and the bootstrap particle filter are described.
Error Analysis of the Stochastic Linear Feedback Particle Filter

Batch Sampling
Deep Neural Networks (DNNs) thrive in recent years in which Batch Normalization (BN) plays an indispensable role. However, it has been observed that BN is costly due to the reduction operations. In this paper, we propose alleviating this problem through sampling only a small fraction of data for normalization at each iteration. Specifically, we model it as a statistical sampling problem and identify that by sampling less correlated data, we can largely reduce the requirement of the number of data for statistics estimation in BN, which directly simplifies the reduction operations. Based on this conclusion, we propose two sampling strategies, ‘Batch Sampling’ (randomly select several samples from each batch) and ‘Feature Sampling’ (randomly select a small patch from each feature map of all samples), that take both computational efficiency and sample correlation into consideration. Furthermore, we introduce an extremely simple variant of BN, termed as Virtual Dataset Normalization (VDN), that can normalize the activations well with few synthetical random samples. All the proposed methods are evaluated on various datasets and networks, where an overall training speedup by up to 20% on GPU is practically achieved without the support of any specialized libraries, and the loss on accuracy and convergence rate are negligible. Finally, we extend our work to the ‘micro-batch normalization’ problem and yield comparable performance with existing approaches at the case of tiny batch size. …

# Document worth reading: “Deep Learning”

Deep learning (DL) is a high dimensional data reduction technique for constructing high-dimensional predictors in input-output models. DL is a form of machine learning that uses hierarchical layers of latent features. In this article, we review the state-of-the-art of deep learning from a modeling and algorithmic perspective. We provide a list of successful areas of applications in Artificial Intelligence (AI), Image Processing, Robotics and Automation. Deep learning is predictive in its nature rather then inferential and can be viewed as a black-box methodology for high-dimensional function estimation. Deep Learning

# Document worth reading: “Above the Clouds: A Brief Survey”

Cloud Computing is a versatile technology that can support a broad-spectrum of applications. The low cost of cloud computing and its dynamic scaling renders it an innovation driver for small companies, particularly in the developing world. Cloud deployed enterprise resource planning (ERP), supply chain management applications (SCM), customer relationship management (CRM) applications, medical applications, business applications and mobile applications have potential to reach millions of users. In this paper, we explore the different concepts involved in cloud computing and we also examine clouds from technical aspects. We highlight some of the opportunities in cloud computing underlining the importance of clouds showing why that technology must succeed and we have provided additional cloud computing problems that businesses may need to address. Finally, we discuss some of the issues that this area should deal with. Above the Clouds: A Brief Survey

# If you did not already know

Session-based Recommendation with Graph Neural Network (SR-GNN)
The problem of session-based recommendation aims to predict users’ actions based on anonymous sessions. Previous methods on the session-based recommendation most model a session as a sequence and capture users’ preference to make recommendations. Though achieved promising results, they fail to consider the complex items transitions among all session sequences, and are insufficient to obtain accurate users’ preference in the session. To better capture the structure of the user-click sessions and take complex transitions of items into account, we propose a novel method, i.e. Session-based Recommendation with Graph Neural Networks, SR-GNN for brevity. In the proposed method, session sequences are aggregated together and modeled as graph-structure data. Based on this graph, GNN can capture complex transitions of items, which are difficult to be revealed by the conventional sequential methods. Each session is then represented as the composition of the global preference and current interests of the session using an attention network. Extensive experiments conducted on two real datasets show that SR-GNN evidently outperforms the state-of-the-art session-based recommendation methods and always obtain stable performance with different connection schemes, session representations, and session lengths. …

GENESYS
Modern deep learning systems rely on (a) a hand-tuned neural network topology, (b) massive amounts of labeled training data, and (c) extensive training over large-scale compute resources to build a system that can perform efficient image classification or speech recognition. Unfortunately, we are still far away from implementing adaptive general purpose intelligent systems which would need to learn autonomously in unknown environments and may not have access to some or any of these three components. Reinforcement learning and evolutionary algorithm (EA) based methods circumvent this problem by continuously interacting with the environment and updating the models based on obtained rewards. However, deploying these algorithms on ubiquitous autonomous agents at the edge (robots/drones) demands extremely high energy-efficiency due to (i) tight power and energy budgets, (ii) continuous/lifelong interaction with the environment, (iii) intermittent or no connectivity to the cloud to run heavy-weight processing. To address this need, we present GENESYS, an HW-SW prototype of an EA-based learning system, that comprises a closed loop learning engine called EvE and an inference engine called ADAM. EvE can evolve the topology and weights of neural networks completely in hardware for the task at hand, without requiring hand-optimization or backpropagation training. ADAM continuously interacts with the environment and is optimized for efficiently running the irregular neural networks generated by EvE. GENESYS identifies and leverages multiple unique avenues of parallelism unique to EAs that we term ‘gene’- level parallelism, and ‘population’-level parallelism. We ran GENESYS with a suite of environments from OpenAI gym and observed 2-5 orders of magnitude higher energy-efficiency over state-of-the-art embedded and desktop CPU and GPU systems. …

Pedometrics
Pedometrics is a branch of soil science that applies mathematical and statistical methods for the study of the distribution and genesis of soils. The goal of pedometrics is to achieve a better understanding of the soil as a phenomenon that varies over different scales in space and time. This understanding is important, both for improved soil management and for our scientific appreciation of the soil and the systems (agronomic, ecological and hydrological) of which it is a part. For this reason much of pedometrics is concerned with predicting the properties of the soil in space and time, with sampling and monitoring the soil and with modelling the soil’s behaviour. Pedometricians are typically engaged in developing and applying quantitative methods to apply to these problems. These include geostatistical methods for spatial prediction, sampling designs and strategies, linear modelling methods and novel mathematical and computational techniques such as wavelet transforms, data mining and fuzzy logic. …

Hereditary Independence Gap
The independence gap of a graph was introduced by Ekim et al. (2018) as a measure of how far a graph is from being well-covered. It is defined as the difference between the maximum and minimum size of a maximal independent set. We investigate the independence gap of a graph from structural and algorithmic points of view, with a focus on classes of perfect graphs. Generalizing results on well-covered graphs due to Dean and Zito (1994) and Hujdurovi\’c et al. (2018), we express the independence gap of a perfect graph in terms of clique partitions and use this characterization to develop a polynomial-time algorithm for recognizing graphs of constant independence gap in any class of perfect graphs of bounded clique number. Next, we introduce a hereditary variant of the parameter, which we call hereditary independence gap and which measures the maximum independence gap over all induced subgraphs of the graph. We show that determining whether a given graph has hereditary independence gap at most $k$ is polynomial-time solvable if $k$ is fixed and co-NP-complete if $k$ is part of input. We also investigate the complexity of the independent set problem in graph classes related to independence gap, showing that the problem is NP-complete in the class of graphs of independence gap at most one and polynomial-time solvable in any class of graphs with bounded hereditary independence gap. Combined with some known results on claw-free graphs, our results imply that the independent domination problem is solvable in polynomial time. …

# Document worth reading: “Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions”

A recent flurry of research activity has attempted to quantitatively define ‘fairness’ for decisions based on statistical and machine learning (ML) predictions. The rapid growth of this new field has led to wildly inconsistent terminology and notation, presenting a serious challenge for cataloguing and comparing definitions. This paper attempts to bring much-needed order. First, we explicate the various choices and assumptions made—often implicitly—to justify the use of prediction-based decisions. Next, we show how such choices and assumptions can raise concerns about fairness and we present a notationally consistent catalogue of fairness definitions from the ML literature. In doing so, we offer a concise reference for thinking through the choices, assumptions, and fairness considerations of prediction-based decision systems. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions

# If you did not already know

Tree Tensor Network (TTN)
Matrix product states (MPS), a tensor network designed for one-dimensional quantum systems, has been recently proposed for generative modeling of natural data (such as images) in terms of `Born machine’. However, the exponential decay of correlation in MPS restricts its representation power heavily for modeling complex data such as natural images. In this work, we push forward the effort of applying tensor networks to machine learning by employing the Tree Tensor Network (TTN) which exhibits balanced performance in expressibility and efficient training and sampling. We design the tree tensor network to utilize the 2-dimensional prior of the natural images and develop sweeping learning and sampling algorithms which can be efficiently implemented utilizing Graphical Processing Units (GPU). We apply our model to random binary patterns and the binary MNIST datasets of handwritten digits. We show that TTN is superior to MPS for generative modeling in keeping correlation of pixels in natural images, as well as giving better log-likelihood scores in standard datasets of handwritten digits. We also compare its performance with state-of-the-art generative models such as the Variational AutoEncoders, Restricted Boltzmann machines, and PixelCNN. Finally, we discuss the future development of Tensor Network States in machine learning problems. …

PyTorch-Kaldi Speech Recognition Toolkit
The availability of open-source software is playing a remarkable role in the popularization of speech recognition and deep learning. Kaldi, for instance, is nowadays an established framework used to develop state-of-the-art speech recognizers. PyTorch is used to build neural networks with the Python language and has recently spawn tremendous interest within the machine learning community thanks to its simplicity and flexibility. The PyTorch-Kaldi project aims to bridge the gap between these popular toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch-Kaldi is not only a simple interface between these software, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug-in user-defined acoustic models. As an alternative, users can exploit several pre-implemented neural networks that can be customized using intuitive configuration files. PyTorch-Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly-released along with a rich documentation and is designed to properly work locally or on HPC clusters. Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers. …

In a graph convolutional network, we assume that the graph $G$ is generated with respect to some observation noise. We make small random perturbations $\Delta{}G$ of the graph and try to improve generalization. Based on quantum information geometry, we can have quantitative measurements on the scale of $\Delta{}G$. We try to maximize the intrinsic scale of the permutation with a small budget while minimizing the loss based on the perturbed $G+\Delta{G}$. Our proposed model can consistently improve graph convolutional networks on semi-supervised node classification tasks with reasonable computational overhead. We present two different types of geometry on the manifold of graphs: one is for measuring the intrinsic change of a graph; the other is for measuring how such changes can affect externally a graph neural network. These new analytical tools will be useful in developing a good understanding of graph neural networks and fostering new techniques. …