# If you did not already know

Shallow Triple Stream Three-Dimensional CNN (STSTNet)
In the recent year, the state-of-the-arts of facial micro-expression recognition task have been significantly advanced by the emergence of data-driven approaches based on deep learning. Due to the superb learning capacity of deep learning, it generates promising performance beyond the traditional handcrafted approaches. Recently, many researchers have focused on developing better networks by increasing its depth, as deep networks can effectively approximate certain function classes more efficiently than shallow ones. In this paper, we aim to design a shallow network to extract the high level features of the micro-expression details. Specifically, a two-layer neural network, namely Shallow Triple Stream Three-dimensional CNN (STSTNet) is proposed. The network is capable to learn the features from three optical flow features (i.e., optical strain, horizontal and vertical optical flow images) computed from the onset and apex frames from each video. Our experimental results demonstrate the viability of the proposed STSTNet, which exhibits the UAR recognition results of 76.05%, 70.13%, 86.86% and 68.10% in composite, SMIC, CASME II and SAMM databases, respectively. …

Hubness
The tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest-neighbor lists of other points. Semantically Aligned Bias Reducing Zero Shot Learning
Semantically Aligned Bias Reducing Zero Shot Learning

Generating high-quality and interpretable adversarial examples in the text domain is a much more daunting task than it is in the image domain. This is due partly to the discrete nature of text, partly to the problem of ensuring that the adversarial examples are still probable and interpretable, and partly to the problem of maintaining label invariance under input perturbations. In order to address some of these challenges, we introduce sparse projected gradient descent (SPGD), a new approach to crafting interpretable adversarial examples for text. SPGD imposes a directional regularization constraint on input perturbations by projecting them onto the directions to nearby word embeddings with highest cosine similarities. This constraint ensures that perturbations move each word embedding in an interpretable direction (i.e., towards another nearby word embedding). Moreover, SPGD imposes a sparsity constraint on perturbations at the sentence level by ignoring word-embedding perturbations whose norms are below a certain threshold. This constraint ensures that our method changes only a few words per sequence, leading to higher quality adversarial examples. Our experiments with the IMDB movie review dataset show that the proposed SPGD method improves adversarial example interpretability and likelihood (evaluated by average per-word perplexity) compared to state-of-the-art methods, while suffering little to no loss in training performance. …

NeuronBlocks
NeuronBlocks is a NLP deep learning modeling toolkit that helps engineers/researchers to build end-to-end pipelines for neural network model training for NLP tasks. The main goal of this toolkit is to minimize developing cost for NLP deep neural network model building, including both training and inference stages. For more details, please check our paper: NeuronBlocks — Building Your NLP DNN Models Like Playing Lego at https://…/1904.09535. NeuronBlocks consists of two major components: Block Zoo and Model Zoo.
• In Block Zoo, we provide commonly used neural network components as building blocks for model architecture design.
• In Model Zoo, we provide a suite of NLP models for common NLP tasks, in the form of JSON configuration files. …

# If you did not already know

Granger Causality Network
We present a new framework for learning Granger causality networks for multivariate categorical time series, based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, non-identifiability, and presence of many local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the application of MTD to high-dimensional multivariate time series. As a baseline, we also formulate a multi-output logistic autoregressive model (mLTD), which while a straightforward extension of autoregressive Bernoulli generalized linear models, has not been previously applied to the analysis of multivariate categorial time series. We develop novel identifiability conditions of the MTD model and compare them to those for mLTD. We further devise novel and efficient optimization algorithm for the MTD based on the new convex formulation, and compare the MTD and mLTD in both simulated and real data experiments. Our approach simultaneously provides a comparison of methods for network inference in categorical time series and opens the door to modern, regularized inference with the MTD model. …

Personalized Evolving Model for Social Network and Opinion (PENO)
Network dynamics has always been a meaningful topic deserving exploration in the realm of academy. previous network models contain two parts: (1) generating structure as per user property; (2) changing property as per network structure. Properties in these models, however, cannot be interpreted to concept in prevalent social theories or empirical truth. Also, they usually treat everyone in an uniform fashion. While such assumption is quite misguiding, and thus saliently limits their performance. To overcome these flaws, we devise a personalized evolving model for social network and opinion (PENO), where citizens’ ideology is revealed by variable opinions and four dimensions of personality are considered for each entity – leadership, openness, agreeableness, and neuroticism. Opinion propagates via social tie, tie generates from opinion affinity, and personalities integrally work with opinion and tie across evolution. To our best knowledge, PENO is the first attempt to introduce personality impact in network dynamics and verify social science with reasonable visualization during simulation. We also present its probabilistic graph and conceive iterative learning algorithm. Experiments show PENO outperforms several state-of-the-art baselines over two typical prediction tasks – congress voting prediction for legislative bills and friendship prediction on a book-commenting website. Finally, we discuss its scalability to do multi-task learning and transfer learning in daily scenarios. …

Eligibility Traces
Eligibility traces are one of the basic mechanisms of reinforcement learning. For example, in the popular TD(lambda) algorithm, the lambda refers to the use of an eligibility trace. Almost any temporal-difference (TD) method, such as Q-learning or Sarsa, can be combined with eligibility traces to obtain a more general method that may learn more efficiently.
There are two ways to view eligibility traces. The more theoretical view, which we emphasize here, is that they are a bridge from TD to Monte Carlo methods. When TD methods are augmented with eligibility traces, they produce a family of methods spanning a spectrum that has Monte Carlo methods at one end and one-step TD methods at the other. In between are intermediate methods that are often better than either extreme method. In this sense eligibility traces unify TD and Monte Carlo methods in a valuable and revealing way.
The other way to view eligibility traces is more mechanistic. From this perspective, an eligibility trace is a temporary record of the occurrence of an event, such as the visiting of a state or the taking of an action. The trace marks the memory parameters associated with the event as eligible for undergoing learning changes. When a TD error occurs, only the eligible states or actions are assigned credit or blame for the error. Thus, eligibility traces help bridge the gap between events and training information. Like TD methods themselves, eligibility traces are a basic mechanism for temporal credit assignment. …

Wasserstein CNN (WCNN)
Heterogeneous face recognition (HFR) aims to match facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR is a much more challenging problem than traditional face recognition because of large intra-class variations of heterogeneous face images and limited training samples of cross-modality face image pairs. This paper proposes a novel approach namely Wasserstein CNN (convolutional neural networks, or WCNN for short) to learn invariant features between near-infrared and visual face images (i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained with widely available face images in visual spectrum. The high-level layer is divided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer. The first two layers aims to learn modality-specific features and NIR-VIS shared layer is designed to learn modality-invariant feature subspace. Wasserstein distance is introduced into NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. So W-CNN learning aims to achieve the minimization of Wasserstein distance between NIR distribution and VIS distribution for invariant deep feature representation of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected layers of WCNN network to reduce parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at training stage and an efficient computation for heterogeneous data at testing stage. Extensive experiments on three challenging NIR-VIS face recognition databases demonstrate the significant superiority of Wasserstein CNN over state-of-the-art methods. …

# If you did not already know

MetricGAN
Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores. To overcome this issue, we propose a novel MetricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics. Moreover, based on MetricGAN, the metric scores of the generated data can also be arbitrarily specified by users. We tested the proposed MetricGAN on a speech enhancement task, which is particularly suitable to verify the proposed approach because there are multiple metrics measuring different aspects of speech signals. Moreover, these metrics are generally complex and could not be fully optimized by Lp or conventional adversarial losses. …

DI-VST
The encoder-decoder dialog model is one of the most prominent methods used to build dialog systems in complex domains. Yet it is limited because it cannot output interpretable actions as in traditional systems, which hinders humans from understanding its generation process. We present an unsupervised discrete sentence representation learning method that can integrate with any existing encoder-decoder dialog models for interpretable response generation. Building upon variational autoencoders (VAEs), we present two novel models, DI-VAE and DI-VST that improve VAEs and can discover interpretable semantics via either auto encoding or context predicting. Our methods have been validated on real-world dialog datasets to discover semantic representations and enhance encoder-decoder models with interpretable generation. …

MF-MI-Greedy
How can we efficiently gather information to optimize an unknown function, when presented with multiple, mutually dependent information sources with different costs? For example, when optimizing a robotic system, intelligently trading off computer simulations and real robot testings can lead to significant savings. Existing methods, such as multi-fidelity GP-UCB or Entropy Search-based approaches, either make simplistic assumptions on the interaction among different fidelities or use simple heuristics that lack theoretical guarantees. In this paper, we study multi-fidelity Bayesian optimization with complex structural dependencies among multiple outputs, and propose MF-MI-Greedy, a principled algorithmic framework for addressing this problem. In particular, we model different fidelities using additive Gaussian processes based on shared latent structures with the target function. Then we use cost-sensitive mutual information gain for efficient Bayesian global optimization. We propose a simple notion of regret which incorporates the cost of different fidelities, and prove that MF-MI-Greedy achieves low regret. We demonstrate the strong empirical performance of our algorithm on both synthetic and real-world datasets. …

Suggestion Mining
We propose a formal definition for the task of suggestion mining in the context of a wide range of open domain applications. Human perception of the term suggestion is subjective and this effects the preparation of hand labeled datasets for the task of suggestion mining. Existing work either lacks a formal problem definition and annotation procedure, or provides domain and application specific definitions. Moreover, many previously used manually labeled datasets remain proprietary. We first present an annotation study, and based on our observations propose a formal task definition and annotation procedure for creating benchmark datasets for suggestion mining. With this study, we also provide publicly available labeled datasets for suggestion mining in multiple domains. …

# If you did not already know

Sentient Enterprise
The continued explosion of data and the continued evolution of analytics capabilities might usher in the next analytics revolution beyond the Intelligent Enterprise. The evolution of analytics capabilities towards an ideal state that is called ‘The Sentient Enterprise’. The Sentient Enterprise is an enterprise that can listen to data, conduct analysis and make autonomous decisions at massive scale in real-time. The Sentient Enterprise can listen to data to sense micro-trends. It can act as one organism without being impeded by information silos. It can make autonomous decisions with little or no human intervention. It is always evolving, with emergent intelligence that becomes progressively more sophisticated.
http://…/1317004

Quantized Epoch Stochastic Gradient Descent (QESGD)
Due to its efficiency and ease to implement, stochastic gradient descent (SGD) has been widely used in machine learning. In particular, SGD is one of the most popular optimization methods for distributed learning. Recently, quantized SGD (QSGD), which adopts quantization to reduce the communication cost in SGD-based distributed learning, has attracted much attention. Although several QSGD methods have been proposed, some of them are heuristic without theoretical guarantee, and others have high quantization variance which makes the convergence become slow. In this paper, we propose a new method, called Quantized Epoch-SGD (QESGD), for communication-efficient distributed learning. QESGD compresses (quantizes) the parameter with variance reduction, so that it can get almost the same performance as that of SGD with less communication cost. QESGD is implemented on the Parameter Server framework, and empirical results on distributed deep learning show that QESGD can outperform other state-of-the-art quantization methods to achieve the best performance. …

Wasserstein Identity Testing Problem
Uniformity testing and the more general identity testing are well studied problems in distributional property testing. Most previous work focuses on testing under $L_1$-distance. However, when the support is very large or even continuous, testing under $L_1$-distance may require a huge (even infinite) number of samples. Motivated by such issues, we consider the identity testing in Wasserstein distance (a.k.a. transportation distance and earthmover distance) on a metric space (discrete or continuous). In this paper, we propose the Wasserstein identity testing problem (Identity Testing in Wasserstein distance). We obtain nearly optimal worst-case sample complexity for the problem. Moreover, for a large class of probability distributions satisfying the so-called ‘Doubling Condition’, we provide nearly instance-optimal sample complexity. …

3D BAT
In this paper, we focus on obtaining 2D and 3D labels, as well as track IDs for objects on the road with the help of a novel 3D Bounding Box Annotation Toolbox (3D BAT). Our open source, web-based 3D BAT incorporates several smart features to improve usability and efficiency. For instance, this annotation toolbox supports semi-automatic labeling of tracks using interpolation, which is vital for downstream tasks like tracking, motion planning and motion prediction. Moreover, annotations for all camera images are automatically obtained by projecting annotations from 3D space into the image domain. In addition to the raw image and point cloud feeds, a Masterview consisting of the top view (bird’s-eye-view), side view and front views is made available to observe objects of interest from different perspectives. Comparisons of our method with other publicly available annotation tools reveal that 3D annotations can be obtained faster and more efficiently by using our toolbox. …

# If you did not already know

Learning directed acyclic graphs (DAGs) from data is a challenging task both in theory and in practice, because the number of possible DAGs scales superexponentially with the number of nodes. In this paper, we study the problem of learning an optimal DAG from continuous observational data. We cast this problem in the form of a mathematical programming model which can naturally incorporate a super-structure in order to reduce the set of possible candidate DAGs. We use the penalized negative log-likelihood score function with both $\ell_0$ and $\ell_1$ regularizations and propose a new mixed-integer quadratic optimization (MIQO) model, referred to as a layered network (LN) formulation. The LN formulation is a compact model, which enjoys as tight an optimal continuous relaxation value as the stronger but larger formulations under a mild condition. Computational results indicate that the proposed formulation outperforms existing mathematical formulations and scales better than available algorithms that can solve the same problem with only $\ell_1$ regularization. In particular, the LN formulation clearly outperforms existing methods in terms of computational time needed to find an optimal DAG in the presence of a sparse super-structure. …

Deep Image Retargeting (DeepIR)
We present \emph{Deep Image Retargeting} (\emph{DeepIR}), a coarse-to-fine framework for content-aware image retargeting. Our framework first constructs the semantic structure of input image with a deep convolutional neural network. Then a uniform re-sampling that suits for semantic structure preserving is devised to resize feature maps to target aspect ratio at each feature layer. The final retargeting result is generated by coarse-to-fine nearest neighbor field search and step-by-step nearest neighbor field fusion. We empirically demonstrate the effectiveness of our model with both qualitative and quantitative results on widely used RetargetMe dataset. …

Neural Differential Equations
DiffEqFlux.jl is a library for fusing neural networks and differential equations. In this work we describe differential equations from the viewpoint of data science and discuss the complementary nature between machine learning models and differential equations. We demonstrate the ability to incorporate DifferentialEquations.jl-defined differential equation problems into a Flux-defined neural network, and vice versa. The advantages of being able to use the entire DifferentialEquations.jl suite for this purpose is demonstrated by counter examples where simple integration strategies fail, but the sophisticated integration strategies provided by the DifferentialEquations.jl library succeed. This is followed by a demonstration of delay differential equations and stochastic differential equations inside of neural networks. We show high-level functionality for defining neural ordinary differential equations (neural networks embedded into the differential equation) and describe the extra models in the Flux model zoo which includes neural stochastic differential equations. We conclude by discussing the various adjoint methods used for backpropogation of the differential equation solvers. DiffEqFlux.jl is an important contribution to the area, as it allows the full weight of the differential equation solvers developed from decades of research in the scientific computing field to be readily applied to the challenges posed by machine learning and data science. …

Block Neural Autoregressive Flow (B-NAF)
Normalising flows (NFS) map two density functions via a differentiable bijection whose Jacobian determinant can be computed efficiently. Recently, as an alternative to hand-crafted bijections, Huang et al. (2018) proposed neural autoregressive flow (NAF) which is a universal approximator for density functions. Their flow is a neural network (NN) whose parameters are predicted by another NN. The latter grows quadratically with the size of the former and thus an efficient technique for parametrization is needed. We propose block neural autoregressive flow (B-NAF), a much more compact universal approximator of density functions, where we model a bijection directly using a single feed-forward network. Invertibility is ensured by carefully designing each affine transformation with block matrices that make the flow autoregressive and (strictly) monotone. We compare B-NAF to NAF and other established flows on density estimation and approximate inference for latent variable models. Our proposed flow is competitive across datasets while using orders of magnitude fewer parameters. …

# If you did not already know

BINet
In this paper, we introduce BINet, a neural network architecture for real-time multi-perspective anomaly detection in business process event logs. BINet is designed to handle both the control flow and the data perspective of a business process. Additionally, we propose a set of heuristics for setting the threshold of an anomaly detection algorithm automatically. We demonstrate that BINet can be used to detect anomalies in event logs not only on a case level but also on event attribute level. Finally, we demonstrate that a simple set of rules can be used to utilize the output of BINet for anomaly classification. We compare BINet to eight other state-of-the-art anomaly detection algorithms and evaluate their performance on an elaborate data corpus of 29 synthetic and 15 real-life event logs. BINet outperforms all other methods both on the synthetic as well as on the real-life datasets. …

Exponential-Generalized Truncated Logarithmic (EGTL)
In this paper, we introduce a new two-parameter lifetime distribution, called the exponential-generalized truncated logarithmic (EGTL) distribution, by compounding the exponential and generalized truncated logarithmic distributions. Our procedure generalizes the exponential-logarithmic (EL) distribution modelling the reliability of systems by the use of first-order concepts, where the minimum lifetime is considered (Tahmasbi 2008). In our approach, we assume that a system fails if a given number k of the components fails and then, we consider the kth-smallest value of lifetime instead of the minimum lifetime. The reliability and failure rate functions as well as their properties are presented for some special cases. The estimation of the parameters is attained by the maximum likelihood, the expectation maximization algorithm, the method of moments and the Bayesian approach, with a simulation study performed to illustrate the different methods of estimation. The application study is illustrated based on two real data sets used in many applications of reliability. …

Predictability, Computability, and Stability (PCS)
We propose the predictability, computability, and stability (PCS) framework to extract reproducible knowledge from data that can guide scientific hypothesis generation and experimental design. The PCS framework builds on key ideas in machine learning, using predictability as a reality check and evaluating computational considerations in data collection, data storage, and algorithm design. It augments PC with an overarching stability principle, which largely expands traditional statistical uncertainty considerations. In particular, stability assesses how results vary with respect to choices (or perturbations) made across the data science life cycle, including problem formulation, pre-processing, modeling (data and algorithm perturbations), and exploratory data analysis (EDA) before and after modeling. Furthermore, we develop PCS inference to investigate the stability of data results and identify when models are consistent with relatively simple phenomena. We compare PCS inference with existing methods, such as selective inference, in high-dimensional sparse linear model simulations to demonstrate that our methods consistently outperform others in terms of ROC curves over a wide range of simulation settings. Finally, we propose a PCS documentation based on Rmarkdown, iPython, or Jupyter Notebook, with publicly available, reproducible codes and narratives to back up human choices made throughout an analysis. The PCS workflow and documentation are demonstrated in a genomics case study available on Zenodo. …

ProjectionNet
Deep neural networks have become ubiquitous for applications related to visual recognition and language understanding tasks. However, it is often prohibitive to use typical neural networks on devices like mobile phones or smart watches since the model sizes are huge and cannot fit in the limited memory available on such devices. While these devices could make use of machine learning models running on high-performance data centers with CPUs or GPUs, this is not feasible for many applications because data can be privacy sensitive and inference needs to be performed directly ‘on’ device. We introduce a new architecture for training compact neural networks using a joint optimization framework. At its core lies a novel objective that jointly trains using two different types of networks–a full trainer neural network (using existing architectures like Feed-forward NNs or LSTM RNNs) combined with a simpler ‘projection’ network that leverages random projections to transform inputs or intermediate representations into bits. The simpler network encodes lightweight and efficient-to-compute operations in bit space with a low memory footprint. The two networks are trained jointly using backpropagation, where the projection network learns from the full network similar to apprenticeship learning. Once trained, the smaller network can be used directly for inference at low memory and computation cost. We demonstrate the effectiveness of the new approach at significantly shrinking the memory requirements of different types of neural networks while preserving good accuracy on visual recognition and text classification tasks. We also study the question ‘how many neural bits are required to solve a given task?’ using the new framework and show empirical results contrasting model predictive capacity (in bits) versus accuracy on several datasets. …

# If you did not already know

Microsoft Machine Learning for Apache Spark (MMLSpark)
We introduce Microsoft Machine Learning for Apache Spark (MMLSpark), an ecosystem of enhancements that expand the Apache Spark distributed computing library to tackle problems in Deep Learning, Micro-Service Orchestration, Gradient Boosting, Model Interpretability, and other areas of modern computation. Furthermore, we present a novel system called Spark Serving that allows users to run any Apache Spark program as a distributed, sub-millisecond latency web service backed by their existing Spark Cluster. All MMLSpark contributions have the same API to enable simple composition across frameworks and usage across batch, streaming, and RESTful web serving scenarios on static, elastic, or serverless clusters. We showcase MMLSpark by creating a method for deep object detection capable of learning without human labeled data and demonstrate its effectiveness for Snow Leopard conservation. …

ExperTwin
Even though the advent of the Web coupled with powerful search engines has empowered the knowledge workers to quickly find the needed information, it still is a time-consuming operation. Presently there are no readily available tools that can create and maintain an up-to-date personal knowledge base that can be readily consulted when needed. While organizing the entire Web as a semantic network is a long-term goal, creation of a semantic network of personal knowledge sources that are continuously updated by crawlers and other devices is an attainable task. We created an app titled ExperTwin, that collects personally relevant knowledge units (known as JANs) from the Web, Email correspondence, and locally stored files, organize them as a semantic network that can be easily queried and visualized in many formats – just in time – when performing a knowledge-based task. The architecture of ExperTwin is based on the model of a ‘Society of Intelligent Agents’, where each agent is responsible for a specific task. Collection of JANs from multiple sources, establishing the relevancy, and creation of the personal semantic network are some of the many tasks performed by the individual agents. Tensorflow and Natural Language Processing (NLP) tools have been implemented to let ExperTwin learn from users. Document the design and deployment of ExperTwin as a ‘Knowledge Advantage Machine’ able to search for relevant information while performing a knowledge-based task, is the main goal of the research presented in this post. …

Continuous Lagrangian Reachability (CLRT)
We introduce continuous Lagrangian reachability (CLRT), a new algorithm for the computation of a tight and continuous-time reachtube for the solution flows of a nonlinear, time-variant dynamical system. CLRT employs finite strain theory to determine the deformation of the solution set from time $t_i$ to time $t_{i+1}$. We have developed simple explicit analytic formulas for the optimal metric for this deformation; this is superior to prior work, which used semi-definite programming. CLRT also uses infinitesimal strain theory to derive an optimal time increment $h_i$ between $t_i$ and $t_{i+1}$, nonlinear optimization to minimally bloat (i.e., using a minimal radius) the state set at time $t_i$ such that it includes all the states of the solution flow in the interval $[t_i,t_{i+1}]$. We use $\delta$-satisfiability to ensure the correctness of the bloating. Our results on a series of benchmarks show that CLRT performs favorably compared to state-of-the-art tools such as CAPD in terms of the continuous reachtube volumes they compute. …

Fully Convolutional two-Stream Fusion Network (FCTSFN)
In this paper, we propose a novel fully convolutional two-stream fusion network (FCTSFN) for interactive image segmentation. The proposed network includes two sub-networks: a two-stream late fusion network (TSLFN) that predicts the foreground at a reduced resolution, and a multi-scale refining network (MSRN) that refines the foreground at full resolution. The TSLFN includes two distinct deep streams followed by a fusion network. The intuition is that, since user interactions are more direction information on foreground/background than the image itself, the two-stream structure of the TSLFN reduces the number of layers between the pure user interaction features and the network output, allowing the user interactions to have a more direct impact on the segmentation result. The MSRN fuses the features from different layers of TSLFN with different scales, in order to seek the local to global information on the foreground to refine the segmentation result at full resolution. We conduct comprehensive experiments on four benchmark datasets. The results show that the proposed network achieves competitive performance compared to current state-of-the-art interactive image segmentation methods. …

# If you did not already know

Surrogate Objective Function
Influence maximization has found applications in a wide range of real-world problems, for instance, viral marketing of products in an online social network, and information propagation of valuable information such as job vacancy advertisements and health-related information. While existing algorithmic techniques usually aim at maximizing the total number of people influenced, the population often comprises several socially salient groups, e.g., based on gender or race. As a result, these techniques could lead to disparity across different groups in receiving important information. Furthermore, in many of these applications, the spread of influence is time-critical, i.e., it is only beneficial to be influenced before a time deadline. As we show in this paper, the time-criticality of the information could further exacerbate the disparity of influence across groups. This disparity, introduced by algorithms aimed at maximizing total influence, could have far-reaching consequences, impacting people’s prosperity and putting minority groups at a big disadvantage. In this work, we propose a notion of group fairness in time-critical influence maximization. We introduce surrogate objective functions to solve the influence maximization problem under fairness considerations. By exploiting the submodularity structure of our objectives, we provide computationally efficient algorithms with guarantees that are effective in enforcing fairness during the propagation process. We demonstrate the effectiveness of our approach through synthetic and real-world experiments. …

Max of Weighed Distance (MWD)
Adversarial attacks add perturbations to the input features with the intent of changing the classification produced by a machine learning system. Small perturbations can yield adversarial examples which are misclassified despite being virtually indistinguishable from the unperturbed input. Classifiers trained with standard neural network techniques are highly susceptible to adversarial examples, allowing an adversary to create misclassifications of their choice. We introduce a new type of network unit, called MWD (max of weighed distance) units that have a built-in resistant to adversarial attacks. These units are highly non-linear, and we develop the techniques needed to effectively train them. We show that simple interval techniques for propagating perturbation effects through the network enables the efficient computation of robustness (i.e., accuracy guarantees) for MWD networks under any perturbations, including adversarial attacks. MWD networks are significantly more robust to input perturbations than ReLU networks. On permutation invariant MNIST, when test examples can be perturbed by 20% of the input range, MWD networks provably retain accuracy above 83%, while the accuracy of ReLU networks drops below 5%. The provable accuracy of MWD networks is superior even to the observed accuracy of ReLU networks trained with the help of adversarial examples. In the absence of adversarial attacks, MWD networks match the performance of sigmoid networks, and have accuracy only slightly below that of ReLU networks. …

Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are useful for many practical tasks in machine learning. Synaptic weights, as well as neuron activation functions within the deep network are typically stored with high-precision formats, e.g. 32 bit floating point. However, since storage capacity is limited and each memory access consumes power, both storage capacity and memory access are two crucial factors in these networks. Here we present a method and present the ADaPTION toolbox to extend the popular deep learning library Caffe to support training of deep CNNs with reduced numerical precision of weights and activations using fixed point notation. ADaPTION includes tools to measure the dynamic range of weights and activations. Using the ADaPTION tools, we quantized several CNNs including VGG16 down to 16-bit weights and activations with only 0.8% drop in Top-1 accuracy. The quantization, especially of the activations, leads to increase of up to 50% of sparsity especially in early and intermediate layers, which we exploit to skip multiplications with zero, thus performing faster and computationally cheaper inference. …

Automatic BAyesian Changepoints Under Sparsity (ABACUS)
Change detection involves segmenting sequential data such that observations in the same segment share some desired properties. Multivariate change detection continues to be a challenging problem due to the variety of ways change points can be correlated across channels and the potentially poor signal-to-noise ratio on individual channels. In this paper, we are interested in locating additive outliers (AO) and level shifts (LS) in the unsupervised setting. We propose ABACUS, Automatic BAyesian Changepoints Under Sparsity, a Bayesian source separation technique to recover latent signals while also detecting changes in model parameters. Multi-level sparsity achieves both dimension reduction and modeling of signal changes. We show ABACUS has competitive or superior performance in simulation studies against state-of-the-art change detection methods and established latent variable models. We also illustrate ABACUS on two real application, modeling genomic profiles and analyzing household electricity consumption. …

# If you did not already know

MMALFM
Although the latent factor model achieves good accuracy in rating prediction, it suffers from many problems including cold-start, non-transparency, and suboptimal results for individual user-item pairs. In this paper, we exploit textual reviews and item images together with ratings to tackle these limitations. Specifically, we first apply a proposed multi-modal aspect-aware topic model (MATM) on text reviews and item images to model users’ preferences and items’ features from different aspects, and also estimate the aspect importance of a user towards an item. Then the aspect importance is integrated into a novel aspect-aware latent factor model (ALFM), which learns user’s and item’s latent factors based on ratings. In particular, ALFM introduces a weight matrix to associate those latent factors with the same set of aspects in MATM, such that the latent factors could be used to estimate aspect ratings. Finally, the overall rating is computed via a linear combination of the aspect ratings, which are weighted by the corresponding aspect importance. To this end, our model could alleviate the data sparsity problem and gain good interpretability for recommendation. Besides, every aspect rating is weighted by its aspect importance, which is dependent on the targeted user’s preferences and the targeted item’s features. Therefore, it is expected that the proposed method can model a user’s preferences on an item more accurately for each user-item pair. Comprehensive experimental studies have been conducted on the Yelp 2017 Challenge dataset and Amazon product datasets to demonstrate the effectiveness of our method. …

Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the $9$ largest language editions. The dataset contains yearly snapshots of the network and spans $17$ years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset. …

Apache Oozie
Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts). …

Semi-Implicit Generator (SIG)
To combine explicit and implicit generative models, we introduce semi-implicit generator (SIG) as a flexible hierarchical model that can be trained in the maximum likelihood framework. Both theoretically and experimentally, we demonstrate that SIG can generate high quality samples especially when dealing with multi-modality. By introducing SIG as an unbiased regularizer to the generative adversarial network (GAN), we show the interplay between maximum likelihood and adversarial learning can stabilize the adversarial training, resist the notorious mode collapsing problem of GANs, and improve the diversity of generated random samples. …

# If you did not already know

Landmark Retracing Network (LRN)
Since convolutional neural network (CNN) lacks an inherent mechanism to handle large scale variations, we always need to compute feature maps multiple times for multi-scale object detection, which has the bottleneck of computational cost in practice. To address this, we devise a recurrent scale approximation (RSA) to compute feature map once only, and only through this map can we approximate the rest maps on other levels. At the core of RSA is the recursive rolling out mechanism: given an initial map on a particular scale, it generates the prediction on a smaller scale that is half the size of input. To further increase efficiency and accuracy, we (a): design a scale-forecast network to globally predict potential scales in the image since there is no need to compute maps on all levels of the pyramid. (b): propose a landmark retracing network (LRN) to retrace back locations of the regressed landmarks and generate a confidence score for each landmark; LRN can effectively alleviate false positives due to the accumulated error in RSA. The whole system could be trained end-to-end in a unified CNN framework. Experiments demonstrate that our proposed algorithm is superior against state-of-the-arts on face detection benchmarks and achieves comparable results for generic proposal generation. The source code of RSA is available at github.com/sciencefans/RSA-for-object-detection. …

Wavelet Convolutional Neural Network
Spatial and spectral approaches are two major approaches for image processing tasks such as image classification and object recognition. Among many such algorithms, convolutional neural networks (CNNs) have recently achieved significant performance improvement in many challenging tasks. Since CNNs process images directly in the spatial domain, they are essentially spatial approaches. Given that spatial and spectral approaches are known to have different characteristics, it will be interesting to incorporate a spectral approach into CNNs. We propose a novel CNN architecture, wavelet CNNs, which combines a multiresolution analysis and CNNs into one model. Our insight is that a CNN can be viewed as a limited form of a multiresolution analysis. Based on this insight, we supplement missing parts of the multiresolution analysis via wavelet transform and integrate them as additional components in the entire architecture. Wavelet CNNs allow us to utilize spectral information which is mostly lost in conventional CNNs but useful in most image processing tasks. We evaluate the practical performance of wavelet CNNs on texture classification and image annotation. The experiments show that wavelet CNNs can achieve better accuracy in both tasks than existing models while having significantly fewer parameters than conventional CNNs. …

SEquential Subspace OPtimization (SESOP)
A merger of two optimization frameworks is introduced: SEquential Subspace OPtimization (SESOP) with the MultiGrid (MG) optimization. At each iteration of the combined algorithm, search directions implied by the coarse-grid correction process of MG are added to the low dimensional search-spaces of SESOP, which include the (preconditioned) gradient and search directions involving the previous iterates (so-called history). The resulting accelerated technique is called SESOP-MG. The {\color{black} asymptotic convergence rate} of the two-level version of SESOP-MG (dubbed SESOP-TG) is studied via Fourier mode analysis for linear problems (i.e., optimization of quadratic functionals). Numerical tests on linear and nonlinear {\color{black} problems} demonstrate the effectiveness of the approach. …

Fused Gromov-Wasserstein Distance
Optimal transport has recently gained a lot of interest in the machine learning community thanks to its ability to compare probability distributions while respecting the underlying space’s geometry. Wasserstein distance deals with feature information through its metric or cost function, but fails in exploiting the structural information, i.e the specific relations existing among the components of the distribution. Recently adapted to a machine learning context, the Gromov-Wasserstein distance defines a metric well suited for comparing distributions that live in different metric spaces by exploiting their inner structural information. In this paper we propose a new optimal transport distance, called the Fused Gromov-Wasserstein distance, capable of leveraging both structural and feature information by combining both views and prove its metric properties over very general manifolds. We also define the barycenter of structured objects as their Fr\’echet mean, leveraging both feature and structural information. We illustrate the versatility of the method for problems where structured objects are involved, computing barycenters in graph and time series contexts. We also use this new distance for graph classification where we obtain comparable or superior results than state-of-the-art graph kernel methods and end-to-end graph CNN approach. …