If you did not already know

Neural Entropic Estimation google
We point out a limitation of the mutual information neural estimation (MINE) where the network fails to learn at the initial training phase, leading to slow convergence in the number of training iterations. To solve this problem, we propose a faster method called the mutual information neural entropic estimation (MI-NEE). Our solution first generalizes MINE to estimate the entropy using a custom reference distribution. The entropy estimate can then be used to estimate the mutual information. We argue that the seemingly redundant intermediate step of entropy estimation allows one to improve the convergence by an appropriate reference distribution. In particular, we show that MI-NEE reduces to MINE in the special case when the reference distribution is the product of marginal distributions, but faster convergence is possible by choosing the uniform distribution as the reference distribution instead. Compared to the product of marginals, the uniform distribution introduces more samples in low-density regions and fewer samples in high-density regions, which appear to lead to an overall larger gradient for faster convergence. …

Word Encoded Sequence Transducer (WEST) google
Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights. This is a bottle-neck in memory constraint on-device training applications like federated learning and on-device inference applications like automatic speech recognition (ASR). One way of compressing the embedding and softmax layers is to substitute larger units such as words with smaller sub-units such as characters. However, often the sub-unit models perform poorly compared to the larger unit models. We propose WEST, an algorithm for encoding categorical features and output classes with a sequence of random or domain dependent sub-units and demonstrate that this transduction can lead to significant compression without compromising performance. WEST bridges the gap between larger unit and sub-unit models and can be interpreted as a MaxEnt model over sub-unit features, which can be of independent interest. …

Graph-Structured Contrastive Loss google
We present a fully-supervized method for learning to segment data structured by an adjacency graph. We introduce the graph-structured contrastive loss, a loss function structured by a ground truth segmentation. It promotes learning vertex embeddings which are homogeneous within desired segments, and have high contrast at their interface. Thus, computing a piecewise-constant approximation of such embeddings produces a graph-partition close to the objective segmentation. This loss is fully backpropagable, which allows us to learn vertex embeddings with deep learning algorithms. We evaluate our methods on a 3D point cloud oversegmentation task, defining a new state-of-the-art by a large margin. These results are based on the published work of Landrieu and Boussaha 2019. …

LSH Count google
An important question that arises in the study of high dimensional vector representations learned from data is: given a set $\mathcal{D}$ of vectors and a query $q$, estimate the number of points within a specified distance threshold of $q$. We develop two estimators, LSH Count and Multi-Probe Count that use locality sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and sample complexity of our schemes, and demonstrate their effectiveness in experiments on a standard word embedding dataset. …

If you did not already know

Automatic Derivation Machine google
This paper presents an artificial intelligence algorithm that can be used to derive formulas from various scientific disciplines called automatic derivation machine. First, the formula is abstractly expressed as a multiway tree model, and then each step of the formula derivation transformation is abstracted as a mapping of multiway trees. Derivation steps similar can be expressed as a reusable formula template by a multiway tree map. After that, the formula multiway tree is eigen-encoded to feature vectors construct the feature space of formulas, the Q-learning model using in this feature space can achieve the derivation by making training data from derivation process. Finally, an automatic formula derivation machine is made to choose the next derivation step based on the current state and object. We also make an example about the nuclear reactor physics problem to show how the automatic derivation machine works. …

Sorites Paradox google
The sorites paradox (sometimes known as the paradox of the heap) is a paradox that arises from vague predicates. A typical formulation involves a heap of sand, from which grains are individually removed. Under the assumption that removing a single grain does not turn a heap into a non-heap, the paradox is to consider what happens when the process is repeated enough times: is a single remaining grain still a heap? If not, when did it change from a heap to a non-heap? …

NLP Transformer google
The Transformer in NLP is a novel architecture that aims to solve sequence-to-sequence tasks while handling long-range dependencies with ease. The Transformer was proposed in the paper Attention Is All You Need. It is recommended reading for anyone interested in NLP. …

Stochastic Neural Architecture Search (SNAS) google
We propose Stochastic Neural Architecture Search (SNAS), an economical end-to-end solution to Neural Architecture Search (NAS) that trains neural operation parameters and architecture distribution parameters in same round of back-propagation, while maintaining the completeness and differentiability of the NAS pipeline. In this work, NAS is reformulated as an optimization problem on parameters of a joint distribution for the search space in a cell. To leverage the gradient information in generic differentiable loss for architecture search, a novel search gradient is proposed. We prove that this search gradient optimizes the same objective as reinforcement-learning-based NAS, but assigns credits to structural decisions more efficiently. This credit assignment is further augmented with locally decomposable reward to enforce a resource-efficient constraint. In experiments on CIFAR-10, SNAS takes less epochs to find a cell architecture with state-of-the-art accuracy than non-differentiable evolution-based and reinforcement-learning-based NAS, which is also transferable to ImageNet. It is also shown that child networks of SNAS can maintain the validation accuracy in searching, with which attention-based NAS requires parameter retraining to compete, exhibiting potentials to stride towards efficient NAS on big datasets. …

If you did not already know

Relational Network google
Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human performance; text-based question answering using the bAbI suite of tasks; and complex reasoning about dynamic physical systems. Then, using a curated dataset called Sort-of-CLEVR we show that powerful convolutional networks do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs. Our work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations.
“Recurrent Relational Network”
Recurrent Relational Networks for Complex Relational Reasoning


Attention-Guided Generative Adversarial Network (AGGAN) google
The state-of-the-art approaches in Generative Adversarial Networks (GANs) are able to learn a mapping function from one image domain to another with unpaired image data. However, these methods often produce artifacts and can only be able to convert low-level information, but fail to transfer high-level semantic part of images. The reason is mainly that generators do not have the ability to detect the most discriminative semantic part of images, which thus makes the generated images with low-quality. To handle the limitation, in this paper we propose a novel Attention-Guided Generative Adversarial Network (AGGAN), which can detect the most discriminative semantic object and minimize changes of unwanted part for semantic manipulation problems without using extra data and models. The attention-guided generators in AGGAN are able to produce attention masks via a built-in attention mechanism, and then fuse the input image with the attention mask to obtain a target image with high-quality. Moreover, we propose a novel attention-guided discriminator which only considers attended regions. The proposed AGGAN is trained by an end-to-end fashion with an adversarial loss, cycle-consistency loss, pixel loss and attention loss. Both qualitative and quantitative results demonstrate that our approach is effective to generate sharper and more accurate images than existing models. …

Answer Set Programming – Reinforcement Learning (ASP(RL)) google
Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing non-stationary domains. …

MyCaffe google
Over the past few years Caffe, from Berkeley AI Research, has gained a strong following in the deep learning community with over 15K forks on the github.com/BLVC/Caffe site. With its well organized, very modular C++ design it is easy to work with and very fast. However, in the world of Windows development, C# has helped accelerate development with many of the enhancements that it offers over C++, such as garbage collection, a very rich .NET programming framework and easy database access via Entity Frameworks. So how can a C# developer use the advances of C# to take full advantage of the benefits offered by the Berkeley Caffe deep learning system? The answer is the fully open source, ‘MyCaffe’ for Windows .NET programmers. MyCaffe is an open source, complete C# language re-write of Berkeley’s Caffe. This article describes the general architecture of MyCaffe including the newly added MyCaffeTrainerRL for Reinforcement Learning. In addition, this article discusses how MyCaffe closely follows the C++ Caffe, while talking efficiently to the low level NVIDIA CUDA hardware to offer a high performance, highly programmable deep learning system for Windows .NET programmers. …

If you did not already know

Online Meta-Learning google
A central capability of intelligent systems is the ability to continuously build upon previous experiences to speed up and enhance learning of new tasks. Two distinct research paradigms have studied this question. Meta-learning views this problem as learning a prior over model parameters that is amenable for fast adaptation on a new task, but typically assumes the set of tasks are available together as a batch. In contrast, online (regret based) learning considers a sequential setting in which problems are revealed one after the other, but conventionally train only a single model without any task-specific adaptation. This work introduces an online meta-learning setting, which merges ideas from both the aforementioned paradigms to better capture the spirit and practice of continual lifelong learning. We propose the follow the meta leader algorithm which extends the MAML algorithm to this setting. Theoretically, this work provides an $\mathcal{O}(\log T)$ regret guarantee with only one additional higher order smoothness assumption in comparison to the standard online setting. Our experimental evaluation on three different large-scale tasks suggest that the proposed algorithm significantly outperforms alternatives based on traditional online learning approaches. …

Rank-one Multi-Reference Factor Analysis google
In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is $1/\operatorname{SNR}^4$. Additionally, we propose a procedure which is experimentally shown to improve the sample complexity by a factor equal to the signal’s length. Finally, we present numerical experiments which demonstrate the performance of our algorithms, and corroborate our theoretical findings. …

Attack Graph Obfuscation google
Before executing an attack, adversaries usually explore the victim’s network in an attempt to infer the network topology and identify vulnerabilities in the victim’s servers and personal computers. Falsifying the information collected by the adversary post penetration may significantly slower lateral movement and increase the amount of noise generated within the victim’s network. We investigate the effect of fake vulnerabilities within a real enterprise network on the attacker performance. We use the attack graphs to model the path of an attacker making its way towards a target in a given network. We use combinatorial optimization in order to find the optimal assignments of fake vulnerabilities. We demonstrate the feasibility of our deception-based defense by presenting results of experiments with a large scale real network. We show that adding fake vulnerabilities forces the adversary to invest a significant amount of effort, in terms of time and exploitability cost. …

Adaptive Sznajd Model google
Understanding the way in which human opinion changes along time and space constitutes one of the great challenges in complex systems research. Among the several approaches that have been attempted at studying this problem, the Sznajd model provides some particularly interesting features, such as its simplicity and ability to represent some of the mechanisms believed to be involved in opinion dynamics. The standard Sznajd model at zero temperature is characterized by converging to one stable state, implying null diversity of opinions. In the present work, we develop an approach — namely the adaptive Sznajd model — in which changes of opinion by an individual (i.e. a network node) implies in possible alterations in the network topology. This is accomplished by allowing agents to change their connections preferentially to other neighbors with the same state. The diversity of opinions along time is quantified in terms of the exponential of the entropy of the opinions density. Several interesting results are reported, including the possible formation of echo chambers or social bubbles. Also, depending on the parameters configuration, the dynamics may converge to different equilibrium states for the same parameter setting. We also investigate the dynamics of the proposed adaptive model at non-null temperatures. …

If you did not already know

Octree Generating Networks google
We present a deep convolutional decoder architecture that can generate volumetric 3D outputs in a compute- and memory-efficient manner by using an octree representation. The network learns to predict both the structure of the octree, and the occupancy values of individual cells. This makes it a particularly valuable technique for generating 3D shapes. In contrast to standard decoders acting on regular voxel grids, the architecture does not have cubic complexity. This allows representing much higher resolution outputs with a limited memory budget. We demonstrate this in several application domains, including 3D convolutional autoencoders, generation of objects and whole scenes from high-level representations, and shape from a single image. …

Mean Shift Clustering google
The mean shift algorithm is a nonparametric clustering technique which does not require prior knowledge of the number of clusters, and does not constrain the shape of the clusters.
http://…/mean_shift.pdf
http://…/mean-shift


NullHop google
Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power consumption becomes a problem for real time mobile applications. We propose a flexible and efficient CNN accelerator architecture which can support the implementation of SOA CNNs in low-power and low-latency application scenarios. This architecture exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across a wide range of convolutional network kernel sizes; and numbers of input and output feature maps. We implemented the proposed architecture on an FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. We show how in RTL simulations in a 28nm process with a clock frequency of 500MHz, the NullHop core is able to reach over 450 GOp/s and efficiency of 368%, maintaining over 98% utilization of the MAC units and achieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2 …

Fast AutoAugment google
Data augmentation is an indispensable technique to improve generalization and also to deal with imbalanced datasets. Recently, AutoAugment has been proposed to automatically search augmentation policies from a dataset and has significantly improved performances on many image recognition tasks. However, its search method requires thousands of GPU hours to train even in a reduced setting. In this paper, we propose Fast AutoAugment algorithm that learns augmentation policies using a more efficient search strategy based on density matching. In comparison to AutoAugment, the proposed algorithm speeds up the search time by orders of magnitude while maintaining the comparable performances on the image recognition tasks with various models and datasets including CIFAR-10, CIFAR-100, and ImageNet. …

If you did not already know

RankCGAN google
In this paper, we investigate the use of generative adversarial networks in the task of image generation according to subjective measures of semantic attributes. Unlike the standard (CGAN) that generates images from discrete categorical labels, our architecture handles both continuous and discrete scales. Given pairwise comparisons of images, our model, called RankCGAN, performs two tasks: it learns to rank images using a subjective measure; and it learns a generative model that can be controlled by that measure. RankCGAN associates each subjective measure of interest to a distinct dimension of some latent space. We perform experiments on UT-Zap50K, PubFig and OSR datasets and demonstrate that the model is expressive and diverse enough to conduct two-attribute exploration and image editing. …

Missing Value PC (MVPC) google
Missing data are ubiquitous in many domains such as healthcare. Depending on how they are missing, the (conditional) independence relations in the observed data may be different from those for the complete data generated by the underlying causal process and, as a consequence, simply applying existing causal discovery methods to the observed data may lead to wrong conclusions. It is then essential to extend existing causal discovery approaches to find true underlying causal structure from such incomplete data. In this paper, we aim at solving this problem for data that are missing with different mechanisms, including missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). With missingness mechanisms represented by missingness Graph (m-Graph), we analyze conditions under which addition correction is needed to derive conditional independence/dependence relations in the complete data. Based on our analysis, we propose missing value PC (MVPC), which combines additional corrections with traditional causal discovery algorithm, in particular, PC. Our proposed MVPC is shown in theory to give asymptotically correct results even using data that are MAR and MNAR. Experiment results illustrate that the proposed algorithm can correct the conditional independence for values MCAR, MAR and rather general cases of values MNAR both with synthetic data as well as real-life healthcare application. …

Diversified, Early-termination-Enabled, and Parallel Bayesian Optimization (DEEP-BO) google
The performance of deep neural networks (DNN) is very sensitive to the particular choice of hyper-parameters. To make it worse, the shape of the learning curve can be significantly affected when a technique like batchnorm is used. As a result, hyperparameter optimization of deep networks can be much more challenging than traditional machine learning models. In this work, we start from well known Bayesian Optimization solutions and provide enhancement strategies specifically designed for hyperparameter optimization of deep networks. The resulting algorithm is named as DEEP-BO (Diversified, Early-termination-Enabled, and Parallel Bayesian Optimization). When evaluated over six DNN benchmarks, DEEP-BO easily outperforms or shows comparable performance with some of the well-known solutions including GP-Hedge, Hyperband, BOHB, Median Stopping Rule, and Learning Curve Extrapolation. The code used is made publicly available at https://…/DEEP-BO.

Three-Stage Subspace Clustering Framework (3S-SC) google
Subspace clustering (SC) refers to the problem of clustering high-dimensional data into a union of low-dimensional subspaces. Based on spectral clustering, state-of-the-art approaches solve SC problem within a two-stage framework. In the first stage, data representation techniques are applied to draw an affinity matrix from the original data. In the second stage, spectral clustering is directly applied to the affinity matrix so that data can be grouped into different subspaces. However, the affinity matrix obtained in the first stage usually fails to reveal the authentic relationship between data points, which leads to inaccurate clustering results. In this paper, we propose a universal Three-Stage Subspace Clustering framework (3S-SC). Graph-Based Transformation and Optimization (GBTO) is added between data representation and spectral clustering. The affinity matrix is obtained in the first stage, then it goes through the second stage, where the proposed GBTO is applied to generate a reconstructed affinity matrix with more authentic similarity between data points. Spectral clustering is applied after GBTO, which is the third stage. We verify our 3S-SC framework with GBTO through theoretical analysis. Experiments on both synthetic data and the real-world data sets of handwritten digits and human faces demonstrate the universality of the proposed 3S-SC framework in improving the connectivity and accuracy of SC methods based on $\ell_0$, $\ell_1$, $\ell_2$ or nuclear norm regularization. …

If you did not already know

Trainable Multiplication Layer google
In this paper, we propose a trainable multiplication layer (TML) for a neural network that can be used to calculate the multiplication between the input features. Taking an image as an input, the TML raises each pixel value to the power of a weight and then multiplies them, thereby extracting the higher-order local auto-correlation from the input image. The TML can also be used to extract co-occurrence from the feature map of a convolutional network. The training of the TML is formulated based on backpropagation with constraints to the weights, enabling us to learn discriminative multiplication patterns in an end-to-end manner. In the experiments, the characteristics of the TML are investigated by visualizing learned kernels and the corresponding output features. The applicability of the TML for classification and neural network interpretation is also evaluated using public datasets. …

Total Operating Characteristic (TOC) google
The relative operating characteristic (ROC) is a popular statistical method to measure the association between observed and diagnosed presence of a characteristic. The diagnosis of presence or absence depends on whether the value of an index variable is above a threshold. ROC considers multiple possible thresholds. Each threshold generates a two-by-two contingency table, which contains four central entries: hits, misses, false alarms, and correct rejections. ROC reveals for each threshold only two ratios, hits/(hits + misses) and false alarms/(false alarms + correct rejections). This article introduces the total operating characteristic (TOC), which shows the total information in the contingency table for each threshold. TOC maintains desirable properties of ROC, while TOC reveals strictly more information than ROC in a manner that makes TOC more useful than ROC. …

Lanczos Network (LanczosNet) google
We propose the Lanczos network (LanczosNet), which uses the Lanczos algorithm to construct low rank approximations of the graph Laplacian for graph convolution. Relying on the tridiagonal decomposition of the Lanczos algorithm, we not only efficiently exploit multi-scale information via fast approximated computation of matrix power but also design learnable spectral filters. Being fully differentiable, LanczosNet facilitates both graph kernel learning as well as learning node embeddings. We show the connection between our LanczosNet and graph based manifold learning methods, especially the diffusion maps. We benchmark our model against several recent deep graph networks on citation networks and QM8 quantum chemistry dataset. Experimental results show that our model achieves the state-of-the-art performance in most tasks. …

CoachAI google
Poor lifestyle represents a health risk factor and is the leading cause of morbidity and chronic conditions. The impact of poor lifestyle can be significantly altered by individual behavior change. Although the current shift in healthcare towards a long lasting modifiable behavior, however, with increasing caregiver workload and individuals’ continuous needs of care, there is a need to ease caregiver’s work while ensuring continuous interaction with users. This paper describes the design and validation of CoachAI, a conversational agent assisted health coaching system to support health intervention delivery to individuals and groups. CoachAI instantiates a text based healthcare chatbot system that bridges the remote human coach and the users. This research provides three main contributions to the preventive healthcare and healthy lifestyle promotion: (1) it presents the conversational agent to aid the caregiver; (2) it aims to decrease caregiver’s workload and enhance care given to users, by handling (automating) repetitive caregiver tasks; and (3) it presents a domain independent mobile health conversational agent for health intervention delivery. We will discuss our approach and analyze the results of a one month validation study on physical activity, healthy diet and stress management. …

If you did not already know

Regularized Online Balanced Decent google
We study online convex optimization in a setting where the learner seeks to minimize the sum of a per-round hitting cost and a movement cost which is incurred when changing decisions between rounds. We prove a new lower bound on the competitive ratio of any online algorithm in the setting where the costs are $m$-strongly convex and the movement costs are the squared $\ell_2$ norm. This lower bound shows that no algorithm can achieve a competitive ratio that is $o(m^{-1/2})$ as $m$ tends to zero. No existing algorithms have competitive ratios matching this bound, and we show that the state-of-the-art algorithm, Online Balanced Decent (OBD), has a competitive ratio that is $\Omega(m^{-2/3})$. We additionally propose two new algorithms, Greedy Online Balanced Decent (G-OBD) and Regularized Online Balanced Decent (R-OBD) and prove that both algorithms have an $O(m^{-1/2})$ competitive ratio. The result for G-OBD holds when the hitting costs are quasiconvex and the movement costs are the squared $\ell_2$ norm, while the result for R-OBD holds when the hitting costs are $m$-strongly convex and the movement costs are Bregman Divergences. Further, we show that R-OBD simultaneously achieves constant, dimension-free competitive ratio and sublinear regret when hitting costs are strongly convex. …

DeLTA google
Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of the execution time of CNN training, and GPUs are commonly used to accelerate these layer workloads. GPU design optimization for efficient CNN training acceleration requires the accurate modeling of how their performance improves when computing and memory resources are increased. We present DeLTA, the first analytical model that accurately estimates the traffic at each GPU memory hierarchy level, while accounting for the complex reuse patterns of a parallel convolution algorithm. We demonstrate that our model is both accurate and robust for different CNNs and GPU architectures. We then show how this model can be used to carefully balance the scaling of different GPU resources for efficient CNN performance improvement. …

Adaptive Collaborative Similarity Learning (ACSL) google
In this paper, we investigate the research problem of unsupervised multi-view feature selection. Conventional solutions first simply combine multiple pre-constructed view-specific similarity structures into a collaborative similarity structure, and then perform the subsequent feature selection. These two processes are separate and independent. The collaborative similarity structure remains fixed during feature selection. Further, the simple undirected view combination may adversely reduce the reliability of the ultimate similarity structure for feature selection, as the view-specific similarity structures generally involve noises and outlying entries. To alleviate these problems, we propose an adaptive collaborative similarity learning (ACSL) for multi-view feature selection. We propose to dynamically learn the collaborative similarity structure, and further integrate it with the ultimate feature selection into a unified framework. Moreover, a reasonable rank constraint is devised to adaptively learn an ideal collaborative similarity structure with proper similarity combination weights and desirable neighbor assignment, both of which could positively facilitate the feature selection. An effective solution guaranteed with the proved convergence is derived to iteratively tackle the formulated optimization problem. Experiments demonstrate the superiority of the proposed approach. …

EXplicit interAction Model (EXAM) google
Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multi-label and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches. …

If you did not already know

Affective Computing google
Affective computing (sometimes called artificial emotional intelligence, or emotion AI) is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer science, psychology, and cognitive science. While the origins of the field may be traced as far back as to early philosophical inquiries into emotion, the more modern branch of computer science originated with Rosalind Picard’s 1995 paper on affective computing. A motivation for the research is the ability to simulate empathy. The machine should interpret the emotional state of humans and adapt its behavior to them, giving an appropriate response to those emotions. The difference between sentiment analysis and affective analysis is that the latter detects the different emotions instead of identifying only the polarity of the phrase. …

VWPPR google
Growing popularity of social networks demands a highly efficient Personalized PageRank (PPR) updating due to the fast-evolving web graphs of enormous size. While current researches are focusing on PPR updating under link structure modification, efficiently updating PPR when node insertion/ deletion involved remains a challenge. In the previous work called Virtual Web (VW), a few VW architectures are designed, which results in some highly effective initializations to significantly accelerate PageRank updating under both link modification and page insertion/deletion. In the paper, under the general scenario of link modification and node insertion/deletion we tackle the fast PPR updating problem. Specifically, we combine VW with the TrackingPPR method to generate initials, which are then used by the Gauss-Southwell method for fast PPR updating. The algorithm is named VWPPR method. In extensive experiments, three real-world datasets are used that contain 1~5.6M nodes and 6.7M~129M links, while a node perturbation of 40k and link perturbation of 1% are applied. Comparing to the more recent LazyForwardUpdate method, which handles the general PPR updating problem, the VWPPR method is 3~6 times faster in terms of running time, or 4.4~10 times faster in terms of iteration numbers. …

Fully Learnable Group Convolution Module (FLGC) google
Benefitted from its great success on many tasks, deep learning is increasingly used on low-computational-cost devices, e.g. smartphone, embedded devices, etc. To reduce the high computational and memory cost, in this work, we propose a fully learnable group convolution module (FLGC for short) which is quite efficient and can be embedded into any deep neural networks for acceleration. Specifically, our proposed method automatically learns the group structure in the training stage in a fully end-to-end manner, leading to a better structure than the existing pre-defined, two-steps, or iterative strategies. Moreover, our method can be further combined with depthwise separable convolution, resulting in 5 times acceleration than the vanilla Resnet50 on single CPU. An additional advantage is that in our FLGC the number of groups can be set as any value, but not necessarily 2^k as in most existing methods, meaning better tradeoff between accuracy and speed. As evaluated in our experiments, our method achieves better performance than existing learnable group convolution and standard group convolution when using the same number of groups. …

DeepFlow google
The calibration of a reservoir model with observed transient data of fluid pressures and rates is a key task in obtaining a predictive model of the flow and transport behaviour of the earth’s subsurface. The model calibration task, commonly referred to as ‘history matching’, can be formalised as an ill-posed inverse problem where we aim to find the underlying spatial distribution of petrophysical properties that explain the observed dynamic data. We use a generative adversarial network pretrained on geostatistical object-based models to represent the distribution of rock properties for a synthetic model of a hydrocarbon reservoir. The dynamic behaviour of the reservoir fluids is modelled using a transient two-phase incompressible Darcy formulation. We invert for the underlying reservoir properties by first modeling property distributions using the pre-trained generative model then using the adjoint equations of the forward problem to perform gradient descent on the latent variables that control the output of the generative model. In addition to the dynamic observation data, we include well rock-type constraints by introducing an additional objective function. Our contribution shows that for a synthetic test case, we are able to obtain solutions to the inverse problem by optimising in the latent variable space of a deep generative model, given a set of transient observations of a non-linear forward problem. …

If you did not already know

JavaScript Object Notation (JSON) google
JSON, or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute-value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML. …

Causal Complementation Algorithm google
The theory of linear Diophantine equations in two unknowns over polynomial rings is used to construct causal lifting factorizations for causal two-channel FIR perfect reconstruction multirate filter banks and wavelet transforms. The Diophantine approach generates causal lifting factorizations satisfying certain polynomial degree-reducing inequalities, enabling a new lifting factorization strategy called the \emph{Causal Complementation Algorithm}. This provides an alternative to the noncausal lifting scheme based on the Extended Euclidean Algorithm for Laurent polynomials that was developed by Daubechies and Sweldens. The new approach, which can be regarded as Gaussian elimination in polynomial matrices, utilizes a generalization of polynomial division that ensures existence and uniqueness of quotients whose remainders satisfy user-specified divisibility constraints. The Causal Complementation Algorithm is shown to be more general than the Extended Euclidean Algorithm approach by generating causal lifting factorizations not obtainable using the polynomial Euclidean Algorithm. …

Best-Scored Random Forest Classification google
We propose an algorithm named best-scored random forest for binary classification problems. The terminology ‘best-scored’ means to select the one with the best empirical performance out of a certain number of purely random tree candidates as each single tree in the forest. In this way, the resulting forest can be more accurate than the original purely random forest. From the theoretical perspective, within the framework of regularized empirical risk minimization penalized on the number of splits, we establish almost optimal convergence rates for the proposed best-scored random trees under certain conditions which can be extended to the best-scored random forest. In addition, we present a counterexample to illustrate that in order to ensure the consistency of the forest, every dimension must have the chance to be split. In the numerical experiments, for the sake of efficiency, we employ an adaptive random splitting criterion. Comparative experiments with other state-of-art classification methods demonstrate the accuracy of our best-scored random forest. …

Berkeley Container Library (BCL) google
One-sided communication is a useful paradigm for irregular parallel applications, but most one-sided programming environments, including MPI’s one-sided interface and PGAS programming languages, lack application level libraries to support these applications. We present the Berkeley Container Library, a set of generic, cross-platform, high-performance data structures for irregular applications, including queues, hash tables, Bloom filters and more. BCL is written in C++ using an internal DSL called the BCL Core that provides one-sided communication primitives such as remote get and remote put operations. The BCL Core has backends for MPI, OpenSHMEM, GASNet-EX, and UPC++, allowing BCL data structures to be used natively in programs written using any of these programming environments. Along with our internal DSL, we present the BCL ObjectContainer abstraction, which allows BCL data structures to transparently serialize complex data types while maintaining efficiency for primitive types. We also introduce the set of BCL data structures and evaluate their performance across a number of high-performance computing systems, demonstrating that BCL programs are competitive with hand-optimized code, even while hiding many of the underlying details of message aggregation, serialization, and synchronization. …