# If you did not already know

Highly Efficient Network (HENet)
In order to enhance the real-time performance of convolutional neural networks(CNNs), more and more researchers are focusing on improving the efficiency of CNN. Based on the analysis of some CNN architectures, such as ResNet, DenseNet, ShuffleNet and so on, we combined their advantages and proposed a very efficient model called Highly Efficient Networks(HENet). The new architecture uses an unusual way to combine group convolution and channel shuffle which was mentioned in ShuffleNet. Inspired by ResNet and DenseNet, we also proposed a new way to use element-wise addition and concatenation connection with each block. In order to make greater use of feature maps, pooling operations are removed from HENet. The experiments show that our model’s efficiency is more than 1 times higher than ShuffleNet on many open source datasets, such as CIFAR-10/100 and SVHN. …

t-Exponential Memory Network
Recent advances in deep learning have brought to the fore models that can make multiple computational steps in the service of completing a task; these are capable of describ- ing long-term dependencies in sequential data. Novel recurrent attention models over possibly large external memory modules constitute the core mechanisms that enable these capabilities. Our work addresses learning subtler and more complex underlying temporal dynamics in language modeling tasks that deal with sparse sequential data. To this end, we improve upon these recent advances, by adopting concepts from the field of Bayesian statistics, namely variational inference. Our proposed approach consists in treating the network parameters as latent variables with a prior distribution imposed over them. Our statistical assumptions go beyond the standard practice of postulating Gaussian priors. Indeed, to allow for handling outliers, which are prevalent in long observed sequences of multivariate data, multivariate t-exponential distributions are imposed. On this basis, we proceed to infer corresponding posteriors; these can be used for inference and prediction at test time, in a way that accounts for the uncertainty in the available sparse training data. Specifically, to allow for our approach to best exploit the merits of the t-exponential family, our method considers a new t-divergence measure, which generalizes the concept of the Kullback-Leibler divergence. We perform an extensive experimental evaluation of our approach, using challenging language modeling benchmarks, and illustrate its superiority over existing state-of-the-art techniques. …

PaToPaEM
Grid topology and line parameters are essential for grid operation and planning, which may be missing or inaccurate in distribution grids. Existing data-driven approaches for recovering such information usually suffer from ignoring 1) input measurement errors and 2) possible state changes among historical measurements. While using the errors-in-variables (EIV) model and letting the parameter and topology estimation interact with each other (PaToPa) can address input and output measurement error modeling, it only works when all measurements are from a single system state. To solve the two challenges simultaneously, we propose the PaToPaEM framework for joint line parameter and topology estimation with historical measurements from different unknown states. We improve the static framework that only works when measurements are from one single state, by further treating state changes in historical measurements as an unobserved latent variable. We then systematically analyze the new mathematical modeling, decouple the optimization problem, and incorporate the expectation-maximization (EM) algorithm to recover different hidden states in measurements. Combining these, PaToPaEM framework enables joint topology and line parameter estimation using noisy measurements from multiple system states. It lays a solid foundation for data-driven system identification in distribution grids. Superior numerical results validate the practicability of the PaToPaEM framework. …

Universal Approximation Theorem
In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. The theorem thus states that simple neural networks can represent a wide variety of interesting functions when given appropriate parameters; it does not touch upon the algorithmic learnability of those parameters. One of the first versions of the theorem was proved by George Cybenko in 1989 for sigmoid activation functions. Kurt Hornik showed in 1991 that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators. The output units are always assumed to be linear. For notational convenience, only the single output case will be shown. The general case can easily be deduced from the single output case. …

# If you did not already know

Linear Memory Network
Recurrent neural networks can learn complex transduction problems that require maintaining and actively exploiting a memory of their inputs. Such models traditionally consider memory and input-output functionalities indissolubly entangled. We introduce a novel recurrent architecture based on the conceptual separation between the functional input-output transformation and the memory mechanism, showing how they can be implemented through different neural components. By building on such conceptualization, we introduce the Linear Memory Network, a recurrent model comprising a feedforward neural network, realizing the non-linear functional transformation, and a linear autoencoder for sequences, implementing the memory component. The resulting architecture can be efficiently trained by building on closed-form solutions to linear optimization problems. Further, by exploiting equivalence results between feedforward and recurrent neural networks we devise a pretraining schema for the proposed architecture. Experiments on polyphonic music datasets show competitive results against gated recurrent networks and other state of the art models. …

Partial Decision-DNNF
Model counting is the problem of computing the number of satisfying assignments of a given propositional formula. Although exact model counters can be naturally furnished by most of the knowledge compilation (KC) methods, in practice, they fail to generate the compiled results for the exact counting of models for certain formulas due to the explosion in sizes. Decision-DNNF is an important KC language that captures most of the practical compilers. We propose a generalized Decision-DNNF (referred to as partial Decision-DNNF) via introducing a class of new leaf vertices (called unknown vertices), and then propose an algorithm called PartialKC to generate randomly partial Decision-DNNF formulas from the given formulas. An unbiased estimate of the model number can be computed via a randomly partial Decision-DNNF formula. Each calling of PartialKC consists of multiple callings of MicroKC, while each of the latter callings is a process of importance sampling equipped with KC technologies. The experimental results show that PartialKC is more accurate than both SampleSearch and SearchTreeSampler, PartialKC scales better than SearchTreeSampler, and the KC technologies can obviously accelerate sampling. …

Generative Adversarial Networks (GANs) have achieved great success in generating realistic images. Most of these are conditional models, although acquisition of class labels is expensive and time-consuming in practice. To reduce the dependence on labeled data, we propose an un-conditional generative adversarial model, called K-Means-generative adversarial model (KM-GAN), which incorporates the idea of updating centers in K-Means into GANs. Specifically, we redesign the framework of GANs by applying K-Means on the features extracted from the discriminator. With obtained labels from K-Means, we propose new objective functions from the perspective of deep metric learning (DML). Distinct from previous works, the discriminator is treated as a feature extractor rather than a classifier in KM-GAN, meanwhile utilization of K-Means makes features of the discriminator more representative. Experiments are conducted on various datasets, such as MNIST, Fashion-10, CIFAR-10 and CelebA, and show that the quality of samples generated by KM-GAN is comparable to some conditional generative adversarial models. …

jackknife+
This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas the jackknife outputs an interval centered at the predicted response of a test point, with the width of the interval determined by the quantiles of leave-one-out residuals, the jackknife+ also uses the leave-one-out predictions at the test point to account for the variability in the fitted regression function. Assuming exchangeable training samples, we prove that this crucial modification permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically. Such guarantees are not possible for the original jackknife and we demonstrate examples where the coverage rate may actually vanish. Our theoretical and empirical analysis reveals that the jackknife and the jackknife+ intervals achieve nearly exact coverage and have similar lengths whenever the fitting algorithm obeys some form of stability. Further, we extend the jackknife+ to K-fold cross validation and similarly establish rigorous coverage properties. Our methods are related to cross-conformal prediction proposed by Vovk [2015] and we discuss connections. …

# If you did not already know

Randomized Block Cubic Newton (RBCN)
We study the problem of minimizing the sum of three convex functions: a differentiable, twice-differentiable and a non-smooth term in a high dimensional setting. To this effect we propose and analyze a randomized block cubic Newton (RBCN) method, which in each iteration builds a model of the objective function formed as the sum of the natural models of its three components: a linear model with a quadratic regularizer for the differentiable term, a quadratic model with a cubic regularizer for the twice differentiable term, and perfect (proximal) model for the nonsmooth term. Our method in each iteration minimizes the model over a random subset of blocks of the search variable. RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases. We establish ${\cal O}(1/\epsilon)$, ${\cal O}(1/\sqrt{\epsilon})$ and ${\cal O}(\log (1/\epsilon))$ rates under different assumptions on the component functions. Lastly, we show numerically that our method outperforms the state-of-the-art on a variety of machine learning problems, including cubically regularized least-squares, logistic regression with constraints, and Poisson regression. …

Gaussian Process Latent Variable Alignment Learning
We present a model that can automatically learn alignments between high-dimensional data in an unsupervised manner. Learning alignments is an ill-constrained problem as there are many different ways of defining a good alignment. Our proposed method casts alignment learning in a framework where both alignment and data are modelled simultaneously. We derive a probabilistic model built on non-parametric priors that allows for flexible warps while at the same time providing means to specify interpretable constraints. We show results on several datasets, including different motion capture sequences and show that the suggested model outperform the classical algorithmic approaches to the alignment task. …

Multimodal Deep Network Embedding (MDNE)
Network embedding is the process of learning low-dimensional representations for nodes in a network, while preserving node features. Existing studies only leverage network structure information and focus on preserving structural features. However, nodes in real-world networks often have a rich set of attributes providing extra semantic information. It has been demonstrated that both structural and attribute features are important for network analysis tasks. To preserve both features, we investigate the problem of integrating structure and attribute information to perform network embedding and propose a Multimodal Deep Network Embedding (MDNE) method. MDNE captures the non-linear network structures and the complex interactions among structures and attributes, using a deep model consisting of multiple layers of non-linear functions. Since structures and attributes are two different types of information, a multimodal learning method is adopted to pre-process them and help the model to better capture the correlations between node structure and attribute information. We employ both structural proximity and attribute proximity in the loss function to preserve the respective features and the representations are obtained by minimizing the loss function. Results of extensive experiments on four real-world datasets show that the proposed method performs significantly better than baselines on a variety of tasks, which demonstrate the effectiveness and generality of our method. …

Scale-Adaptive Neural Dense Features (SAND Features)
How do computers and intelligent agents view the world around them? Feature extraction and representation constitutes one the basic building blocks towards answering this question. Traditionally, this has been done with carefully engineered hand-crafted techniques such as HOG, SIFT or ORB. However, there is no “one size fits all” approach that satisfies all requirements. In recent years, the rising popularity of deep learning has resulted in a myriad of end-to-end solutions to many computer vision problems. These approaches, while successful, tend to lack scalability and can’t easily exploit information learned by other systems. Instead, we propose SAND features, a dedicated deep learning solution to feature extraction capable of providing hierarchical context information. This is achieved by employing sparse relative labels indicating relationships of similarity/dissimilarity between image locations. The nature of these labels results in an almost infinite set of dissimilar examples to choose from. We demonstrate how the selection of negative examples during training can be used to modify the feature space and vary it’s properties. To demonstrate the generality of this approach, we apply the proposed features to a multitude of tasks, each requiring different properties. This includes disparity estimation, semantic segmentation, self-localisation and SLAM. In all cases, we show how incorporating SAND features results in better or comparable results to the baseline, whilst requiring little to no additional training. Code can be found at: https://…/SAND_features

# If you did not already know

Expectation-Biasing
State-of-the-art forecasting methods using Recurrent Neural Net- works (RNN) based on Long-Short Term Memory (LSTM) cells have shown exceptional performance targeting short-horizon forecasts, e.g given a set of predictor features, forecast a target value for the next few time steps in the future. However, in many applications, the performance of these methods decays as the forecasting horizon extends beyond these few time steps. This paper aims to explore the challenges of long-horizon forecasting using LSTM networks. Here, we illustrate the long-horizon forecasting problem in datasets from neuroscience and energy supply management. We then propose expectation-biasing, an approach motivated by the literature of Dynamic Belief Networks, as a solution to improve long-horizon forecasting using LSTMs. We propose two LSTM architectures along with two methods for expectation biasing that significantly outperforms standard practice. …

Binary Weight and Hadamard-transformed Image Network (BWHIN)
Deep learning has made significant improvements at many image processing tasks in recent years, such as image classification, object recognition and object detection. Convolutional neural networks (CNN), which is a popular deep learning architecture designed to process data in multiple array form, show great success to almost all detection \& recognition problems and computer vision tasks. However, the number of parameters in a CNN is too high such that the computers require more energy and larger memory size. In order to solve this problem, we propose a novel energy efficient model Binary Weight and Hadamard-transformed Image Network (BWHIN), which is a combination of Binary Weight Network (BWN) and Hadamard-transformed Image Network (HIN). It is observed that energy efficiency is achieved with a slight sacrifice at classification accuracy. Among all energy efficient networks, our novel ensemble model outperforms other energy efficient models. …

Apache Pulsar
Pulsar is a distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API. …

TuckER
Knowledge graphs are structured representations of real world facts. However, they typically contain only a small subset of all possible facts. Link prediction is a task of inferring missing facts based on existing ones. We propose TuckER, a relatively simple but powerful linear model based on Tucker decomposition of the binary tensor representation of knowledge graph triples. TuckER outperforms all previous state-of-the-art models across standard link prediction datasets. We prove that TuckER is a fully expressive model, deriving the bound on its entity and relation embedding dimensionality for full expressiveness which is several orders of magnitude smaller than the bound of previous state-of-the-art models ComplEx and SimplE. We further show that several previously introduced linear models can be viewed as special cases of TuckER. …

# If you did not already know

Independent and Periodically Identically Distributed Processes (i.p.i.d.)
A new class of stochastic processes called independent and periodically identically distributed (i.p.i.d.) processes is defined to capture periodically varying statistical behavior. Algorithms are proposed to detect changes in such i.p.i.d. processes. It is shown that the algorithms can be computed recursively and are asymptotically optimal. This problem has applications in anomaly detection in traffic data, social network data, and neural data, where periodic statistical behavior has been observed. …

C Math Library (CML)

Multiple Search Neuroevolution (MSN)
This paper presents an evolutionary metaheuristic called Multiple Search Neuroevolution (MSN) to optimize deep neural networks. The algorithm attempts to search multiple promising regions in the search space simultaneously, maintaining sufficient distance between them. It is tested by training neural networks for two tasks, and compared with other optimization algorithms. The first task is to solve Global Optimization functions with challenging topographies. We found to MSN to outperform classic optimization algorithms such as Evolution Strategies, reducing the number of optimization steps performed by at least 2X. The second task is to train a convolutional neural network (CNN) on the popular MNIST dataset. Using 3.33% of the training set, MSN reaches a validation accuracy of 90%. Stochastic Gradient Descent (SGD) was able to match the same accuracy figure, while taking 7X less optimization steps. Despite lagging, the fact that the MSN metaheurisitc trains a 4.7M-parameter CNN suggests promise for future development. This is by far the largest network ever evolved using a pool of only 50 samples. …

Nested Averaged Stochastic Approximation (NASA)
We study constrained nested stochastic optimization problems in which the objective function is a composition of two smooth functions whose exact values and derivatives are not available. We propose a single time-scale stochastic approximation algorithm, which we call the Nested Averaged Stochastic Approximation (NASA), to find an approximate stationary point of the problem. The algorithm has two auxiliary averaged sequences (filters) which estimate the gradient of the composite objective function and the inner function value. By using a special Lyapunov function, we show that NASA achieves the sample complexity of ${\cal O}(1/\epsilon^{2})$ for finding an $\epsilon$-approximate stationary point, thus outperforming all extant methods for nested stochastic approximation. Our method and its analysis are the same for both unconstrained and constrained problems, without any need of batch samples for constrained nonconvex stochastic optimization. We also present a simplified variant of the NASA method for solving constrained single level stochastic optimization problems, and we prove the same complexity result for both unconstrained and constrained problems. …

# If you did not already know

HDTCat
HDT (Header, Dictionary, Triples) is a serialization for RDF. HDT has become very popular in the last years because it allows to store RDF data with a small disk footprint, while remaining at the same time queriable. For this reason HDT is often used when scalability becomes an issue. Once RDF data is serialized into HDT, the disk footprint to store it and the memory footprint to query it are very low. However, generating HDT files from raw text RDF serializations (like N-Triples) is a time-consuming and (especially) memory-consuming task. In this publication we present HDTCat, an algorithm and command line tool to join two HDT files with low memory footprint. HDTCat can be used in a divide-and-conquer strategy to generate HDT files from huge datasets using a low-memory footprint. …

Syntax-Aware Word Representation (SAWR)
Syntax has been demonstrated highly effective in neural machine translation (NMT). Previous NMT models integrate syntax by representing 1-best tree outputs from a well-trained parsing system, e.g., the representative Tree-RNN and Tree-Linearization methods, which may suffer from error propagation. In this work, we propose a novel method to integrate source-side syntax implicitly for NMT. The basic idea is to use the intermediate hidden representations of a well-trained end-to-end dependency parser, which are referred to as syntax-aware word representations (SAWRs). Then, we simply concatenate such SAWRs with ordinary word embeddings to enhance basic NMT models. The method can be straightforwardly integrated into the widely-used sequence-to-sequence (Seq2Seq) NMT models. We start with a representative RNN-based Seq2Seq baseline system, and test the effectiveness of our proposed method on two benchmark datasets of the Chinese-English and English-Vietnamese translation tasks, respectively. Experimental results show that the proposed approach is able to bring significant BLEU score improvements on the two datasets compared with the baseline, 1.74 points for Chinese-English translation and 0.80 point for English-Vietnamese translation, respectively. In addition, the approach also outperforms the explicit Tree-RNN and Tree-Linearization methods. …

KPynq
K-means is a popular but computation-intensive algorithm for unsupervised learning. To address this issue, we present KPynq, a work-efficient triangle-inequality based K-means on FPGA for handling large-size, high-dimension datasets. KPynq leverages an algorithm-level optimization to balance the performance and computation irregularity, and a hardware architecture design to fully exploit the pipeline and parallel processing capability of various FPGAs. In the experiment, KPynq consistently outperforms the CPU-based standard K-means in terms of its speedup (up to 4.2x) and significant energy-efficiency (up to 218x). …

GnnExplainer
Graph Neural Networks (GNNs) are a powerful tool for machine learning on graphs. GNNs combine node feature information with the graph structure by using neural networks to pass messages through edges in the graph. However, incorporating both graph structure and feature information leads to complex non-linear models and explaining predictions made by GNNs remains to be a challenging task. Here we propose GnnExplainer, a general model-agnostic approach for providing interpretable explanations for predictions of any GNN-based model on any graph-based machine learning task (node and graph classification, link prediction). In order to explain a given node’s predicted label, GnnExplainer provides a local interpretation by highlighting relevant features as well as an important subgraph structure by identifying the edges that are most relevant to the prediction. Additionally, the model provides single-instance explanations when given a single prediction as well as multi-instance explanations that aim to explain predictions for an entire class of instances/nodes. We formalize GnnExplainer as an optimization task that maximizes the mutual information between the prediction of the full model and the prediction of simplified explainer model. We experiment on synthetic as well as real-world data. On synthetic data we demonstrate that our approach is able to highlight relevant topological structures from noisy graphs. We also demonstrate GnnExplainer to provide a better understanding of pre-trained models on real-world tasks. GnnExplainer provides a variety of benefits, from the identification of semantically relevant structures to explain predictions to providing guidance when debugging faulty graph neural network models. …

# If you did not already know

In this paper we study the convergence of generative adversarial networks (GANs) from the perspective of the informativeness of the gradient of the optimal discriminative function. We show that GANs without restriction on the discriminative function space commonly suffer from the problem that the gradient produced by the discriminator is uninformative to guide the generator. By contrast, Wasserstein GAN (WGAN), where the discriminative function is restricted to $1$-Lipschitz, does not suffer from such a gradient uninformativeness problem. We further show in the paper that the model with a compact dual form of Wasserstein distance, where the Lipschitz condition is relaxed, also suffers from this issue. This implies the importance of Lipschitz condition and motivates us to study the general formulation of GANs with Lipschitz constraint, which leads to a new family of GANs that we call Lipschitz GANs (LGANs). We show that LGANs guarantee the existence and uniqueness of the optimal discriminative function as well as the existence of a unique Nash equilibrium. We prove that LGANs are generally capable of eliminating the gradient uninformativeness problem. According to our empirical analysis, LGANs are more stable and generate consistently higher quality samples compared with WGAN. …

Since with massive data growth, the need for autonomous and generic anomaly detection system is increased. However, developing one stand-alone generic anomaly detection system that is accurate and fast is still a challenge. In this paper, we propose conventional time-series analysis approaches, the Seasonal Autoregressive Integrated Moving Average (SARIMA) model and Seasonal Trend decomposition using Loess (STL), to detect complex and various anomalies. Usually, SARIMA and STL are used only for stationary and periodic time-series, but by combining, we show they can detect anomalies with high accuracy for data that is even noisy and non-periodic. We compared the algorithm to Long Short Term Memory (LSTM), a deep-learning-based algorithm used for anomaly detection system. We used a total of seven real-world datasets and four artificial datasets with different time-series properties to verify the performance of the proposed algorithm. …

Cloud AutoML (AutoML)
Train high-quality custom machine learning models with minimum effort and machine learning expertise.
• Train custom machine learning models: Cloud AutoML is a suite of machine learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs, by leveraging Google´s state-of-the-art transfer learning, and Neural Architecture Search technology.
• State-of-the-art performance: Use Cloud AutoML to leverage Google´s proprietary technology, which offers fast performance and accurate predictions. AutoML puts more than 10 years of Google Research technology in the hands of our users.
• Get up and running fast: Cloud AutoML provides a simple graphical user interface (GUI) for you to train, evaluate, improve, and deploy models based on your own data. You´re only a few minutes away from your own custom machine learning model.
• Generate high-quality training data: You can use Google´s human labeling service to have real-life people annotate or clean your labels to make sure your models are being trained on high-quality data. …

Outlier Exposure
It is important to detect and handle anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data commonly used by deep learning systems are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This approach enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments in vision and natural language processing settings, we find that Outlier Exposure significantly improves the detection performance. Our approach is even applicable to density estimation models and anomaly detectors for large-scale images. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance. …

# If you did not already know

Decision Intelligence
Decision intelligence is an engineering discipline that augments data science with theory from social science, decision theory, and managerial science. Its application provides a framework for best practices in organizational decision-making and processes for applying machine learning at scale. The basic idea is that decisions are based on our understanding of how actions lead to outcomes. Decision intelligence is a discipline for analyzing this chain of cause and effect, and decision modeling is a visual language for representing these chains. A related field, decision engineering, also investigates the improvement of decision-making processes but is not always as closely tied to data science. …

Pix2Vex
We present a novel approach to 3D object reconstruction from its 2D projections. Our unique, GAN-inspired system employs a novel $C^\infty$ smooth differentiable renderer. Unlike the state-of-the-art, our renderer does not display any discontinuities at occlusions and dis-occlusions, facilitating training without 3D supervision and only minimal 2D supervision. Through domain adaptation and a novel training scheme, our network, the Reconstructive Adversarial Network (RAN), is able to train on different types of images. In contrast, previous work can only train on images of a similar appearance to those rendered by a differentiable renderer. We validate our reconstruction method through three shape classes from ShapeNet, and demonstrate that our method is robust to perturbations in view directions, different lighting conditions, and levels of texture details. …

Morpheo
Morpheo is a transparent and secure machine learning platform collecting and analysing large datasets. It aims at building state-of-the art prediction models in various fields where data are sensitive. Indeed, it offers strong privacy of data and algorithm, by preventing anyone to read the data, apart from the owner and the chosen algorithms. Computations in Morpheo are orchestrated by a blockchain infrastructure, thus offering total traceability of operations. Morpheo aims at building an attractive economic ecosystem around data prediction by channelling crypto-money from prediction requests to useful data and algorithms providers. Morpheo is designed to handle multiple data sources in a transfer learning approach in order to mutualize knowledge acquired from large datasets for applications with smaller but similar datasets. …

Lifelong Bayesian Optimization (LBO)
Automatic Machine Learning (Auto-ML) systems tackle the problem of automating the design of prediction models or pipelines for data science. In this paper, we present Lifelong Bayesian Optimization (LBO), an online, multitask Bayesian optimization (BO) algorithm designed to solve the problem of model selection for datasets arriving and evolving over time. To be suitable for Lifelong Bayesian Optimization, an algorithm needs to scale with the ever-increasing size of the dataset, and should be able to leverage past optimizations in learning the current best model. We cast the problem of model selection as a black-box function optimization problem. In LBO, we exploit the correlation between functions by using components of previously learned functions to speed up the learning process for newly arriving datasets. Experiments on real and synthetic data show that LBO outperforms standard BO algorithms applied repeatedly on the data. …

# If you did not already know

Additive white Gaussian noise (AWGN) is a basic noise model used in Information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics:
• Additive because it is added to any noise that might be intrinsic to the information system.
• White refers to the idea that it has uniform power across the frequency band for the information system. It is an analogy to the color white which has uniform emissions at all frequencies in the visible spectrum.
• Gaussian because it has a normal distribution in the time domain with an average time domain value of zero.
Wideband noise comes from many natural noise, such as the thermal vibrations of atoms in conductors (referred to as thermal noise or Johnson-Nyquist noise), shot noise, black-body radiation from the earth and other warm objects, and from celestial sources such as the Sun. The central limit theorem of probability theory indicates that the summation of many random processes will tend to have distribution called Gaussian or Normal. …

Collaborative Cross Network (CoNet)
The cross-domain recommendation technique is an effective way of alleviating the data sparsity in recommender systems by leveraging the knowledge from relevant domains. Transfer learning is a class of algorithms underlying these techniques. In this paper, we propose a novel transfer learning approach for cross-domain recommendation by using neural networks as the base model. We assume that hidden layers in two base networks are connected by cross mappings, leading to the collaborative cross networks (CoNet). CoNet enables dual knowledge transfer across domains by introducing cross connections from one base network to another and vice versa. CoNet is achieved in multi-layer feedforward networks by adding dual connections and joint loss functions, which can be trained efficiently by back-propagation. The proposed model is evaluated on two real-world datasets and it outperforms baseline models by relative improvements of 3.56\% in MRR and 8.94\% in NDCG, respectively. …

The present paper proposes generalized Gaussian kernel adaptive filtering, where the kernel parameters are adaptive and data-driven. The Gaussian kernel is parametrized by a center vector and a symmetric positive definite (SPD) precision matrix, which is regarded as a generalization of the scalar width parameter. These parameters are adaptively updated on the basis of a proposed least-square-type rule to minimize the estimation error. The main contribution of this paper is to establish update rules for precision matrices on the SPD manifold in order to keep their symmetric positive-definiteness. Different from conventional kernel adaptive filters, the proposed regressor is a superposition of Gaussian kernels with all different parameters, which makes such regressor more flexible. The kernel adaptive filtering algorithm is established together with a l1-regularized least squares to avoid overfitting and the increase of dimensionality of the dictionary. Experimental results confirm the validity of the proposed method. …

ScoutBot
ScoutBot is a dialogue interface to physical and simulated robots that supports collaborative exploration of environments. The demonstration will allow users to issue unconstrained spoken language commands to ScoutBot. ScoutBot will prompt for clarification if the user’s instruction needs additional input. It is trained on human-robot dialogue collected from Wizard-of-Oz experiments, where robot responses were initiated by a human wizard in previous interactions. The demonstration will show a simulated ground robot (Clearpath Jackal) in a simulated environment supported by ROS (Robot Operating System). …

# If you did not already know

Robust Semi-Supervised Adaptive Concept Factorization (RS2ACF)
Constrained Concept Factorization (CCF) yields the enhanced representation ability over CF by incorporating label information as additional constraints, but it cannot classify and group unlabeled data appropriately. Minimizing the difference between the original data and its reconstruction directly can enable CCF to model a small noisy perturbation, but is not robust to gross sparse errors. Besides, CCF cannot preserve the manifold structures in new representation space explicitly, especially in an adaptive manner. In this paper, we propose a joint label prediction based Robust Semi-Supervised Adaptive Concept Factorization (RS2ACF) framework. To obtain robust representation, RS2ACF relaxes the factorization to make it simultaneously stable to small entrywise noise and robust to sparse errors. To enrich prior knowledge to enhance the discrimination, RS2ACF clearly uses class information of labeled data and more importantly propagates it to unlabeled data by jointly learning an explicit label indicator for unlabeled data. By the label indicator, RS2ACF can ensure the unlabeled data of the same predicted label to be mapped into the same class in feature space. Besides, RS2ACF incorporates the joint neighborhood reconstruction error over the new representations and predicted labels of both labeled and unlabeled data, so the manifold structures can be preserved explicitly and adaptively in the representation space and label space at the same time. Owing to the adaptive manner, the tricky process of determining the neighborhood size or kernel width can be avoided. Extensive results on public databases verify that our RS2ACF can deliver state-of-the-art data representation, compared with other related methods. …

Probabilistic Data Structure
Probabilistic data structures are a group of data structures that are extremely useful for big data and streaming applications. Generally speaking, these data structures use hash functions to randomize and compactly represent a set of items. Collisions are ignored but errors can be well-controlled under certain threshold. Comparing with error-free approaches, these algorithms use much less memory and have constant query time. They usually support union and intersection operations and therefore can be easily parallelized.
http://…/Category:Probabilistic_data_structures

SincNet
Deep learning is currently playing a crucial role toward higher levels of artificial intelligence. This paradigm allows neural networks to learn complex and abstract representations, that are progressively obtained by combining simpler ones. Nevertheless, the internal ‘black-box’ representations automatically discovered by current neural architectures often suffer from a lack of interpretability, making of primary interest the study of explainable machine learning techniques. This paper summarizes our recent efforts to develop a more interpretable neural model for directly processing speech from the raw waveform. In particular, we propose SincNet, a novel Convolutional Neural Network (CNN) that encourages the first layer to discover more meaningful filters by exploiting parametrized sinc functions. In contrast to standard CNNs, which learn all the elements of each filter, only low and high cutoff frequencies of band-pass filters are directly learned from data. This inductive bias offers a very compact way to derive a customized filter-bank front-end, that only depends on some parameters with a clear physical meaning. Our experiments, conducted on both speaker and speech recognition, show that the proposed architecture converges faster, performs better, and is more interpretable than standard CNNs. …

Transferlearning Oriented Minority Over-Sampling Technique (TOMO)
Cross-project defect prediction (CPDP) aims to predict defects of projects lacking training data by using prediction models trained on historical defect data from other projects. However, since the distribution differences between datasets from different projects, it is still a challenge to build high-quality CPDP models. Unfortunately, class imbalanced nature of software defect datasets further increases the difficulty. In this paper, we propose a transferlearning oriented minority over-sampling technique (TOMO) based feature weighting transfer naive Bayes (FWTNB) approach (TOMOFWTNB) for CPDP by considering both classimbalance and feature importance problems. Differing from traditional over-sampling techniques, TOMO not only can balance the data but reduce the distribution difference. And then FWTNB is used to further increase the similarity of two distributions. Experiments are performed on 11 public defect datasets. The experimental results show that (1) TOMO improves the average G-Measure by 23.7\%$\sim$41.8\%, and the average MCC by 54.2\%$\sim$77.8\%. (2) feature weighting (FW) strategy improves the average G-Measure by 11\%, and the average MCC by 29.2\%. (3) TOMOFWTNB improves the average G-Measure value by at least 27.8\%, and the average MCC value by at least 71.5\%, compared with existing state-of-theart CPDP approaches. It can be concluded that (1) TOMO is very effective for addressing class-imbalance problem in CPDP scenario; (2) our FW strategy is helpful for CPDP; (3) TOMOFWTNB outperforms previous state-of-the-art CPDP approaches. …