We present a developmental framework based on a long-term memory and reasoning mechanisms (Vision Similarity and Bayesian Optimisation). This architecture allows a robot to optimize autonomously hyper-parameters that need to be tuned from any action and/or vision module, treated as a black-box. The learning can take advantage of past experiences (stored in the episodic and procedural memories) in order to warm-start the exploration using a set of hyper-parameters previously optimized from objects similar to the new unknown one (stored in a semantic memory). As example, the system has been used to optimized 9 continuous hyper-parameters of a professional software (Kamido) both in simulation and with a real robot (industrial robotic arm Fanuc) with a total of 13 different objects. The robot is able to find a good object-specific optimization in 68 (simulation) or 40 (real) trials. In simulation, we demonstrate the benefit of the transfer learning based on visual similarity, as opposed to an amnesic learning (i.e. learning from scratch all the time). Moreover, with the real robot, we show that the method consistently outperforms the manual optimization from an expert with less than 2 hours of training time to achieve more than 88% of success.
Having the right assortment of shipping boxes in the fulfillment warehouse to pack and ship customer’s online orders is an indispensable and integral part of nowadays eCommerce business, as it will not only help maintain a profitable business but also create great experiences for customers. However, it is an extremely challenging operations task to strategically select the best combination of tens of box sizes from thousands of feasible ones to be responsible for hundreds of thousands of orders daily placed on millions of inventory products. In this paper, we present a machine learning approach to tackle the task by formulating the box design problem prescriptively as a generalized version of weighted $k$-medoids clustering problem, where the parameters are estimated through a variety of descriptive analytics. We test this machine learning approach on fulfillment data collected from Walmart U.S. eCommerce, and our approach is shown to be capable of improving the box utilization rate by more than $10\%$.
Query optimization remains one of the most important and well-studied problems in database systems. However, traditional query optimizers are complex heuristically-driven systems, requiring large amounts of time to tune for a particular database and requiring even more time to develop and maintain in the first place. In this vision paper, we argue that a new type of query optimizer, based on deep reinforcement learning, can drastically improve on the state-of-the-art. We identify potential complications for future research that integrates deep learning with query optimization and describe three novel deep learning based approaches that can lead the way to end-to-end learning-based query optimizers.
Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, ‘Random File Numbers (Uniform)’ is able to predict computational reproducibility with a good accuracy. We also analyze the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speedup reproducibility evaluations substantially, with a reduced accuracy loss.
Gradient control plays an important role in feed-forward networks applied to various computer vision tasks. Previous work has shown that Recurrent Highway Networks minimize the problem of vanishing or exploding gradients. They achieve this by setting the eigenvalues of the temporal Jacobian to 1 across the time steps. In this work, batch normalized recurrent highway networks are proposed to control the gradient flow in an improved way for network convergence. Specifically, the introduced model can be formed by batch normalizing the inputs at each recurrence loop. The proposed model is tested on an image captioning task using MSCOCO dataset. Experimental results indicate that the batch normalized recurrent highway networks converge faster and performs better compared with the traditional LSTM and RHN based models.
Total correlation (TC’) and dual total correlation (DTC’) are two classical measures of correlation for an $n$-tuple of random variables. They both reduce to mutual information when $n=2$. The first part of this paper sets up the theory of TC and DTC for general random variables, not necessarily finite-valued. This generality has not been exposed in the literature before. The second part considers the structural implications when a joint distribution $\mu$ has small TC or DTC. If $\mathrm{TC}(\mu) = o(n)$, then $\mu$ is close in the transportation metric to a product measure: this follows quickly from Marton’s classical transportation-entropy inequality. On the other hand, if $\mathrm{DTC}(\mu) = o(n)$, then the structural consequence is more complicated: $\mu$ is close to a mixture of a controlled number of terms, most of them close to product measures in the transportation metric. This is the main new result of the paper.
We propose a generic numerical measure of the inconsistency of a database with respect to a set of integrity constraints. It is based on an abstract repair semantics. In particular, an inconsistency measure associated to cardinality-repairs is investigated in detail. More specifically, it is shown that it can be computed via answer-set programs, but sometimes its computation can be intractable in data complexity. However, polynomial-time fixed-parameter exact computation, and also deterministic and randomized approximations are exhibited. The behavior of this measure under small updates is analyzed. Furthermore, alternative inconsistency measures are proposed and discussed.
Financial forecasting is challenging and attractive in machine learning. There are many classic solutions, as well as many deep learning based methods, proposed to deal with it yielding encouraging performance. Stock time series forecasting is the most representative problem in financial forecasting. Due to the strong connections among stocks, the information valuable for forecasting is not only included in individual stocks, but also included in the stocks related to them. However, most previous works focus on one single stock, which easily ignore the valuable information in others. To leverage more information, in this paper, we propose a jointly forecasting approach to process multiple time series of related stocks simultaneously, using multi-task learning framework. Compared to the previous works, we use multiple networks to forecast multiple related stocks, using the shared and private information of them simultaneously through multi-task learning. Moreover, we propose an attention method learning an optimized weighted combination of shared and private information based on the idea of Capital Asset Pricing Model (CAPM) to help forecast. Experimental results on various data show improved forecasting performance over baseline methods.
Views are known mechanisms for controlling access of data and for sharing data of different schemas. Despite long and intensive research on views in both the database community and the programming language community, we are facing difficulties to use views in practice. The main reason is that we lack ways to directly describe view update strategies to deal with the inherent ambiguity of view updating. This paper aims to provide a new language-based approach to controlling and sharing distributed data based on views, and establish a software foundation for systematic construction of such data management systems. Our key observation is that a view should be defined through a view update strategy rather than a view definition. We show that Datalog can be used for specifying view update strategies whose unique view definition can be automatically derived, present a novel P2P-based programmable architecture for distributed data management where updatable views are fully utilized for controlling and sharing distributed data, and demonstrate its usefulness through the development of a privacy-preserving ride-sharing alliance system.
Online class imbalance learning constitutes a new problem and an emerging research topic that focusses on the challenges of online learning under class imbalance and concept drift. Class imbalance deals with data streams that have very skewed distributions while concept drift deals with changes in the class imbalance status. Little work exists that addresses these challenges and in this paper we introduce queue-based resampling, a novel algorithm that successfully addresses the co-existence of class imbalance and concept drift. The central idea of the proposed resampling algorithm is to selectively include in the training set a subset of the examples that appeared in the past. Results on two popular benchmark datasets demonstrate the effectiveness of queue-based resampling over state-of-the-art methods in terms of learning speed and quality.
Medical knowledge graph is the core component for various medical applications such as automatic diagnosis and question-answering. However, medical knowledge usually associates with certain conditions, which can significantly affect the performance of the supported applications. In the light of this challenge, we propose a new truth discovery method to explore medical-related texts and infer trustworthiness degrees of knowledge triples associating with different conditions. Experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed truth discovery method.
This paper proposes a deep learning architecture that attains statistically significant improvements over traditional algorithms in Poisson image denoising espically when the noise is strong. Poisson noise commonly occurs in low-light and photon- limited settings, where the noise can be most accurately modeled by the Poission distribution. Poisson noise traditionally prevails only in specific fields such as astronomical imaging. However, with the booming market of surveillance cameras, which commonly operate in low-light environments, or mobile phones, which produce noisy night scene pictures due to lower-grade sensors, the necessity for an advanced Poisson image denoising algorithm has increased. Deep learning has achieved amazing breakthroughs in other imaging problems, such image segmentation and recognition, and this paper proposes a deep learning denoising network that outperforms traditional algorithms in Poisson denoising especially when the noise is strong. The architecture incorporates a hybrid of convolutional and deconvolutional layers along with symmetric connections. The denoising network achieved statistically significant 0.38dB, 0.68dB, and 1.04dB average PSNR gains over benchmark traditional algorithms in experiments with image peak values 4, 2, and 1. The denoising network can also operate with shorter computational time while still outperforming the benchmark algorithm by tuning the reconstruction stride sizes.
The connection between inequalities in additive combinatorics and analogous versions in terms of the entropy of random variables has been extensively explored over the past few years. This paper extends a device introduced by Ruzsa in his seminal work introducing this correspondence. This extension provides a toolbox for establishing the equivalence between sumset inequalities and their entropic versions. It supplies simpler proofs of known results and opens a path for obtaining new ones. Some new examples in nonabelian groups illustrate the power of the device.
We consider a natural generalization of scheduling $n$ jobs on $m$ parallel machines so as to minimize the makespan. In our extension the set of jobs is partitioned into several classes and a machine requires a setup whenever it switches from processing jobs of one class to jobs of a different class. During such a setup, a machine cannot process jobs and the duration of a setup may depend on the machine as well as the class of the job to be processed next. For this problem, we study approximation algorithms for non-identical machines. We develop a polynomial-time approximation scheme for uniformly related machines. For unrelated machines we obtain an $O(\log n + \log m)$-approximation, which we show to be optimal (up to constant factors) unless $NP \subset RP$. We also identify two special cases that admit constant factor approximations.
We present a second-order language that can be used to succinctly specify ontologies in a consistent and transparent manner. This language is based on ontology templates (OTTR), a framework for capturing recurring patterns of axioms in ontological modelling. The language and our results are independent of any specific DL. We define the language and its semantics, including the case of negation-as-failure, investigate reasoning over ontologies specified using our language, and show results about the decidability of useful reasoning tasks about the language itself. We also state and discuss some open problems that we believe to be of interest.
Pearson’s $\rho$ is the most used measure of statistical dependence. It gives a complete characterization of dependence in the Gaussian case, and it also works well in some non-Gaussian situations. It is well known, however, that it has a number of shortcomings; in particular for heavy tailed distributions and in nonlinear situations, where it may produce misleading, and even disastrous results. In recent years a number of alternatives have been proposed. In this paper, we will survey these developments, especially results obtained in the last couple of decades. Among measures discussed are the copula, distribution-based measures, the distance covariance, the HSIC measure popular in machine learning, and finally the local Gaussian correlation, which is a local version of Pearson’s $\rho$. Throughout we put the emphasis on conceptual developments and a comparison of these. We point out relevant references to technical details as well as comparative empirical and simulated experiments. There is a broad selection of references under each topic treated.
We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers. We introduce and benchmark three strategies: (i) learning the speaker embedding while keeping the WaveNet core fixed, (ii) fine-tuning the entire architecture with stochastic gradient descent, and (iii) predicting the speaker embedding with a trained neural network encoder. The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity with merely a few minutes of audio data from new speakers.
Convolutional neural networks have achieved astonishing results in different application areas. Various methods which allow us to use these models on mobile and embedded devices have been proposed. Especially binary neural networks seem to be a promising approach for these devices with low computational power. However, understanding binary neural networks and training accurate models for practical applications remains a challenge. In our work, we focus on increasing our understanding of the training process and making it accessible to everyone. We publish our code and models based on BMXNet for everyone to use. Within this framework, we systematically evaluated different network architectures and hyperparameters to provide useful insights on how to train a binary neural network. Further, we present how we improved accuracy by increasing the number of connections in the network.
Quantization of weights and activations in Deep Neural Networks (DNNs) is a powerful technique for network compression, and has enjoyed significant attention and success. However, much of the inference-time benefit of quantization is accessible only through the use of customized hardware accelerators or by providing an FPGA implementation of quantized arithmetic. Building on prior work, we show how to construct arbitrary bit-precise signed and unsigned integer operations using a software technique which logically \emph{embeds} a vector architecture with custom bit-width lanes in universally available fixed-width scalar arithmetic. We evaluate our approach on a high-end Intel Haswell processor, and an embedded ARM processor. Our approach yields very fast implementations of bit-precise custom DNN operations, which often match or exceed the performance of operations quantized to the sizes supported in native arithmetic. At the strongest level of quantization, our approach yields a maximum speedup of $\thicksim6\times$ on the Intel platform, and $\thicksim10\times$ on the ARM platform versus quantization to native 8-bit integers.