Characterizing the neural encoding of behavior remains a challenging task in many research areas due in part to complex and noisy spatiotemporal dynamics of evoked brain activity. An important aspect of modeling these neural encodings involves separation of robust, behaviorally relevant signals from background activity, which often contains signals from irrelevant brain processes and decaying information from previous behavioral events. To achieve this separation, we develop a two-branch State Space Variational AutoEncoder (SSVAE) model to individually describe the instantaneous evoked foreground signals and the context-dependent background signals. We modeled the spontaneous speech-evoked brain dynamics using smoothed Gaussian mixture models. By applying the proposed SSVAE model to track ECoG dynamics in one participant over multiple hours, we find that the model can predict speech-related dynamics more accurately than other latent factor inference algorithms. Our results demonstrate that separately modeling the instantaneous speech-evoked and slow context-dependent brain dynamics can enhance tracking performance, which has important implications for the development of advanced neural encoding and decoding models in various neuroscience sub-disciplines.
Many challenging image processing tasks can be described by an ill-posed linear inverse problem: deblurring, deconvolution, inpainting, compressed sensing, and superresolution all lie in this framework. Traditional inverse problem solvers minimize a cost function consisting of a data-fit term, which measures how well an image matches the observations, and a regularizer, which reflects prior knowledge and promotes images with desirable properties like smoothness. Recent advances in machine learning and image processing have illustrated that it is often possible to learn a regularizer from training data that can outperform more traditional regularizers. We present an end-to-end, data-driven method of solving inverse problems inspired by the Neumann series, which we call a Neumann network. Rather than unroll an iterative optimization algorithm, we truncate a Neumann series which directly solves the linear inverse problem with a data-driven nonlinear regularizer. The Neumann network architecture outperforms traditional inverse problem solution methods, model-free deep learning approaches, and state-of-the-art unrolled iterative methods on standard datasets. Finally, when the images belong to a union of subspaces and under appropriate assumptions on the forward model, we prove there exists a Neumann network configuration that well-approximates the optimal oracle estimator for the inverse problem and demonstrate empirically that the trained Neumann network has the form predicted by theory.
Generating accurate and reliable sales forecasts is crucial in the E-commerce business. The current state-of-the-art techniques are typically univariate methods, which produce forecasts considering only the historical sales data of a single product. However, in a situation where large quantities of related time series are available, conditioning the forecast of an individual time series on past behaviour of similar, related time series can be beneficial. Given that the product assortment hierarchy in an E-commerce platform contains large numbers of related products, in which the sales demand patterns can be correlated, our attempt is to incorporate this cross-series information in a unified model. We achieve this by globally training a Long Short-Term Memory network (LSTM) that exploits the nonlinear demand relationships available in an E-commerce product assortment hierarchy. Aside from the forecasting engine, we propose a systematic pre-processing framework to overcome the challenges in an E-commerce setting. We also introduce several product grouping strategies to supplement the LSTM learning schemes, in situations where sales patterns in a product portfolio are disparate. We empirically evaluate the proposed forecasting framework on a real-world online marketplace dataset from Walmart. com. Our method achieves competitive results on category level and super-departmental level datasets, outperforming state-of-the-art techniques.
Dynamic affinity load balancing of multi-type tasks on multi-skilled servers, when the service rate of each task type on each of the servers is known and can possibly be different from each other, is an open problem for over three decades. The goal is to do task assignment on servers in a real time manner so that the system becomes stable, which means that the queue lengths do not diverge to infinity in steady state (throughput optimality), and the mean task completion time is minimized (delay optimality). The fluid model planning, Max-Weight, and c-$\mu$-rule algorithms have theoretical guarantees on optimality in some aspects for the affinity problem, but they consider a complicated queueing structure and either require the task arrival rates, the service rates of tasks on servers, or both. In many cases that are discussed in the introduction section, both task arrival rates and service rates of different task types on different servers are unknown. In this work, the Blind GB-PANDAS algorithm is proposed which is completely blind to task arrival rates and service rates. Blind GB-PANDAS uses an exploration-exploitation approach for load balancing. We prove that Blind GB-PANDAS is throughput optimal under arbitrary and unknown distributions for service times of different task types on different servers and unknown task arrival rates. Blind GB-PANDAS desires to route an incoming task to the server with the minimum weighted-workload, but since the service rates are unknown, such routing of incoming tasks is not guaranteed which makes the throughput optimality analysis more complicated than the case where service rates are known. Our extensive experimental results reveal that Blind GB-PANDAS significantly outperforms existing methods in terms of mean task completion time at high loads.
A feature selection algorithm should ideally satisfy four conditions: reliably extract relevant features; be able to identify non-linear feature interactions; scale linearly with the number of features and dimensions; allow the incorporation of known sparsity structure. In this work we propose a novel feature selection algorithm, Gradient Boosted Feature Selection (GBFS), which satisfies all four of these requirements. The algorithm is flexible, scalable, and surprisingly straight-forward to implement as it is based on a modification of Gradient Boosted Trees. We evaluate GBFS on several real world data sets and show that it matches or out-performs other state of the art feature selection algorithms. Yet it scales to larger data set sizes and naturally allows for domain-specific side information.
As machine learning transitions increasingly towards real world applications controlling the test-time cost of algorithms becomes more and more crucial. Recent work, such as the Greedy Miser and Speedboost, incorporate test-time budget constraints into the training procedure and learn classifiers that provably stay within budget (in expectation). However, so far, these algorithms are limited to the supervised learning scenario where sufficient amounts of labeled data are available. In this paper we investigate the common scenario where labeled data is scarce but unlabeled data is available in abundance. We propose an algorithm that leverages the unlabeled data (through Laplace smoothing) and learns classifiers with budget constraints. Our model, based on gradient boosted regression trees (GBRT), is, to our knowledge, the first algorithm for semi-supervised budgeted learning.
Traditional network embedding primarily focuses on learning a dense vector representation for each node, which encodes network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned dense vector representations are inefficient for large-scale similarity search, which requires to find the nearest neighbor measured by Euclidean distance in a continuous vector space. In this paper, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a sparse binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations efficiently through a stochastic gradient descent based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support much quicker network node search compared to Euclidean distance or other distance measures. Our experiments and comparisons show that BinaryNE not only delivers more than 23 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods.
Deep learning is very effective at jointly learning feature representations and classification models, especially when dealing with high dimensional input patterns. Probabilistic logic reasoning, on the other hand, is capable to take consistent and robust decisions in complex environments. The integration of deep learning and logic reasoning is still an open-research problem and it is considered to be the key for the development of real intelligent agents. This paper presents Deep Logic Models, which are deep graphical models integrating deep learning and logic reasoning both for learning and inference. Deep Logic Models create an end-to-end differentiable architecture, where deep learners are embedded into a network implementing a continuous relaxation of the logic knowledge. The learning process allows to jointly learn the weights of the deep learners and the meta-parameters controlling the high-level reasoning. The experimental results show that the proposed methodology overtakes the limitations of the other approaches that have been proposed to bridge deep learning and reasoning.
Control systems involving unknown parameters appear a natural framework for applications in which the model design has to take into account various uncertainties. In these circumstances the performance criterion can be given in terms of an average cost, providing a paradigm which differs from the more traditional minimax or robust optimization criteria. In this paper, we provide necessary optimality conditions for a nonrestrictive class of optimal control problems in which unknown parameters intervene in the dynamics, the cost function and the right end-point constraint. An important feature of our results is that we allow the unknown parameters belonging to a mere complete separable metric space (not necessarily compact).
Software systems often leverage on open source software libraries to reuse functionalities. Such libraries are readily available through software package managers like npm for JavaScript. Due to the huge amount of packages available in such package distributions, developers often decide to rely on or contribute to a software package based on its popularity. Moreover, it is a common practice for researchers to depend on popularity metrics for data sampling and choosing the right candidates for their studies. However, the meaning of popularity is relative and can be defined and measured in a diversity of ways, that might produce different outcomes even when considered for the same studies. In this paper, we show evidence of how different is the meaning of popularity in software engineering research. Moreover, we empirically analyse the relationship between different software popularity measures. As a case study, for a large dataset of 175k npm packages, we computed and extracted 9 different popularity metrics from three open source tracking systems: libraries.io, npmjs.com and GitHub. We found that indeed popularity can be measured with different unrelated metrics, each metric can be defined within a specific context. This indicates a need for a generic framework that would use a portfolio of popularity metrics drawing from different concepts.
In many problem settings, most notably in game playing, an agent receives a possibly delayed reward for its actions. Often, those rewards are handcrafted and not naturally given. Even simple terminal-only rewards, like winning equals 1 and losing equals -1, can not be seen as an unbiased statement, since these values are chosen arbitrarily, and the behavior of the learner may change with different encodings, such as setting the value of a loss to -0:5, which is often done in practice to encourage learning. It is hard to argue about good rewards and the performance of an agent often depends on the design of the reward signal. In particular, in domains where states by nature only have an ordinal ranking and where meaningful distance information between game state values are not available, a numerical reward signal is necessarily biased. In this paper, we take a look at Monte Carlo Tree Search (MCTS), a popular algorithm to solve MDPs, highlight a reoccurring problem concerning its use of rewards, and show that an ordinal treatment of the rewards overcomes this problem. Using the General Video Game Playing framework we show a dominance of our newly proposed ordinal MCTS algorithm over preference-based MCTS, vanilla MCTS and various other MCTS variants.
During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning. However Deep Learning-based algorithms require amounts of data that are often difficult and costly to gather. Particularly, modeling the variability in speech of different speakers, different styles or different emotions with few data remains challenging. In this paper, we investigate how to leverage fine-tuning on a pre-trained Deep Learning-based TTS model to synthesize speech with a small dataset of another speaker. Then we investigate the possibility to adapt this model to have emotional TTS by fine-tuning the neutral TTS model with a small emotional dataset.
The analysis of natural disaster-related multimedia content got great attention in recent years. Being one of the most important sources of information, social media have been crawled over the years to collect and analyze disaster-related multimedia content. Satellite imagery has also been widely explored for disasters analysis. In this paper, we survey the existing literature on disaster detection and analysis of the retrieved information from social media and satellites. Literature on disaster detection and analysis of related multimedia content on the basis of the nature of the content can be categorized into three groups, namely (i) disaster detection in text; (ii) analysis of disaster-related visual content from social media; and (iii) disaster detection in satellite imagery. We extensively review different approaches proposed in these three domains. Furthermore, we also review benchmarking datasets available for the evaluation of disaster detection frameworks. Moreover, we provide a detailed discussion on the insights obtained from the literature review, and identify future trends and challenges, which will provide an important starting point for the researchers in the field.
It is well understood that Bayesian decision theory and average case analysis are essentially identical. However, if one is interested in performing uncertainty quantification for a numerical task, it can be argued that the decision-theoretic framework is neither appropriate nor sufficient. To this end, we consider an alternative optimality criterion from Bayesian experimental design and study its implied optimal information in the numerical context. This information is demonstrated to differ, in general, from the information that would be used in an average-case-optimal numerical method. The explicit connection to Bayesian experimental design suggests several distinct regimes in which optimal probabilistic numerical methods can be developed.
This article presents the Quotient Hash Table (QHT) a new data structure for duplicate detection in unbounded streams. QHTs stem from a corrected analysis of streaming quotient filters (SQFs), resulting in a 33\% reduction in memory usage for equal performance. We provide a new and thorough analysis of both algorithms, with results of interest to other existing constructions. We also introduce an optimised version of our new data structure dubbed Queued QHT with Duplicates (QQHTD). Finally we discuss the effect of adversarial inputs for hash-based duplicate filters similar to QHT.
Spiking neural networks (SNNs) equipped with latency coding and spike-timing dependent plasticity rules offer an alternative to solve the data and energy bottlenecks of standard computer vision approaches: they can learn visual features without supervision and can be implemented by ultra-low power hardware architectures. However, their performance in image classification has never been evaluated on recent image datasets. In this paper, we compare SNNs to auto-encoders on three visual recognition datasets, and extend the use of SNNs to color images. Results show that SNNs are not competitive yet with traditional feature learning approaches, especially for color features. Further analyses of the results allow us to identify some of the bottlenecks of SNNs and provide specific directions towards improving their performance on vision tasks.