Optimizing and Contrasting Recurrent Neural Network Architectures

Recurrent Neural Networks (RNNs) have long been recognized for their potential to model complex time series. However, it remains to be determined what optimization techniques and recurrent architectures can be used to best realize this potential. The experiments presented take a deep look into Hessian free optimization, a powerful second order optimization method that has shown promising results, but still does not enjoy widespread use. This algorithm was used to train to a number of RNN architectures including standard RNNs, long short-term memory, multiplicative RNNs, and stacked RNNs on the task of character prediction. The insights from these experiments led to the creation of a new multiplicative LSTM hybrid architecture that outperformed both LSTM and multiplicative RNNs. When tested on a larger scale, multiplicative LSTM achieved character level modelling results competitive with the state of the art for RNNs using very different methodology.

Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss

We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data-dimension scales. In particular, we consider the general approach that detects changes by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame for several change-detection methods, its effectiveness when the dimension of data scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data-dimension increases. This structural problem, which we refer to as \emph{detectability loss}, is due to the linear relationship existing between the variance of the log-likelihood and the data dimension, and reveals to be harmful even at low data-dimensions (say, 10). We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on a real-world dataset.

A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas

This report will show the history of deep learning evolves. It will trace back as far as the initial belief of connectionism modelling of brain, and come back to look at its early stage realization: neural networks. With the background of neural network, we will gradually introduce how convolutional neural networks, as a representative of deep discriminative models, is developed from neural networks, together with many practical techniques that can help in optimization of neural networks. On the other hand, we will also trace back to see the evolution history of deep generative models, to see how researchers balance the representation power and computation complexity to reach Restricted Boltzmann Machine and eventually reach Deep Belief Nets. Further, we will also look into the development history of modelling time series data with neural networks. We start with Time Delay Neural Networks and move further to currently famous model named Recurrent Neural Network and its extension Lone Time Short Memory. We will also briefly look into how to construct deep recurrent neural networks. Finally, we will conclude this report with some interesting open-ended questions of deep neural networks.

Multi-dimensional intra-tile parallelization for memory-starved stencil computations

ETH-Hardness for Signaling in Symmetric Zero-Sum Games

Multilevel particle filter

Normalization of Relative and Incomplete Temporal Expressions in Clinical Narratives

Nonparametric estimation of infinitely divisible distributions based on variational analysis on measures

Holographic Embeddings of Knowledge Graphs

Bad Universal Priors and Notions of Optimality

Linear sequential dynamical systems and the Möbius functions of partially ordered sets

Simpler Online Updates for Arbitrary-Order Central Moments

Hybridization of Interval CP and Evolutionary Algorithms for Optimizing Difficult Problems

Robust Partially-Compressed Least-Squares

High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations

How to initialize a second class particle?

Data Allocation in a Heterogeneous Disk Array – HDA with Multiple RAID Levels for Database Applications

Efficient Approaches for Enclosing the United Solution Set of the Interval Generalized Sylvester Matrix Equation

Chvátal-type results for degree sequence Ramsey numbers

How to (Not) Estimate Gini Coefficients for Fat Tailed Variables

Evaluating the Competency of a First-Order Ontology

SGD with Variance Reduction beyond Empirical Risk Minimization

On the emergence of random initial conditions in fluid limits

Improving the Competency of First-Order Ontologies

Scalable MCMC for Mixed Membership Stochastic Blockmodels

On the decomposition of random hypergraphs

Projection predictive input variable selection for Gaussian process models

Quantification in-the-wild: data-sets and baselines

Independent random variables on Abelian groups with independent the sum and difference

Improved Solution to the Non-Domination Level Update Problem

A Graph Traversal Based Approach to Answer Non-Aggregation Questions Over DBpedia

Scaling Limit of Two-component Interacting Brownian Motions

Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations

Semiparametric theory and empirical processes in causal inference

A Method for Modeling Co-Occurrence Propensity of Clinical Codes with Application to ICD-10-PCS Auto-Coding

Efficient Replication of Queued Tasks for Latency Reduction in Cloud Systems

First passage times in homogeneous nucleation: dependence on the total number of particles

Site recurrence for coalescing random walk

Uniform measure density condition and game regularity for tug-of-war games

Multi-Language Image Description with Neural Sequence Models