• **Optimizing and Contrasting Recurrent Neural Network Architectures**

Recurrent Neural Networks (RNNs) have long been recognized for their potential to model complex time series. However, it remains to be determined what optimization techniques and recurrent architectures can be used to best realize this potential. The experiments presented take a deep look into Hessian free optimization, a powerful second order optimization method that has shown promising results, but still does not enjoy widespread use. This algorithm was used to train to a number of RNN architectures including standard RNNs, long short-term memory, multiplicative RNNs, and stacked RNNs on the task of character prediction. The insights from these experiments led to the creation of a new multiplicative LSTM hybrid architecture that outperformed both LSTM and multiplicative RNNs. When tested on a larger scale, multiplicative LSTM achieved character level modelling results competitive with the state of the art for RNNs using very different methodology.

• **Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss**

We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data-dimension scales. In particular, we consider the general approach that detects changes by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame for several change-detection methods, its effectiveness when the dimension of data scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data-dimension increases. This structural problem, which we refer to as \emph{detectability loss}, is due to the linear relationship existing between the variance of the log-likelihood and the data dimension, and reveals to be harmful even at low data-dimensions (say, 10). We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on a real-world dataset.

• **A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas**

This report will show the history of deep learning evolves. It will trace back as far as the initial belief of connectionism modelling of brain, and come back to look at its early stage realization: neural networks. With the background of neural network, we will gradually introduce how convolutional neural networks, as a representative of deep discriminative models, is developed from neural networks, together with many practical techniques that can help in optimization of neural networks. On the other hand, we will also trace back to see the evolution history of deep generative models, to see how researchers balance the representation power and computation complexity to reach Restricted Boltzmann Machine and eventually reach Deep Belief Nets. Further, we will also look into the development history of modelling time series data with neural networks. We start with Time Delay Neural Networks and move further to currently famous model named Recurrent Neural Network and its extension Lone Time Short Memory. We will also briefly look into how to construct deep recurrent neural networks. Finally, we will conclude this report with some interesting open-ended questions of deep neural networks.

• Multi-dimensional intra-tile parallelization for memory-starved stencil computations

• ETH-Hardness for Signaling in Symmetric Zero-Sum Games

• Multilevel particle filter

• Normalization of Relative and Incomplete Temporal Expressions in Clinical Narratives

• Nonparametric estimation of infinitely divisible distributions based on variational analysis on measures

• Holographic Embeddings of Knowledge Graphs

• Bad Universal Priors and Notions of Optimality

• Linear sequential dynamical systems and the Möbius functions of partially ordered sets

• Simpler Online Updates for Arbitrary-Order Central Moments

• Hybridization of Interval CP and Evolutionary Algorithms for Optimizing Difficult Problems

• Robust Partially-Compressed Least-Squares

• High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations

• How to initialize a second class particle?

• Data Allocation in a Heterogeneous Disk Array – HDA with Multiple RAID Levels for Database Applications

• Efficient Approaches for Enclosing the United Solution Set of the Interval Generalized Sylvester Matrix Equation

• Chvátal-type results for degree sequence Ramsey numbers

• How to (Not) Estimate Gini Coefficients for Fat Tailed Variables

• Evaluating the Competency of a First-Order Ontology

• SGD with Variance Reduction beyond Empirical Risk Minimization

• On the emergence of random initial conditions in fluid limits

• Improving the Competency of First-Order Ontologies

• Scalable MCMC for Mixed Membership Stochastic Blockmodels

• On the decomposition of random hypergraphs

• Projection predictive input variable selection for Gaussian process models

• Quantification in-the-wild: data-sets and baselines

• Independent random variables on Abelian groups with independent the sum and difference

• Improved Solution to the Non-Domination Level Update Problem

• A Graph Traversal Based Approach to Answer Non-Aggregation Questions Over DBpedia

• Scaling Limit of Two-component Interacting Brownian Motions

• Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations

• Semiparametric theory and empirical processes in causal inference

• A Method for Modeling Co-Occurrence Propensity of Clinical Codes with Application to ICD-10-PCS Auto-Coding

• Efficient Replication of Queued Tasks for Latency Reduction in Cloud Systems

• First passage times in homogeneous nucleation: dependence on the total number of particles

• Site recurrence for coalescing random walk

• Uniform measure density condition and game regularity for tug-of-war games

• Multi-Language Image Description with Neural Sequence Models

### Like this:

Like Loading...