Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach

Collaborative Filtering (CF) is widely used in large-scale recommendation engines because of its efficiency, accuracy and scalability. However, in practice, the fact that recommendation engines based on CF require interactions between users and items before making recommendations, make it inappropriate for new items which haven’t been exposed to the end users to interact with. This is known as the cold-start problem. In this paper we introduce a novel approach which employs deep learning to tackle this problem in any CF based recommendation engine. One of the most important features of the proposed technique is the fact that it can be applied on top of any existing CF based recommendation engine without changing the CF core. We successfully applied this technique to overcome the item cold-start problem in Careerbuilder’s CF based recommendation engine. Our experiments show that the proposed technique is very efficient to resolve the cold-start problem while maintaining high accuracy of the CF recommendations.

Algebraic multigrid support vector machines

The support vector machine is a flexible optimization-based technique widely used for classification problems. In practice, its training part becomes computationally expensive on large-scale data sets because of such reasons as the complexity and number of iterations in parameter fitting methods, underlying optimization solvers, and nonlinearity of kernels. We introduce a fast multilevel framework for solving support vector machine models that is inspired by the algebraic multigrid. Significant improvement in the running has been achieved without any loss in the quality. The proposed technique is highly beneficial on imbalanced sets. We demonstrate computational results on publicly available and industrial data sets.

Fog Computing: A Taxonomy, Survey and Future Directions

In recent years, the number of Internet of Things (IoT) devices/sensors has increased to a great extent. To support the computational demand of real-time latency-sensitive applications of largely geo-distributed IoT devices/sensors, a new computing paradigm named ‘Fog computing’ has been introduced. Generally, Fog computing resides closer to the IoT devices/sensors and extends the Cloud-based computing, storage and networking facilities. In this chapter, we comprehensively analyse the challenges in Fogs acting as an intermediate layer between IoT devices/ sensors and Cloud datacentres and review the current developments in this field. We present a taxonomy of Fog computing according to the identified challenges and its key features.We also map the existing works to the taxonomy in order to identify current research gaps in the area of Fog computing. Moreover, based on the observations, we propose future directions for research.

Stochastic Gradient Descent in Continuous Time

We consider stochastic gradient descent for continuous-time models. Traditional approaches for the statistical estimation of continuous-time models, such as batch optimization, can be impractical for large datasets where observations occur over a long period of time. Stochastic gradient descent provides a computationally efficient method for such statistical estimation problems. The stochastic gradient descent algorithm performs an online parameter update in continuous time, with the parameter updates satisfying a stochastic differential equation. The parameters are proven to converge to a local minimum of a natural objective function for the estimation of the continuous-time dynamics. The convergence proof leverages ergodicity by using an appropriate Poisson equation to help describe the evolution of the parameters for large times. Numerical analysis of the stochastic gradient descent algorithm is presented for several examples, including the Ornstein-Uhlenbeck process, Burger’s stochastic partial differential equation, and reinforcement learning.

$e$PCA: High Dimensional Exponential Family PCA

Many applications involve large collections of high-dimensional datapoints with noisy entries from exponential family distributions. It is of interest to estimate the covariance and principal components of the noiseless distribution. In photon-limited imaging (e.g. XFEL) we want to estimate the covariance of the pixel intensities of 2-D images, where the pixels are low-intensity Poisson variables. In genomics we want to estimate population structure from biallelic—Binomial(2)—genetic markers such as Single Nucleotide Polymorphisms (SNPs). A standard method for this is Principal Component Analysis (PCA). However, PCA loses some of its optimality properties for non-Gaussian distributions and can be inefficient when applied directly. We develop ePCA (exponential family PCA), a methodology for PCA on exponential family distributions. ePCA can be used for dimensionality reduction and denoising of large data matrices. It involves the eigendecomposition of a new covariance matrix estimator, and is as fast as PCA. It is suitable for datasets with multiple types of variables. The first step of ePCA is a diagonal debiasing of the sample covariance matrix. We obtain the convergence rate for covariance matrix estimation, and the Marchenko-Pastur law in high dimensions. Another key step of ePCA is whitening, a specific variable weighting. For SNPs, this recovers the widely used Hardy-Weinberg equilibrium (HWE) normalization. We show that whitening improves the signal strength, providing justification for HWE normalization. ePCA outperforms PCA in simulations as well as in XFEL and SNP data analysis.

PolyNet: A Pursuit of Structural Diversity in Very Deep Networks

A number of studies have shown that increasing the depth or width of convolutional networks is a rewarding approach to improve the performance of image recognition. In our study, however, we observed difficulties along both directions. On one hand, the pursuit for very deep networks are met with diminishing return and increased training difficulty; on the other hand, widening a network would result in a quadratic growth in both computational cost and memory demand. These difficulties motivate us to explore structural diversity in designing deep networks, a new dimension beyond just depth and width. Specifically, we present a new family of modules, namely the PolyInception, which can be flexibly inserted in isolation or in a composition as replacements of different parts of a network. Choosing PolyInception modules with the guidance of architectural efficiency can improve the expressive power while preserving comparable computational cost. A benchmark on the ILSVRC 2012 validation set demonstrates substantial improvements over the state-of-the-art. Compared to Inception-ResNet-v2, it reduces the top-5 error on single crops from 4.9% to 4.25%, and that on multi-crops from 3.7% to 3.45%.

A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival

Cancer survival prediction is an active area of research that can help prevent unnecessary therapies and improve patient’s quality of life. Gene expression profiling is being widely used in cancer studies to discover informative biomarkers that aid predict different clinical endpoint prediction. We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq) to predict survival of cancer patients. Despite the wealth of information available in expression profiles of cancer tumors, fulfilling the aforementioned objective remains a big challenge, for the most part, due to the paucity of data samples compared to the high dimension of the expression profiles. As such, analysis of transcriptomic data modalities calls for state-of-the-art big-data analytics techniques that can maximally use all the available data to discover the relevant information hidden within a significant amount of noise. In this paper, we propose a pipeline that predicts cancer patients’ survival by exploiting the structure of the input (manifold learning) and by leveraging the unlabeled samples using Laplacian support vector machines, a graph-based semi supervised learning (GSSL) paradigm. We show that under certain circumstances, no single modality per se will result in the best accuracy and by fusing different models together via a stacked generalization strategy, we may boost the accuracy synergistically. We apply our approach to two cancer datasets and present promising results. We maintain that a similar pipeline can be used for predictive tasks where labeled samples are expensive to acquire.

Learning to reinforcement learn

In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience.

What Do Recurrent Neural Network Grammars Learn About Syntax?

Recurrent neural network grammars (RNNG) are a recently proposed probabilistic generative modeling family for natural language. They show state-of-the-art language modeling and parsing performance. We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection. We find that explicit modeling of composition is crucial for achieving the best performance. Through the attention mechanism, we find that headedness plays a central role in phrasal representation (with the model’s latent attention largely agreeing with predictions made by hand-crafted rules, albeit with some important differences). By training grammars without non-terminal labels, we find that phrasal representations depend minimally on non-terminals, providing support for the endocentricity hypothesis.

Designing and Training Feedforward Neural Networks: A Smooth Optimisation Perspective

Despite the recent great success of deep neural networks in various applications, designing and training a deep neural network is still among the greatest challenges in the field. In this work, we present a smooth optimisation perspective on designing and training multilayer Feedforward Neural Networks (FNNs) in the supervised learning setting. By characterising the critical point conditions of an FNN based optimisation problem, we identify the conditions to eliminate local optima of the corresponding cost function. Moreover, by studying the Hessian structure of the cost function at the global minima, we develop an approximate Newton FNN algorithm, which is capable of alleviating the vanishing gradient problem. Finally, our results are numerically verified on two classic benchmarks, i.e., the XOR problem and the four region classification problem.

Common Reconstructions in the Successive Refinement Problem with Receiver Side Information

Embedding Projector: Interactive Visualization and Interpretation of Embeddings

The Bayesian Formulation and Well-Posedness of Fractional Elliptic Inverse Problems

Self-calibration-based Approach to Critical Motion Sequences of Rolling-shutter Structure from Motion

Probabilistic Fluorescence-Based Synapse Detection

A new estimate on complexity of binary generalized pseudostandard words

A Hybrid Quasi-Newton Projected-Gradient Method with Application to Lasso and Basis-Pursuit Denoise

Semantic Regularisation for Recurrent Image Annotation

Explicable Robot Planning as Minimizing Distance from Expected Behavior

Enumeration of Commuting Pairs in Lie Algebras over Finite Fields

On the Exploration of Convolutional Fusion Networks for Visual Recognition

Deep Feature Interpolation for Image Content Changes

Bayesian Semiparametric Mixed Effects Markov Chains

Lifting linear preferential attachment trees yields the arcsine coalescent

Deep Action- and Context-Aware Sequence Learning for Activity Recognition and Anticipation

Robust Hashing for Multi-View Data: Jointly Learning Low-Rank Kernelized Similarity Consensus and Hash Functions

Automatic Node Selection for Deep Neural Networks using Group Lasso Regularization

An Improved Integrality Gap for the Calinescu-Karloff-Rabani Relaxation for Multiway Cut

Viscosity Solutions to Path-Dependent HJB Equation and Applications

Duplication Distance to the Root for Binary Sequences

Distributed Continuous-time Nonsmooth Optimization with Coupled Nonlinear Inequality Constraints

Zero-Shot Visual Question Answering

Multiple Access Technologies for cellular M2M Communications: An Overview

Parallel multiple selection by regular sampling

DelugeNets: Deep Networks with Massive and Flexible Cross-layer Information Inflows

Probabilistic Rank and Matrix Rigidity

Boosting Variational Inference

A Fast and Provable Method for Estimating Clique Counts Using Turán’s Theorem

On the extreme values of the Riemann zeta function on random intervals of the critical line

A Note on Quantum-Secure PRPs

Random matrix approach to estimation of high-dimensional factor models

On the central role of the scale invariant Poisson processes on (0,infty)

Instance-aware Image and Sentence Matching with Selective Multimodal LSTM

Convex Optimization of Distributed Cooperative Detection in Multi-Receiver Molecular Communication

Multimodal Memory Modelling for Video Captioning

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

Decoupled Signal Detection for the Uplink of Large-Scale MIMO Systems in Heterogeneous Networks

Bayesian inference for multivariate extreme value distributions

Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization

Approximate Negative-Binomial Confidence Intervals: Asbestos Fiber Counts

Optical Flow Requires Multiple Strategies (but only one network)

Self-Stabilizing Maximal Matching and Anonymous Networks

Asymptotic expansions of some Toeplitz determinants via the topological recursion

Minor complexities of finite operations

Fictitious play for cooperative action selection in robot teams

Engineering electronic states of periodic and quasiperiodic chains by buckling

Stream Packing for Asynchronous Multi-Context Systems using ASP

Computational tameness of classical non-causal models

Inverting The Generator Of A Generative Adversarial Network

Online and Dynamic Algorithms for Set Cover

Existence and Non-Existence Results for Strong External Difference Families

Iterative Channel Estimation Using LSE and Sparse Message Passing for MmWave MIMO Systems

A characterization of $\mathbb{Z}_2\mathbb{Z}_2[u]$-linear codes

Swap Equilibria under Link and Vertex Destruction

The Matrix Chain Algorithm to Compile Linear Algebra Expressions

Learning to detect and localize many objects from few examples

A Discriminatively Learned CNN Embedding for Person Re-identification

On Affine Invariant $L_p$ Depth Classifiers based on an Adaptive Choice of $p$

Shape Optimization Using the Cut Finite Element Method

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

Asymptotic shape optimization for Riesz means of the Dirichlet Laplacian over convex domains

End-to-end Learning of Cost-Volume Aggregation for Real-time Dense Stereo

DSAC – Differentiable RANSAC for Camera Localization

Fusing 2D Uncertainty and 3D Cues for Monocular Body Pose Estimation

Factorized Bilinear Models for Image Recognition

Localization Lifetime of a Many-Body System with Periodic Constructed Disorder

Hard-Aware Deeply Cascaded Embedding

GENESIM: genetic extraction of a single, interpretable model

Achievable Uplink Rates for Massive MIMO with Coarse Quantization

Unimodal Thompson Sampling for Graph-Structured Arms

Component structure of the configuration model: barely supercritical case

Optimal Dynamic Coverage Infrastructure for Large-Scale Fleets of Reconnaissance UAVs

Fast Non-Parametric Tests of Relative Dependency and Similarity

Building Deep Networks on Grassmann Manifolds

Compensating for Large In-Plane Rotations in Natural Images

Note on k-planar crossing numbers

Maximizing a Submodular Function with Viability Constraints

Separating quantum communication and approximate rank

Cross-Domain Face Verification: Matching ID Document and Self-Portrait Photographs

Examining the Impact of Blur on Recognition by Convolutional Networks

Heat kernels for non-symmetric diffusion operators with jumps

Distinct spreads in vector spaces over finite fields

Phase transition in random distance graphs on the torus

Reflections at infinity of time changed RBMs on a domain with Liouville branches

Explicit (Polynomial!) Expressions for the Expectation, Variance and Higher Moments of the Size of a (2n + 1, 2n + 3)-core partition with Distinct Parts

DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins

Towards the Modeling of Behavioral Trajectories of Users in Online Social Media

Gap Safe screening rules for sparsity enforcing penalties

Finite reflection groups and graph norms

How Lock-free Data Structures Perform in Dynamic Environments: Models and Analyses

Imprecise Continuous-Time Markov Chains

The Freiburg Groceries Dataset

Splitting schemes for unsteady problems involving the grad-div operator

Filling the gaps: Gaussian mixture models from noisy, truncated or incomplete samples

Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance

Qualitative parameter estimation for a class of relaxation oscillators

A Socio-geographic Perspective on Human Activities in Social Media

AutoScaler: Scale-Attention Networks for Visual Correspondence

A smooth transition from Wishart to GOE

Maximizing the minimum achievable secrecy rate of two-way relay networks using the null space beamforming method

Video Processing from Electro-optical Sensors for Object Detection and Tracking in Maritime Environment: A Survey

Contributed Discussion to Bayesian Solution Uncertainty Quantification for Differential Equations

Irreducible components of exotic Springer fibres

Infinite dimensional optimistic optimisation with applications on physical systems

Biodiversity in models of cyclic dominance is preserved by heterogeneity in site-specific invasion rates