Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach
Collaborative Filtering (CF) is widely used in large-scale recommendation engines because of its efficiency, accuracy and scalability. However, in practice, the fact that recommendation engines based on CF require interactions between users and items before making recommendations, make it inappropriate for new items which haven’t been exposed to the end users to interact with. This is known as the cold-start problem. In this paper we introduce a novel approach which employs deep learning to tackle this problem in any CF based recommendation engine. One of the most important features of the proposed technique is the fact that it can be applied on top of any existing CF based recommendation engine without changing the CF core. We successfully applied this technique to overcome the item cold-start problem in Careerbuilder’s CF based recommendation engine. Our experiments show that the proposed technique is very efficient to resolve the cold-start problem while maintaining high accuracy of the CF recommendations.
Algebraic multigrid support vector machines
The support vector machine is a flexible optimization-based technique widely used for classification problems. In practice, its training part becomes computationally expensive on large-scale data sets because of such reasons as the complexity and number of iterations in parameter fitting methods, underlying optimization solvers, and nonlinearity of kernels. We introduce a fast multilevel framework for solving support vector machine models that is inspired by the algebraic multigrid. Significant improvement in the running has been achieved without any loss in the quality. The proposed technique is highly beneficial on imbalanced sets. We demonstrate computational results on publicly available and industrial data sets.
Fog Computing: A Taxonomy, Survey and Future Directions
In recent years, the number of Internet of Things (IoT) devices/sensors has increased to a great extent. To support the computational demand of real-time latency-sensitive applications of largely geo-distributed IoT devices/sensors, a new computing paradigm named ‘Fog computing’ has been introduced. Generally, Fog computing resides closer to the IoT devices/sensors and extends the Cloud-based computing, storage and networking facilities. In this chapter, we comprehensively analyse the challenges in Fogs acting as an intermediate layer between IoT devices/ sensors and Cloud datacentres and review the current developments in this field. We present a taxonomy of Fog computing according to the identified challenges and its key features.We also map the existing works to the taxonomy in order to identify current research gaps in the area of Fog computing. Moreover, based on the observations, we propose future directions for research.
Stochastic Gradient Descent in Continuous Time
We consider stochastic gradient descent for continuous-time models. Traditional approaches for the statistical estimation of continuous-time models, such as batch optimization, can be impractical for large datasets where observations occur over a long period of time. Stochastic gradient descent provides a computationally efficient method for such statistical estimation problems. The stochastic gradient descent algorithm performs an online parameter update in continuous time, with the parameter updates satisfying a stochastic differential equation. The parameters are proven to converge to a local minimum of a natural objective function for the estimation of the continuous-time dynamics. The convergence proof leverages ergodicity by using an appropriate Poisson equation to help describe the evolution of the parameters for large times. Numerical analysis of the stochastic gradient descent algorithm is presented for several examples, including the Ornstein-Uhlenbeck process, Burger’s stochastic partial differential equation, and reinforcement learning.
$e$PCA: High Dimensional Exponential Family PCA
Many applications involve large collections of high-dimensional datapoints with noisy entries from exponential family distributions. It is of interest to estimate the covariance and principal components of the noiseless distribution. In photon-limited imaging (e.g. XFEL) we want to estimate the covariance of the pixel intensities of 2-D images, where the pixels are low-intensity Poisson variables. In genomics we want to estimate population structure from biallelic—Binomial(2)—genetic markers such as Single Nucleotide Polymorphisms (SNPs). A standard method for this is Principal Component Analysis (PCA). However, PCA loses some of its optimality properties for non-Gaussian distributions and can be inefficient when applied directly. We develop

PCA (exponential family PCA), a methodology for PCA on exponential family distributions.

PCA can be used for dimensionality reduction and denoising of large data matrices. It involves the eigendecomposition of a new covariance matrix estimator, and is as fast as PCA. It is suitable for datasets with multiple types of variables. The first step of

PCA is a diagonal debiasing of the sample covariance matrix. We obtain the convergence rate for covariance matrix estimation, and the Marchenko-Pastur law in high dimensions. Another key step of

PCA is whitening, a specific variable weighting. For SNPs, this recovers the widely used Hardy-Weinberg equilibrium (HWE) normalization. We show that whitening improves the signal strength, providing justification for HWE normalization.

PCA outperforms PCA in simulations as well as in XFEL and SNP data analysis.
PolyNet: A Pursuit of Structural Diversity in Very Deep Networks
A number of studies have shown that increasing the depth or width of convolutional networks is a rewarding approach to improve the performance of image recognition. In our study, however, we observed difficulties along both directions. On one hand, the pursuit for very deep networks are met with diminishing return and increased training difficulty; on the other hand, widening a network would result in a quadratic growth in both computational cost and memory demand. These difficulties motivate us to explore structural diversity in designing deep networks, a new dimension beyond just depth and width. Specifically, we present a new family of modules, namely the PolyInception, which can be flexibly inserted in isolation or in a composition as replacements of different parts of a network. Choosing PolyInception modules with the guidance of architectural efficiency can improve the expressive power while preserving comparable computational cost. A benchmark on the ILSVRC 2012 validation set demonstrates substantial improvements over the state-of-the-art. Compared to Inception-ResNet-v2, it reduces the top-5 error on single crops from 4.9% to 4.25%, and that on multi-crops from 3.7% to 3.45%.
A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival
Cancer survival prediction is an active area of research that can help prevent unnecessary therapies and improve patient’s quality of life. Gene expression profiling is being widely used in cancer studies to discover informative biomarkers that aid predict different clinical endpoint prediction. We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq) to predict survival of cancer patients. Despite the wealth of information available in expression profiles of cancer tumors, fulfilling the aforementioned objective remains a big challenge, for the most part, due to the paucity of data samples compared to the high dimension of the expression profiles. As such, analysis of transcriptomic data modalities calls for state-of-the-art big-data analytics techniques that can maximally use all the available data to discover the relevant information hidden within a significant amount of noise. In this paper, we propose a pipeline that predicts cancer patients’ survival by exploiting the structure of the input (manifold learning) and by leveraging the unlabeled samples using Laplacian support vector machines, a graph-based semi supervised learning (GSSL) paradigm. We show that under certain circumstances, no single modality per se will result in the best accuracy and by fusing different models together via a stacked generalization strategy, we may boost the accuracy synergistically. We apply our approach to two cancer datasets and present promising results. We maintain that a similar pipeline can be used for predictive tasks where labeled samples are expensive to acquire.
Learning to reinforcement learn
In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience.
What Do Recurrent Neural Network Grammars Learn About Syntax?
Recurrent neural network grammars (RNNG) are a recently proposed probabilistic generative modeling family for natural language. They show state-of-the-art language modeling and parsing performance. We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection. We find that explicit modeling of composition is crucial for achieving the best performance. Through the attention mechanism, we find that headedness plays a central role in phrasal representation (with the model’s latent attention largely agreeing with predictions made by hand-crafted rules, albeit with some important differences). By training grammars without non-terminal labels, we find that phrasal representations depend minimally on non-terminals, providing support for the endocentricity hypothesis.
Designing and Training Feedforward Neural Networks: A Smooth Optimisation Perspective
Despite the recent great success of deep neural networks in various applications, designing and training a deep neural network is still among the greatest challenges in the field. In this work, we present a smooth optimisation perspective on designing and training multilayer Feedforward Neural Networks (FNNs) in the supervised learning setting. By characterising the critical point conditions of an FNN based optimisation problem, we identify the conditions to eliminate local optima of the corresponding cost function. Moreover, by studying the Hessian structure of the cost function at the global minima, we develop an approximate Newton FNN algorithm, which is capable of alleviating the vanishing gradient problem. Finally, our results are numerically verified on two classic benchmarks, i.e., the XOR problem and the four region classification problem.
• Common Reconstructions in the Successive Refinement Problem with Receiver Side Information
• Embedding Projector: Interactive Visualization and Interpretation of Embeddings
• The Bayesian Formulation and Well-Posedness of Fractional Elliptic Inverse Problems
• Self-calibration-based Approach to Critical Motion Sequences of Rolling-shutter Structure from Motion
• Probabilistic Fluorescence-Based Synapse Detection
• A new estimate on complexity of binary generalized pseudostandard words
• A Hybrid Quasi-Newton Projected-Gradient Method with Application to Lasso and Basis-Pursuit Denoise
• Semantic Regularisation for Recurrent Image Annotation
• Explicable Robot Planning as Minimizing Distance from Expected Behavior
• Enumeration of Commuting Pairs in Lie Algebras over Finite Fields
• On the Exploration of Convolutional Fusion Networks for Visual Recognition
• Deep Feature Interpolation for Image Content Changes
• Bayesian Semiparametric Mixed Effects Markov Chains
• Lifting linear preferential attachment trees yields the arcsine coalescent
• Deep Action- and Context-Aware Sequence Learning for Activity Recognition and Anticipation
• Robust Hashing for Multi-View Data: Jointly Learning Low-Rank Kernelized Similarity Consensus and Hash Functions
• Automatic Node Selection for Deep Neural Networks using Group Lasso Regularization
• An Improved Integrality Gap for the Calinescu-Karloff-Rabani Relaxation for Multiway Cut
• Viscosity Solutions to Path-Dependent HJB Equation and Applications
• Duplication Distance to the Root for Binary Sequences
• Distributed Continuous-time Nonsmooth Optimization with Coupled Nonlinear Inequality Constraints
• Zero-Shot Visual Question Answering
• Multiple Access Technologies for cellular M2M Communications: An Overview
• Parallel multiple selection by regular sampling
• DelugeNets: Deep Networks with Massive and Flexible Cross-layer Information Inflows
• Probabilistic Rank and Matrix Rigidity
• Boosting Variational Inference
• A Fast and Provable Method for Estimating Clique Counts Using Turán’s Theorem
• On the extreme values of the Riemann zeta function on random intervals of the critical line
• A Note on Quantum-Secure PRPs
• Random matrix approach to estimation of high-dimensional factor models
• On the central role of the scale invariant Poisson processes on (0,infty)
• Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
• Convex Optimization of Distributed Cooperative Detection in Multi-Receiver Molecular Communication
• Multimodal Memory Modelling for Video Captioning
• SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
• Decoupled Signal Detection for the Uplink of Large-Scale MIMO Systems in Heterogeneous Networks
• Bayesian inference for multivariate extreme value distributions
• Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization
• Approximate Negative-Binomial Confidence Intervals: Asbestos Fiber Counts
• Optical Flow Requires Multiple Strategies (but only one network)
• Self-Stabilizing Maximal Matching and Anonymous Networks
• Asymptotic expansions of some Toeplitz determinants via the topological recursion
• Minor complexities of finite operations
• Fictitious play for cooperative action selection in robot teams
• Engineering electronic states of periodic and quasiperiodic chains by buckling
• Stream Packing for Asynchronous Multi-Context Systems using ASP
• Computational tameness of classical non-causal models
• Inverting The Generator Of A Generative Adversarial Network
• Online and Dynamic Algorithms for Set Cover
• Existence and Non-Existence Results for Strong External Difference Families
• Iterative Channel Estimation Using LSE and Sparse Message Passing for MmWave MIMO Systems
• A characterization of $\mathbb{Z}_2\mathbb{Z}_2[u]$-linear codes
• Swap Equilibria under Link and Vertex Destruction
• The Matrix Chain Algorithm to Compile Linear Algebra Expressions
• Learning to detect and localize many objects from few examples
• A Discriminatively Learned CNN Embedding for Person Re-identification
• On Affine Invariant $L_p$ Depth Classifiers based on an Adaptive Choice of $p$
• Shape Optimization Using the Cut Finite Element Method
• Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition
• Asymptotic shape optimization for Riesz means of the Dirichlet Laplacian over convex domains
• End-to-end Learning of Cost-Volume Aggregation for Real-time Dense Stereo
• DSAC – Differentiable RANSAC for Camera Localization
• Fusing 2D Uncertainty and 3D Cues for Monocular Body Pose Estimation
• Factorized Bilinear Models for Image Recognition
• Localization Lifetime of a Many-Body System with Periodic Constructed Disorder
• Hard-Aware Deeply Cascaded Embedding
• GENESIM: genetic extraction of a single, interpretable model
• Achievable Uplink Rates for Massive MIMO with Coarse Quantization
• Unimodal Thompson Sampling for Graph-Structured Arms
• Component structure of the configuration model: barely supercritical case
• Optimal Dynamic Coverage Infrastructure for Large-Scale Fleets of Reconnaissance UAVs
• Fast Non-Parametric Tests of Relative Dependency and Similarity
• Building Deep Networks on Grassmann Manifolds
• Compensating for Large In-Plane Rotations in Natural Images
• Note on k-planar crossing numbers
• Maximizing a Submodular Function with Viability Constraints
• Separating quantum communication and approximate rank
• Cross-Domain Face Verification: Matching ID Document and Self-Portrait Photographs
• Examining the Impact of Blur on Recognition by Convolutional Networks
• Heat kernels for non-symmetric diffusion operators with jumps
• Distinct spreads in vector spaces over finite fields
• Phase transition in random distance graphs on the torus
• Reflections at infinity of time changed RBMs on a domain with Liouville branches
• Explicit (Polynomial!) Expressions for the Expectation, Variance and Higher Moments of the Size of a (2n + 1, 2n + 3)-core partition with Distinct Parts
• DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins
• Towards the Modeling of Behavioral Trajectories of Users in Online Social Media
• Gap Safe screening rules for sparsity enforcing penalties
• Finite reflection groups and graph norms
• How Lock-free Data Structures Perform in Dynamic Environments: Models and Analyses
• Imprecise Continuous-Time Markov Chains
• The Freiburg Groceries Dataset
• Splitting schemes for unsteady problems involving the grad-div operator
• Filling the gaps: Gaussian mixture models from noisy, truncated or incomplete samples
• Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance
• Qualitative parameter estimation for a class of relaxation oscillators
• A Socio-geographic Perspective on Human Activities in Social Media
• AutoScaler: Scale-Attention Networks for Visual Correspondence
• A smooth transition from Wishart to GOE
• Maximizing the minimum achievable secrecy rate of two-way relay networks using the null space beamforming method
• Video Processing from Electro-optical Sensors for Object Detection and Tracking in Maritime Environment: A Survey
• Contributed Discussion to Bayesian Solution Uncertainty Quantification for Differential Equations
• Irreducible components of exotic Springer fibres
• Infinite dimensional optimistic optimisation with applications on physical systems
• Biodiversity in models of cyclic dominance is preserved by heterogeneity in site-specific invasion rates
Like this:
Like Loading...