Personalized Federated Search at LinkedIn

LinkedIn has grown to become a platform hosting diverse sources of information ranging from member profiles, jobs, professional groups, slideshows etc. Given the existence of multiple sources, when a member issues a query like ‘software engineer’, the member could look for software engineer profiles, jobs or professional groups. To tackle this problem, we exploit a data-driven approach that extracts searcher intents from their profile data and recent activities at a large scale. The intents such as job seeking, hiring, content consuming are used to construct features to personalize federated search experience. We tested the approach on the LinkedIn homepage and A/B tests show significant improvements in member engagement. As of writing this paper, the approach powers all of federated search on LinkedIn homepage.


‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust in a model. Trust is fundamental if one plans to take action based on a prediction, or when choosing whether or not to deploy a new model. Such understanding further provides insights into the model, which can be used to turn an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We further propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). The usefulness of explanations is shown via novel experiments, both simulated and with human subjects. Our explanations empower users in various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and detecting why a classifier should not be trusted.


Contextual Media Retrieval Using Natural Language Queries

The widespread integration of cameras in hand-held and head-worn devices as well as the ability to share content online enables a large and diverse visual capture of the world that millions of users build up collectively every day. We envision these images as well as associated meta information, such as GPS coordinates and timestamps, to form a collective visual memory that can be queried while automatically taking the ever-changing context of mobile users into account. As a first step towards this vision, in this work we present Xplore-M-Ego: a novel media retrieval system that allows users to query a dynamic database of images and videos using spatio-temporal natural language queries. We evaluate our system using a new dataset of real user queries as well as through a usability study. One key finding is that there is a considerable amount of inter-user variability, for example in the resolution of spatial relations in natural language utterances. We show that our retrieval system can cope with this variability using personalisation through an online learning-based retrieval formulation.


Practical Introduction to Clustering Data

Data clustering is an approach to seek for structure in sets of complex data, i.e., sets of ‘objects’. The main objective is to identify groups of objects which are similar to each other, e.g., for classification. Here, an introduction to clustering is given and three basic approaches are introduced: the k-means algorithm, neighbour-based clustering, and an agglomerative clustering method. For all cases, C source code examples are given, allowing for an easy implementation.


A Ranking Algorithm for Re-finding

Re-finding files from a personal computer is a frequent demand to users. When encountered a difficult re-finding task, people may not recall the attributes used by conventional re-finding methods, such as a file’s path, file name, keywords etc., the re-finding would fail. We proposed a method to support difficult re-finding tasks. By asking the user a list of questions about the target, such as a document’s pages, author numbers, accumulated reading time, last reading location etc. Then use the user’s answers to filter out the target. After the user answered a list of questions about the target file, we evaluate the user’s familiar degree about the target file based on the answers. We devise a ranking algorithm which sorts the candidates by comparing the user’s familiarity degree about the target and the candidates. We also propose a method to generate re-finding tasks artificially based on the user’s own document corpus.


On the classification of topological phases in periodically driven interacting systems

Black-box optimization with a politician

Memory properties of transformations of linear processes

Local-likelihood transformation kernel density estimation for positive random variables

A Stochastic Performance Model for Pipelined Krylov Methods

Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation

POMDP-lite for Robust Robot Planning under Uncertainty

VPSolver 3: Multiple-choice Vector Packing Solver

Implicit Function Computation by Oblivious Mobile Robots

An analytic derivation of the variance for the Abelian distribution

Multi-Source Domain Adaptation Using Approximate Label Matching

Segmentation Rectification for Video Cutout via One-Class Structured Learning

Two-log-convexity of the Catalan-Larcombe-French sequence

Bayesian generalized fused lasso modeling via NEG distribution

Uniform {\varepsilon}-Stability of Distributed Nonlinear Filtering over DNAs: Gaussian-Finite HMMs

Base sizes of imprimitive linear groups and orbits of general linear groups on spanning tuples

Gradient Descent Converges to Minimizers

A diffusion and clustering-based approach for finding coherent motions and understanding crowd scenes

Generalized minimum dominating set and application in automatic text summarization

Greedy Ants Colony Optimization Strategy for Solving the Curriculum Based University Course Timetabling Problem

Reinforcement Learning approach for Real Time Strategy Games Battle city and S3

On Perfect Classification for Gaussian Processes

Q($λ$) with Off-Policy Corrections

Parallel Linear Search with no Coordination for a Randomly Placed Treasure

Local explosion in self-similar growth-fragmentation processes

The Brownian limit of separable permutations

Strongly Universal Reversible Gate Sets

Dynamic portfolio selection without risk-free assets

Stochastic Process Bandits: Upper Confidence Bounds Algorithms via Generic Chaining

Fast strategies in biased Maker-Breaker games

On the Density of 3-Planar Graphs

The Multivariate Generalised von Mises: Inference and applications

Transmission Resonances Anomaly in 1D Disordered Quantum Systems

A Subsequence Interleaving Model for Sequential Pattern Mining

Tight Lower Bounds on Graph Embedding Problems

An introduction to sampling via measure transport

Symmetry Breaking Predicates for SAT-based DFA Identification

Spectrum graph coloring and applications to WiFi channel assignment

Do Public Events Affect Sex Trafficking Activity?

Measuring multivariate redundant information with pointwise common change in surprisal

Robust Covariance Matrix Estimation for Radar Space-Time Adaptive Processing (STAP)

Dependence of the heavily covered point on parameters

Karhunen-Loève expansion for a generalization of Wiener bridge

Generating images with recurrent adversarial networks

Locally Stationary Functional Time Series

A Harmonic Extension Approach for Collaborative Ranking

Interacting Particle Markov Chain Monte Carlo

What is the Observer generated information process?

Occupation times of alternating renewal processes with Lévy applications

The structure of matroids with a spanning clique or projective geometry

Exploration of Faulty Hamiltonian Graphs

Parallel Bayesian Global Optimization of Expensive Functions

Non-symmetric Macdonald polynomials and Demazure-Lusztig operators

A Dirichlet Process Functional Approach to Heteroscedastic-Consistent Covariance Estimation

Fast Learning Requires Good Memory: A Time-Space Lower Bound for Parity Learning

A Bayes interpretation of stacking for M-complete and M-open settings

Composable Industrial Internet Applications for Tiered Architectures

Rainbow perfect matchings and Hamilton cycles in the random geometric graph

Towards a Biologically Plausible Backprop

A Simple Condition for the Existence of Transversals

Five subsets of permutations enumerated as weak sorting permutations

Similarity adapted publication vectors: A note and a correction on measuring cognitive distance in multiple dimensions

On the difference between the Szeged and Wiener index