Assisting Composition of Email Responses: a Topic Prediction Approach

We propose an approach for helping agents compose email replies to customer requests. To enable that, we use LDA to extract latent topics from a collection of email exchanges. We then use these latent topics to label our data, obtaining a so-called ‘silver standard’ topic labelling. We exploit this labelled set to train a classifier to: (i) predict the topic distribution of the entire agent’s email response, based on features of the customer’s email; and (ii) predict the topic distribution of the next sentence in the agent’s reply, based on the customer’s email features and on features of the agent’s current sentence. The experimental results on a large email collection from a contact center in the tele- com domain show that the proposed ap- proach is effective in predicting the best topic of the agent’s next sentence. In 80% of the cases, the correct topic is present among the top five recommended topics (out of fifty possible ones). This shows the potential of this method to be applied in an interactive setting, where the agent is presented a small list of likely topics to choose from for the next sentence.

HDIdx: High-Dimensional Indexing for Efficient Approximate Nearest Neighbor Search

Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale data processing and analytics, particularly for analyzing multimedia contents which are often of high dimensionality. Instead of using exact NN search, extensive research efforts have been focusing on approximate NN search algorithms. In this work, we present ‘HDIdx’, an efficient high-dimensional indexing library for fast approximate NN search, which is open-source and written in Python. It offers a family of state-of-the-art algorithms that convert input high-dimensional vectors into compact binary codes, making them very efficient and scalable for NN search with very low space complexity.

Towards a general framework for an observation and knowledge based model of occupant behaviour in office buildings

This paper proposes a new general approach based on Bayesian networks to model the human behaviour. This approach represents human behaviour withprobabilistic cause-effect relations based not only on previous works, but also with conditional probabilities coming either from expert knowledge or deduced from observations. The approach has been used in the co-simulation of building physics and human behaviour in order to assess the CO 2 concentration in an office.

Source Localization in Networks: Trees and Beyond

Information diffusion in networks can be used to model many real-world phenomena, including rumor spreading on online social networks, epidemics in human beings, and malware on the Internet. Informally speaking, the source localization problem is to identify a node in the network that provides the best explanation of the observed diffusion. Despite significant efforts and successes over last few years, theoretical guarantees of source localization algorithms were established only for tree networks due to the complexity of the problem. This paper presents a new source localization algorithm, called the Short-Fat Tree (SFT) algorithm. Loosely speaking, the algorithm selects the node such that the breadth-first search (BFS) tree from the node has the minimum depth but the maximum number of leaf nodes. Performance guarantees of SFT under the independent cascade (IC) model are established for both tree networks and the Erdos-Renyi (ER) random graph. On tree networks, SFT is the maximum a posterior (MAP) estimator. On the ER random graph, the following fundamental limits have been obtained: (i) when the infection duration <\frac{2}{3}t_u, SFT identifies the source with probability one asymptotically, where t_u=\left\lceil\frac{\log n}{\log \mu}\right\rceil+2 and \mu is the average node degree, (ii) when the infection duration >t_u, the probability of identifying the source approaches zero asymptotically under any algorithm; and (iii) when infection duration <t_u, the BFS tree starting from the source is a fat tree. Numerical experiments on tree networks, the ER random graphs and real world networks with different evaluation metrics show that the SFT algorithm outperforms existing algorithms.

A proof of the Cycle Double Cover Conjecture

Local density for two-dimensional one-component plasma

Solving the Quadratic Assignment Problem on heterogeneous environment (CPUs and GPUs) with the application of Level 2 Reformulation and Linearization Technique

The Cusp-Airy Process

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

Budget Constraints in Prediction Markets

Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

What future for the Anthropocene? A biophysical perspective

Algebraic Structure of Vector Fields in Financial Diffusion Models and its Applications

VERCE delivers a productive e-Science environment for seismology research

Gröbner basis. A new algorithm for computing the Frobenius number

Pattern size in Gaussian fields from spinodal decomposition

A Box Decomposition Algorithm to Compute the Hypervolume Indicator

Minimal free resolutions of monomial ideals are supported on posets

Large deviations for the two-dimensional two-component plasma

Equivalence classes of ballot paths modulo strings of length 2 and 3

Hierarchical Representation of Prosody for Statistical Speech Synthesis

An Optimal Transport Formulation of the Linear Feedback Particle Filter

Helping Domain Experts Build Speech Translation Systems

Irreducibility of stochastic real Ginzburg-Landau equation driven by $α$-stable noises and applications

Extremal Graph Theory for Degree Sequences

Exponential mixing for SPDEs driven by highly degenerate Lévy noises

The Bruhat order on clans

The Large Deviation Principle and Steady-state Fluctuation Theorem for the Entropy Production Rate of a Stochastic Process in Magnetic Fields

On the Hardest Problem Formulations for the 0/1 Lasserre Hierarchy

Semi-static completeness and robust pricing by informed investors

Using Ontology-Based Context in the Portuguese-English Translation of Homographs in Textual Dialogues

Extreme residuals in regression model. Minimax approach

Asymptotics of Sample Entropy Production Rate for Stochastic Differential Equations

Saddlepoint methods for conditional expectations with applications to risk management

Weak approximation rates for integral functionals of Markov processes

G-Stochastic Calculus Viewed through Rough Paths and the Norris Lemma in G-framework

Pricing the European call option in the model with stochastic volatility driven by Ornstein–Uhlenbeck process. Exact formulas

Graph Operations and Upper Bounds on Graph Homomorphism Counts

Bootstrapping the Mean Vector for the Observations in the Domain of Attraction of a Multivariate Stable Law

Implicit renewal theory for exponential functionals of Lévy processes

Low regret bounds for Bandits with Knapsacks

Efficient Per-Example Gradient Computations

VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback

Conditioned real self-similar Markov processes

Fast Perfect Simulation of Vervaat Perpetutities

Estimating Stochastic Production Frontiers: A One-stage Multivariate Semi-Nonparametric Bayesian Concave Regression Method

Robustness and efficiency of covariate adjusted linear instrumental variable estimators

Boson peak and Ioffe-Regel criterion in amorphous silicon-like materials: the effect of bond directionality

Fuzzy Differences-in-Differences

Doubled patterns are $3$-avoidable

Brane Brick Models, Toric Calabi-Yau 4-Folds and 2d (0,2) Quivers