We propose an approach for helping agents compose email replies to customer requests. To enable that, we use LDA to extract latent topics from a collection of email exchanges. We then use these latent topics to label our data, obtaining a so-called ‘silver standard’ topic labelling. We exploit this labelled set to train a classifier to: (i) predict the topic distribution of the entire agent’s email response, based on features of the customer’s email; and (ii) predict the topic distribution of the next sentence in the agent’s reply, based on the customer’s email features and on features of the agent’s current sentence. The experimental results on a large email collection from a contact center in the tele- com domain show that the proposed ap- proach is effective in predicting the best topic of the agent’s next sentence. In 80% of the cases, the correct topic is present among the top five recommended topics (out of fifty possible ones). This shows the potential of this method to be applied in an interactive setting, where the agent is presented a small list of likely topics to choose from for the next sentence.
Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale data processing and analytics, particularly for analyzing multimedia contents which are often of high dimensionality. Instead of using exact NN search, extensive research efforts have been focusing on approximate NN search algorithms. In this work, we present ‘HDIdx’, an efficient high-dimensional indexing library for fast approximate NN search, which is open-source and written in Python. It offers a family of state-of-the-art algorithms that convert input high-dimensional vectors into compact binary codes, making them very efficient and scalable for NN search with very low space complexity.
This paper proposes a new general approach based on Bayesian networks to model the human behaviour. This approach represents human behaviour withprobabilistic cause-effect relations based not only on previous works, but also with conditional probabilities coming either from expert knowledge or deduced from observations. The approach has been used in the co-simulation of building physics and human behaviour in order to assess the CO 2 concentration in an office.
Information diffusion in networks can be used to model many real-world phenomena, including rumor spreading on online social networks, epidemics in human beings, and malware on the Internet. Informally speaking, the source localization problem is to identify a node in the network that provides the best explanation of the observed diffusion. Despite significant efforts and successes over last few years, theoretical guarantees of source localization algorithms were established only for tree networks due to the complexity of the problem. This paper presents a new source localization algorithm, called the Short-Fat Tree (SFT) algorithm. Loosely speaking, the algorithm selects the node such that the breadth-first search (BFS) tree from the node has the minimum depth but the maximum number of leaf nodes. Performance guarantees of SFT under the independent cascade (IC) model are established for both tree networks and the Erdos-Renyi (ER) random graph. On tree networks, SFT is the maximum a posterior (MAP) estimator. On the ER random graph, the following fundamental limits have been obtained: $(i)$ when the infection duration $<\frac{2}{3}t_u,$ SFT identifies the source with probability one asymptotically, where $t_u=\left\lceil\frac{\log n}{\log \mu}\right\rceil+2$ and $\mu$ is the average node degree, $(ii)$ when the infection duration $>t_u,$ the probability of identifying the source approaches zero asymptotically under any algorithm; and $(iii)$ when infection duration $ the BFS tree starting from the source is a fat tree. Numerical experiments on tree networks, the ER random graphs and real world networks with different evaluation metrics show that the SFT algorithm outperforms existing algorithms.