Micro-Browsing Model
Click-through rate (CTR) is a key signal of relevance for search engine results, both organic and sponsored. CTR of a result has two core components: (a) the probability of examination of a result by a user, and (b) the perceived relevance of the result given that it has been examined by the user. There has been considerable work on user browsing models, to model and analyze both the examination and the relevance components of CTR. In this paper, we propose a novel formulation: a micro-browsing model for how users read result snippets. The snippet text of a result often plays a critical role in the perceived relevance of the result. We study how particular words within a line of snippet can influence user behavior. We validate this new micro-browsing user model by considering the problem of predicting which snippet will yield higher CTR, and show that classification accuracy is dramatically higher with our micro-browsing user model. The key insight in this paper is that varying relatively few words within a snippet, and even their location within a snippet, can have a significant influence on the clickthrough of a snippet. …
Reduced Dynamic Chain Event Graph (RDCEG)
In this paper we introduce a new class of probabilistic graphical models called the Reduced Dynamic Chain Event Graph (RDCEG) which is a novel mixture of a Chain Event Graph (CEG) and a semi-Markov process (SMP). It has been demonstrated that many real-world scenarios, particularly in the domain of public health and security, can be modelled as an unfolding of events in the life histories of individuals. Our interest not only lies in the future trajectories of an individual with a specified history and set of characteristics but also in the timescale associated with these developments. Such information is critical in developing suitable interventions and informs the prioritisation of policy decisions. The RDCEG was born out of the need for such a model. It is a coloured graph which inherits useful properties like fast conjugate model selection, conditional independence interrogations and a support for causal interventions from the family of probabilistic graphical models. Its novelty lies in its underlying semi-Markov structure which offers the flexibility of the holding time at each state being any arbitrary distribution. We demonstrate this new decision support system with a simulated intervention to reduce falls in the elderly. …
Extreme Gradient Boosting
Extreme Gradient Boosting, which is an efficient implementation of gradient boosting framework. …
DBSCAN++
DBSCAN is a classical density-based clustering procedure which has had tremendous practical relevance. However, it implicitly needs to compute the empirical density for each sample point, leading to a quadratic worst-case time complexity, which may be too slow on large datasets. We propose DBSCAN++, a simple modification of DBSCAN which only requires computing the densities for a subset of the points. We show empirically that, compared to traditional DBSCAN, DBSCAN++ can provide not only competitive performance but also added robustness in the bandwidth hyperparameter while taking a fraction of the runtime. We also present statistical consistency guarantees showing the trade-off between computational cost and estimation rates. Surprisingly, up to a certain point, we can enjoy the same estimation rates while lowering computational cost, showing that DBSCAN++ is a sub-quadratic algorithm that attains minimax optimal rates for level-set estimation, a quality that may be of independent interest. …
If you did not already know
30 Saturday Jul 2022
Posted What is ...
in