Interest Narrowness
The number of posts made by a single user account on a social media platform Twitter in any given time interval is usually quite low. However, there is a subset of users whose volume of posts is much higher than the median. In this paper, we investigate the content diversity and the social neighborhood of these extreme users and others. We define a metric called ‘interest narrowness’, and identify that a subset of extreme users, termed anomalous users, write posts with very low topic diversity, including posts with no text content. Using a few interaction patterns we show that anomalous groups have the strongest within-group interactions, compared to their interaction with others. Further, they exhibit different information sharing behaviors with other anomalous users compared to non-anomalous extreme tweeters. …
Rysearch
In our work, we propose to represent HTM as a set of flat models, or layers, and a set of topical hierarchies, or edges. We suggest several quality measures for edges of hierarchical models, resembling those proposed for flat models. We conduct an assessment experimentation and show strong correlation between the proposed measures and human judgement on topical edge quality. We also introduce heterogeneous algorithm to build hierarchical topic models for heterogeneous data sources. We show how making certain adjustments to learning process helps to retain original structure of customized models while allowing for slight coherent modifications for new documents. We evaluate this approach using the proposed measures and show that the proposed heterogeneous algorithm significantly outperforms the baseline concat approach. Finally, we implement our own ESE called Rysearch, which demonstrates the potential of ARTM approach for visualizing large heterogeneous document collections. …
Colors of Noise
In audio engineering, electronics, physics, and many other fields, the color of noise refers to the power spectrum of a noise signal (a signal produced by a stochastic process). Different colors of noise have significantly different properties: for example, as audio signals they will sound different to human ears, and as images they will have a visibly different texture. Therefore, each application typically requires noise of a specific color. This sense of ‘color’ for noise signals is similar to the concept of timbre in music (which is also called ‘tone color’); however the latter is almost always used for sound, and may consider very detailed features of the spectrum. The practice of naming kinds of noise after colors started with white noise, a signal whose spectrum has equal power within any equal interval of frequencies. That name was given by analogy with white light, which was (incorrectly) assumed to have such a flat power spectrum over the visible range. Other color names, like pink, red, and blue were then given to noise with other spectral profiles, often (but not always) in reference to the color of light with similar spectra. Some of those names have standard definitions in certain disciplines, while others are very informal and poorly defined. Many of these definitions assume a signal with components at all frequencies, with a power spectral density per unit of bandwidth proportional to 1/f ß and hence they are examples of power-law noise. For instance, the spectral density of white noise is flat (ß = 0), while flicker or pink noise has ß = 1, and Brownian noise has ß = 2. …
Concept2vec
Although there is an emerging trend towards generating embeddings for primarily unstructured data, and recently for structured data, there is not yet any systematic suite for measuring the quality of embeddings. This deficiency is further sensed with respect to embeddings generated for structured data because there are no concrete evaluation metrics measuring the quality of encoded structure as well as semantic patterns in the embedding space. In this paper, we introduce a framework containing three distinct tasks concerned with the individual aspects of ontological concepts: (i) the categorization aspect, (ii) the hierarchical aspect, and (iii) the relational aspect. Then, in the scope of each task, a number of intrinsic metrics are proposed for evaluating the quality of the embeddings. Furthermore, w.r.t. this framework multiple experimental studies were run to compare the quality of the available embedding models. Employing this framework in future research can reduce misjudgment and provide greater insight about quality comparisons of embeddings for ontological concepts. …
If you did not already know
14 Sunday Nov 2021
Posted What is ...
in