Stochastic Deep Network google
Machine learning is increasingly targeting areas where input data cannot be accurately described by a single vector, but can be modeled instead using the more flexible concept of random vectors, namely probability measures or more simply point clouds of varying cardinality. Using deep architectures on measures poses, however, many challenging issues. Indeed, deep architectures are originally designed to handle fixedlength vectors, or, using recursive mechanisms, ordered sequences thereof. In sharp contrast, measures describe a varying number of weighted observations with no particular order. We propose in this work a deep framework designed to handle crucial aspects of measures, namely permutation invariances, variations in weights and cardinality. Architectures derived from this pipeline can (i) map measures to measures – using the concept of push-forward operators; (ii) bridge the gap between measures and Euclidean spaces – through integration steps. This allows to design discriminative networks (to classify or reduce the dimensionality of input measures), generative architectures (to synthesize measures) and recurrent pipelines (to predict measure dynamics). We provide a theoretical analysis of these building blocks, review our architectures’ approximation abilities and robustness w.r.t. perturbation, and try them on various discriminative and generative tasks. …

Self-Exciting Model of Information Cascades (SEISMIC) google
Here we focus on predicting the final size of an information cascade spreading through a network. We develop a statistical model based on the theory of self-exciting point processes. A point process indexed by time is called a counting process when it counts the number of instances (reshares, in our case) over time. In contrast to homogeneous Poisson processes which assume constant intensity over time, self-exciting processes assume that all the previous instances (i.e., reshares) influence the future evolution of the process. Self-exciting point processes are frequently used to model ‘rich get richer’ phenomena. They are ideal for modeling information cascades in networks because every new reshare of a post not only increases its cumulative reshare count by one, but also exposes new followers who may further reshare the post. We develop SEISMIC (Self-Exciting Model of Information Cascades) for predicting the total number of reshares of a given post. In our model, each post is fully characterized by its infectiousness which measures the reshare probability. We allow the infectiousness to vary freely over time in agreement with the observation that the infectiousness can drop as the content gets stale. Moreover, our model is able to identify at each time point whether the cascade is in the supercritical or subcritical state, based on whether its infectiousness is above or below a critical threshold. A cascade in the supercritical state is going through an ‘explosion’ period and its final size cannot be predicted accurately at the current time. On the contrary, a cascade is tractable if it is in subcritical state. In this case, we are able to predict its ultimate popularity accurately by modeling the future cascading behavior by a Galton- Watson tree. Our SEISMIC approach makes several contributions: Generative model: SEISMIC imposes no parametric assumptions and requires no expensive feature engineering. Moreover, as complete social network structure may be hard to obtain, SEISMIC assumes minimal knowledge of the network: The only required input is the time history of reshares and the degrees of the resharing nodes. …

Hypothesizing After the Results are Known (HARK) google
Recent advancements in machine learning research, i.e., deep learning, introduced methods that excel conventional algorithms as well as humans in several complex tasks, ranging from detection of objects in images and speech recognition to playing difficult strategic games. However, the current methodology of machine learning research and consequently, implementations of the real-world applications of such algorithms, seems to have a recurring HARKing (Hypothesizing After the Results are Known) issue. In this work, we elaborate on the algorithmic, economic and social reasons and consequences of this phenomenon. We present examples from current common practices of conducting machine learning research (e.g. avoidance of reporting negative results) and failure of generalization ability of the proposed algorithms and datasets in actual real-life usage. Furthermore, a potential future trajectory of machine learning research and development from the perspective of accountable, unbiased, ethical and privacy-aware algorithmic decision making is discussed. We would like to emphasize that with this discussion we neither claim to provide an exhaustive argumentation nor blame any specific institution or individual on the raised issues. This is simply a discussion put forth by us, insiders of the machine learning field, reflecting on us. …

Compositional Network Embedding google
Network embedding has proved extremely useful in a variety of network analysis tasks such as node classification, link prediction, and network visualization. Almost all the existing network embedding methods learn to map the node IDs to their corresponding node embeddings. This design principle, however, hinders the existing methods from being applied in real cases. Node ID is not generalizable and, thus, the existing methods have to pay great effort in cold-start problem. The heterogeneous network usually requires extra work to encode node types, as node type is not able to be identified by node ID. Node ID carries rare information, resulting in the criticism that the existing methods are not robust to noise. To address this issue, we introduce Compositional Network Embedding, a general inductive network representation learning framework that generates node embeddings by combining node features based on the principle of compositionally. Instead of directly optimizing an embedding lookup based on arbitrary node IDs, we learn a composition function that infers node embeddings by combining the corresponding node attribute embeddings through a graph-based loss. For evaluation, we conduct the experiments on link prediction under four different settings. The results verified the effectiveness and generalization ability of compositional network embeddings, especially on unseen nodes. …