Likelihood Likelihood is a funny concept. It’s not a probability, but it is proportional to a probability. The likelihood of a hypothesis (H) given some data (D) is proportional to the probability of obtaining D given that H is true, multiplied by an arbitrary positive constant (K). In other words, L(H|D) = K · P(D|H). Since a likelihood isn’t actually a probability it doesn’t obey various rules of probability. For example, likelihood need not sum to 1. A critical difference between probability and likelihood is in the interpretation of what is fixed and what can vary. In the case of a conditional probability, P(D|H), the hypothesis is fixed and the data are free to vary. Likelihood, however, is the opposite. The likelihood of a hypothesis, L(H|D), conditions on the data as if they are fixed while allowing the hypotheses to vary. The distinction is subtle, so I’ll say it again. For conditional probability, the hypothesis is treated as a given and the data are free to vary. For likelihood, the data are a given and the hypotheses vary. http://…/likelihood

Covariant Compositional Network (CCN) Most existing neural networks for learning graphs address permutation invariance by conceiving of the network as a message passing scheme, where each node sums the feature vectors coming from its neighbors. We argue that this imposes a limitation on their representation power, and instead propose a new general architecture for representing objects consisting of a hierarchy of parts, which we call Covariant Compositional Networks (CCNs). Here, covariance means that the activation of each neuron must transform in a specific way under permutations, similarly to steerability in CNNs. We achieve covariance by making each activation transform according to a tensor representation of the permutation group, and derive the corresponding tensor aggregation rules that each neuron must implement. Experiments show that CCNs can outperform competing methods on standard graph learning benchmarks. …

Conditional Fiducial Model The fiducial is not unique in general, but we prove that in a restricted class of models it is uniquely determined by the sampling distribution of the data. It depends in particular not on the choice of a data generating model. The arguments lead to a generalization of the classical formula found by Fisher (1930). The restricted class includes cases with discrete distributions, the case of the shape parameter in the Gamma distribution, and also the case of the correlation coefficient in a bivariate Gaussian model. One of the examples can also be used in a pedagogical context to demonstrate possible difficulties with likelihood-, Bayesian-, and bootstrap-inference. Examples that demonstrate non-uniqueness are also presented. It is explained that they can be seen as cases with restrictions on the parameter space. Motivated by this the concept of a conditional fiducial model is introduced. This class of models includes the common case of iid samples from a one-parameter model investigated by Hannig (2013), the structural group models investigated by Fraser (1968), and also certain models discussed by Fisher (1973) in his final writing on the subject. …