Ever since its discovery in 1807, the Fourier transform has been one of the mainstays of pure mathematics, theoretical physics, and engineering. The ease with which it connects the analytical and algebraic properties of function spaces; the particle and wave descriptions of matter; and the time and frequency domain descriptions of waves and vibrations make the Fourier transform one of the great unifying concepts of mathematics. Deeper examination reveals that the logic of the Fourier transform is dictated by the structure of the underlying space itself. Hence, the classical cases of functions on the real line, the unit circle, and the integers modulo n are only the beginning: harmonic analysis can be generalized to functions on any space on which a group of transformations acts. Here the emphasis is on the word group in the mathematical sense of an algebraic system obeying specific axioms. The group might even be non-commutative: the fundamental principles behind harmonic analysis are so general that they apply equally to commutative and non-commutative structures. Thus, the humble Fourier transform leads us into the depths of group theory and abstract algebra, arguably the most extensive formal system ever explored by humans. Should this be of any interest to the practitioner who has his eyes set on concrete applications of machine learning and statistical inference? Hopefully, the present thesis will convince the reader that the answer is an emphatic “yes”. One of the reasons why this is so is that groups are the mathematician’s way of capturing symmetries, and symmetries are all around us. Twentieth century physics has taught us just how powerful a tool symmetry principles are for prying open the secrets of nature. One could hardly ask for a better example of the power of mathematics than particle physics, which translates the abstract machinery of group theory into predictions about the behavior of the elementary building blocks of our universe. I believe that algebra will prove to be just as crucial to the science of data as it has proved to be to the sciences of the physical world. In probability theory and statistics it was Persi Diaconis who did much of the pioneering work in this realm, brilliantly expounded in his little book [Diaconis, 1988]. Since then, several other authors have also contributed to the field. In comparison, the algebraic side of machine learning has until now remained largely unexplored. The present thesis is a first step towards filling this gap. The two main themes of the thesis are (a) learning on domains which have non-trivial algebraic structure; and (b) learning in the presence of invariances. Learning rankings/matchings are the classic example of the first situation, whilst rotation/translation/scale invariance in machine vision is probably the most immediate example of the latter. The thesis presents examples addressing real world problems in these two domains. However, the beauty of the algebraic approach is that it allows us to discuss these matters on a more general, abstract, level, so most of our results apply equally well to a large range of learning scenarios. The generality of our approach also means that we do not have to commit to just one learning paradigm (frequentist/Bayesian) or one group of algorithms (SVMs/graphical models/boosting/etc.). We do find that some of our ideas regarding symmetrization and learning on groups meshes best with the Hilbert space learning framework, so in Chapters 4 and 5 we focus on this methodology, but even there we take a comparative stance, contrasting the SVM with Gaussian Processes and a modified version of the Perceptron. One of the reasons why up until now abstract algebra has not had a larger impact on the applied side of computer science is that it is often perceived as a very theoretical field, where computations are difficult if not impossible due to the sheer size of the objects at hand. For example, while permutations obviously enter many applied problems, calulations on the full symmetric group (permutation group) are seldom viable, since it has n! elements. However, recently abstract algebra has developed a strong computational side [B¨urgisser et al., 1997]. The core algorithms of this new computational algebra, such as the non-commutative FFTs discussed in detail in Chapter 3, are the backbone of the bridge between applied computations and abstract theory. In addition to our machine learning work, the present thesis offers some modest additions to this field by deriving some useful generalizations of Clausen’s FFT for the symmetric group, and presenting an efficient, expandable software library implementing the transform. To the best of our knowledge, this is the first time that such a library has been made publicly available. Clearly, a thesis like this one is only a first step towards building a bridge between the theory of groups/representations and machine learning. My hope is that it will offer ideas and inspiration to both sides, as well as a few practical algorithms that I believe are directly applicable to real world problems. Group theoretical methods in machine learning