Information theory is a mathematical theory of learning with deep connections with topics as diverse as artificial intelligence, statistical physics, and biological evolution. Many primers on the topic paint a broad picture with relatively little mathematical sophistication, while many others develop specific application areas in detail. In contrast, these informal notes aim to outline some elements of the information-theoretic ‘way of thinking,’ by cutting a rapid and interesting path through some of the theory’s foundational concepts and theorems. We take the Kullback-Leibler divergence as our foundational concept, and then proceed to develop the entropy and mutual information. We discuss some of the main foundational results, including the Chernoff bounds as a characterization of the divergence; Gibbs’ Theorem; and the Data Processing Inequality. A recurring theme is that the definitions of information theory support natural theorems that sound ‘obvious’ when translated into English. More pithily, ‘information theory makes common sense precise.’ Since the focus of the notes is not primarily on technical details, proofs are provided only where the relevant techniques are illustrative of broader themes. Otherwise, proofs and intriguing tangents are referenced in liberally-sprinkled footnotes. The notes close with a highly nonexhaustive list of references to resources and other perspectives on the field. Divergence, Entropy, Information: An Opinionated Introduction to Information Theory