(2.1 hours to learn)
Mutual information is a measure of the amount of information one random variable conveys about another. It is one of the fundamental quantities of information theory, and determines the rate at which information can be conveyed over a noisy channel.
This concept has the prerequisites:
- entropy (Mutual information is defined in terms of conditional entropy.)
- independent random variables (Independent random variables have zero mutual information.)
- conditional distributions (Conditional distributions are needed to define mutual information.)
- Know the definition of mutual information (in terms of the difference between joint entropy and conditional entropy)
- Derive some basic properties of mutual information:
- that it is symmetric
- that the mutual information of a random variable with itself is the entropy
- that it is nonnegative
- that it is zero for independent random variables
- Know various ways joint entropy decomposes into sums of conditional entropies and mutual information
Core resources (read/watch one of the following)
→ Elements of Information Theory
A graduate level textbook on information theory.
- Section 2.3, "Relative entropy and mutual information," pages 19-20
- Section 2.4, "Relationship between entropy and mutual information," pages 20-22
- Section 2.5, "Chain rules for entropy, relative entropy, and mutual information," pages 22-25
Supplemental resources (the following are optional, but you may find them useful)
→ Information Theory, Inference, and Learning Algorithms
A graudate-level textbook on machine learning and information theory.
Location: Section 8.1, "More about entropy," pages 138-140
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location: Section 1.6.1, "Relative entropy and mutual information," pages 55-58
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location: Section 2.8.3, "Mutual information," page 59
- Mutual information can also be defined in terms of [KL divergence](kl_divergence) .
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation