mutual information
(2.1 hours to learn)
Summary
Mutual information is a measure of the amount of information one random variable conveys about another. It is one of the fundamental quantities of information theory, and determines the rate at which information can be conveyed over a noisy channel.
Context
This concept has the prerequisites:
- entropy (Mutual information is defined in terms of conditional entropy.)
- independent random variables (Independent random variables have zero mutual information.)
- conditional distributions (Conditional distributions are needed to define mutual information.)
Goals
- Know the definition of mutual information (in terms of the difference between joint entropy and conditional entropy)
- Derive some basic properties of mutual information:
- that it is symmetric
- that the mutual information of a random variable with itself is the entropy
- that it is nonnegative
- that it is zero for independent random variables
- Know various ways joint entropy decomposes into sums of conditional entropies and mutual information
Core resources (read/watch one of the following)
-Paid-
→ Elements of Information Theory
A graduate level textbook on information theory.
- Section 2.3, "Relative entropy and mutual information," pages 19-20
- Section 2.4, "Relationship between entropy and mutual information," pages 20-22
- Section 2.5, "Chain rules for entropy, relative entropy, and mutual information," pages 22-25
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ Information Theory, Inference, and Learning Algorithms
A graudate-level textbook on machine learning and information theory.
Location:
Section 8.1, "More about entropy," pages 138-140
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 1.6.1, "Relative entropy and mutual information," pages 55-58
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Section 2.8.3, "Mutual information," page 59
See also
- Mutual information can also be defined in terms of [KL divergence](kl_divergence) .