entropy
(1.9 hours to learn)
Summary
Entropy is a measure of the information content of a random variable, and one of the fundamental quantities of information theory. It determines the minimum expected code length necessary to encode samples of the random variable.
Context
This concept has the prerequisites:
- expectation and variance (Entropy is defined in terms of an expectation.)
- conditional distributions (Conditional distributions are needed to define conditional entropy.)
- independent random variables (The joint entropy of a set of independent random variables is the sum of the individual entropies.)
- optimization problems (Maximizing the entropy is an optimization problem.)
Goals
- Understand the notion of entropy of a discrete random variable.
- What is the largest possible entropy of a discrete random variable which takes on r possible values?
- Know the definitions of joint entropy and conditional entropy.
- Derive the chain rule for writing joint entropy as a sum of conditional entropies.
- Show that the entropy of a set of independent random variables is the sum of the individual entropies.
Core resources (read/watch one of the following)
-Free-
→ Information Theory, Inference, and Learning Algorithms
A graudate-level textbook on machine learning and information theory.
- Section 2.4, "Definition of entropy and related functions," pages 32-33
- Section 2.5, "Decomposability of the entropy," pages 33-34
- Section 4.1, "How to measure the information content of a random variable," pages 67-73
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 1.6, "Information theory," not including 1.6.1, pages 48-55
→ Elements of Information Theory
A graduate level textbook on information theory.
- Section 2.1, "Entropy," pages 13-16
- Section 2.2, "Joint entropy and conditional entropy," pages 16-18
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ Course on Information Theory, Pattern Recognition, and Neural Networks
Video lectures on machine learning and information theory.
-Paid-
→ Probabilistic Graphical Models: Principles and Techniques
A very comprehensive textbook for a graduate-level course on probabilistic AI.
- Section A.1.1, "Compression and entropy," pages 1135-1137
- Section A.1.2, "Conditional entropy and information," pages 1137-1138
See also
- Entropy determines the amount by which a message can be compressed .
- Relationship with entropy from statistical mechanics
- Conditional entropy gives the amount of uncertainty in one random variable when the value of another one is known
- Differential entropy is the analogue of entropy for continuous random variables.