# entropy

(1.9 hours to learn)

## Summary

Entropy is a measure of the information content of a random variable, and one of the fundamental quantities of information theory. It determines the minimum expected code length necessary to encode samples of the random variable.

## Context

This concept has the prerequisites:

- expectation and variance (Entropy is defined in terms of an expectation.)
- conditional distributions (Conditional distributions are needed to define conditional entropy.)
- independent random variables (The joint entropy of a set of independent random variables is the sum of the individual entropies.)
- optimization problems (Maximizing the entropy is an optimization problem.)

## Goals

- Understand the notion of entropy of a discrete random variable.

- What is the largest possible entropy of a discrete random variable which takes on r possible values?

- Know the definitions of joint entropy and conditional entropy.

- Derive the chain rule for writing joint entropy as a sum of conditional entropies.

- Show that the entropy of a set of independent random variables is the sum of the individual entropies.

## Core resources (read/watch one of the following)

## -Free-

→ Information Theory, Inference, and Learning Algorithms

A graudate-level textbook on machine learning and information theory.

- Section 2.4, "Definition of entropy and related functions," pages 32-33
- Section 2.5, "Decomposability of the entropy," pages 33-34
- Section 4.1, "How to measure the information content of a random variable," pages 67-73

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Section 1.6, "Information theory," not including 1.6.1, pages 48-55

→ Elements of Information Theory

A graduate level textbook on information theory.

- Section 2.1, "Entropy," pages 13-16
- Section 2.2, "Joint entropy and conditional entropy," pages 16-18

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ Course on Information Theory, Pattern Recognition, and Neural Networks

Video lectures on machine learning and information theory.

## -Paid-

→ Probabilistic Graphical Models: Principles and Techniques

A very comprehensive textbook for a graduate-level course on probabilistic AI.

- Section A.1.1, "Compression and entropy," pages 1135-1137
- Section A.1.2, "Conditional entropy and information," pages 1137-1138

## See also

- Entropy determines the amount by which a message can be compressed .
- Relationship with entropy from statistical mechanics
- Conditional entropy gives the amount of uncertainty in one random variable when the value of another one is known
- Differential entropy is the analogue of entropy for continuous random variables.