KL divergence

Summary

KL divergence, roughly speaking, is a measure of the distance between two probability distributions P and Q, and corresponds to the number of extra bits required to encode samples from P using an optimal code for Q. It is not truly a distance function, because it's not symmetric and it doesn't satisfy the triangle inequality. Despite this, it's widely used in information theory and probabilistic inference.

Context

This concept has the prerequisites:

Goals

  • Know the definition of KL divergence.
  • Derive some basic properties:
    • that it is nonnegative
    • that the KL divergence between a distribution and itself is 0
  • Show that it is not a true distance metric because
    • it is not symmetric
    • it doesn't satisfy the triangle inequality

Core resources (we're sorry, we haven't finished tracking down resources for this concept yet)

Supplemental resources (the following are optional, but you may find them useful)

-Paid-

See also