KL divergence


KL divergence, roughly speaking, is a measure of the distance between two probability distributions P and Q, and corresponds to the number of extra bits required to encode samples from P using an optimal code for Q. It is not truly a distance function, because it's not symmetric and it doesn't satisfy the triangle inequality. Despite this, it's widely used in information theory and probabilistic inference.


This concept has the prerequisites:


  • Know the definition of KL divergence.
  • Derive some basic properties:
    • that it is nonnegative
    • that the KL divergence between a distribution and itself is 0
  • Show that it is not a true distance metric because
    • it is not symmetric
    • it doesn't satisfy the triangle inequality

Core resources (we're sorry, we haven't finished tracking down resources for this concept yet)

Supplemental resources (the following are optional, but you may find them useful)


See also