F measure

(50 minutes to learn)

Summary

The F measure (F1 score or F score) is a measure of a test's accuracy and is defined as the weighted harmonic mean of the precision and recall of the test.

Context

This concept has the prerequisites:

Goals

  • Understand why the harmonic mean of the precision and recall is used instead of the arithmetic mean
  • Should a web search engine, such as Google, favor high precision or high recall for its top 10 search results? What weights should they then use for the F measure?

Core resources (read/watch one of the following)

-Free-

Introduction to Information Retrieval
A textbook on information retrieval techniques.
Authors: Christopher D. Manning,Prabhakar Raghavan,Hinrich Schütze

Supplemental resources (the following are optional, but you may find them useful)

-Free-

Wikipedia

See also

  • accuracy is the "traditional" way to measure the performance of a system but equally weights the positive and negative results, which may not be desirable in an information retrieval system, as the number of negative results (non relevant results) can vastly outweigh the number of positive results (relevant results). However, the F measure is not invariant to label swapping (switching the positive and negative classes) which may not be desirable when e.g. using the F measure with a classifier.