# Benford Distribution

The Benford distribution, denoted X ∼ Benford, describes the distribution of random variables that follows Benford’s law.

Benford’s law (also called the first digit law) states that the first (non-zero) digit (i.e., the digits 1 to 9) in a wide range of number collections doesn’t follow a uniform distribution as one would expect. Instead, the numbers follow the non-uniform Benford distribution, which is a probability distribution for the probability of the first digit in a set of numbers [1].

As an example, approximately 30% of numbers appearing in a text will have a leading digit of 1. If all leading numbers had equal probability, the number 1 would occur 1/9 or 11.1% of the time.

The law only works for numbers placed in standard form, stripped of zeros and sign. For example, 2072, and −0.02072 both have a first digit of 2.

## Applications of the Benford distribution

Benford’s law can be used to analyze numbers in texts, but it also has many other applications in probability and statistics, including in products of independent and identically distributed (iid) mixtures and in some stochastic models. The Benford distribution can model accounting data, census data, and data from stock markets [2] in addition to auditing financial records. Most people who cook the books will probably not know that numbers follow a Benford distribution and would not take this into account when entering fraudulent data [3].

Some parametric survival distributions follow a Benford distribution because many parametric lifetime models also follow the distribution for certain values [4].

## Why do numbers follow a Benford distribution?

At first, it might seem counterintuitive that numbers 1 through 9 aren’t uniformly distributed. However, our number system starts at 1, so it seems likely that higher numbers will appear less frequently. But there are many other reasons why numbers follow a Benford distribution. They include:

• Phone numbers in the US start with an area code of “1”.
• Most written texts were published in a year beginning with “1” (1999, 1987, 1892,…).
• One third of days of the month begin with 1 (another third begin with a 2 and another third begin with 3).
• As a percentage of the population, more people are alive whose ages start with a “1” (around 15%) than whose ages start with a 5, 6, 7, 8, or 9 [5].

## References

[1] Frunza, M. (2015). Solving Modern Crime in Financial Markets: Analytics and Case Studies. Academic Press.

[2] Hill, T. (1995). A Statistical Derivation of the Significant Digit Law. Statistical Science. 10(4):354-363.

[3] Tam Cho, W. & Gaines, B. Breaking the (Benford) Law: Statistical Fraud Detection in Campaign Finance.

[4] Leemis, L. et al. Survival Distributions Satisfying Benford’s Law. Retrieved November 9, 2021 from: http://www.math.wm.edu/~leemis/2000amstat.pdf