# Hurdle distribution

A hurdle distribution (also called a zero-altered distribution) is a two-part mixture distribution that accounts for excess zeros in data. It’s called a hurdle distribution because of the need to overcome the “hurdle” of excess zeros such as the recording of rare phenomenon.

“[The hurdle distribution] provides a natural means for modeling overdispersion and underdispersion of the data”

Mullahy, 1986, p. 54 [1]

The hurdle distribution was first proposed by Cragg in 1971 [2]. Since then, the distribution has gained in popularity and is commonly found in epidemiology, genetics, insurance claims, marketing and medicine.

## Hurdle distribution duality

The number of events in a hurdle distribution is a result of two distributions [3]:

• A binomial distribution that determines whether zero or non-zero events will be observed. A value of zero can only come from this portion of the model.
• A truncated Poisson distribution or negative binomial distribution to determine the non-zero counts (1, 2, 3, …).

Another way to approach modeling of data with excess zeros is zero-inflated models such as the ZIP distribution and some negative binomial variables of zero-inflated and hurdle models [4]. These distributions differ in how zeros can happen: in zero-inflated models, zeros can happen as an outcome of the counting variable; in hurdle models, zeros can only happen as outcomes when the counting variable is truncated at zero [5].

## References

[1] Mullahy, J. (1986). Specification and testing of some modified count data models.
Journal of econometrics, 33 (3), 341–365.

[2] Cragg J.G. (1971) Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica, 39, 829–844.

[3] Martin, P. (2022). Regression Models for Categorical and Count Data. SAGE publications.

[4] Min, Y., and Agresti, A. (2005). Random effect models for repeated measures of
zero-inflated count data. Statistical Modelling, 5 (1), 1–19.