< List of probability distributions

A** hurdle distribution** (also called a *zero-altered distribution*) is a two-part mixture distribution that accounts for excess zeros in data. It’s called a *hurdle *distribution because of the need to overcome the “hurdle” of excess zeros such as the recording of rare phenomenon.

“[The hurdle distribution] provides a natural means for modeling overdispersion and underdispersion of the data”

Mullahy, 1986, p. 54 [1]

The hurdle distribution was first proposed by Cragg in 1971 [2]. Since then, the distribution has gained in popularity and is commonly found in epidemiology, genetics, insurance claims, marketing and medicine.

## Hurdle distribution duality

The number of events in a hurdle distribution is a result of two distributions [3]:

- A binomial distribution that determines whether zero or non-zero events will be observed. A value of zero can only come from this portion of the model.
- A truncated Poisson distribution or negative binomial distribution to determine the non-zero counts (1, 2, 3, …).

Another way to approach modeling of data with excess zeros is zero-inflated models such as the ZIP distribution and some negative binomial variables of zero-inflated and hurdle models [4]. These distributions differ in how zeros can happen: in zero-inflated models, zeros can happen as an outcome of the counting variable; in hurdle models, zeros can only happen as outcomes when the counting variable is truncated at zero [5].

## References

[1] Mullahy, J. (1986). Specification and testing of some modified count data models.

Journal of econometrics, 33 (3), 341–365.

[2] Cragg J.G. (1971) Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica, 39, 829–844.

[3] Martin, P. (2022). Regression Models for Categorical and Count Data. SAGE publications.

[4] Min, Y., and Agresti, A. (2005). Random effect models for repeated measures of

zero-inflated count data. Statistical Modelling, 5 (1), 1–19.

[5] Zuniga, F. (2021). A New Trivariate Model and Generalized Linear Model for Stochastic Episodes’ Duration, Magnitude and Maximum. Dissertation.