# Hard distribution

A “hard distribution” is so-called because it can be hard to identify or work with components or data points. The opposite is an “easy distribution.”

For example, computer models can be classified as “hard” or “harder” depending on their complexity and how well they model data. If model B is harder than C, that means the best algorithm for B does worse than the best algorithm for C.

## Hard distribution and hardness examples

Hardness is a measure of either sampling complexity or the complexity of computing a distribution function:

• Ting et al. define a hard distribution as one that is hard for a DBSCAN density-based clustering algorithm to identify all the clusters in the distribution [1].
• Rochetto et al. [2] describe hard distributions as those that are conjectured to be difficult to sample without the use of quantum computing.
• MCMC can sample from a hard distribution that can’t be explicitly written out; Often we can’t compute simulations for very large distribution because a Markov Chain has too many states; it could take a computer eons to calculate some of these problems. The workaround is to simulate the Markov Chain for many iterations until a “good” state solution is found [3].

## References

Image: Original: Alisneaky Vector: Zirguezi, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

[1] Ting et. al. August 2016. Overcoming Key Weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Measure. Conference: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). DOI: 10.1145/2939672.2939779

[2] Rocchetto, A., Grant, E., Strelchuk, S. et al. Learning hard quantum distributions with variational autoencoders. npj Quantum Inf 4, 28 (2018). https://doi.org/10.1038/s41534-018-0077-z

[3] Tsum A. Probability & Statistics with Applications to Computing. Chapter 9: Applications to Computing. 9.6: Markov Chain Monte Carlo (MCMC).