Discrete probability distributions
Discrete distributions
- class statinf.distributions.discrete.CMPoisson(lambda_=None, nu_=None, j=500, *args, **kwargs)[source]
Bases:
Discrete
Conway-Maxwell Poisson distribution. This class allows to generate a random variable based on selected parameters and size but also to fit some data and estimate the parameters by means of Maximum Likelihood Estimation (MLE).
Introduced by Conway and Maxwell (1962), the Conway-Maxwell Poisson (aka CMP) is a generalization of the common Poisson distribution (
statinf.distributions.discrete.Poisson
). The distribution can handle non equi-dispersion cases where \(\mathbb{E}(X) \neq \mathbb{V}(X)\). The level of dispersion is captured by \(\nu\) such that underdispersion is captured when \(\nu > 1\), equidispersion \(\nu = 1\) and overdispersion when \(\nu < 1\).- Formulae
The probability mass function (pmf) is defined by
\[\mathbb{P}(X = x | \lambda, \nu) = \dfrac{\lambda^{x}}{(x!)^{\nu}} \dfrac{1}{Z(\lambda, \nu)}\]where \(Z(\lambda, \nu) = \sum_{j=0}^{\infty} \dfrac{\lambda^{j}}{(j!)^{\nu}}\) is calculated in
statinf.distributions.discrete.CMPoisson.Z()
.
Special cases of the CMP distribution include well-known distributions.
When \(\nu = 1\), one recovers the Poisson distribution with parameter \(\lambda\)
When \(\nu = 0\) and \(\lambda < 1\) one recovers the geometric distribution with parameter \(p = 1 - \lambda\) for the probability of success
When \(\nu \rightarrow \infty\), one finds the Bernoulli distribution with parameter \(p = \frac{\lambda}{1 + \lambda}\) for the probability of success
- Parameters
lambda_ (
float
, optional) – Parameter \(\lambda\) representing the generalized expectation, defaults to Nonenu_ (
float
, optional) – Parameter \(\nu\) representing the level of dispersion, defaults to Nonej (
int
, optional) – Length of the sum for the normalizing constant (seestatinf.distributions.discrete.CMPoisson.Z()
), defaults to 250
- Example
>>> from statinf.distributions import CMPoisson >>> # Let us generate a random sample of size 1000 >>> x = CMPoisson(lambda_=2.5, nu_=1.5).sample(size=1000) >>> # We can also estimate the parameters from the generated sample >>> # We just need to initialize the class... >>> cmp = CMPoisson() >>> # ... and we can fit from the generated sample. The function returns a dictionary >>> cmp.fit(x) ... {'lambda_': 2.7519745539344687, 'nu_': 1.5624694839612023, 'nll': 1492.0792423744383} >>> # The class stores the value of the estimated parameters >>> print(cmp.lambda_) ... 2.7519745539344687 >>> # So we can generate more samples using the fitted parameters >>> y = cmp.sample(200)
- Reference
Conway, R. W., & Maxwell, W. L. (1962). A queuing model with state dependent service rates. Journal of Industrial Engineering, 12(2), 132-136.
Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S., & Boatwright, P. (2005). A useful distribution for fitting discrete data: revival of the Conway-Maxwell-Poisson distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(1), 127-142.
Sellers, K. F., Swift, A. W., & Weems, K. S. (2017). A flexible distribution class for count data. Journal of Statistical Distributions and Applications, 4(1), 1-21.
Sellers, K. (2023). The Conway-Maxwell-Poisson Distribution (Institute of Mathematical Statistics Monographs). Cambridge: Cambridge University Press.
- static Z(lambda_, nu_, j=None, log=False) float [source]
Compute the \(Z\) factor, normalizing constant.
The factor \(Z(\lambda, \nu)\) serves as a normalizing constant such that the distribution satisfies the basic probability axioms (i.e. the probability mass function sums up to 1).
\[Z(\lambda, \nu) = \sum_{j=0}^{\infty} \dfrac{\lambda^{j}}{(j!)^{\nu}}\]Note
For implementation purposes, the length of the sum cannot be infinity. The parameter
j
is chosen to be suficiently large so that the value of the sum converges to its asymptotic value. Note that too large values forj
will imply longer computation time and potential errors (\(j!\) may become too large and might not fit in memory).- Parameters
lambda_ (
float
, optional) – First parameter \(\lambda > 0\) of the distributionnu_ (
float
, optional) – Second parameter \(\nu > 0\) of the distributionj (
int
, optional) – Length of the sum for the normalizing constant, ifNone
then we use the value from the__init__
method, defaults to Nonelog (
bool
) – Compute \(\log(Z(\lambda, \nu))\), defaults to False
- Returns
Normalizing factor \(Z\)
- Return type
float
- fit(data, method='L-BFGS-B', init_params='auto', j=None, bounds=None) dict [source]
Estimates the parameters \(\lambda\) and \(\nu\) of the distribution from empirical data based on Maximum Likelihood Estimation.
Note
There is no close form to estimate the parameters nor a direct relation between the empirical moments (\(\bar{X}\)) and the theoretical ones. Therefore, only MLE is available (no fast method).
- Parameters
data (
numpy.array
orlist
orpandas.Series
) – Data to fit and estimate parameters from.method (
str
, optional) – Optimization method to estimate the parameters, defaults to ‘L-BFGS-B’init_params (
numpy.array
, optional) – Initial parameters for the optimization method, defaults tonp.array([1., 0.5])
- Returns
Estimated parameters
- Return type
dict
- static nloglike(params, data, Z, j=100) float [source]
Static method to estumate the negative likelihood (used in
statinf.distributions.discrete.CMPoisson.fit()
method).- Formula
The log-likelihood function \(l\) is defined by
\[\mathcal{l}(x_1, ..., x_n | \lambda, \nu) = \log (\lambda) \sum_{i}^{n} {x_i} - \nu \sum_{i}^{n} {\log (x_i!)} - n \log (Z(\lambda, \nu))\]- Parameters
params (
list
) – List of parameters \(\lambda\) and \(\nu\)data (
numpy.array
orlist
orpandas.Series
) – Data to evaluate the netative log-likelihood onj (
int
, optional) – Length of the inifinite sum for the normalizing factor \(Z\), defaults to 100
- Returns
Negative log-likelihood
- Return type
float
- pmf(x) float [source]
Computes the probability mass function for selected value
x
.- Formula
The probability mass function (pmf) is computed by
\[\mathbb{P}(X = x | \lambda, \nu) = \dfrac{\lambda^{x}}{(x!)^{\nu}} \dfrac{1}{Z(\lambda, \nu)}\]where \(Z(\lambda, \nu) = \sum_{j=0}^{\infty} \dfrac{\lambda^{j}}{(j!)^{\nu}}\) is calculated in
statinf.distributions.discrete.CMPoisson.Z()
.- Parameters
x (
int
) – Value to be evaluated- Returns
Probability \(\mathbb{P}(X = x | \lambda, \nu)\)
- Return type
float
- class statinf.distributions.discrete.Discrete[source]
Bases:
object
A generic class for discrete probability distributions
- class statinf.distributions.discrete.NegativeBinomial(r_=None, p_=None, *args, **kwargs)[source]
Bases:
Discrete
Negative Binomial distribution.
- Formulae
The probability mass function (pmf) is defined by
\[\mathbb{P}(X = x | r, p) = \dfrac{(x + r - 1)!}{(r - 1)! x!} (1 - p)^{k} p^{r}\]- Example
>>> from statinf.distributions import NegativeBinomial >>> # Let us generate a random sample of size 1000 >>> x = NegativeBinomial(n_=5, p_=0.15).sample(size=1000) >>> # We can also estimate the parameters from the generated sample >>> # We just need to initialize the class... >>> nb = NegativeBinomial() >>> # ... and we can fit from the generated sample. The function returns a dictionary >>> nb.fit(x) ... {'r_': 5, 'p_': 0.15069905301345346, 'convergence': True, 'loglikelihood': -3972.2726626530775} >>> # The class stores the value of the estimated parameters >>> print(nb.p_) ... 0.15069905301345346 >>> # So we can generate more samples using the fitted parameters >>> y = nb.sample(200)
- Reference
DeGroot, M. H., & Schervish, M. J. (2012). Probability and statistics. Pearson Education.
- fit(data, method='L-BFGS-B', init_params=[1, 0.5], bounds=None, **kwargs) dict [source]
Estimates the parameters \(\lambda\) and \(\nu\) of the distribution from empirical data based on Maximum Likelihood Estimation.
Note
There is no close form to estimate the parameters. Therefore, only MLE is available (no fast method).
- Parameters
data (
numpy.array
orlist
orpandas.Series
) – Data to fit and estimate parameters from.method (
str
, optional) – Optimization method to estimate the parameters, defaults to ‘L-BFGS-B’init_params (
numpy.array
, optional) – Initial parameters for the optimization method, defaults tonp.array([1., 1.])
- Returns
Estimated parameters
- Return type
dict
- static nloglike(params, data, eps=0.001) float [source]
Static method to estumate the negative likelihood (used in
statinf.distributions.discrete.NegativeBinomial.fit()
method).- Formula
The log-likelihood function \(l\) is defined by
\[\begin{split}\mathcal{l}(x_1, ..., x_n | r, p) &= \sum_{i=1}^{n} {\log(\Gamma(x_i + r))} \sum_{i=1}^{n} {\log(x_i!)} - N \log(\Gamma(r)) \\ &+ \sum_{i=1}^{n} {x_i \log(1-p)} + N r \log(p)\end{split}\]- Parameters
params (
list
) – List of parameters \(r\) and \(p\)data (
numpy.array
orlist
orpandas.Series
) – Data to evaluate the netative log-likelihood on
- Returns
Negative log-likelihood
- Return type
float
- pmf(x) float [source]
Computes the probability mass function for selected value
x
.- Formula
The probability mass function (pmf) is computed by
\[\mathbb{P}(X = x | r, p) = \dfrac{(x + r - 1)!}{(r - 1)! x!} (1 - p)^{k} p^{r}\]- Parameters
x (
int
) – Value to be evaluated- Returns
Probability \(\mathbb{P}(X = x | r, p)\)
- Return type
float
- class statinf.distributions.discrete.Poisson(lambda_=None, *args, **kwargs)[source]
Bases:
Discrete
Poisson distribution.
The Poisson distribution is the most common probability distribution for count data.
- Formula
The probability mass function is defined by
\[\mathbb{P}(X = x | \lambda) = \dfrac{\lambda^{x}}{x!} e^{- \lambda}\]
The distribution assumes equi-dispersion, meaning that \(\mathbb{E}(X) = \mathbb{V}(X)\).
- Parameters
lambda_ (
float
, optional) – Parameter \(\lambda\) representing the both location (\(\mathbb{E}(X)\)) and the scale (\(\mathbb{V}(X)\)) parameters, defaults to None- Example
>>> from statinf.distributions import Poisson >>> # Let us generate a random sample of size 1000 >>> x = Poisson(lambda_=2.5).sample(size=1000) >>> # We can also estimate the parameter from the generated sample >>> # We just need to initialize the class... >>> poiss = Poisson() >>> # ... and we can fit from the generated sample. The function returns a dictionary >>> poiss.fit(x) ... {'lambda_': 2.46} >>> # The class stores the value of the estimated parameters >>> print(poiss.lambda_) ... 2.46 >>> # So we can generate more samples using the fitted parameters >>> y = poiss.sample(200)
- Reference
DeGroot, M. H., & Schervish, M. J. (2012). Probability and statistics. Pearson Education.
- fit(data, method='fast', **kwargs) dict [source]
Estimates the parameter \(\lambda\) of the distribution from empirical data based on Maximum Likelihood Estimation.
The Maximum Likihood Estimator also corresponds to the emprical mean:
\[\hat{\lambda}_{\text{MLE}} = \dfrac{1}{n} \sum_{i=1}^{n} x_i\]The method = ‘fast’ is available to estimate the parameter directly from the emprical mean. Any other value for the parameter method will use classical MLE.
- Parameters
data (
numpy.array
orlist
orpandas.Series
) – Data to fit and estimate parameters from.method (
str
, optional) – Optimization method to estimate the parameters as in the scipy library (allows ‘fast’ value), defaults to ‘L-BFGS-B’
- Returns
Estimated parameter
- Return type
dict
- static nloglike(params, data) float [source]
Static method to estumate the negative likelihood (used in
statinf.distributions.discrete.Poisson.fit()
method).- Formula
The log-likelihood function \(l\) is defined by
\[\mathcal{l}(x_1, ..., x_n | \lambda, \nu) = - n \lambda + \log(\lambda) \sum_{i=1}^{n} {x_i} - \sum_{i=1}^{n} {\log(x_i!)}\]- Parameters
params (
list
) – List containing parameter \(\lambda\)data (
numpy.array
orlist
orpandas.Series
) – Data to evaluate the netative log-likelihood on
- Returns
Negative log-likelihood
- Return type
float
- pmf(x) float [source]
Computes the probability mass function for selected value
x
.- Formula
The probability mass function (pmf) is computed by
\[\mathbb{P}(X = x | \lambda) = \dfrac{\lambda^{x}}{x!} e^{- \lambda}\]- Parameters
x (
int
) – Value to be evaluated- Returns
Probability \(\mathbb{P}(X = x | \lambda, \nu)\)
- Return type
float
Example
from statinf.distributions import CMPoisson