Discrete probability distributions

Discrete distributions

class statinf.distributions.discrete.CMPoisson(lambda_=None, nu_=None, j=500, *args, **kwargs)[source]

Bases: Discrete

Conway-Maxwell Poisson distribution. This class allows to generate a random variable based on selected parameters and size but also to fit some data and estimate the parameters by means of Maximum Likelihood Estimation (MLE).

Introduced by Conway and Maxwell (1962), the Conway-Maxwell Poisson (aka CMP) is a generalization of the common Poisson distribution (statinf.distributions.discrete.Poisson). The distribution can handle non equi-dispersion cases where \(\mathbb{E}(X) \neq \mathbb{V}(X)\). The level of dispersion is captured by \(\nu\) such that underdispersion is captured when \(\nu > 1\), equidispersion \(\nu = 1\) and overdispersion when \(\nu < 1\).

Formulae

The probability mass function (pmf) is defined by

\[\mathbb{P}(X = x | \lambda, \nu) = \dfrac{\lambda^{x}}{(x!)^{\nu}} \dfrac{1}{Z(\lambda, \nu)}\]

where \(Z(\lambda, \nu) = \sum_{j=0}^{\infty} \dfrac{\lambda^{j}}{(j!)^{\nu}}\) is calculated in statinf.distributions.discrete.CMPoisson.Z().

Special cases of the CMP distribution include well-known distributions.

  • When \(\nu = 1\), one recovers the Poisson distribution with parameter \(\lambda\)

  • When \(\nu = 0\) and \(\lambda < 1\) one recovers the geometric distribution with parameter \(p = 1 - \lambda\) for the probability of success

  • When \(\nu \rightarrow \infty\), one finds the Bernoulli distribution with parameter \(p = \frac{\lambda}{1 + \lambda}\) for the probability of success

Parameters
  • lambda_ (float, optional) – Parameter \(\lambda\) representing the generalized expectation, defaults to None

  • nu_ (float, optional) – Parameter \(\nu\) representing the level of dispersion, defaults to None

  • j (int, optional) – Length of the sum for the normalizing constant (see statinf.distributions.discrete.CMPoisson.Z()), defaults to 250

Example

>>> from statinf.distributions import CMPoisson
>>> # Let us generate a random sample of size 1000
>>> x = CMPoisson(lambda_=2.5, nu_=1.5).sample(size=1000)
>>> # We can also estimate the parameters from the generated sample
>>> # We just need to initialize the class...
>>> cmp = CMPoisson()
>>> # ... and we can fit from the generated sample. The function returns a dictionary
>>> cmp.fit(x)
... {'lambda_': 2.7519745539344687, 'nu_': 1.5624694839612023, 'nll': 1492.0792423744383}
>>> # The class stores the value of the estimated parameters
>>> print(cmp.lambda_)
... 2.7519745539344687
>>> # So we can generate more samples using the fitted parameters
>>> y = cmp.sample(200)
Reference
static Z(lambda_, nu_, j=None, log=False) float[source]

Compute the \(Z\) factor, normalizing constant.

The factor \(Z(\lambda, \nu)\) serves as a normalizing constant such that the distribution satisfies the basic probability axioms (i.e. the probability mass function sums up to 1).

\[Z(\lambda, \nu) = \sum_{j=0}^{\infty} \dfrac{\lambda^{j}}{(j!)^{\nu}}\]

Note

For implementation purposes, the length of the sum cannot be infinity. The parameter j is chosen to be suficiently large so that the value of the sum converges to its asymptotic value. Note that too large values for j will imply longer computation time and potential errors (\(j!\) may become too large and might not fit in memory).

Parameters
  • lambda_ (float, optional) – First parameter \(\lambda > 0\) of the distribution

  • nu_ (float, optional) – Second parameter \(\nu > 0\) of the distribution

  • j (int, optional) – Length of the sum for the normalizing constant, if None then we use the value from the __init__ method, defaults to None

  • log (bool) – Compute \(\log(Z(\lambda, \nu))\), defaults to False

Returns

Normalizing factor \(Z\)

Return type

float

fit(data, method='L-BFGS-B', init_params='auto', j=None, bounds=None) dict[source]

Estimates the parameters \(\lambda\) and \(\nu\) of the distribution from empirical data based on Maximum Likelihood Estimation.

Note

There is no close form to estimate the parameters nor a direct relation between the empirical moments (\(\bar{X}\)) and the theoretical ones. Therefore, only MLE is available (no fast method).

Parameters
  • data (numpy.array or list or pandas.Series) – Data to fit and estimate parameters from.

  • method (str, optional) – Optimization method to estimate the parameters, defaults to ‘L-BFGS-B’

  • init_params (numpy.array, optional) – Initial parameters for the optimization method, defaults to np.array([1., 0.5])

Returns

Estimated parameters

Return type

dict

static nloglike(params, data, Z, j=100) float[source]

Static method to estumate the negative likelihood (used in statinf.distributions.discrete.CMPoisson.fit() method).

Formula

The log-likelihood function \(l\) is defined by

\[\mathcal{l}(x_1, ..., x_n | \lambda, \nu) = \log (\lambda) \sum_{i}^{n} {x_i} - \nu \sum_{i}^{n} {\log (x_i!)} - n \log (Z(\lambda, \nu))\]
Parameters
  • params (list) – List of parameters \(\lambda\) and \(\nu\)

  • data (numpy.array or list or pandas.Series) – Data to evaluate the netative log-likelihood on

  • j (int, optional) – Length of the inifinite sum for the normalizing factor \(Z\), defaults to 100

Returns

Negative log-likelihood

Return type

float

pmf(x) float[source]

Computes the probability mass function for selected value x.

Formula

The probability mass function (pmf) is computed by

\[\mathbb{P}(X = x | \lambda, \nu) = \dfrac{\lambda^{x}}{(x!)^{\nu}} \dfrac{1}{Z(\lambda, \nu)}\]

where \(Z(\lambda, \nu) = \sum_{j=0}^{\infty} \dfrac{\lambda^{j}}{(j!)^{\nu}}\) is calculated in statinf.distributions.discrete.CMPoisson.Z().

Parameters

x (int) – Value to be evaluated

Returns

Probability \(\mathbb{P}(X = x | \lambda, \nu)\)

Return type

float

class statinf.distributions.discrete.Discrete[source]

Bases: object

A generic class for discrete probability distributions

logp(*args)[source]
static nll(data)[source]
pmf(x)[source]
sample(size, seed=None)[source]
class statinf.distributions.discrete.NegativeBinomial(r_=None, p_=None, *args, **kwargs)[source]

Bases: Discrete

Negative Binomial distribution.

Formulae

The probability mass function (pmf) is defined by

\[\mathbb{P}(X = x | r, p) = \dfrac{(x + r - 1)!}{(r - 1)! x!} (1 - p)^{k} p^{r}\]
Example

>>> from statinf.distributions import NegativeBinomial
>>> # Let us generate a random sample of size 1000
>>> x = NegativeBinomial(n_=5, p_=0.15).sample(size=1000)
>>> # We can also estimate the parameters from the generated sample
>>> # We just need to initialize the class...
>>> nb = NegativeBinomial()
>>> # ... and we can fit from the generated sample. The function returns a dictionary
>>> nb.fit(x)
... {'r_': 5, 'p_': 0.15069905301345346, 'convergence': True, 'loglikelihood': -3972.2726626530775}
>>> # The class stores the value of the estimated parameters
>>> print(nb.p_)
... 0.15069905301345346
>>> # So we can generate more samples using the fitted parameters
>>> y = nb.sample(200)
Reference
  • DeGroot, M. H., & Schervish, M. J. (2012). Probability and statistics. Pearson Education.

fit(data, method='L-BFGS-B', init_params=[1, 0.5], bounds=None, **kwargs) dict[source]

Estimates the parameters \(\lambda\) and \(\nu\) of the distribution from empirical data based on Maximum Likelihood Estimation.

Note

There is no close form to estimate the parameters. Therefore, only MLE is available (no fast method).

Parameters
  • data (numpy.array or list or pandas.Series) – Data to fit and estimate parameters from.

  • method (str, optional) – Optimization method to estimate the parameters, defaults to ‘L-BFGS-B’

  • init_params (numpy.array, optional) – Initial parameters for the optimization method, defaults to np.array([1., 1.])

Returns

Estimated parameters

Return type

dict

static nloglike(params, data, eps=0.001) float[source]

Static method to estumate the negative likelihood (used in statinf.distributions.discrete.NegativeBinomial.fit() method).

Formula

The log-likelihood function \(l\) is defined by

\[\begin{split}\mathcal{l}(x_1, ..., x_n | r, p) &= \sum_{i=1}^{n} {\log(\Gamma(x_i + r))} \sum_{i=1}^{n} {\log(x_i!)} - N \log(\Gamma(r)) \\ &+ \sum_{i=1}^{n} {x_i \log(1-p)} + N r \log(p)\end{split}\]
Parameters
  • params (list) – List of parameters \(r\) and \(p\)

  • data (numpy.array or list or pandas.Series) – Data to evaluate the netative log-likelihood on

Returns

Negative log-likelihood

Return type

float

pmf(x) float[source]

Computes the probability mass function for selected value x.

Formula

The probability mass function (pmf) is computed by

\[\mathbb{P}(X = x | r, p) = \dfrac{(x + r - 1)!}{(r - 1)! x!} (1 - p)^{k} p^{r}\]
Parameters

x (int) – Value to be evaluated

Returns

Probability \(\mathbb{P}(X = x | r, p)\)

Return type

float

class statinf.distributions.discrete.Poisson(lambda_=None, *args, **kwargs)[source]

Bases: Discrete

Poisson distribution.

The Poisson distribution is the most common probability distribution for count data.

Formula

The probability mass function is defined by

\[\mathbb{P}(X = x | \lambda) = \dfrac{\lambda^{x}}{x!} e^{- \lambda}\]

The distribution assumes equi-dispersion, meaning that \(\mathbb{E}(X) = \mathbb{V}(X)\).

Parameters

lambda_ (float, optional) – Parameter \(\lambda\) representing the both location (\(\mathbb{E}(X)\)) and the scale (\(\mathbb{V}(X)\)) parameters, defaults to None

Example

>>> from statinf.distributions import Poisson
>>> # Let us generate a random sample of size 1000
>>> x = Poisson(lambda_=2.5).sample(size=1000)
>>> # We can also estimate the parameter from the generated sample
>>> # We just need to initialize the class...
>>> poiss = Poisson()
>>> # ... and we can fit from the generated sample. The function returns a dictionary
>>> poiss.fit(x)
... {'lambda_': 2.46}
>>> # The class stores the value of the estimated parameters
>>> print(poiss.lambda_)
... 2.46
>>> # So we can generate more samples using the fitted parameters
>>> y = poiss.sample(200)
Reference
fit(data, method='fast', **kwargs) dict[source]

Estimates the parameter \(\lambda\) of the distribution from empirical data based on Maximum Likelihood Estimation.

The Maximum Likihood Estimator also corresponds to the emprical mean:

\[\hat{\lambda}_{\text{MLE}} = \dfrac{1}{n} \sum_{i=1}^{n} x_i\]

The method = ‘fast’ is available to estimate the parameter directly from the emprical mean. Any other value for the parameter method will use classical MLE.

Parameters
  • data (numpy.array or list or pandas.Series) – Data to fit and estimate parameters from.

  • method (str, optional) – Optimization method to estimate the parameters as in the scipy library (allows ‘fast’ value), defaults to ‘L-BFGS-B’

Returns

Estimated parameter

Return type

dict

static nloglike(params, data) float[source]

Static method to estumate the negative likelihood (used in statinf.distributions.discrete.Poisson.fit() method).

Formula

The log-likelihood function \(l\) is defined by

\[\mathcal{l}(x_1, ..., x_n | \lambda, \nu) = - n \lambda + \log(\lambda) \sum_{i=1}^{n} {x_i} - \sum_{i=1}^{n} {\log(x_i!)}\]
Parameters
  • params (list) – List containing parameter \(\lambda\)

  • data (numpy.array or list or pandas.Series) – Data to evaluate the netative log-likelihood on

Returns

Negative log-likelihood

Return type

float

pmf(x) float[source]

Computes the probability mass function for selected value x.

Formula

The probability mass function (pmf) is computed by

\[\mathbb{P}(X = x | \lambda) = \dfrac{\lambda^{x}}{x!} e^{- \lambda}\]
Parameters

x (int) – Value to be evaluated

Returns

Probability \(\mathbb{P}(X = x | \lambda, \nu)\)

Return type

float

Example

from statinf.distributions import CMPoisson