Data Generation

statinf.data.GenerateData.generate_dataset(coeffs, n, std_dev, intercept=0.0, distribution='normal', binary=False, seed=None, **kwargs)[source]

Generate an artificial dataset

Parameters
  • coeffs (list) – List of coefficients to use for computing the ouytput variable.

  • n (int) – Number of observations to generate.

  • std_dev (list) – Standard deviation of the distribution.

  • intercept (float, optional) – Value of the intercept to be set, defaults to 0.

  • distribution (str, optional) –

    Type of distribution to use for generating the input variables, defaults to ‘normal’. Can be:

    • normal: \(X \sim \mathcal{N}(\mu, \sigma^{2})\)

    • unirform: \(X \sim \mathcal{U}_{[\text{low}, \text{high}]}\)

  • binary (bool, optional) – Define if output is binary, defaults to False.

  • seed (int, optional) – Random seed, defaults to None.

  • **kwargs

    Arguments to be passed in the distribution function. Can be:

    • normal: loc = \(\mu\) and scale = \(\sigma^{2}\)

    • uniform: low and high

Returns

DataFrame with output variable named as Y and covariates as X0, X1, X2, …

Return type

pandas.DataFrame