Data Generation
- statinf.data.GenerateData.generate_dataset(coeffs, n, std_dev, intercept=0.0, distribution='normal', binary=False, seed=None, **kwargs)[source]
Generate an artificial dataset
- Parameters
coeffs (
list
) – List of coefficients to use for computing the ouytput variable.n (
int
) – Number of observations to generate.std_dev (
list
) – Standard deviation of the distribution.intercept (
float
, optional) – Value of the intercept to be set, defaults to 0.distribution (
str
, optional) –Type of distribution to use for generating the input variables, defaults to ‘normal’. Can be:
normal: \(X \sim \mathcal{N}(\mu, \sigma^{2})\)
unirform: \(X \sim \mathcal{U}_{[\text{low}, \text{high}]}\)
binary (
bool
, optional) – Define if output is binary, defaults to False.seed (
int
, optional) – Random seed, defaults to None.**kwargs –
Arguments to be passed in the distribution function. Can be:
normal:
loc
= \(\mu\) andscale
= \(\sigma^{2}\)uniform:
low
andhigh
- Returns
DataFrame with output variable named as
Y
and covariates asX0
,X1
,X2
, …- Return type
pandas.DataFrame