Bayesian statistics

class statinf.stats.bayesian.GGM[source]

Bases: object

Gaussian Generative model. This class implements a Linear and Quadratic classifiers obtained by assuming Gaussian distributed data and by using Bayes’ Theorem.

Warning

This function is still under development. This is a beta version, please be aware that some functionalitie might not be available. The full stable version soon will be released. To be added: priors of the classes

Formula

\[\mathbb{P}(A \mid B) = \dfrac{\mathbb{P}(B \mid A) \cdot \mathbb{P}(A)}{\mathbb{P}(B)}\]

References

Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.

fit(data, labels, nb_classes, isotropic=True)[source]

Extracts the mean and the covariance of each class, in the isotropic case we assume that all the classes have the same covariance. Returns the mean vector for each class, and the estimated covariance.

Parameters

data (numpy.ndarray) – Data features.
labels (numpy.ndarray) – Data labels.
nb_classes (int) – Number of classes in the data.
isotropic (bool) – Is an isotropic case or not, meaning the covariance matrix for each class is a \((\sigma^{2}) \times \mathbb{I}_{n}\) or different (LDA vs QDA).

plot_decision_boundary(X, labels, norm='euclidian', grid_size=100, *args)[source]

Plots the predictions on the entire dataset as well as the decision boudaries for each class

Parameters

X (numpy.ndarray) – Data features.
labels (numpy.ndarray) – Labels of each point in the training set.
norm (str) – Norm to be used, options are euclidian or mahalonobis, default is euclidian.
grid_size (int) – Size of the square grid to be plotted.

predict(new_data, norm='euclidian')[source]

Returns predictions for each sample, it affects the labels by finding the closest mean of each class using different distances (euclidian, mahalanobis with isotropic or non isotropic covariance) For the isotropic case we use a Linear Discriminant classifier, for the non isotropic case we use a Quadratic Discriminant classifier.

Parameters

new_data (numpy.ndarray) – New data to evaluate.
norm (str) – Norm to be used, options are euclidian or mahalonobis, default is euclidian.

predict_proba(X, norm='euclidian')[source]

Returns the likelihood probability for each class.

Parameters

X (numpy.ndarray) – Data features.
norm (str) – Norm to be used, options are euclidian or mahalonobis, default is euclidian.

Examples

LDA

from statinf.stats import GMM
from sklearn.datasets import make_blobs  # Use for synthetic data

# Generate data with Scikit Learn
X, labels = make_blobs(n_samples=[100, 100, 100], cluster_std=[0.5, 0.5, 0.5],
                               centers=None, n_features=2, random_state=0)

# Initialize and fit the GMM
classifier = GGM()
means, covariance = classifier.fit(X_train, y_train, nb_classes=3, isotropic=True)
# Predict
preds = classifier.predict(X_test, norm="euclidean")
# Plot the decision boundaries
classifier.plot_decision_boundary(X, labels, norm="euclidean")

Output will be:

Decision boundaries for LDA with linear isotropy