Unsupervised

class statinf.stats.unsupervised.GaussianMixture[source]

Bases: object

Class for a gaussian mixture model, uses the EM algorithm to fit the model to the data.

Warning

This function is still under development. This is a beta version, please be aware that some functionalitie might not be available. The full stable version soon will be released.

References
  • Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.

Source
fit(X, k, n_epochs=100, improvement_threshold=0.0005)[source]

Fitting function initialized by K-means algorithm.

Parameters
  • X (numpy.ndarray) – data.

  • K (int) – number of clusters (gaussians).

  • n_epochs (numpy.ndarray) – number of epochs, default is 100.

  • improvement_threshold (float, optional) – Threshold from which we consider the likelihood improved, defaults to 0.0005.

class statinf.stats.unsupervised.KMeans(k=1, max_iter=100, init='random', random_state=0)[source]

Bases: object

K-means clustering implementation.

Warning

This function is still under development. This is a beta version, please be aware that some functionalitie might not be available. The full stable version soon will be released.

Parameters
  • k (int) – number of clusters, default is 1.

  • max_iter (int) – number of iterations for convergence.

  • init (String) – initialization option, options are random or kmeans++ .

  • random_state (int) – seed of the random state, default is 0.

  • labels (numpy.array) – labels for each datapoint.

  • centroids (numpy.array) – coordinates of the centroids.

References
  • Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, No. 10). New York: Springer series in statistics.

closest_centroid(points, centroids)[source]

Returns an array containing the index to the nearest centroid for each point

Parameters
  • points (numpy.array) – features of each point.

  • centroids (list) – list of the centroids coordinates.

fit(X)[source]

Fit the model to the data using different initializations (random init or kmeans++)

Parameters

X (numpy.array) – Input data.

get_distance(points, centroids)[source]

Returns the euclidian distance between each point and the centroids.

Parameters
  • points (numpy.array) – features of each point.

  • centroids (list) – list of the centroids coordinates.

move_centroids(points, closest, centroids)[source]

Returns the new centroids assigned from the points closest to them.

Parameters
  • points (numpy.array) – features of each point.

  • closest (numpy.array) – array with the index of closest centroid for each point.

  • centroids (list) – list of the centroids coordinates.

silhouette_score(X, labels)[source]

To be added soon