Unsupervised¶

class
statinf.stats.unsupervised.
GaussianMixture
[source]¶ Bases:
object
Class for a gaussian mixture model, uses the EM algorithm to fit the model to the data.
Warning
This function is still under development. This is a beta version, please be aware that some functionalitie might not be available. The full stable version soon will be released.
 References
Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.
 Source

fit
(X, k, n_epochs=100, improvement_threshold=0.0005)[source]¶ Fitting function initialized by Kmeans algorithm.
 Parameters
X (
numpy.ndarray
) – data.K (
int
) – number of clusters (gaussians).n_epochs (
numpy.ndarray
) – number of epochs, default is 100.improvement_threshold (
float
, optional) – Threshold from which we consider the likelihood improved, defaults to 0.0005.

class
statinf.stats.unsupervised.
KMeans
(k=1, max_iter=100, init='random', random_state=0)[source]¶ Bases:
object
Kmeans clustering implementation.
Warning
This function is still under development. This is a beta version, please be aware that some functionalitie might not be available. The full stable version soon will be released.
 Parameters
k (
int
) – number of clusters, default is 1.max_iter (
int
) – number of iterations for convergence.init (
String
) – initialization option, options are random or kmeans++ .random_state (
int
) – seed of the random state, default is 0.labels (
numpy.array
) – labels for each datapoint.centroids (
numpy.array
) – coordinates of the centroids.
 References
Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, No. 10). New York: Springer series in statistics.

closest_centroid
(points, centroids)[source]¶ Returns an array containing the index to the nearest centroid for each point
 Parameters
points (
numpy.array
) – features of each point.centroids (
list
) – list of the centroids coordinates.

fit
(X)[source]¶ Fit the model to the data using different initializations (random init or kmeans++)
 Parameters
X (
numpy.array
) – Input data.

get_distance
(points, centroids)[source]¶ Returns the euclidian distance between each point and the centroids.
 Parameters
points (
numpy.array
) – features of each point.centroids (
list
) – list of the centroids coordinates.

move_centroids
(points, closest, centroids)[source]¶ Returns the new centroids assigned from the points closest to them.
 Parameters
points (
numpy.array
) – features of each point.closest (
numpy.array
) – array with the index of closest centroid for each point.centroids (
list
) – list of the centroids coordinates.