Unsupervised
- class statinf.stats.unsupervised.GaussianMixture[source]
Bases:
object
Class for a gaussian mixture model, uses the EM algorithm to fit the model to the data.
Warning
This function is still under development. This is a beta version, please be aware that some functionalitie might not be available. The full stable version soon will be released.
- References
Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.
- Source
- fit(X, k, n_epochs=100, improvement_threshold=0.0005)[source]
Fitting function initialized by K-means algorithm.
- Parameters
X (
numpy.ndarray
) – data.K (
int
) – number of clusters (gaussians).n_epochs (
numpy.ndarray
) – number of epochs, default is 100.improvement_threshold (
float
, optional) – Threshold from which we consider the likelihood improved, defaults to 0.0005.
- class statinf.stats.unsupervised.KMeans(k=1, max_iter=100, init='random', random_state=0)[source]
Bases:
object
K-means clustering implementation.
Warning
This function is still under development. This is a beta version, please be aware that some functionalitie might not be available. The full stable version soon will be released.
- Parameters
k (
int
) – number of clusters, default is 1.max_iter (
int
) – number of iterations for convergence.init (
String
) – initialization option, options are random or kmeans++ .random_state (
int
) – seed of the random state, default is 0.labels (
numpy.array
) – labels for each datapoint.centroids (
numpy.array
) – coordinates of the centroids.
- References
Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, No. 10). New York: Springer series in statistics.
- closest_centroid(points, centroids)[source]
Returns an array containing the index to the nearest centroid for each point
- Parameters
points (
numpy.array
) – features of each point.centroids (
list
) – list of the centroids coordinates.
- fit(X)[source]
Fit the model to the data using different initializations (random init or kmeans++)
- Parameters
X (
numpy.array
) – Input data.
- get_distance(points, centroids)[source]
Returns the euclidian distance between each point and the centroids.
- Parameters
points (
numpy.array
) – features of each point.centroids (
list
) – list of the centroids coordinates.
- move_centroids(points, closest, centroids)[source]
Returns the new centroids assigned from the points closest to them.
- Parameters
points (
numpy.array
) – features of each point.closest (
numpy.array
) – array with the index of closest centroid for each point.centroids (
list
) – list of the centroids coordinates.