nemi.workflow module

class nemi.workflow.NEMI(params=None)[source]

Bases: SingleNemi

Main NEMI workflow

Parameters:

params (dict, optional) – clustering and enbedding algorithm parameters.

run(X, n=1)[source]

Run the NEMI pipeline

The pipeline consists of steps:

  • fitting the embedding

  • predicting the clusters,

  • sorting the clusters by descending size

Parameters:
  • X (ndarray) – The data contained in a sparse matrix of shape (n_samples, n_features)

  • n (int, optional) – Number of iterations to run. Defaults to 1.

plot(to_plot=None, plot_ensemble=False, **kwargs)[source]
assess_overlap(base_id: int = 0, max_clusters=None, **kwargs)[source]

Assess the overlap between the clusters.

Parameters:

base_id (int, optional) – index (starting at 0) of ensemble member to use as the base comparison

class nemi.workflow.SingleNemi(params=None)[source]

Bases: object

A single instance of the NEMI pipeline

Parameters:

params (dict, optional) – A dictionary of the embedding and clustering options. Defaults to nemi.workflow.default_params.

run(X, save_steps=True)[source]

Run a single instance of the NEMI pipeline

The pipeline consists of steps:

  • fitting the embedding

  • predicting the clusters,

  • sorting the clusters by descending size

Parameters:

X (ndarray) – The data contained in a sparse matrix of shape (n_samples, n_features)

scale_data(X)[source]

Scale the data to have a mean and variance of 1.

Parameters:
  • X (ndarray) – The data to pick seeds for. A sparse matrix of shape (n_samples, n_features)

  • **kwargs – keyword arguments to embedding function

fit_embedding(X)[source]

Run the embedding algorithm on the data

Args

X (ndarray): The data to pick seeds for. A sparse matrix of shape (n_samples, n_features) **kwargs : keyword arguments to embedding function

predict_clusters()[source]

Run the clustering algorithm on the embedding

Clustering algorithm parameters is set by the clustering_dict attribute.

Returns:

Identified clusters

sort_clusters(clusters)[source]

Updates cluster labels 0,1,…,k so that each cluster is of descending size.

Parameters:

( (clusters) – py:class`~numpy.ndarray`, list)

Returns:

An array with the new labels

save(filename)[source]
load_embedding(filename)[source]
save_embedding(filename)[source]

Save the embedding to a file

Parameters:

filename (str) – Filename to save embedding

plot(to_plot=None, **kwargs)[source]