sc_dr

sc_dr.datasets

class sc_dr.datasets.FromPickle(path)[source]

Load a Dataset from a pickled object. At this stage, however, labels will not be available for the Dataset.

It is currently being used in scripts to compare embeddings. Labels can be taken from the full dataset

class sc_dr.datasets.FromPickledPanda(path)[source]

Load a dataset from pickled pandas dataframe

class sc_dr.datasets.PCAReducedDuo(path, n_components=2, log_trans=False, log1p=False)[source]

Same as DuoBenchmark but performs a preliminary PCA on the data. Usage is discouraged as it has not been updated to reflect recent Changes to the DuoBenchmark class

sc_dr.datasets.scale_dataset(ds)[source]

Scale each feature to be between 0 and 1 Note that overwrites the original data. In the future, this function should be changed to retain scaling information

Parameters:ds – dataset to be scaled

sc_dr.metrics

sc_dr.metrics.davies_bouldin_score(X, labels)[source]

Taken from: https://github.com/scikit-learn/scikit-learn/pull/12760 to avoid errors

sc_dr.metrics.dunn_index(X, labels)[source]

Calculate the Dunn Index for the provided clustering

Parameters:
  • points – a numpy array of data points
  • labels – a numpy array of labels for each point

sc_dr.summarize

sc_dr.clustering