crested.pp.sort_and_filter_regions_on_specificity

crested.pp.sort_and_filter_regions_on_specificity#

crested.pp.sort_and_filter_regions_on_specificity(adata, top_k, model_name=None, method='gini')#

Sort bed regions & targets/predictions based on high Gini or proportion score per colum while keeping the top k rows per column.

Combines them into a single AnnData object with extra columns indicating the original class name, the rank per column, and the score.

Parameters:
  • adata (AnnData) – The AnnData object containing the matrix (celltypes, regions) to be sorted.

  • top_k (int) – The number of top regions to keep per column.

  • model_name (Optional[str] (default: None)) – The name of the model to look for in adata.layers[model_name] for predictions. If None, will use the targets in adata.X to decide which regions to sort.

  • method (str (default: 'gini')) – The method to use for calculating scores, either ‘gini’ or ‘proportion’. Default is ‘gini’.

Return type:

None

Returns:

The AnnData object is modified inplace with the sorted and filtered matrix, and extra columns indicating the original class name, the rank per column, and the score.

Example

>>> crested.pp.sort_and_filter_regions_on_specificity(
...     adata,
...     top_k=500,
...     method="gini",
... )