crested.pp.sort_and_filter_regions_on_specificity

crested.pp.sort_and_filter_regions_on_specificity#

crested.pp.sort_and_filter_regions_on_specificity(adata, top_k, model_name=None, method='gini', inplace=True)#

Sort bed regions & targets/predictions based on high Gini or proportion score per colum while keeping the top k rows per column.

Combines them into a single AnnData object with extra columns indicating the original class name, the rank per column, and the score. To get an idea for the impact of different possible top_k values, see sort_and_filter_cutoff().

Parameters:
  • adata (AnnData) – The AnnData object containing the matrix (celltypes, regions) to be sorted.

  • top_k (int) – The number of top regions to keep per column.

  • model_name (str | None (default: None)) – The name of the model to look for in adata.layers[model_name] for predictions. If None, will use the targets in adata.X to decide which regions to sort.

  • method (str (default: 'gini')) – The method to use for calculating scores, either ‘gini’ or ‘proportion’. Default is ‘gini’.

  • inplace (bool (default: True)) – Perform computation and modify adata in-place or return a resulting copy of the adata instead.

Return type:

AnnData | None

Returns:

If inplace=True (default), returns nothing and modifies the AnnData in-place with the sorted and filtered matrix, and extra columns indicating the original class name, the rank per column, and the score. If inplace=False, returns a modified copy of the AnnData object instead.

Example

>>> crested.pp.sort_and_filter_regions_on_specificity(
...     adata,
...     top_k=500,
...     method="gini",
... )