crested.pp.filter_regions_on_specificity

crested.pp.filter_regions_on_specificity#

crested.pp.filter_regions_on_specificity(adata, gini_std_threshold=1.0, model_name=None)#

Filter bed regions & targets/predictions based on high Gini score.

This function filters regions based on their specificity using Gini scores. The regions with high Gini scores are retained, and a new AnnData object is created with the filtered data. If model_name is provided, will look for the corresponding predictions in the adata.layers[model_name] layer. Else, it will use the targets in adata.X to decide which regions to keep.

Parameters:
  • adata (AnnData) – The AnnData object containing the matrix (celltypes, regions) to be filtered.

  • gini_std_threshold (float (default: 1.0)) – The number of standard deviations above the mean Gini score used to determine the threshold for high variability.

  • model_name (Optional[str] (default: None)) – The name of the model to look for in adata.layers[model_name] for predictions. If None, will use the targets in adata.X to select specific regions.

Return type:

None

Returns:

The AnnData object is filtered inplace with the filtered matrix and updated variable names.

Example

>>> crested.pp.filter_regions_on_specificity(
...     adata,
...     gini_std_threshold=1.0,
... )