crested.import_beds#
- crested.import_beds(beds_folder, regions_file=None, chromsizes_file=None, classes_subset=None, remove_empty_regions=True, compress=False)#
Import beds and optionally consensus regions BED files into AnnData format.
Expects the folder with BED files where each file is named {class_name}.bed The result is an AnnData object with classes as rows and the regions as columns, with the .X values indicating whether a region is open in a class.
Note
This is the default function to import topic BED files coming from running pycisTopic (https://pycistopic.readthedocs.io/en/latest/) on your data. The result is an AnnData object with topics as rows and consensus region as columns, with binary values indicating whether a region is present in a topic.
- Parameters:
beds_folder (
PathLike
) – Folder path containing the BED files.regions_file (
Optional
[PathLike
] (default:None
)) – File path of the consensus regions BED file to use as columns in the AnnData object. If None, the regions will be extracted from the files.classes_subset (
Optional
[list
] (default:None
)) – List of classes to include in the AnnData object. If None, all files will be included. Classes should be named after the file name without the extension.chromsizes_file (
Optional
[PathLike
] (default:None
)) – File path of the chromsizes file. Used for checking if the new regions are within the chromosome boundaries. If not provided, will look for a registered genome object.remove_empty_regions (
bool
(default:True
)) – Remove regions that are not open in any class (only possible if regions_file is provided)compress (
bool
(default:False
)) –Compress the AnnData.X matrix. If True, the matrix will be stored as a sparse matrix. If False, the matrix will be stored as a dense matrix.
WARNING: Compressing the matrix currently makes training very slow and is never recommended. We’re still investigating a way around.
- Return type:
- Returns:
AnnData object with classes as rows and peaks as columns.
Example
>>> anndata = crested.import_beds( ... beds_folder="path/to/beds/folder/", ... regions_file="path/to/regions.bed", ... chromsizes_file="path/to/chrom.sizes", ... classes_subset=["Topic_1", "Topic_2"], ... )