crested.import_beds#

crested.import_beds(beds_folder, regions_file=None, chromsizes_file=None, classes_subset=None, remove_empty_regions=True, compress=False)#

Import beds and optionally consensus regions BED files into AnnData format.

Expects the folder with BED files where each file is named {class_name}.bed The result is an AnnData object with classes as rows and the regions as columns, with the .X values indicating whether a region is open in a class.

Note

This is the default function to import topic BED files coming from running pycisTopic (https://pycistopic.readthedocs.io/en/latest/) on your data. The result is an AnnData object with topics as rows and consensus region as columns, with binary values indicating whether a region is present in a topic.

Parameters:

beds_folder (list[str] | dict[str, str] | str | PathLike) – List of bed file paths, dict of bed paths with class name keys, or folder path containing the bed files. If a path to a folder, assumed all bed files have the .bed extension.
regions_file (str | PathLike | None (default: None)) – File path of the consensus regions BED file to use as columns in the AnnData object. If None, the regions will be extracted from the files.
classes_subset (list | None (default: None)) – List of classes to include in the AnnData object when providing a folder to read. If None, all files will be included. Classes should be named after the file name without the extension.
chromsizes_file (str | PathLike | None (default: None)) – File path of the chromsizes file. Used for checking if the new regions are within the chromosome boundaries. If not provided, will look for a registered genome object.
remove_empty_regions (bool (default: True)) – Remove regions that are not open in any class (only possible if regions_file is provided)
compress (bool (default: False)) –
Compress the AnnData.X matrix. If True, the matrix will be stored as a sparse matrix. If False, the matrix will be stored as a dense matrix.

WARNING: Compressing the matrix currently makes training very slow and is never recommended. We’re still investigating a way around.

Return type:

AnnData

Returns:

AnnData object with classes as rows and peaks as columns.

Example

>>> anndata = crested.import_beds(
...     beds_folder="path/to/beds/folder/",
...     regions_file="path/to/regions.bed",
...     chromsizes_file="path/to/chrom.sizes",
...     classes_subset=["Topic_1", "Topic_2"],
... )

crested.import_beds

Contents

crested.import_beds#