crested.import_beds

Contents

crested.import_beds#

crested.import_beds(beds_folder, regions_file=None, chromsizes_file=None, classes_subset=None, remove_empty_regions=True, compress=False)#

Import beds and optionally consensus regions BED files into AnnData format.

Expects the folder with BED files where each file is named {class_name}.bed The result is an AnnData object with classes as rows and the regions as columns, with the .X values indicating whether a region is open in a class.

Note

This is the default function to import topic BED files coming from running pycisTopic (https://pycistopic.readthedocs.io/en/latest/) on your data. The result is an AnnData object with topics as rows and consensus region as columns, with binary values indicating whether a region is present in a topic.

Parameters:
  • beds_folder (PathLike) – Folder path containing the BED files.

  • regions_file (Optional[PathLike] (default: None)) – File path of the consensus regions BED file to use as columns in the AnnData object. If None, the regions will be extracted from the files.

  • classes_subset (Optional[list] (default: None)) – List of classes to include in the AnnData object. If None, all files will be included. Classes should be named after the file name without the extension.

  • chromsizes_file (Optional[PathLike] (default: None)) – File path of the chromsizes file. Used for checking if the new regions are within the chromosome boundaries. If not provided, will look for a registered genome object.

  • remove_empty_regions (bool (default: True)) – Remove regions that are not open in any class (only possible if regions_file is provided)

  • compress (bool (default: False)) –

    Compress the AnnData.X matrix. If True, the matrix will be stored as a sparse matrix. If False, the matrix will be stored as a dense matrix.

    WARNING: Compressing the matrix currently makes training very slow and is never recommended. We’re still investigating a way around.

Return type:

AnnData

Returns:

AnnData object with classes as rows and peaks as columns.

Example

>>> anndata = crested.import_beds(
...     beds_folder="path/to/beds/folder/",
...     regions_file="path/to/regions.bed",
...     chromsizes_file="path/to/chrom.sizes",
...     classes_subset=["Topic_1", "Topic_2"],
... )