crested.import_bigwigs#

crested.import_bigwigs(bigwigs_folder, regions_file, chromsizes_file=None, target='mean', target_region_width=None, compress=False)#

Import bigWig files and consensus regions BED file into AnnData format.

This format is required to be able to train a peak prediction model. The bigWig files target values are calculated for each region and and imported into an AnnData object, with the bigWig file names as .obs and the consensus regions as .var. Optionally, the target region width can be specified to extract values from a wider/narrower region around the consensus region, where the original region will still be used as the index. This is often useful to extract sequence information around the actual peak region.

Parameters:

bigwigs_folder (list[str] | dict[str, str] | str | PathLike) – List of bigWig file paths, dict of bigWig paths with class name keys, or folder path containing the bigWig files.
regions_file (str | PathLike) – File name of the consensus regions BED file.
chromsizes_file (str | PathLike | None (default: None)) – File name of the chromsizes file. Used for checking if the new regions are within the chromosome boundaries. If not provided, will look for a registered genome object.
target (str (default: 'mean')) – Target value to extract from bigwigs. Can be ‘mean’, ‘max’, ‘count’, or ‘logcount’. 'count' sums the signal over the region while 'mean' averages it, so count == mean * target_region_width. For dense coverage bigwigs use 'mean'; for sparse cut-site bigwigs use 'count' (summing avoids near-zero values). Note: this choice is coupled to the multiplier of CosineMSELogLoss if you train with it. Pair 'mean' targets with multiplier=target_region_width (1000 by default) and 'count' targets with multiplier=1; a mismatch silently collapses the loss’ dynamic range. See default_configs() (‘peak_regression_mean’ vs ‘peak_regression_count’).
target_region_width (int | None (default: None)) – Width of region that the bigWig target value will be extracted from. If None, the consensus region width will be used.
compress (bool (default: False)) – Compress the AnnData.X matrix. If True, the matrix will be stored as a sparse matrix. If False, the matrix will be stored as a dense matrix.

Return type:

AnnData

Returns:

AnnData object with bigWigs as rows and peaks as columns.

Example

>>> anndata = crested.import_bigwigs(
...     bigwigs_folder="path/to/bigwigs",
...     regions_file="path/to/peaks.bed",
...     chromsizes_file="path/to/chrom.sizes",
...     target="max",
...     target_region_width=500,
... )

crested.import_bigwigs

Contents

crested.import_bigwigs#