crested.import_bigwigs#
- crested.import_bigwigs(bigwigs_folder, regions_file, chromsizes_file=None, target='mean', target_region_width=None, compress=False)#
Import bigWig files and consensus regions BED file into AnnData format.
This format is required to be able to train a peak prediction model. The bigWig files target values are calculated for each region and and imported into an AnnData object, with the bigWig file names as .obs and the consensus regions as .var. Optionally, the target region width can be specified to extract values from a wider/narrower region around the consensus region, where the original region will still be used as the index. This is often useful to extract sequence information around the actual peak region.
- Parameters:
bigwigs_folder (
PathLike
) – Folder name containing the bigWig files.regions_file (
PathLike
) – File name of the consensus regions BED file.chromsizes_file (
Optional
[PathLike
] (default:None
)) – File name of the chromsizes file. Used for checking if the new regions are within the chromosome boundaries. If not provided, will look for a registered genome object.target (
str
(default:'mean'
)) – Target value to extract from bigwigs. Can be ‘mean’, ‘max’, ‘count’, or ‘logcount’target_region_width (
Optional
[int
] (default:None
)) – Width of region that the bigWig target value will be extracted from. If None, the consensus region width will be used.compress (
bool
(default:False
)) – Compress the AnnData.X matrix. If True, the matrix will be stored as a sparse matrix. If False, the matrix will be stored as a dense matrix.
- Return type:
- Returns:
AnnData object with bigWigs as rows and peaks as columns.
Example
>>> anndata = crested.import_bigwigs( ... bigwigs_folder="path/to/bigwigs", ... regions_file="path/to/peaks.bed", ... chromsizes_file="path/to/chrom.sizes", ... target="max", ... target_region_width=500, ... )