crested.get_dataset

Contents

crested.get_dataset#

crested.get_dataset(dataset)#

Fetch an example dataset. This function retrieves the dataset of bigwig or bed files and associated region file, downloading if not already cached, and returns the paths to the dataset.

Provided examples: - ‘mouse_cortex_bed’: the BICCN mouse cortex snATAC-seq dataset, processed as BED files per topic. For use in topic classification. - ‘mouse_cortex_bigwig_coverage’: the BICCN mouse cortex snATAC-seq dataset, processed as pseudobulked bigWig coverage tracks per cell type. For use in peak regression. - ‘mouse_cortex_bigwig_cut_sites’: the BICCN mouse cortex snATAC-seq dataset, processed as pseudobulked bigWig cut site tracks per cell type. For use in peak regression.

These two paths can be passed to crested.import_bigwigs() / crested.import_beds().

Note

The cache location can be changed by setting environment variable $CRESTED_DATA_DIR.

Parameters:

dataset (str) – The name of the dataset to fetch. Options: - ‘mouse_cortex_bed’ - ‘mouse_cortex_bigwig_cut_sites’ - ‘mouse_cortex_bigwig_coverage’ - ‘mouse_cortex_bigwig’ (deprecated, same as ‘mouse_cortex_bigwig_coverage’)

Returns:

A tuple consisting of the BED/bigWig-containing directory and the consensus regions file.

Example

>>> beds_folder, regions_file = crested.get_dataset("mouse_cortex_bed")
>>> adata = crested.import_beds(beds_folder=beds_folder, regions_file=regions_file)