crested.tl.data.AnnDataset#
- class crested.tl.data.AnnDataset(anndata, genome, split=None, in_memory=True, random_reverse_complement=False, always_reverse_complement=False, max_stochastic_shift=0, deterministic_shift=False)#
Dataset class for combining genome files and AnnData objects.
Called by the by the AnnDataModule class.
- Parameters:
anndata (
AnnData
) – AnnData object containing the data.genome (
Genome
) – Genome instancesplit (
Optional
[str
] (default:None
)) – ‘train’, ‘val’, or ‘test’ split column in anndata.var.in_memory (
bool
(default:True
)) – If True, the train and val sequences will be loaded into memory.random_reverse_complement (
bool
(default:False
)) – If True, the sequences will be randomly reverse complemented during training.always_reverse_complement (
bool
(default:False
)) – If True, all sequences will be augmented with their reverse complement during training.max_stochastic_shift (
int
(default:0
)) – Maximum stochastic shift (n base pairs) to apply randomly to each sequence during training.deterministic_shift (
bool
(default:False
)) – If true, each region will be shifted twice with stride 50bp to each side. This is our legacy shifting, we recommend using max_stochastic_shift instead.