crested.tl.data.AnnDataset

crested.tl.data.AnnDataset#

class crested.tl.data.AnnDataset(anndata, genome, split=None, in_memory=True, random_reverse_complement=False, always_reverse_complement=False, max_stochastic_shift=0, deterministic_shift=False)#

Dataset class for combining genome files and AnnData objects.

Called by the by the AnnDataModule class.

Parameters:
  • anndata (AnnData) – AnnData object containing the data.

  • genome (Genome) – Genome instance

  • split (Optional[str] (default: None)) – ‘train’, ‘val’, or ‘test’ split column in anndata.var.

  • in_memory (bool (default: True)) – If True, the train and val sequences will be loaded into memory.

  • random_reverse_complement (bool (default: False)) – If True, the sequences will be randomly reverse complemented during training.

  • always_reverse_complement (bool (default: False)) – If True, all sequences will be augmented with their reverse complement during training.

  • max_stochastic_shift (int (default: 0)) – Maximum stochastic shift (n base pairs) to apply randomly to each sequence during training.

  • deterministic_shift (bool (default: False)) – If true, each region will be shifted twice with stride 50bp to each side. This is our legacy shifting, we recommend using max_stochastic_shift instead.

Methods table#

Methods#