crested.Genome#

class crested.Genome(fasta, chrom_sizes=None, annotation=None, name=None)#

A class that encapsulates information about a genome, including its FASTA sequence, its annotation, and chromosome sizes.

Adapted from kaizhang/SnapATAC2.

Parameters:
  • fasta (Path) – The path to the FASTA file.

  • chrom_sizes (Union[dict[str, int], Path, None] (default: None)) – A path to a tab delimited chromsizes file or a dictionary containing chromosome names and sizes. If not provided, the chromosome sizes will be inferred from the FASTA file.

  • annotation (Optional[Path] (default: None)) – The path to the annotation file.

  • name (Optional[str] (default: None)) – Optional name of the genome.

Examples

>>> genome = Genome(
...     fasta="tests/data/test.fa",
...     chrom_sizes="tests/data/test.chrom.sizes",
... )
>>> print(genome.fasta)
<pysam.libcfaidx.FastaFile at 0x7f4d8b4a8f40>
>>> print(genome.chrom_sizes)
{'chr1': 1000, 'chr2': 2000}
>>> print(genome.name)
test

Attributes table#

annotation

The Path to the annotation file.

chrom_sizes

A dictionary with chromosome names as keys and their lengths as values.

fasta

The pysam FastaFile object for the FASTA file.

name

The name of the genome.

Methods table#

fetch([chrom, start, end, strand, region])

Fetch a sequence from a genomic region.

Attributes#

Genome.annotation#

The Path to the annotation file.

Currently not used in the package.

Returns:

The path to the annotation file.

Genome.chrom_sizes#

A dictionary with chromosome names as keys and their lengths as values.

Returns:

A dictionary of chromosome sizes.

Genome.fasta#

The pysam FastaFile object for the FASTA file.

Returns:

The pysam FastaFile object.

Genome.name#

The name of the genome.

Returns:

The name of the genome.

Methods#

Genome.fetch(chrom=None, start=None, end=None, strand='+', region=None)#

Fetch a sequence from a genomic region.

Start and end denote 0-based, half-open intervals, following the bed convention.

Parameters:
  • chrom (default: None) – The chromosome of the region to extract.

  • start (default: None) – The start of the region to extract. Assumes 0-indexed positions.

  • end (default: None) – The end of the region to extract, exclusive.

  • strand (default: '+') – The strand of the region. If ‘-’, the sequence is reverse-complemented. Default is “+”.

  • region (default: None) – Alternatively, a region string to parse. If supplied together with chrom/start/end, explicit coordinates take priority.

Return type:

str

Returns:

The requested sequence, as a string.