crested.Genome#

class crested.Genome(fasta, chrom_sizes=None, annotation=None, name=None)#

A class that encapsulates information about a genome, including its FASTA sequence, its annotation, and chromosome sizes.

Adapted from kaizhang/SnapATAC2.

Parameters:

fasta (Path) – The path to the FASTA file.
chrom_sizes (dict[str, int] | Path | None (default: None)) – A path to a tab delimited chromsizes file or a dictionary containing chromosome names and sizes. If not provided, the chromosome sizes will be inferred from the FASTA file.
annotation (Path | None (default: None)) – The path to the annotation file.
name (str | None (default: None)) – Optional name of the genome.

Examples

>>> genome = Genome(
...     fasta="tests/data/test.fa",
...     chrom_sizes="tests/data/test.chrom.sizes",
... )
>>> print(genome.fasta)
<pysam.libcfaidx.FastaFile at 0x7f4d8b4a8f40>
>>> print(genome.chrom_sizes)
{'chr1': 1000, 'chr2': 2000}
>>> print(genome.name)
test

Attributes table#

`annotation`	The Path to the annotation file.
`chrom_sizes`	A dictionary with chromosome names as keys and their lengths as values.
`fasta`	The pysam FastaFile object for the FASTA file.
`name`	The name of the genome.

Methods table#

fetch([chrom, start, end, strand, region])

Fetch a sequence from a genomic region.

Attributes#

Genome.annotation#

The Path to the annotation file.

Currently not used in the package.

Returns:: The path to the annotation file.

Genome.chrom_sizes#

A dictionary with chromosome names as keys and their lengths as values.

Returns:: A dictionary of chromosome sizes.

Genome.fasta#

The pysam FastaFile object for the FASTA file.

Returns:: The pysam FastaFile object.

Genome.name#

The name of the genome.

Returns:: The name of the genome.

Methods#

Genome.fetch(chrom=None, start=None, end=None, strand='+', region=None)#

Fetch a sequence from a genomic region.

Start and end denote 0-based, half-open intervals, following the bed convention.

Parameters:

chrom (str | None (default: None)) – The chromosome of the region to extract.
start (int | None (default: None)) – The start of the region to extract. Assumes 0-indexed positions.
end (int | None (default: None)) – The end of the region to extract, exclusive.
strand (str (default: '+')) – The strand of the region. If ‘-’, the sequence is reverse-complemented. Default is “+”.
region (str | None (default: None)) – Alternatively, a region string to parse. If supplied together with chrom/start/end, explicit coordinates take priority.

Return type:

str

Returns:

The requested sequence, as a string.

crested.Genome

Contents

crested.Genome#

Attributes table#

Methods table#

Attributes#

Methods#