crested.tl.score_gene_locus

crested.tl.score_gene_locus#

crested.tl.score_gene_locus(chr_name, gene_start, gene_end, target_idx, model, genome=None, strand='+', upstream=50000, downstream=10000, central_size=1000, step_size=50, **kwargs)#

Score regions upstream and downstream of a gene locus using the model’s prediction.

The model predicts a value for the {central_size} of each window.

Parameters:
  • chrom – The chromosome name.

  • gene_start (int) – The start position of the gene locus (TSS for + strand).

  • gene_end (int) – The end position of the gene locus (TSS for - strand).

  • target_idx (int) – Index of the target class to score. You can usually get this from running list(anndata.obs_names).index(class_name).

  • model (Model | list[Model]) – A (list of) trained keras model(s) to make predictions with.

  • genome (Union[Genome, PathLike, None] (default: None)) – Genome or path to the genome file. Required if no genome is registered.

  • strand (str (default: '+')) – ‘+’ for positive strand, ‘-’ for negative strand. Default ‘+’.

  • upstream (int (default: 50000)) – Distance upstream of the gene to score.

  • downstream (int (default: 10000)) – Distance downstream of the gene to score.

  • central_size (int (default: 1000)) – Size of the central region that the model predicts for.

  • step_size (int (default: 50)) – Distance between consecutive windows.

  • **kwargs – Additional keyword arguments to pass to the keras.Model.predict method.

Return type:

tuple[ndarray, ndarray, int, int, int]

Returns:

scores

An array of prediction scores across the entire genomic range.

coordinates

An array of tuples, each containing the chromosome name and the start and end positions of the sequence for each window.

min_loc

Start position of the entire scored region.

max_loc

End position of the entire scored region.

tss_position

The transcription start site (TSS) position.