crested.tl.enhancer_design_in_silico_evolution

crested.tl.enhancer_design_in_silico_evolution#

crested.tl.enhancer_design_in_silico_evolution(n_mutations, target, model, n_sequences=1, return_intermediate=False, no_mutation_flanks=None, target_len=None, enhancer_optimizer=None, starting_sequences=None, acgt_distribution=None, **kwargs)#

Create synthetic enhancers for a specified class using in silico evolution (ISE).

Parameters:
  • n_mutations (int) – Number of mutations to make in each sequence. 20 is a good starting point for most cases.

  • target (int | ndarray) – Using the default weighted_difference optimization function this should be the index of the target class to design enhancers for. This gets passed to the get_best function of the EnhancerOptimizer, so can represent other target values too.

  • model (Model | list[Model]) – A (list of) trained keras model(s) to design enhancers with. If a list of models is provided, the predictions will be averaged across all models.

  • n_sequences (int (default: 1)) – Number of enhancers to design

  • return_intermediate (bool (default: False)) – If True, returns a dictionary with predictions and changes made in intermediate steps for selected sequences

  • no_mutation_flanks (Optional[tuple[int, int]] (default: None)) – A tuple of integers which determine the regions in each flank to not do insertions.

  • target_len (Optional[int] (default: None)) – Length of the area in the center of the sequence to make mutations in. Ignored if no_mutation_flanks is provided.

  • acgt_distribution (Optional[ndarray[float]] (default: None)) – An array of floats representing the distribution of A, C, G, and T in the genome (in that order). If the array is of shape (L, 4), it will be assumed to be per position. If it is of shape (4,), it will be assumed to be overall. If None, a uniform distribution will be used. This will be used to generate random sequences if starting_sequences is not provided. You can calculate these using calculate_nucleotide_distribution().

  • kwargs (dict[str, Any]) – Keyword arguments that will be passed to the get_best function of the EnhancerOptimizer

Return type:

list | tuple[list[dict], list]

Returns:

A list of designed sequences. If return_intermediate is True, will also return a list of dictionaries of intermediate mutations and predictions.

Examples

>>> acgt_distribution = crested.utils.calculate_nucleotide_distribution(
...     my_anndata, genome, per_position=True
... )  # shape (L, 4)
>>> target_idx = my_anndata.obs_names.index("my_celltype")
>>> (
...     intermediate_results,
...     designed_sequences,
... ) = crested.tl.enhancer_design_in_silico_evolution(
...     n_mutations=20,
...     target=target_idx,
...     model=my_trained_model,
...     n_sequences=1,
...     return_intermediate=True,
...     acgt_distribution=acgt_distribution,
... )