crested.tl.enhancer_design_in_silico_evolution#
- crested.tl.enhancer_design_in_silico_evolution(n_mutations, target, model, n_sequences=1, return_intermediate=False, no_mutation_flanks=None, target_len=None, enhancer_optimizer=None, starting_sequences=None, acgt_distribution=None, **kwargs)#
Create synthetic enhancers for a specified class using in silico evolution (ISE).
- Parameters:
n_mutations (
int
) – Number of mutations to make in each sequence. 20 is a good starting point for most cases.target (
int
|ndarray
) – Using the default weighted_difference optimization function this should be the index of the target class to design enhancers for. This gets passed to theget_best
function of the EnhancerOptimizer, so can represent other target values too.model (
Model
|list
[Model
]) – A (list of) trained keras model(s) to design enhancers with. If a list of models is provided, the predictions will be averaged across all models.n_sequences (
int
(default:1
)) – Number of enhancers to designreturn_intermediate (
bool
(default:False
)) – If True, returns a dictionary with predictions and changes made in intermediate steps for selected sequencesno_mutation_flanks (
Optional
[tuple
[int
,int
]] (default:None
)) – A tuple of integers which determine the regions in each flank to not do insertions.target_len (
Optional
[int
] (default:None
)) – Length of the area in the center of the sequence to make mutations in. Ignored if no_mutation_flanks is provided.acgt_distribution (
Optional
[ndarray
[float
]] (default:None
)) – An array of floats representing the distribution of A, C, G, and T in the genome (in that order). If the array is of shape (L, 4), it will be assumed to be per position. If it is of shape (4,), it will be assumed to be overall. If None, a uniform distribution will be used. This will be used to generate random sequences if starting_sequences is not provided. You can calculate these usingcalculate_nucleotide_distribution()
.kwargs (
dict
[str
,Any
]) – Keyword arguments that will be passed to theget_best
function of the EnhancerOptimizer
- Return type:
- Returns:
A list of designed sequences. If return_intermediate is True, will also return a list of dictionaries of intermediate mutations and predictions.
Examples
>>> acgt_distribution = crested.utils.calculate_nucleotide_distribution( ... my_anndata, genome, per_position=True ... ) # shape (L, 4) >>> target_idx = my_anndata.obs_names.index("my_celltype") >>> ( ... intermediate_results, ... designed_sequences, ... ) = crested.tl.enhancer_design_in_silico_evolution( ... n_mutations=20, ... target=target_idx, ... model=my_trained_model, ... n_sequences=1, ... return_intermediate=True, ... acgt_distribution=acgt_distribution, ... )