crested.tl.enhancer_design_motif_insertion#
- crested.tl.enhancer_design_motif_insertion(patterns, model, target, n_sequences=1, insertions_per_pattern=None, return_intermediate=False, no_mutation_flanks=None, target_len=None, preserve_inserted_motifs=True, enhancer_optimizer=None, starting_sequences=None, acgt_distribution=None, **kwargs)#
Create synthetic enhancers using motif insertions.
- Parameters:
patterns (
dict
) – Dictionary of patterns to be implemented in the form {‘pattern_name’: ‘pattern_sequence’}model (
Model
|list
[Model
]) – A (list of) trained keras model(s) to design enhancers with. If a list of models is provided, the predictions will be averaged across all models.target (
int
|ndarray
) – Using the default weighted_difference optimization function this should be the index of the target class to design enhancers for. This gets passed to theget_best
function of the EnhancerOptimizer, so can represent other target values too.n_sequences (
int
(default:1
)) – Number of enhancers to design.insertions_per_pattern (
Optional
[dict
] (default:None
)) – Dictionary of number of patterns to be implemented in the form {‘pattern_name’: number_of_insertions}. If not provided, each pattern is inserted once.return_intermediate (
bool
(default:False
)) – If True, returns a dictionary with predictions and changes made in intermediate steps.no_mutation_flanks (
Optional
[tuple
[int
,int
]] (default:None
)) – A tuple specifying regions in each flank where no modifications should occur.target_len (
Optional
[int
] (default:None
)) – Length of the area in the center of the sequence to make insertions, ignored ifno_mutation_flanks
is set.preserve_inserted_motifs (
bool
(default:True
)) – If True, prevents motifs from being inserted on top of previously inserted motifs.enhancer_optimizer (
Optional
[EnhancerOptimizer
] (default:None
)) – An instance of EnhancerOptimizer, defining how sequences should be optimized. If None, a default EnhancerOptimizer will be initialized using_weighted_difference
as optimization function.starting_sequences (
Union
[str
,list
,None
] (default:None
)) – An optional DNA sequence or a list of DNA sequences that will be used instead of randomly generated sequences. If provided, n_sequences is ignoredacgt_distribution (
Optional
[ndarray
[float
]] (default:None
)) – An array of floats representing the distribution of A, C, G, and T in the genome (in that order). If the array is of shape (L, 4), it will be assumed to be per position. If it is of shape (4,), it will be assumed to be overall. If None, a uniform distribution will be used. This will be used to generate random sequences if starting_sequences is not provided. You can calculate these usingcalculate_nucleotide_distribution()
.kwargs (
dict
[str
,Any
]) – Additional arguments passed toget_best
function of EnhancerOptimizer.
- Return type:
- Returns:
A list of designed sequences, and if
return_intermediate=True
, a list of intermediate results.
Examples
>>> acgt_distribution = crested.utils.calculate_nucleotide_distribution( ... my_anndata, genome, per_position=True ... ) # shape (L, 4) >>> target_idx = my_anndata.obs_names.index("my_celltype") >>> my_motifs = { ... "motif1": "ACGTTTGA", ... "motif2": "TGCA", ... } >>> ( ... intermediate_results, ... designed_sequences, ... ) = crested.tl.enhancer_design_motif_insertion( ... patterns=my_motifs, ... n_mutations=20, ... target=target_idx, ... model=my_trained_model, ... n_sequences=1, ... return_intermediate=True, ... acgt_distribution=acgt_distribution, ... )