Embryo10x#
The Embryo10x and EmbryoHydrop models are peak regression models trained on ATAC coverage from cell types that were captured using different technologies to show the similarities between these methods.
Both models were trained using the same preprocessing steps and model architecture.
For preprocessing, the regions in chromosome 2R were evenly divided into two to use as validation and test sets. The remaining chromosomes were used for training. Peak heights were normalized per chromosome to a target mean accessibility of 0.5. After normalization, z-scores of peak heights were calculated per region. For each cell type, top 3000 regions with the highest z-scores were kept and the accessibility values of all the other regions were set to zero for that cell type
For model training, CosineMSELoss (from crested.tl.losses
) was used along with the default optimizer and metrics from default CREsted peak regression configuration.
The model is a CNN multiclass regression model using the deeptopic_cnn()
architecture with a softplus output activation and the following parameters: filters=500, conv_do=0.5
Details of the data and the model can be found in the original publication.
Citation
Dickmanken, H., Wojno, M., Theunis, K., Eksi, E. C., Mahieu, L., Christiaens, V., Kempynck, N., De Rop, F., Roels, N., Spanier, K. I., Vandepoel, R., Hulselmans, G., Poovathingal, S., Aerts, S. HyDrop v2: Scalable atlas construction for training sequence-to-function models. bioRxiv doi: 10.1101/2025.04.02.646792
Usage#
1import crested
2import keras
3
4# download model
5model_path, output_names = crested.get_model("Embryo10x")
6
7# load model
8model = keras.models.load_model(model_path)
9
10# make predictions
11sequence = "A" * 500
12predictions = crested.tl.predict(sequence, model)
13print(predictions.shape)