crested.tl.modisco.create_tf_ct_matrix

crested.tl.modisco.create_tf_ct_matrix#

crested.tl.modisco.create_tf_ct_matrix(pattern_tf_dict, all_patterns, df, classes, log_transform=True, normalize_pattern_importances=False, normalize_gex=False, min_tf_gex=0, importance_threshold=0, pattern_parameter='seqlet_count', filter_correlation=False, zscore_threshold=2, correlation_threshold=0.2, verbose=False)#

Create a tensor (matrix) of transcription factor (TF) expression and cell type contributions.

Parameters:
  • pattern_tf_dict (dict) – A dictionary with pattern indices and their TFs. See crested.tl.modisco.create_pattern_tf_dict.

  • all_patterns (dict) – A list of patterns with metadata. See crested.tl.modisco.process_patterns.

  • df (DataFrame) – A DataFrame containing gene expression data. See crested.tl.modisco.calculate_mean_expression_per_cell_type

  • classes (list[str]) – A list of cell type classes.

  • log_transform (bool (default: True)) – Whether to apply log transformation to the gene expression values. Default is True.

  • normalize_pattern_importances (bool (default: False)) – Whether to normalize the contribution scores across the cell types. Default is False.

  • normalize_gex (bool (default: False)) – Whether to normalize gene expression across the cell types. Default is False.

  • min_tf_gex (float (default: 0)) – The minimal GEX value to select potential TF candidates. Default 0.

  • importance_threshold (float (default: 0)) – The minimum pattern importance value. Default is 0.

  • pattern_parameter (str (default: 'seqlet_count')) – Parameter which is used to indicate the pattern’s importance. Either average contribution score (‘contrib’), or number of pattern instances (‘seqlet_count’, default) and its log (‘seqlet_count_log’).

  • filter_correlation (bool (default: False)) – Whether to filter based on Pearson correlation between tf_gex and ct_contribs. Default is False.

  • zscore_threshold (float (default: 2)) – Zscore used for filtering TF candidates. If the max zscore over the cell types is belofw this threshold, the TF gets discarded. Default is 2.

  • correlation_threshold (float (default: 0.2)) – Minimum Pearson correlation between expression and contribution profile required to keep a column if filtering is enabled. Default is 0.2.

  • verbose (bool (default: False)) – Whether to print intermediate debugging steps.

Return type:

tuple[ndarray, list[str]]

Returns:

A tuple containing the TF-cell type matrix and the list of TF pattern annotations.