Infer GRN¶
- recon.infer_grn.layers.compute_atac_to_rna_links(atac, rna, ref_genome)¶
Compute ATAC peak-to-gene links using TSS annotation. It uses the CellOracle motif_analysis module to get TSS information for the provided ATAC peaks. It returns a DataFrame with the peak-to-gene links.
- Parameters:
atac (anndata.AnnData) – AnnData object containing the ATAC-seq data. The peak names should be in the format ‘chr_start_end’.
rna (anndata.AnnData) – AnnData object containing the RNA-seq data. The gene names should match those in the ATAC-seq data.
ref_genome (str) – Reference genome to use for TSS annotation. E.g., ‘hg38’, ‘mm10’, etc.
- Returns:
DataFrame with columns [‘source’, ‘target’] representing the peak-to-gene links.
- Return type:
pd.DataFrame
- recon.infer_grn.layers.compute_rna_network(df_exp_mtx: DataFrame | AnnData, tf_names: List[str], temp_dir: Path | None = None, method: Literal['GBM', 'RF'] = 'GBM', n_cpu: int = 1, seed: int = 666) DataFrame¶
# Inspired from SCENICPLUS: https://github.com/aertslab/scenicplus/blob/main/src/scenicplus/TF_to_gene.py
Calculate TF-to-gene relationships using either Gradient Boosting Machine (GBM) or Random Forest (RF) regression.
It is a wrapper around the infer_partial_network function from the arboreto package, similarly to GRNBoost2. It uses joblib to parallelize the inference of the relationships for each target gene. It returns a DataFrame with the TF-to-gene relationships and their importance scores.
- Parameters:
df_exp_mtx (pd.DataFrame, ad.AnnData) – Gene expression matrix with genes as columns and cells as rows. If an AnnData object is provided, the expression matrix is extracted using the to_df() method.
tf_names (List[str]) – List of transcription factor names to consider as potential regulators.
temp_dir (pathlib.Path) – Path to a temporary directory to store intermediate files during parallel processing. If None, a temporary directory will be created and deleted after use.
method (Literal['GBM', 'RF'], optional) – Method to use for regression. Either ‘GBM’ for Gradient Boosting Machine or ‘RF’ for Random Forest. Default is ‘GBM’.
n_cpu (int, optional) – Number of CPU cores to use for parallel processing. Default is 1.
seed (int, optional) – Random seed for reproducibility. Default is 666.
- Returns:
DataFrame with columns [‘tf’, ‘target’, ‘importance’] representing the TF-to-gene relationships and their importance scores.
- Return type:
pd.DataFrame
- recon.infer_grn.layers.compute_tf_network(rna, tfs_list, method=None)¶
- recon.infer_grn.layers.compute_tf_to_atac_links(atac, ref_genome, tfs_list: List[str] | None = None, genomes_dir=None, motifs=None, fpr=0.02, verbose=True, indirect=True, n_cpus=-1)¶
Compute TF-to-ATAC peak links using motif scanning. It uses the CellOracle motif_analysis module to scan for motifs in the provided ATAC peaks. It returns a DataFrame with the TF-to-peak links.
- Parameters:
atac (anndata.AnnData) – AnnData object containing the ATAC-seq data. The peak names should be in the format ‘chr_start_end’.
ref_genome (str) – Reference genome to use for motif scanning. E.g., ‘hg38’, ‘mm10’, etc.
genomes_dir (str, optional) – Directory containing the reference genomes. If None, the default CellOracle genomes directory will be used.
motifs (list, optional) – List of motifs to use for scanning. If None, the default CellOracle motifs will be used.
fpr (float, optional) – False positive rate for motif scanning. Default is 0.02.
verbose (bool, optional) – Whether to print progress messages. Default is True.
indirect (bool, optional) – Whether to include TF-to-peak links from indirect evidences. Default is True.
n_cpus (int, optional) – Number of CPUs to use for parallel processing. Default is -1 (use all available CPUs).
- Returns:
DataFrame with columns [‘source’, ‘target’] representing the TF-to-peak links.
- Return type:
pd.DataFrame
- recon.infer_grn.layers.generate_grn(rna_network, atac_network, tf_network, tf_to_atac_links, atac_to_rna_links, n_jobs=1)¶
Generate a Gene Regulatory Network (GRN) by integrating TF-to-gene, peak-to-gene, and TF-to-peak relationships. It uses the HuMMuS package to create a multiplex network and perform random walks to rank the nodes. It returns a DataFrame with the ranked nodes.
- Parameters:
rna_network (pd.DataFrame) – DataFrame with columns [‘source’, ‘target’, ‘weight’] representing the TF-to-gene relationships.
atac_network (pd.DataFrame) – DataFrame with columns [‘source’, ‘target’, ‘weight’] representing the peak-to-gene relationships.
tf_network (pd.DataFrame) – DataFrame with columns [‘source’, ‘target’, ‘weight’] representing the TF-to-TF relationships.
tf_to_atac_links (pd.DataFrame) – DataFrame with columns [‘source’, ‘target’] representing the TF-to-peak relationships.
atac_to_rna_links (pd.DataFrame) – DataFrame with columns [‘source’, ‘target’] representing the peak-to-gene relationships.
n_jobs (int, optional) – Number of jobs to use for parallel processing. Default is 1.
- Returns:
DataFrame with ranked nodes from the GRN.
- Return type:
pd.DataFrame