{
"cells": [
{
"cell_type": "markdown",
"id": "4bee5c8e",
"metadata": {},
"source": [
".. note::\n",
" To download the tutorial data, use the following commands:\n",
"\n",
" **All tutorial data:**\n",
" ```python\n",
" from recon.data import fetch_all_tutorial_data\n",
" fetch_all_tutorial_data(data_dir='./data')\n",
" ```\n",
"\n",
" **Specific file (e.g., RNA data):**\n",
" ```python\n",
" from recon.data import fetch_tutorial_data\n",
" fetch_tutorial_data('perturbation_tuto/rna.h5ad', data_dir='./data')\n",
" ```"
]
},
{
"cell_type": "markdown",
"id": "1e98cdcd",
"metadata": {},
"source": [
"# Inferring Gene Regulatory Networks with HuMMuS\n",
"\n",
"This tutorial demonstrates how to use ReCoN to infer **Gene Regulatory Networks (GRNs)** from paired single-cell RNA-seq and ATAC-seq data using the HuMMuS methodology.\n",
"\n",
"## What you will learn\n",
"\n",
"1. How to compute individual network layers (RNA, ATAC, TF)\n",
"2. How to link layers via bipartite networks (TF→ATAC, ATAC→RNA)\n",
"3. How to integrate everything into a multilayer GRN using Random Walk with Restart\n",
"\n",
"```{warning}\n",
"**Python 3.10 required!**\n",
"\n",
"GRN inference requires a Python 3.10 environment due to CellOracle and gimmemotifs dependencies.\n",
"See the [Installation guide](../recon_explained/get_ready.rst) for setup instructions.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "5dc95caa",
"metadata": {},
"source": [
"---\n",
"\n",
"## Prerequisites\n",
"\n",
"```{note}\n",
"**Required packages**\n",
"\n",
"- `bedtools` - for ATAC-seq peak processing (`conda install -c bioconda bedtools`)\n",
"- `muon` - for multi-omics data handling (`pip install muon`)\n",
"- `circe-py` - for ATAC co-accessibility networks (included with ReCoN)\n",
"```\n",
"\n",
"### Check bedtools installation"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "a470e4e2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/remitrimbour/ReCoN_project/ReCoN\n"
]
}
],
"source": [
"cd ../../../"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "30efc2b5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/remitrimbour/miniconda3/envs/recon/bin/bedtools\n",
"bedtools v2.31.1\n"
]
}
],
"source": [
"import os\n",
"import shutil\n",
"\n",
"# Check if bedtools is available\n",
"if shutil.which(\"bedtools\") is None:\n",
" # Try to find bedtools in common conda locations\n",
" conda_prefix = os.environ.get(\"CONDA_PREFIX\", \"\")\n",
" possible_paths = [\n",
" f\"{conda_prefix}/bin\",\n",
" os.path.expanduser(\"~/miniconda3/bin\"),\n",
" os.path.expanduser(\"~/anaconda3/bin\"),\n",
" \"/usr/local/bin\",\n",
" ]\n",
" for path in possible_paths:\n",
" bedtools_path = os.path.join(path, \"bedtools\")\n",
" if os.path.exists(bedtools_path):\n",
" os.environ[\"PATH\"] = path + \":\" + os.environ[\"PATH\"]\n",
" print(f\"Added {path} to PATH\")\n",
" break\n",
" \n",
"# Verify bedtools is now available\n",
"!which bedtools && bedtools --version"
]
},
{
"cell_type": "markdown",
"id": "ff47b089",
"metadata": {},
"source": [
"### Import libraries"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8c22f5ea",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)\n",
"2026-01-14 14:15:45,106 - INFO - Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)\n",
"To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle\n",
"2026-01-14 14:15:45,109 - INFO - To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle\n"
]
}
],
"source": [
"\n",
"import hummuspy.loader # HuMMuS multilayer network utilities\n",
"import circe as ci # ATAC-seq co-accessibility networks\n",
"import recon # ReCoN GRN inference\n",
"import recon.infer_grn # ReCoN GRN inference"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "47ea9f71",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/scanpy/__init__.py:96: FutureWarning: `__version__` is deprecated, use `importlib.metadata.version('scanpy')` instead\n",
" warnings.warn(msg, FutureWarning, stacklevel=2)\n"
]
}
],
"source": [
"import muon as mu # Multi-omics data handling\n",
"import scanpy as sc # Single-cell analysis"
]
},
{
"cell_type": "markdown",
"id": "2da84c94",
"metadata": {},
"source": [
"---\n",
"\n",
"## Load Multi-omics Data\n",
"\n",
"We use paired scRNA-seq and scATAC-seq data stored in a MuData object.\n",
"\n",
"```{tip}\n",
"The example dataset (PBMC 10X) can be downloaded from the [CIRCE benchmark repository](https://github.com/cantinilab/circe_benchmark).\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "dddd89ba",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"RNA: (9631, 18410), ATAC: (9631, 215676)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.\n",
" warnings.warn(\n",
"/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.\n",
" warnings.warn(\n"
]
}
],
"source": [
"# Load paired RNA + ATAC data\n",
"mudata = mu.read_h5mu(\"./data/build_grn_tuto/pbmc10x.h5mu\")\n",
"print(f\"RNA: {mudata.mod['rna'].shape}, ATAC: {mudata.mod['atac'].shape}\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "53529504",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Subset - RNA: (5000, 1000), ATAC: (5000, 5000)\n"
]
}
],
"source": [
"# Subset for faster computation (remove for full analysis)\n",
"rna = mudata.mod['rna'][:5000, :1000] # First 1000 genes\n",
"atac = mudata.mod['atac'][:5000, :5000] # First 5000 peaks\n",
"print(f\"Subset - RNA: {rna.shape}, ATAC: {atac.shape}\")"
]
},
{
"cell_type": "markdown",
"id": "172ee80b",
"metadata": {},
"source": [
"---\n",
"\n",
"## Build the Multilayer GRN\n",
"\n",
"The HuMMuS methodology builds a GRN by integrating multiple layers:\n",
"\n",
"| Layer | Description | Method |\n",
"|-------|-------------|--------|\n",
"| **RNA layer** | Gene co-regulation patterns (which genes vary together) | GRNBoost2 (Arboreto) |\n",
"| **ATAC layer** | Peak co-accessibility (which regulatory regions are co-active) | CIRCE |\n",
"| **TF layer** | TF-TF co-expression | Correlation |\n",
"| **TF → ATAC** | Where TFs bind DNA (motif-based) | CellOracle motif scanning |\n",
"| **ATAC → RNA** | Which peaks regulate which genes | CIRCE peak-gene links |\n",
"\n",
"```{note}\n",
"**Why multiple layers?**\n",
"\n",
"GRNBoost2 alone infers statistical associations between TFs and genes, but these can include indirect relationships. By integrating chromatin accessibility (ATAC), we add mechanistic evidence: a TF can only regulate a gene if (1) it binds nearby accessible DNA and (2) that region is linked to the gene. The multilayer RWR combines both direct (TF→gene) and indirect (TF→ATAC→gene) evidence for more robust GRN inference.\n",
"```\n",
"\n",
"### Step 1: Load TF list"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d096d99a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loaded 914 transcription factors\n"
]
}
],
"source": [
"# Load curated list of human transcription factors\n",
"tfs_list = hummuspy.loader.load_tfs(\"human_tfs_r_hummus\")\n",
"print(f\"Loaded {len(tfs_list)} transcription factors\")"
]
},
{
"cell_type": "markdown",
"id": "624550f0",
"metadata": {},
"source": [
"### Step 2: Compute RNA network (gene co-regulation)\n",
"\n",
"Uses GRNBoost2 to identify genes that co-vary with each TF. This captures statistical associations that may include both direct targets and genes in the same regulatory programs.\n",
"\n",
"```{warning}\n",
"GRNBoost2 edges are used to represent **co-regulated** genes, not final TF-gene regulation. The multilayer integration in the final step will leverage these by requiring chromatin accessibility evidence.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "e8f63427",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Calculating TF-to-gene importance\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Running using 15 cores: 100%|██████████| 1000/1000 [00:25<00:00, 39.75it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"RNA network: 25881 edges\n"
]
}
],
"source": [
"# Compute TF → gene network using GRNBoost2\n",
"rna_network = recon.infer_grn.compute_rna_network(rna, tf_names=tfs_list, n_cpu=15)\n",
"print(f\"RNA network: {len(rna_network)} edges\")"
]
},
{
"cell_type": "markdown",
"id": "ac25be81",
"metadata": {},
"source": [
"### Step 3: Compute ATAC network (peak co-accessibility)\n",
"\n",
"Uses CIRCE to compute co-accessibility between ATAC-seq peaks."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "11fab776",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: \n",
" No organism, nor value passed for the parameters: ['window_size', 'distance_constraint', 's'],\n",
" using default values.\n",
" The default values are defined from human and mouse data,\n",
" you might want to change them if you are working with\n",
" another organisms.\n",
"\n",
" Default values used:\n",
" {'window_size': 500000, 'distance_constraint': 250000, 's': 0.75}\n",
"\n",
" You can check how to define them in https://cole-trapnell-lab.github.io/cicero-release/docs_m3/#important-considerations-for-non-human-data.\n",
" \n",
" _warn(*args, **kwargs)\n"
]
},
{
"data": {
"text/html": [
"
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: \n",
"install \"ipywidgets\" for Jupyter support\n",
" _warn(*args, **kwargs)\n",
"\n"
],
"text/plain": [
"/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: \n",
"install \"ipywidgets\" for Jupyter support\n",
" _warn(*args, **kwargs)\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Calculating co-accessibility scores...\n",
"Concatenating results from all chromosomes...\n",
"ATAC network: 172271 edges\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/anndata/_core/aligned_mapping.py:156: ImplicitModificationWarning: Setting element `.varp['atac_network']` of view, initializing view as actual.\n",
" warnings.warn(\n"
]
}
],
"source": [
"# Prepare ATAC peak names (replace special characters)\n",
"atac.var_names = atac.var_names.str.replace(\"-\", \"_\")\n",
"atac.var_names = atac.var_names.str.replace(\":\", \"_\")\n",
"\n",
"# Add genomic region information\n",
"atac = ci.add_region_infos(atac)\n",
"\n",
"# Compute peak co-accessibility network\n",
"ci.compute_atac_network(atac, njobs=20)\n",
"\n",
"# Extract and format the network\n",
"atac_network = ci.extract_atac_links(atac, key='atac_network')\n",
"atac_network = atac_network.rename(columns={\n",
" \"Peak1\": \"source\",\n",
" \"Peak2\": \"target\",\n",
" \"score\": \"weight\"\n",
"})\n",
"print(f\"ATAC network: {len(atac_network)} edges\")"
]
},
{
"cell_type": "markdown",
"id": "029902a8",
"metadata": {},
"source": [
"### Step 4: Compute TF network (TF-TF interactions)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "4358ea03",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"TF network: 29 edges\n"
]
}
],
"source": [
"# Compute TF-TF correlation network\n",
"tf_network = recon.infer_grn.compute_tf_network(rna, tfs_list=tfs_list)\n",
"print(f\"TF network: {len(tf_network)} edges\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "db25db56",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" source | \n",
" target | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" TFAP2E_TF | \n",
" fake_TF | \n",
"
\n",
" \n",
" | 1 | \n",
" HES2_TF | \n",
" fake_TF | \n",
"
\n",
" \n",
" | 2 | \n",
" ID3_TF | \n",
" fake_TF | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" source target\n",
"0 TFAP2E_TF fake_TF\n",
"1 HES2_TF fake_TF\n",
"2 ID3_TF fake_TF"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tf_network.head(3)"
]
},
{
"cell_type": "markdown",
"id": "f1aaa195",
"metadata": {},
"source": [
"### Step 5: Compute bipartite links\n",
"\n",
"Now we connect the layers:\n",
"- **ATAC → RNA**: Which peaks regulate which genes (based on proximity and correlation)\n",
"- **TF → ATAC**: Where TFs bind DNA (based on motif scanning)\n",
"\n",
"```{note}\n",
"These steps require a **reference genome**. Make sure to install it first:\n",
"```python\n",
"import genomepy\n",
"genomepy.install_genome(\"hg19\", \"UCSC\", annotation=True)\n",
"```\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "4c9c9216",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"que bed peaks: 5000\n",
"tss peaks in que: 180\n",
"ATAC→RNA links: 130 edges\n"
]
}
],
"source": [
"# ATAC → RNA: Peak-gene links based on proximity and correlation\n",
"atac_rna_links = recon.infer_grn.compute_atac_to_rna_links(atac, rna, ref_genome=\"hg19\")\n",
"print(f\"ATAC→RNA links: {len(atac_rna_links)} edges\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "145fda2f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Peaks before filtering: 5000\n",
"Peaks with invalid chr_name: 0\n",
"Peaks with invalid length: 0\n",
"Peaks after filtering: 5000\n",
"No motif data entered. Loading default motifs for your species ...\n",
" Default motif for vertebrate: gimme.vertebrate.v5.0. \n",
" For more information, please see https://gimmemotifs.readthedocs.io/en/master/overview.html \n",
"\n",
"Initiating scanner... \n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2026-01-14 14:17:58,331 - DEBUG - using background: genome mm10 with size 200\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Calculating FPR-based threshold. This step may take substantial time when you load a new ref-genome. It will be done quicker on the second time. \n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2026-01-14 14:18:00,535 - DEBUG - determining FPR-based threshold\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Motif scan started .. It may take long time.\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Scanning: 0%| | 0/5000 [00:00, ? sequences/s]Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Matplotlib is building the font cache; this may take a moment.\n",
"Scanning: 100%|██████████| 5000/5000 [03:10<00:00, 26.28 sequences/s] \n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Filtering finished: 473238 -> 100930\n",
"1. Converting scanned results into one-hot encoded dataframe.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 4548/4548 [00:04<00:00, 1093.61it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"2. Converting results into dictionaries.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 1/1 [00:00<00:00, 26.58it/s]\n",
"100%|██████████| 1093/1093 [00:00<00:00, 20999.38it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"TF→ATAC links: 2253638 edges\n"
]
}
],
"source": [
"# TF → ATAC: Motif scanning to find TF binding sites\n",
"tf_atac_links = recon.infer_grn.compute_tf_to_atac_links(atac, ref_genome=\"mm10\", n_cpus=30)\n",
"print(f\"TF→ATAC links: {len(tf_atac_links)} edges\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "b2b63e5a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Filtered TF→ATAC links: 1053944 edges\n"
]
}
],
"source": [
"# Filter to keep only links from known TFs\n",
"tf_atac_links = tf_atac_links[tf_atac_links['source'].str.replace(\"_TF\", \"\").isin(tfs_list)]\n",
"print(f\"Filtered TF→ATAC links: {len(tf_atac_links)} edges\")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "bd2ab3eb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" source | \n",
" target | \n",
"
\n",
" \n",
" \n",
" \n",
" | 5 | \n",
" GATA3_TF | \n",
" chr1_10016349_10016849 | \n",
"
\n",
" \n",
" | 6 | \n",
" GATA2_TF | \n",
" chr1_10016349_10016849 | \n",
"
\n",
" \n",
" | 7 | \n",
" RORC_TF | \n",
" chr1_10016349_10016849 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" source target\n",
"5 GATA3_TF chr1_10016349_10016849\n",
"6 GATA2_TF chr1_10016349_10016849\n",
"7 RORC_TF chr1_10016349_10016849"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tf_atac_links.head(3)"
]
},
{
"cell_type": "markdown",
"id": "69f5cf77",
"metadata": {},
"source": [
"---\n",
"\n",
"## Integrate into Final GRN\n",
"\n",
"Now we combine all layers using Random Walk with Restart (RWR) to propagate information across the multilayer network.\n",
"\n",
"```{tip}\n",
"The RWR integration weighs indirect evidence (TF → ATAC → RNA) alongside direct evidence (TF → RNA), improving GRN quality compared to single-layer approaches.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "dcc77be3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"TF_0\n",
"ATAC_0\n",
"RNA_0\n",
"Initializing Dask cluster with 1 workers...\n",
"http://192.168.1.150:8787/status\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Running RWR per seed: 100%|██████████| 29/29 [00:01<00:00, 20.07it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Building Dask DataFrame graph with delayed parallel tasks...\n",
"Final GRN: 2929 TF → gene edges\n"
]
}
],
"source": [
"# Generate integrated GRN via RWR across all layers\n",
"grn = recon.infer_grn.generate_grn(\n",
" rna_network=rna_network,\n",
" atac_network=atac_network,\n",
" tf_network=tf_network,\n",
" atac_to_rna_links=atac_rna_links,\n",
" tf_to_atac_links=tf_atac_links,\n",
" n_jobs=1\n",
")\n",
"\n",
"# Keep only RNA layer (TF → gene relationships)\n",
"grn = grn[grn.layer == 'RNA']\n",
"print(f\"Final GRN: {len(grn)} TF → gene edges\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "ae23cde1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" layer | \n",
" target | \n",
" path_layer | \n",
" score | \n",
" seed | \n",
"
\n",
" \n",
" \n",
" \n",
" | 10755 | \n",
" 618 | \n",
" RNA | \n",
" PADI6 | \n",
" RNA_0 | \n",
" 0.000109 | \n",
" GFI1_TF | \n",
"
\n",
" \n",
" | 11123 | \n",
" 156 | \n",
" RNA | \n",
" CLCN6 | \n",
" RNA_0 | \n",
" 0.000098 | \n",
" HIVEP3_TF | \n",
"
\n",
" \n",
" | 11181 | \n",
" 809 | \n",
" RNA | \n",
" SNRNP40 | \n",
" RNA_0 | \n",
" 0.000097 | \n",
" HIVEP3_TF | \n",
"
\n",
" \n",
" | 11303 | \n",
" 759 | \n",
" RNA | \n",
" SDHB | \n",
" RNA_0 | \n",
" 0.000096 | \n",
" HIVEP3_TF | \n",
"
\n",
" \n",
" | 11360 | \n",
" 643 | \n",
" RNA | \n",
" PIK3CD-AS1 | \n",
" RNA_0 | \n",
" 0.000096 | \n",
" NFIA_TF | \n",
"
\n",
" \n",
" | 11813 | \n",
" 242 | \n",
" RNA | \n",
" EFHD2 | \n",
" RNA_0 | \n",
" 0.000091 | \n",
" JUN_TF | \n",
"
\n",
" \n",
" | 11825 | \n",
" 913 | \n",
" RNA | \n",
" TSSK3 | \n",
" RNA_0 | \n",
" 0.000091 | \n",
" JUN_TF | \n",
"
\n",
" \n",
" | 11864 | \n",
" 100 | \n",
" RNA | \n",
" CA6 | \n",
" RNA_0 | \n",
" 0.000091 | \n",
" JUN_TF | \n",
"
\n",
" \n",
" | 11877 | \n",
" 538 | \n",
" RNA | \n",
" MIR34AHG | \n",
" RNA_0 | \n",
" 0.000090 | \n",
" JUN_TF | \n",
"
\n",
" \n",
" | 12039 | \n",
" 747 | \n",
" RNA | \n",
" RUNX3 | \n",
" RNA_0 | \n",
" 0.000079 | \n",
" ZNF683_TF | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index layer target path_layer score seed\n",
"10755 618 RNA PADI6 RNA_0 0.000109 GFI1_TF\n",
"11123 156 RNA CLCN6 RNA_0 0.000098 HIVEP3_TF\n",
"11181 809 RNA SNRNP40 RNA_0 0.000097 HIVEP3_TF\n",
"11303 759 RNA SDHB RNA_0 0.000096 HIVEP3_TF\n",
"11360 643 RNA PIK3CD-AS1 RNA_0 0.000096 NFIA_TF\n",
"11813 242 RNA EFHD2 RNA_0 0.000091 JUN_TF\n",
"11825 913 RNA TSSK3 RNA_0 0.000091 JUN_TF\n",
"11864 100 RNA CA6 RNA_0 0.000091 JUN_TF\n",
"11877 538 RNA MIR34AHG RNA_0 0.000090 JUN_TF\n",
"12039 747 RNA RUNX3 RNA_0 0.000079 ZNF683_TF"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Preview the final GRN\n",
"grn.head(10)"
]
},
{
"cell_type": "markdown",
"id": "7bfd2a15",
"metadata": {},
"source": [
"---\n",
"\n",
"## Save the GRN\n",
"\n",
"Save the inferred GRN for use in other ReCoN tutorials (e.g., Tutorial 2: Multicellular Coordination)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8bfd6948",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GRN saved to recon_hummus_grn.csv\n"
]
}
],
"source": [
"# Save to CSV\n",
"grn.to_csv(\"recon_hummus_grn.csv\", index=False)\n",
"print(\"GRN saved to recon_hummus_grn.csv\")"
]
},
{
"cell_type": "markdown",
"id": "f758af52",
"metadata": {},
"source": [
"```{tip}\n",
"**Next steps**\n",
"\n",
"Use this GRN in:\n",
"- [Tutorial 2: Multicellular Coordination](2.recon_multicellular_coordination.ipynb) - Explore upstream regulators\n",
"- [Tutorial 3: Molecular Cascades](3.recon_molecular_cascades.ipynb) - Visualize signaling pathways\n",
"```\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "43f19203",
"metadata": {},
"source": [
"This work uses **ReCoN**’s wrapper of different packages, which are all cited in the Methods section:\n",
"\n",
"- The **RNA layer** uses [**arboreto¹**](#arboreto) \n",
"- The **ATAC layer** uses [**CIRCE²**](#circe) \n",
"- The **TF-ATAC** and **ATAC-RNA bipartites** use [**CellOracle³**](#celloracle) \n",
"- The exploration of the **multilayer** uses [**HuMMuS⁴**](#hummus) and [**MultiXrank⁵**](#multixrank) \n",
"\n",
"---\n",
"\n",
"### References\n",
"\n",
"[1] Moerman, T., Aibar, S., Bravo González-Blas, C., Simm, J., Moreau, Y., Aerts, J., & Aerts, S. (2019). \n",
"**GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks.** \n",
"*Bioinformatics*, 35(12), 2159–2161. doi:[10.1093/bioinformatics/bty916](https://doi.org/10.1093/bioinformatics/bty916)\n",
"\n",
"\n",
"[2] Trimbour, R., Saez Rodriguez J., Cantini L. (2025).\n",
"**CIRCE: a scalable Python package to predict cis-regulatory DNA interactions from single-cell chromatin accessibility data.** doi:[https://doi.org/10.1101/2025.09.23.678054](https://doi.org/10.1101/2025.09.23.678054)\n",
"\n",
"\n",
"[3] Kamimoto, K., Stringa, B., Hoffmann, C. M., Jindal, K., Solnica-Krezel, L., Morris, S. A., et al. (2023). \n",
"**Dissecting cell identity via network inference and in silico gene perturbation.** \n",
"*Nature*, 614, 742–751. doi:[10.1038/s41586-022-05688-9](https://doi.org/10.1038/s41586-022-05688-9)\n",
"\n",
"\n",
"[4] Trimbour, R., Deutschmann I. M., Cantini L. (2024). \n",
"**Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS.** \n",
"*Bioinformatics*, 40(5), btae143. doi:[10.1093/bioinformatics/btae143](https://doi.org/10.1093/bioinformatics/btae143)\n",
"\n",
"\n",
"[5] Baptista, A., González, A., & Baudot, A. (2022). \n",
"**Universal multilayer network exploration by random walk with restart.** \n",
"*Communications Physics*, 5, 170. doi:[10.1038/s42005-022-00937-9](https://doi.org/10.1038/s42005-022-00937-9)\n"
]
},
{
"cell_type": "markdown",
"id": "984c1129",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "recon",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.19"
}
},
"nbformat": 4,
"nbformat_minor": 5
}