.. note:: To download the tutorial data, use the following commands:
All tutorial data:
from recon.data import fetch_all_tutorial_data
fetch_all_tutorial_data(data_dir='./data')
Specific file (e.g., RNA data):
from recon.data import fetch_tutorial_data
fetch_tutorial_data('perturbation_tuto/rna.h5ad', data_dir='./data')
Inferring Gene Regulatory Networks with HuMMuS¶
This tutorial demonstrates how to use ReCoN to infer Gene Regulatory Networks (GRNs) from paired single-cell RNA-seq and ATAC-seq data using the HuMMuS methodology.
What you will learn¶
How to compute individual network layers (RNA, ATAC, TF)
How to link layers via bipartite networks (TF→ATAC, ATAC→RNA)
How to integrate everything into a multilayer GRN using Random Walk with Restart
Warning
Python 3.10 required!
GRN inference requires a Python 3.10 environment due to CellOracle and gimmemotifs dependencies. See the Installation guide for setup instructions.
Prerequisites¶
Note
Required packages
bedtools- for ATAC-seq peak processing (conda install -c bioconda bedtools)muon- for multi-omics data handling (pip install muon)circe-py- for ATAC co-accessibility networks (included with ReCoN)
Check bedtools installation¶
cd ../../../
/Users/remitrimbour/ReCoN_project/ReCoN
import os
import shutil
# Check if bedtools is available
if shutil.which("bedtools") is None:
# Try to find bedtools in common conda locations
conda_prefix = os.environ.get("CONDA_PREFIX", "")
possible_paths = [
f"{conda_prefix}/bin",
os.path.expanduser("~/miniconda3/bin"),
os.path.expanduser("~/anaconda3/bin"),
"/usr/local/bin",
]
for path in possible_paths:
bedtools_path = os.path.join(path, "bedtools")
if os.path.exists(bedtools_path):
os.environ["PATH"] = path + ":" + os.environ["PATH"]
print(f"Added {path} to PATH")
break
# Verify bedtools is now available
!which bedtools && bedtools --version
/Users/remitrimbour/miniconda3/envs/recon/bin/bedtools
bedtools v2.31.1
Import libraries¶
import hummuspy.loader # HuMMuS multilayer network utilities
import circe as ci # ATAC-seq co-accessibility networks
import recon # ReCoN GRN inference
import recon.infer_grn # ReCoN GRN inference
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Matplotlib is building the font cache; this may take a moment.
Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)
2026-01-14 14:15:45,106 - INFO - Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)
To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle
2026-01-14 14:15:45,109 - INFO - To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle
import muon as mu # Multi-omics data handling
import scanpy as sc # Single-cell analysis
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/scanpy/__init__.py:96: FutureWarning: `__version__` is deprecated, use `importlib.metadata.version('scanpy')` instead
warnings.warn(msg, FutureWarning, stacklevel=2)
Load Multi-omics Data¶
We use paired scRNA-seq and scATAC-seq data stored in a MuData object.
Tip
The example dataset (PBMC 10X) can be downloaded from the CIRCE benchmark repository.
# Load paired RNA + ATAC data
mudata = mu.read_h5mu("./data/build_grn_tuto/pbmc10x.h5mu")
print(f"RNA: {mudata.mod['rna'].shape}, ATAC: {mudata.mod['atac'].shape}")
RNA: (9631, 18410), ATAC: (9631, 215676)
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
warnings.warn(
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
warnings.warn(
# Subset for faster computation (remove for full analysis)
rna = mudata.mod['rna'][:5000, :1000] # First 1000 genes
atac = mudata.mod['atac'][:5000, :5000] # First 5000 peaks
print(f"Subset - RNA: {rna.shape}, ATAC: {atac.shape}")
Subset - RNA: (5000, 1000), ATAC: (5000, 5000)
Build the Multilayer GRN¶
The HuMMuS methodology builds a GRN by integrating multiple layers:
Layer |
Description |
Method |
|---|---|---|
RNA layer |
Gene co-regulation patterns (which genes vary together) |
GRNBoost2 (Arboreto) |
ATAC layer |
Peak co-accessibility (which regulatory regions are co-active) |
CIRCE |
TF layer |
TF-TF co-expression |
Correlation |
TF → ATAC |
Where TFs bind DNA (motif-based) |
CellOracle motif scanning |
ATAC → RNA |
Which peaks regulate which genes |
CIRCE peak-gene links |
Note
Why multiple layers?
GRNBoost2 alone infers statistical associations between TFs and genes, but these can include indirect relationships. By integrating chromatin accessibility (ATAC), we add mechanistic evidence: a TF can only regulate a gene if (1) it binds nearby accessible DNA and (2) that region is linked to the gene. The multilayer RWR combines both direct (TF→gene) and indirect (TF→ATAC→gene) evidence for more robust GRN inference.
Step 1: Load TF list¶
# Load curated list of human transcription factors
tfs_list = hummuspy.loader.load_tfs("human_tfs_r_hummus")
print(f"Loaded {len(tfs_list)} transcription factors")
Loaded 914 transcription factors
Step 2: Compute RNA network (gene co-regulation)¶
Uses GRNBoost2 to identify genes that co-vary with each TF. This captures statistical associations that may include both direct targets and genes in the same regulatory programs.
Warning
GRNBoost2 edges are used to represent co-regulated genes, not final TF-gene regulation. The multilayer integration in the final step will leverage these by requiring chromatin accessibility evidence.
# Compute TF → gene network using GRNBoost2
rna_network = recon.infer_grn.compute_rna_network(rna, tf_names=tfs_list, n_cpu=15)
print(f"RNA network: {len(rna_network)} edges")
Calculating TF-to-gene importance
Running using 15 cores: 100%|██████████| 1000/1000 [00:25<00:00, 39.75it/s]
RNA network: 25881 edges
Step 3: Compute ATAC network (peak co-accessibility)¶
Uses CIRCE to compute co-accessibility between ATAC-seq peaks.
# Prepare ATAC peak names (replace special characters)
atac.var_names = atac.var_names.str.replace("-", "_")
atac.var_names = atac.var_names.str.replace(":", "_")
# Add genomic region information
atac = ci.add_region_infos(atac)
# Compute peak co-accessibility network
ci.compute_atac_network(atac, njobs=20)
# Extract and format the network
atac_network = ci.extract_atac_links(atac, key='atac_network')
atac_network = atac_network.rename(columns={
"Peak1": "source",
"Peak2": "target",
"score": "weight"
})
print(f"ATAC network: {len(atac_network)} edges")
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning:
No organism, nor value passed for the parameters: ['window_size', 'distance_constraint', 's'],
using default values.
The default values are defined from human and mouse data,
you might want to change them if you are working with
another organisms.
Default values used:
{'window_size': 500000, 'distance_constraint': 250000, 's': 0.75}
You can check how to define them in https://cole-trapnell-lab.github.io/cicero-release/docs_m3/#important-considerations-for-non-human-data.
_warn(*args, **kwargs)
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: install "ipywidgets" for Jupyter support _warn(*args, **kwargs)
Calculating co-accessibility scores...
Concatenating results from all chromosomes...
ATAC network: 172271 edges
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/anndata/_core/aligned_mapping.py:156: ImplicitModificationWarning: Setting element `.varp['atac_network']` of view, initializing view as actual.
warnings.warn(
Step 4: Compute TF network (TF-TF interactions)¶
# Compute TF-TF correlation network
tf_network = recon.infer_grn.compute_tf_network(rna, tfs_list=tfs_list)
print(f"TF network: {len(tf_network)} edges")
TF network: 29 edges
tf_network.head(3)
| source | target | |
|---|---|---|
| 0 | TFAP2E_TF | fake_TF |
| 1 | HES2_TF | fake_TF |
| 2 | ID3_TF | fake_TF |
Step 5: Compute bipartite links¶
Now we connect the layers:
ATAC → RNA: Which peaks regulate which genes (based on proximity and correlation)
TF → ATAC: Where TFs bind DNA (based on motif scanning)
Note
These steps require a reference genome. Make sure to install it first:
import genomepy
genomepy.install_genome("hg19", "UCSC", annotation=True)
# ATAC → RNA: Peak-gene links based on proximity and correlation
atac_rna_links = recon.infer_grn.compute_atac_to_rna_links(atac, rna, ref_genome="hg19")
print(f"ATAC→RNA links: {len(atac_rna_links)} edges")
que bed peaks: 5000
tss peaks in que: 180
ATAC→RNA links: 130 edges
# TF → ATAC: Motif scanning to find TF binding sites
tf_atac_links = recon.infer_grn.compute_tf_to_atac_links(atac, ref_genome="mm10", n_cpus=30)
print(f"TF→ATAC links: {len(tf_atac_links)} edges")
Peaks before filtering: 5000
Peaks with invalid chr_name: 0
Peaks with invalid length: 0
Peaks after filtering: 5000
No motif data entered. Loading default motifs for your species ...
Default motif for vertebrate: gimme.vertebrate.v5.0.
For more information, please see https://gimmemotifs.readthedocs.io/en/master/overview.html
Initiating scanner...
2026-01-14 14:17:58,331 - DEBUG - using background: genome mm10 with size 200
Calculating FPR-based threshold. This step may take substantial time when you load a new ref-genome. It will be done quicker on the second time.
2026-01-14 14:18:00,535 - DEBUG - determining FPR-based threshold
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Motif scan started .. It may take long time.
Scanning: 0%| | 0/5000 [00:00<?, ? sequences/s]Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Scanning: 100%|██████████| 5000/5000 [03:10<00:00, 26.28 sequences/s]
Filtering finished: 473238 -> 100930
1. Converting scanned results into one-hot encoded dataframe.
100%|██████████| 4548/4548 [00:04<00:00, 1093.61it/s]
2. Converting results into dictionaries.
100%|██████████| 1/1 [00:00<00:00, 26.58it/s]
100%|██████████| 1093/1093 [00:00<00:00, 20999.38it/s]
TF→ATAC links: 2253638 edges
# Filter to keep only links from known TFs
tf_atac_links = tf_atac_links[tf_atac_links['source'].str.replace("_TF", "").isin(tfs_list)]
print(f"Filtered TF→ATAC links: {len(tf_atac_links)} edges")
Filtered TF→ATAC links: 1053944 edges
tf_atac_links.head(3)
| source | target | |
|---|---|---|
| 5 | GATA3_TF | chr1_10016349_10016849 |
| 6 | GATA2_TF | chr1_10016349_10016849 |
| 7 | RORC_TF | chr1_10016349_10016849 |
Integrate into Final GRN¶
Now we combine all layers using Random Walk with Restart (RWR) to propagate information across the multilayer network.
Tip
The RWR integration weighs indirect evidence (TF → ATAC → RNA) alongside direct evidence (TF → RNA), improving GRN quality compared to single-layer approaches.
# Generate integrated GRN via RWR across all layers
grn = recon.infer_grn.generate_grn(
rna_network=rna_network,
atac_network=atac_network,
tf_network=tf_network,
atac_to_rna_links=atac_rna_links,
tf_to_atac_links=tf_atac_links,
n_jobs=1
)
# Keep only RNA layer (TF → gene relationships)
grn = grn[grn.layer == 'RNA']
print(f"Final GRN: {len(grn)} TF → gene edges")
TF_0
ATAC_0
RNA_0
Initializing Dask cluster with 1 workers...
http://192.168.1.150:8787/status
Running RWR per seed: 100%|██████████| 29/29 [00:01<00:00, 20.07it/s]
Building Dask DataFrame graph with delayed parallel tasks...
Final GRN: 2929 TF → gene edges
# Preview the final GRN
grn.head(10)
| index | layer | target | path_layer | score | seed | |
|---|---|---|---|---|---|---|
| 10755 | 618 | RNA | PADI6 | RNA_0 | 0.000109 | GFI1_TF |
| 11123 | 156 | RNA | CLCN6 | RNA_0 | 0.000098 | HIVEP3_TF |
| 11181 | 809 | RNA | SNRNP40 | RNA_0 | 0.000097 | HIVEP3_TF |
| 11303 | 759 | RNA | SDHB | RNA_0 | 0.000096 | HIVEP3_TF |
| 11360 | 643 | RNA | PIK3CD-AS1 | RNA_0 | 0.000096 | NFIA_TF |
| 11813 | 242 | RNA | EFHD2 | RNA_0 | 0.000091 | JUN_TF |
| 11825 | 913 | RNA | TSSK3 | RNA_0 | 0.000091 | JUN_TF |
| 11864 | 100 | RNA | CA6 | RNA_0 | 0.000091 | JUN_TF |
| 11877 | 538 | RNA | MIR34AHG | RNA_0 | 0.000090 | JUN_TF |
| 12039 | 747 | RNA | RUNX3 | RNA_0 | 0.000079 | ZNF683_TF |
Save the GRN¶
Save the inferred GRN for use in other ReCoN tutorials (e.g., Tutorial 2: Multicellular Coordination).
# Save to CSV
grn.to_csv("recon_hummus_grn.csv", index=False)
print("GRN saved to recon_hummus_grn.csv")
GRN saved to recon_hummus_grn.csv
Tip
Next steps
Use this GRN in:
Tutorial 2: Multicellular Coordination - Explore upstream regulators
Tutorial 3: Molecular Cascades - Visualize signaling pathways
This work uses ReCoN’s wrapper of different packages, which are all cited in the Methods section:
The RNA layer uses arboreto¹
The ATAC layer uses CIRCE²
The TF-ATAC and ATAC-RNA bipartites use CellOracle³
The exploration of the multilayer uses HuMMuS⁴ and MultiXrank⁵
References¶
[1] Moerman, T., Aibar, S., Bravo González-Blas, C., Simm, J., Moreau, Y., Aerts, J., & Aerts, S. (2019).
GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks.
Bioinformatics, 35(12), 2159–2161. doi:10.1093/bioinformatics/bty916
[2] Trimbour, R., Saez Rodriguez J., Cantini L. (2025). CIRCE: a scalable Python package to predict cis-regulatory DNA interactions from single-cell chromatin accessibility data. doi:https://doi.org/10.1101/2025.09.23.678054
[3] Kamimoto, K., Stringa, B., Hoffmann, C. M., Jindal, K., Solnica-Krezel, L., Morris, S. A., et al. (2023).
Dissecting cell identity via network inference and in silico gene perturbation.
Nature, 614, 742–751. doi:10.1038/s41586-022-05688-9
[4] Trimbour, R., Deutschmann I. M., Cantini L. (2024).
Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS.
Bioinformatics, 40(5), btae143. doi:10.1093/bioinformatics/btae143
[5] Baptista, A., González, A., & Baudot, A. (2022).
Universal multilayer network exploration by random walk with restart.
Communications Physics, 5, 170. doi:10.1038/s42005-022-00937-9