.. note:: To download the tutorial data, use the following commands:

All tutorial data:

from recon.data import fetch_all_tutorial_data
fetch_all_tutorial_data(data_dir='./data')

Specific file (e.g., RNA data):

from recon.data import fetch_tutorial_data
fetch_tutorial_data('perturbation_tuto/rna.h5ad', data_dir='./data')

Inferring Gene Regulatory Networks with HuMMuS¶

This tutorial demonstrates how to use ReCoN to infer Gene Regulatory Networks (GRNs) from paired single-cell RNA-seq and ATAC-seq data using the HuMMuS methodology.

What you will learn¶

How to compute individual network layers (RNA, ATAC, TF)
How to link layers via bipartite networks (TF→ATAC, ATAC→RNA)
How to integrate everything into a multilayer GRN using Random Walk with Restart

Warning

Python 3.10 required!

GRN inference requires a Python 3.10 environment due to CellOracle and gimmemotifs dependencies. See the Installation guide for setup instructions.

Prerequisites¶

Note

Required packages

bedtools - for ATAC-seq peak processing (conda install -c bioconda bedtools)
muon - for multi-omics data handling (pip install muon)
circe-py - for ATAC co-accessibility networks (included with ReCoN)

Check bedtools installation¶

cd ../../../

/Users/remitrimbour/ReCoN_project/ReCoN

import os
import shutil

# Check if bedtools is available
if shutil.which("bedtools") is None:
    # Try to find bedtools in common conda locations
    conda_prefix = os.environ.get("CONDA_PREFIX", "")
    possible_paths = [
        f"{conda_prefix}/bin",
        os.path.expanduser("~/miniconda3/bin"),
        os.path.expanduser("~/anaconda3/bin"),
        "/usr/local/bin",
    ]
    for path in possible_paths:
        bedtools_path = os.path.join(path, "bedtools")
        if os.path.exists(bedtools_path):
            os.environ["PATH"] = path + ":" + os.environ["PATH"]
            print(f"Added {path} to PATH")
            break
    
# Verify bedtools is now available
!which bedtools && bedtools --version

/Users/remitrimbour/miniconda3/envs/recon/bin/bedtools
bedtools v2.31.1

Import libraries¶

import hummuspy.loader  # HuMMuS multilayer network utilities
import circe as ci       # ATAC-seq co-accessibility networks
import recon             # ReCoN GRN inference
import recon.infer_grn   # ReCoN GRN inference

/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Matplotlib is building the font cache; this may take a moment.
Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)
2026-01-14 14:15:45,106 - INFO - Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)
To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle
2026-01-14 14:15:45,109 - INFO - To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle

import muon as mu   # Multi-omics data handling
import scanpy as sc # Single-cell analysis

/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/scanpy/__init__.py:96: FutureWarning: `__version__` is deprecated, use `importlib.metadata.version('scanpy')` instead
  warnings.warn(msg, FutureWarning, stacklevel=2)

Load Multi-omics Data¶

We use paired scRNA-seq and scATAC-seq data stored in a MuData object.

Tip

The example dataset (PBMC 10X) can be downloaded from the CIRCE benchmark repository.

# Load paired RNA + ATAC data
mudata = mu.read_h5mu("./data/build_grn_tuto/pbmc10x.h5mu")
print(f"RNA: {mudata.mod['rna'].shape}, ATAC: {mudata.mod['atac'].shape}")

RNA: (9631, 18410), ATAC: (9631, 215676)

/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
  warnings.warn(
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
  warnings.warn(

# Subset for faster computation (remove for full analysis)
rna = mudata.mod['rna'][:5000, :1000]   # First 1000 genes
atac = mudata.mod['atac'][:5000, :5000] # First 5000 peaks
print(f"Subset - RNA: {rna.shape}, ATAC: {atac.shape}")

Subset - RNA: (5000, 1000), ATAC: (5000, 5000)

Build the Multilayer GRN¶

The HuMMuS methodology builds a GRN by integrating multiple layers:

Layer	Description	Method
RNA layer	Gene co-regulation patterns (which genes vary together)	GRNBoost2 (Arboreto)
ATAC layer	Peak co-accessibility (which regulatory regions are co-active)	CIRCE
TF layer	TF-TF co-expression	Correlation
TF → ATAC	Where TFs bind DNA (motif-based)	CellOracle motif scanning
ATAC → RNA	Which peaks regulate which genes	CIRCE peak-gene links

Note

Why multiple layers?

GRNBoost2 alone infers statistical associations between TFs and genes, but these can include indirect relationships. By integrating chromatin accessibility (ATAC), we add mechanistic evidence: a TF can only regulate a gene if (1) it binds nearby accessible DNA and (2) that region is linked to the gene. The multilayer RWR combines both direct (TF→gene) and indirect (TF→ATAC→gene) evidence for more robust GRN inference.

Step 1: Load TF list¶

# Load curated list of human transcription factors
tfs_list = hummuspy.loader.load_tfs("human_tfs_r_hummus")
print(f"Loaded {len(tfs_list)} transcription factors")

Loaded 914 transcription factors

Step 2: Compute RNA network (gene co-regulation)¶

Uses GRNBoost2 to identify genes that co-vary with each TF. This captures statistical associations that may include both direct targets and genes in the same regulatory programs.

Warning

GRNBoost2 edges are used to represent co-regulated genes, not final TF-gene regulation. The multilayer integration in the final step will leverage these by requiring chromatin accessibility evidence.

# Compute TF → gene network using GRNBoost2
rna_network = recon.infer_grn.compute_rna_network(rna, tf_names=tfs_list, n_cpu=15)
print(f"RNA network: {len(rna_network)} edges")

Calculating TF-to-gene importance

Running using 15 cores: 100%|██████████| 1000/1000 [00:25<00:00, 39.75it/s]

RNA network: 25881 edges

Step 3: Compute ATAC network (peak co-accessibility)¶

Uses CIRCE to compute co-accessibility between ATAC-seq peaks.

# Prepare ATAC peak names (replace special characters)
atac.var_names = atac.var_names.str.replace("-", "_")
atac.var_names = atac.var_names.str.replace(":", "_")

# Add genomic region information
atac = ci.add_region_infos(atac)

# Compute peak co-accessibility network
ci.compute_atac_network(atac, njobs=20)

# Extract and format the network
atac_network = ci.extract_atac_links(atac, key='atac_network')
atac_network = atac_network.rename(columns={
    "Peak1": "source",
    "Peak2": "target",
    "score": "weight"
})
print(f"ATAC network: {len(atac_network)} edges")

/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: 
                No organism, nor value passed for the parameters: ['window_size', 'distance_constraint', 's'],
                using default values.
                The default values are defined from human and mouse data,
                you might want to change them if you are working with
                another organisms.

                Default values used:
                {'window_size': 500000, 'distance_constraint': 250000, 's': 0.75}

                You can check how to define them in https://cole-trapnell-lab.github.io/cicero-release/docs_m3/#important-considerations-for-non-human-data.
                
  _warn(*args, **kwargs)

/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: 
install "ipywidgets" for Jupyter support
  _warn(*args, **kwargs)

Calculating co-accessibility scores...
Concatenating results from all chromosomes...
ATAC network: 172271 edges

/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/anndata/_core/aligned_mapping.py:156: ImplicitModificationWarning: Setting element `.varp['atac_network']` of view, initializing view as actual.
  warnings.warn(

Step 4: Compute TF network (TF-TF interactions)¶

# Compute TF-TF correlation network
tf_network = recon.infer_grn.compute_tf_network(rna, tfs_list=tfs_list)
print(f"TF network: {len(tf_network)} edges")

TF network: 29 edges

tf_network.head(3)

	source	target
0	TFAP2E_TF	fake_TF
1	HES2_TF	fake_TF
2	ID3_TF	fake_TF

Step 5: Compute bipartite links¶

Now we connect the layers:

ATAC → RNA: Which peaks regulate which genes (based on proximity and correlation)
TF → ATAC: Where TFs bind DNA (based on motif scanning)

Note

These steps require a reference genome. Make sure to install it first:

import genomepy
genomepy.install_genome("hg19", "UCSC", annotation=True)

# ATAC → RNA: Peak-gene links based on proximity and correlation
atac_rna_links = recon.infer_grn.compute_atac_to_rna_links(atac, rna, ref_genome="hg19")
print(f"ATAC→RNA links: {len(atac_rna_links)} edges")

que bed peaks: 5000
tss peaks in que: 180
ATAC→RNA links: 130 edges

# TF → ATAC: Motif scanning to find TF binding sites
tf_atac_links = recon.infer_grn.compute_tf_to_atac_links(atac, ref_genome="mm10", n_cpus=30)
print(f"TF→ATAC links: {len(tf_atac_links)} edges")

Peaks before filtering:  5000
Peaks with invalid chr_name:  0
Peaks with invalid length:  0
Peaks after filtering:  5000
No motif data entered. Loading default motifs for your species ...
 Default motif for vertebrate: gimme.vertebrate.v5.0. 
 For more information, please see https://gimmemotifs.readthedocs.io/en/master/overview.html 

Initiating scanner... 

2026-01-14 14:17:58,331 - DEBUG - using background: genome mm10 with size 200

Calculating FPR-based threshold. This step may take substantial time when you load a new ref-genome. It will be done quicker on the second time. 

2026-01-14 14:18:00,535 - DEBUG - determining FPR-based threshold
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.

Motif scan started .. It may take long time.

Scanning:   0%|          | 0/5000 [00:00<?, ? sequences/s]Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Matplotlib is building the font cache; this may take a moment.
Scanning: 100%|██████████| 5000/5000 [03:10<00:00, 26.28 sequences/s]  

Filtering finished: 473238 -> 100930
1. Converting scanned results into one-hot encoded dataframe.

100%|██████████| 4548/4548 [00:04<00:00, 1093.61it/s]

2. Converting results into dictionaries.

100%|██████████| 1/1 [00:00<00:00, 26.58it/s]
100%|██████████| 1093/1093 [00:00<00:00, 20999.38it/s]

TF→ATAC links: 2253638 edges

# Filter to keep only links from known TFs
tf_atac_links = tf_atac_links[tf_atac_links['source'].str.replace("_TF", "").isin(tfs_list)]
print(f"Filtered TF→ATAC links: {len(tf_atac_links)} edges")

Filtered TF→ATAC links: 1053944 edges

tf_atac_links.head(3)

	source	target
5	GATA3_TF	chr1_10016349_10016849
6	GATA2_TF	chr1_10016349_10016849
7	RORC_TF	chr1_10016349_10016849

Integrate into Final GRN¶

Now we combine all layers using Random Walk with Restart (RWR) to propagate information across the multilayer network.

Tip

The RWR integration weighs indirect evidence (TF → ATAC → RNA) alongside direct evidence (TF → RNA), improving GRN quality compared to single-layer approaches.

# Generate integrated GRN via RWR across all layers
grn = recon.infer_grn.generate_grn(
    rna_network=rna_network,
    atac_network=atac_network,
    tf_network=tf_network,
    atac_to_rna_links=atac_rna_links,
    tf_to_atac_links=tf_atac_links,
    n_jobs=1
)

# Keep only RNA layer (TF → gene relationships)
grn = grn[grn.layer == 'RNA']
print(f"Final GRN: {len(grn)} TF → gene edges")

TF_0
ATAC_0
RNA_0
Initializing Dask cluster with 1 workers...
http://192.168.1.150:8787/status

Running RWR per seed: 100%|██████████| 29/29 [00:01<00:00, 20.07it/s]

Building Dask DataFrame graph with delayed parallel tasks...
Final GRN: 2929 TF → gene edges

# Preview the final GRN
grn.head(10)

	index	layer	target	path_layer	score	seed
10755	618	RNA	PADI6	RNA_0	0.000109	GFI1_TF
11123	156	RNA	CLCN6	RNA_0	0.000098	HIVEP3_TF
11181	809	RNA	SNRNP40	RNA_0	0.000097	HIVEP3_TF
11303	759	RNA	SDHB	RNA_0	0.000096	HIVEP3_TF
11360	643	RNA	PIK3CD-AS1	RNA_0	0.000096	NFIA_TF
11813	242	RNA	EFHD2	RNA_0	0.000091	JUN_TF
11825	913	RNA	TSSK3	RNA_0	0.000091	JUN_TF
11864	100	RNA	CA6	RNA_0	0.000091	JUN_TF
11877	538	RNA	MIR34AHG	RNA_0	0.000090	JUN_TF
12039	747	RNA	RUNX3	RNA_0	0.000079	ZNF683_TF

Save the GRN¶

Save the inferred GRN for use in other ReCoN tutorials (e.g., Tutorial 2: Multicellular Coordination).

# Save to CSV
grn.to_csv("recon_hummus_grn.csv", index=False)
print("GRN saved to recon_hummus_grn.csv")

GRN saved to recon_hummus_grn.csv

Tip

Next steps

Use this GRN in:

Tutorial 2: Multicellular Coordination - Explore upstream regulators
Tutorial 3: Molecular Cascades - Visualize signaling pathways

This work uses ReCoN’s wrapper of different packages, which are all cited in the Methods section:

The RNA layer uses arboreto¹
The ATAC layer uses CIRCE²
The TF-ATAC and ATAC-RNA bipartites use CellOracle³
The exploration of the multilayer uses HuMMuS⁴ and MultiXrank⁵

References¶

[1] Moerman, T., Aibar, S., Bravo González-Blas, C., Simm, J., Moreau, Y., Aerts, J., & Aerts, S. (2019).
GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks.
Bioinformatics, 35(12), 2159–2161. doi:10.1093/bioinformatics/bty916

[2] Trimbour, R., Saez Rodriguez J., Cantini L. (2025). CIRCE: a scalable Python package to predict cis-regulatory DNA interactions from single-cell chromatin accessibility data. doi:https://doi.org/10.1101/2025.09.23.678054

[3] Kamimoto, K., Stringa, B., Hoffmann, C. M., Jindal, K., Solnica-Krezel, L., Morris, S. A., et al. (2023).
Dissecting cell identity via network inference and in silico gene perturbation.
Nature, 614, 742–751. doi:10.1038/s41586-022-05688-9

[4] Trimbour, R., Deutschmann I. M., Cantini L. (2024).
Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS.
Bioinformatics, 40(5), btae143. doi:10.1093/bioinformatics/btae143

[5] Baptista, A., González, A., & Baudot, A. (2022).
Universal multilayer network exploration by random walk with restart.
Communications Physics, 5, 170. doi:10.1038/s42005-022-00937-9