.. note:: To download the tutorial data, use the following commands:

All tutorial data:

from recon.data import fetch_all_tutorial_data
fetch_all_tutorial_data(data_dir='./data')

Specific file (e.g., RNA data):

from recon.data import fetch_tutorial_data
fetch_tutorial_data('perturbation_tuto/rna.h5ad', data_dir='./data')

Inferring Gene Regulatory Networks with HuMMuS

This tutorial demonstrates how to use ReCoN to infer Gene Regulatory Networks (GRNs) from paired single-cell RNA-seq and ATAC-seq data using the HuMMuS methodology.

What you will learn

  1. How to compute individual network layers (RNA, ATAC, TF)

  2. How to link layers via bipartite networks (TF→ATAC, ATAC→RNA)

  3. How to integrate everything into a multilayer GRN using Random Walk with Restart

Warning

Python 3.10 required!

GRN inference requires a Python 3.10 environment due to CellOracle and gimmemotifs dependencies. See the Installation guide for setup instructions.


Prerequisites

Note

Required packages

  • bedtools - for ATAC-seq peak processing (conda install -c bioconda bedtools)

  • muon - for multi-omics data handling (pip install muon)

  • circe-py - for ATAC co-accessibility networks (included with ReCoN)

Check bedtools installation

cd ../../../
/Users/remitrimbour/ReCoN_project/ReCoN
import os
import shutil

# Check if bedtools is available
if shutil.which("bedtools") is None:
    # Try to find bedtools in common conda locations
    conda_prefix = os.environ.get("CONDA_PREFIX", "")
    possible_paths = [
        f"{conda_prefix}/bin",
        os.path.expanduser("~/miniconda3/bin"),
        os.path.expanduser("~/anaconda3/bin"),
        "/usr/local/bin",
    ]
    for path in possible_paths:
        bedtools_path = os.path.join(path, "bedtools")
        if os.path.exists(bedtools_path):
            os.environ["PATH"] = path + ":" + os.environ["PATH"]
            print(f"Added {path} to PATH")
            break
    
# Verify bedtools is now available
!which bedtools && bedtools --version
/Users/remitrimbour/miniconda3/envs/recon/bin/bedtools
bedtools v2.31.1

Import libraries

import hummuspy.loader  # HuMMuS multilayer network utilities
import circe as ci       # ATAC-seq co-accessibility networks
import recon             # ReCoN GRN inference
import recon.infer_grn   # ReCoN GRN inference
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Matplotlib is building the font cache; this may take a moment.
Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)
2026-01-14 14:15:45,106 - INFO - Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)
To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle
2026-01-14 14:15:45,109 - INFO - To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle
import muon as mu   # Multi-omics data handling
import scanpy as sc # Single-cell analysis
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/scanpy/__init__.py:96: FutureWarning: `__version__` is deprecated, use `importlib.metadata.version('scanpy')` instead
  warnings.warn(msg, FutureWarning, stacklevel=2)

Load Multi-omics Data

We use paired scRNA-seq and scATAC-seq data stored in a MuData object.

Tip

The example dataset (PBMC 10X) can be downloaded from the CIRCE benchmark repository.

# Load paired RNA + ATAC data
mudata = mu.read_h5mu("./data/build_grn_tuto/pbmc10x.h5mu")
print(f"RNA: {mudata.mod['rna'].shape}, ATAC: {mudata.mod['atac'].shape}")
RNA: (9631, 18410), ATAC: (9631, 215676)
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
  warnings.warn(
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.
  warnings.warn(
# Subset for faster computation (remove for full analysis)
rna = mudata.mod['rna'][:5000, :1000]   # First 1000 genes
atac = mudata.mod['atac'][:5000, :5000] # First 5000 peaks
print(f"Subset - RNA: {rna.shape}, ATAC: {atac.shape}")
Subset - RNA: (5000, 1000), ATAC: (5000, 5000)

Build the Multilayer GRN

The HuMMuS methodology builds a GRN by integrating multiple layers:

Layer

Description

Method

RNA layer

Gene co-regulation patterns (which genes vary together)

GRNBoost2 (Arboreto)

ATAC layer

Peak co-accessibility (which regulatory regions are co-active)

CIRCE

TF layer

TF-TF co-expression

Correlation

TF → ATAC

Where TFs bind DNA (motif-based)

CellOracle motif scanning

ATAC → RNA

Which peaks regulate which genes

CIRCE peak-gene links

Note

Why multiple layers?

GRNBoost2 alone infers statistical associations between TFs and genes, but these can include indirect relationships. By integrating chromatin accessibility (ATAC), we add mechanistic evidence: a TF can only regulate a gene if (1) it binds nearby accessible DNA and (2) that region is linked to the gene. The multilayer RWR combines both direct (TF→gene) and indirect (TF→ATAC→gene) evidence for more robust GRN inference.

Step 1: Load TF list

# Load curated list of human transcription factors
tfs_list = hummuspy.loader.load_tfs("human_tfs_r_hummus")
print(f"Loaded {len(tfs_list)} transcription factors")
Loaded 914 transcription factors

Step 2: Compute RNA network (gene co-regulation)

Uses GRNBoost2 to identify genes that co-vary with each TF. This captures statistical associations that may include both direct targets and genes in the same regulatory programs.

Warning

GRNBoost2 edges are used to represent co-regulated genes, not final TF-gene regulation. The multilayer integration in the final step will leverage these by requiring chromatin accessibility evidence.

# Compute TF → gene network using GRNBoost2
rna_network = recon.infer_grn.compute_rna_network(rna, tf_names=tfs_list, n_cpu=15)
print(f"RNA network: {len(rna_network)} edges")
Calculating TF-to-gene importance
Running using 15 cores: 100%|██████████| 1000/1000 [00:25<00:00, 39.75it/s]
RNA network: 25881 edges

Step 3: Compute ATAC network (peak co-accessibility)

Uses CIRCE to compute co-accessibility between ATAC-seq peaks.

# Prepare ATAC peak names (replace special characters)
atac.var_names = atac.var_names.str.replace("-", "_")
atac.var_names = atac.var_names.str.replace(":", "_")

# Add genomic region information
atac = ci.add_region_infos(atac)

# Compute peak co-accessibility network
ci.compute_atac_network(atac, njobs=20)

# Extract and format the network
atac_network = ci.extract_atac_links(atac, key='atac_network')
atac_network = atac_network.rename(columns={
    "Peak1": "source",
    "Peak2": "target",
    "score": "weight"
})
print(f"ATAC network: {len(atac_network)} edges")
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: 
                No organism, nor value passed for the parameters: ['window_size', 'distance_constraint', 's'],
                using default values.
                The default values are defined from human and mouse data,
                you might want to change them if you are working with
                another organisms.

                Default values used:
                {'window_size': 500000, 'distance_constraint': 250000, 's': 0.75}

                You can check how to define them in https://cole-trapnell-lab.github.io/cicero-release/docs_m3/#important-considerations-for-non-human-data.
                
  _warn(*args, **kwargs)
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: 
install "ipywidgets" for Jupyter support
  _warn(*args, **kwargs)


Calculating co-accessibility scores...
Concatenating results from all chromosomes...
ATAC network: 172271 edges
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/anndata/_core/aligned_mapping.py:156: ImplicitModificationWarning: Setting element `.varp['atac_network']` of view, initializing view as actual.
  warnings.warn(

Step 4: Compute TF network (TF-TF interactions)

# Compute TF-TF correlation network
tf_network = recon.infer_grn.compute_tf_network(rna, tfs_list=tfs_list)
print(f"TF network: {len(tf_network)} edges")
TF network: 29 edges
tf_network.head(3)
source target
0 TFAP2E_TF fake_TF
1 HES2_TF fake_TF
2 ID3_TF fake_TF

Integrate into Final GRN

Now we combine all layers using Random Walk with Restart (RWR) to propagate information across the multilayer network.

Tip

The RWR integration weighs indirect evidence (TF → ATAC → RNA) alongside direct evidence (TF → RNA), improving GRN quality compared to single-layer approaches.

# Generate integrated GRN via RWR across all layers
grn = recon.infer_grn.generate_grn(
    rna_network=rna_network,
    atac_network=atac_network,
    tf_network=tf_network,
    atac_to_rna_links=atac_rna_links,
    tf_to_atac_links=tf_atac_links,
    n_jobs=1
)

# Keep only RNA layer (TF → gene relationships)
grn = grn[grn.layer == 'RNA']
print(f"Final GRN: {len(grn)} TF → gene edges")
TF_0
ATAC_0
RNA_0
Initializing Dask cluster with 1 workers...
http://192.168.1.150:8787/status
Running RWR per seed: 100%|██████████| 29/29 [00:01<00:00, 20.07it/s]
Building Dask DataFrame graph with delayed parallel tasks...
Final GRN: 2929 TF → gene edges
# Preview the final GRN
grn.head(10)
index layer target path_layer score seed
10755 618 RNA PADI6 RNA_0 0.000109 GFI1_TF
11123 156 RNA CLCN6 RNA_0 0.000098 HIVEP3_TF
11181 809 RNA SNRNP40 RNA_0 0.000097 HIVEP3_TF
11303 759 RNA SDHB RNA_0 0.000096 HIVEP3_TF
11360 643 RNA PIK3CD-AS1 RNA_0 0.000096 NFIA_TF
11813 242 RNA EFHD2 RNA_0 0.000091 JUN_TF
11825 913 RNA TSSK3 RNA_0 0.000091 JUN_TF
11864 100 RNA CA6 RNA_0 0.000091 JUN_TF
11877 538 RNA MIR34AHG RNA_0 0.000090 JUN_TF
12039 747 RNA RUNX3 RNA_0 0.000079 ZNF683_TF

Save the GRN

Save the inferred GRN for use in other ReCoN tutorials (e.g., Tutorial 2: Multicellular Coordination).

# Save to CSV
grn.to_csv("recon_hummus_grn.csv", index=False)
print("GRN saved to recon_hummus_grn.csv")
GRN saved to recon_hummus_grn.csv

Tip

Next steps

Use this GRN in:


This work uses ReCoN’s wrapper of different packages, which are all cited in the Methods section:


References

[1] Moerman, T., Aibar, S., Bravo González-Blas, C., Simm, J., Moreau, Y., Aerts, J., & Aerts, S. (2019).
GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks.
Bioinformatics, 35(12), 2159–2161. doi:10.1093/bioinformatics/bty916

[2] Trimbour, R., Saez Rodriguez J., Cantini L. (2025). CIRCE: a scalable Python package to predict cis-regulatory DNA interactions from single-cell chromatin accessibility data. doi:https://doi.org/10.1101/2025.09.23.678054

[3] Kamimoto, K., Stringa, B., Hoffmann, C. M., Jindal, K., Solnica-Krezel, L., Morris, S. A., et al. (2023).
Dissecting cell identity via network inference and in silico gene perturbation.
Nature, 614, 742–751. doi:10.1038/s41586-022-05688-9

[4] Trimbour, R., Deutschmann I. M., Cantini L. (2024).
Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS.
Bioinformatics, 40(5), btae143. doi:10.1093/bioinformatics/btae143

[5] Baptista, A., González, A., & Baudot, A. (2022).
Universal multilayer network exploration by random walk with restart.
Communications Physics, 5, 170. doi:10.1038/s42005-022-00937-9