.. note:: To download the tutorial data, use the following commands:

All tutorial data:

from recon.data import fetch_all_tutorial_data
fetch_all_tutorial_data(data_dir='./data')

Specific file (e.g., RNA data):

from recon.data import fetch_tutorial_data
fetch_tutorial_data('perturbation_tuto/rna.h5ad', data_dir='./data')

Visualizing Molecular Cascades with Sankey Diagrams

This tutorial demonstrates how to use ReCoN’s Sankey diagram functions to visualize molecular cascades across cell types. These visualizations help understand how signals propagate from ligands through receptors, transcription factors (TFs), and target genes.

What you will learn:

  • How to create intracellular cascades (Receptor → TF → Gene)

  • How to visualize ligand-receptor signaling (Ligand → Receptor → TF → Gene)

  • How to display full intercellular cascades with upstream regulators

Setup

import numpy as np
import scanpy as sc  # single cell data
import pandas as pd  # data manipulation
import liana as li  # cell communication
import recon  # multilayer and perturbation prediction
import recon.data
import recon.explore
import recon.plot.sankey_paths
# Load example scRNA-seq data
rna = sc.read_h5ad("./data/perturbation_tuto/rna.h5ad")
# Check available cell types
rna.obs["celltype"].unique().tolist()
['B_cell',
 'ILC',
 'Macrophage',
 'MigDC',
 'Monocyte',
 'NK_cell',
 'Neutrophil',
 'T_cell_CD4',
 'T_cell_CD8',
 'T_cell_gd',
 'Treg',
 'cDC1',
 'cDC2',
 'eTAC',
 'pDC']

Create ReCoN’s Multilayer Network

1. Import Gene Regulatory Network (GRN)

grn_path = "./data/perturbation_tuto/grn.csv"
grn = pd.read_csv(grn_path)
grn = grn.sort_values(by="weight", ascending=False)[:500_000]
grn["source"] = grn["source"].str.capitalize()
grn["source"] = grn["source"] + '_TF'
grn["target"] = grn["target"].str.capitalize()
grn.head(3)
Unnamed: 0 target source weight
0 0 Pax5 Mbd1_TF 0.000095
2 2 Pax5 Smad5_TF 0.000092
1 1 Pax5 Smad1_TF 0.000092

2. Compute Cell-Cell Communication with LIANA+

li.method.cellphonedb(
    rna, 
    resource_name="mouseconsensus",
    expr_prop=0.00,
    use_raw=False,
    groupby="celltype",
    verbose=True, 
    key_added='cpdb_res'
)
Using resource `mouseconsensus`.
Using `.X`!
/pasteur/appa/homes/rtrimbou/miniconda3/envs/snakemake/envs/recon-grn/lib/python3.10/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
15364 features of mat are empty, they will be removed.
Make sure that normalized counts are passed!
/pasteur/appa/homes/rtrimbou/miniconda3/envs/snakemake/envs/recon-grn/lib/python3.10/site-packages/liana/method/_pipe_utils/_pre.py:146: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
/pasteur/appa/homes/rtrimbou/miniconda3/envs/snakemake/envs/recon-grn/lib/python3.10/site-packages/liana/method/_pipe_utils/_pre.py:149: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
0.36 of entities in the resource are missing from the data.
Generating ligand-receptor stats for 1296 samples and 937 features
100%|██████████| 1000/1000 [00:04<00:00, 230.63it/s]
# Format cell communication network for ReCoN
ccc_network = rna.uns["cpdb_res"].copy()
ccc_network = ccc_network[["ligand", "receptor", "lr_means", "source", "target"]]
ccc_network = ccc_network.rename(columns={
    "lr_means": "weight",
    "source": "celltype_source",
    "target": "celltype_target",
    "ligand": "source",
    "receptor": "target"
})
ccc_network = ccc_network[ccc_network['weight'] != 0]
ccc_network.head(3)
source target weight celltype_source celltype_target
406685 App Cd74 102.485008 cDC2 cDC1
405645 Copa Cd74 102.370003 cDC1 cDC1
410237 Copa Cd74 102.366211 eTAC cDC1

4. Define Seeds and Parameters

# Define seed genes (example: inflammatory signaling genes)
seed_genes = ["Nfkb1", "Tnf", "Il6", "Ccl2", "Cxcl10"]

# Define focal cell type
focal_celltype = "Macrophage"

# Add celltype suffix for seeds
seeds_with_suffix = {f"{g}::{focal_celltype}": 1.0 for g in seed_genes}
# Network parameters
cell_communication_graph_directed = False
cell_communication_graph_weighted = True
restart_proba = 0.6
grn_graph_weighted = True
grn_graph_directed = False

Assemble the Multicellular Network

celltypes = ["B_cell", "pDC", "Macrophage", "NK_cell", "T_cell_CD4", "T_cell_CD8"]

multicell = recon.explore.Multicell(
    celltypes={celltype: recon.explore.Celltype(
        grn_graph=grn,
        receptor_grn_bipartite=receptor_genes,
        celltype_name=celltype,
        receptor_graph_directed=False,
        receptor_graph_weighted=False,
        grn_graph_directed=grn_graph_directed,
        grn_graph_weighted=grn_graph_weighted,
        receptor_grn_bipartite_graph_directed=False,
        receptor_grn_bipartite_graph_weighted=True,
        seeds=[]
    ) for celltype in celltypes},
    cell_communication_graph=ccc_network.iloc[
        ccc_network["celltype_source"].isin(celltypes).values & 
        ccc_network["celltype_target"].isin(celltypes).values, :
    ],
    cell_communication_graph_directed=cell_communication_graph_directed,
    cell_communication_graph_weighted=cell_communication_graph_weighted,
    bipartite_grn_cell_communication_directed=False,
    bipartite_grn_cell_communication_weighted=False,
    bipartite_cell_communication_receptor_directed=False,
    bipartite_cell_communication_receptor_weighted=False,
    seeds=seeds_with_suffix,
)
/pasteur/helix/projects/ml4ig_hot/Users/rtrimbou/ReCoN/src/recon/explore/recon.py:122: UserWarning: 
                No receptor_graph provided,
                an empty receptor graph will be created.
                
/pasteur/helix/projects/ml4ig_hot/Users/rtrimbou/ReCoN/src/recon/explore/recon.py:387: UserWarning: The celltypes dictionary was converted toa list of Celltype objects.
The keys of the dictionary will be the celltype names.

Set Lambda for Upstream Exploration

multicell.lamb = recon.explore.set_lambda(
    multicell,
    direction="upstream",
    strategy="intercell",
)

Run Random Walk with Restart

# Run multicellular exploration
results = multicell.explore(restart_proba=restart_proba)

# Format results as gene profiles per cell type
cell_type_profiles = recon.explore.format_multicell_results(results, celltypes=celltypes)
Seeds are provided as a dictionary with weights per seed.
Creating a multixrank object with seeds as a dictionary.
cell_communication
receptor
gene
receptor
gene
receptor
gene
receptor
gene
receptor
gene
receptor
gene
Identifying produced ligands in response to the perturbation.
/pasteur/helix/projects/ml4ig_hot/Users/rtrimbou/ReCoN/src/recon/explore/recon.py:588: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/pasteur/helix/projects/ml4ig_hot/Users/rtrimbou/ReCoN/src/recon/explore/recon.py:588: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
# View results
results.sort_values(by="score", ascending=False)
multiplex node layer score
5662 Macrophage_grn Nfkb1::Macrophage gene 0.200103
2043 Macrophage_grn Cxcl10::Macrophage gene 0.200019
9170 Macrophage_grn Tnf::Macrophage gene 0.200013
705 Macrophage_receptor fake_receptor::Macrophage receptor 0.049329
8302 Macrophage_grn Sp100_TF::Macrophage gene 0.008345
... ... ... ... ...
1798 cell_communication Ptpra-T_cell_CD8 cell_communication 0.000000
1797 cell_communication Ptpra-T_cell_CD4 cell_communication 0.000000
1796 cell_communication Ptpra-NK_cell cell_communication 0.000000
617 cell_communication Clec1b-Macrophage cell_communication 0.000000
1531 cell_communication Mttp-B_cell cell_communication 0.000000

69791 rows × 4 columns


Visualize Molecular Cascades with Sankey Diagrams

ReCoN provides three types of Sankey diagrams to visualize molecular cascades:

Function

Layers

Use Case

plot_intracell_sankey

Receptor → TF → Gene

Intracellular regulation only

plot_ligand_sankey

Ligand → Receptor → TF → Gene

Cross-cell signaling (4 layers)

plot_intercell_sankey

Before-Receptor → Before-TF → Ligand → Receptor → TF → Gene

Full cascade with upstream regulators (6 layers)

1. Intracellular Cascade (Receptor → TF → Gene)

Shows regulation within a single cell type: how receptors connect to transcription factors that regulate the seed genes.

recon.plot.sankey_paths.plot_intracell_sankey(
    multicell_obj=multicell,
    results=results,
    cell_type=focal_celltype,
    seeds=seed_genes,
    top_receptor_n=10,  # Number of top receptors to include
    top_tf_n=15,        # Number of top TFs to include
    flow="upstream",    # Direction: receptor → TF → gene
    save_path=None
)

2. Ligand-Receptor Cascade (Ligand → Receptor → TF → Gene)

Shows how ligands from other cell types activate receptors in the focal cell type, leading to TF activation and gene regulation.

# Define ligand source cell types (cells that send signals)
ligand_source_cells = [ct for ct in celltypes if ct != focal_celltype]
print(f"Ligand source cells: {ligand_source_cells}")
Ligand source cells: ['B_cell', 'pDC', 'NK_cell', 'T_cell_CD4', 'T_cell_CD8']
recon.plot.sankey_paths.plot_ligand_sankey(
    multicell_obj=multicell,
    results=results,
    cell_type=focal_celltype,
    seeds=seed_genes,
    ligand_cells=ligand_source_cells,
    top_ligand_n=20,     # Number of top ligands to include
    top_receptor_n=10,   # Number of top receptors to include
    top_tf_n=10,         # Number of top TFs to include
    per_celltype=True,   # Select top ligands per source cell type (more balanced)
    flow="upstream",
    save_path=None
)

3. Full Intercellular Cascade (6 Layers)

Shows the complete regulatory cascade including:

  1. Before-Receptors: Receptors in ligand-producing cells

  2. Before-TFs: TFs that regulate ligand production

  3. Ligands: Signaling molecules from other cell types

  4. Receptors: Receptors in the focal cell type

  5. TFs: Transcription factors in the focal cell type

  6. Target genes: The seed genes of interest

Warning

Fewer cell types than expected?

The 6-layer cascade requires full connectivity across all layers. Only cell types where the chain Receptor→TF→Ligand is connected to your focal cell will appear. If you see fewer cell types than in plot_ligand_sankey, this is expected - it means some cell types lack the upstream regulatory paths (receptor→TF→ligand edges) in the prior knowledge network.

To debug, use verbose=True to see filtering details.

Tip

If layers become disconnected (e.g., before_tf_ligand has 0 rows), try increasing before_top_n or top_ligand_n to include more candidate edges.

recon.plot.sankey_paths.plot_intercell_sankey(
    multicell_obj=multicell,
    results=results,
    cell_type=focal_celltype,
    seeds=seed_genes,
    ligand_cells=ligand_source_cells,
    top_ligand_n=20,     # Number of top ligands
    top_receptor_n=50,    # Number of top receptors in focal cell
    top_tf_n=20,          # Number of top TFs in focal cell
    before_top_n=10,      # Number of top regulators in ligand-producing cells
    per_celltype=True,
    flow="upstream",
    verbose=True,         # Show filtering details
    save_path=None
)
[extract_receptor_ligand_pairs] === INPUT ===
  Input ligands: 100
    Examples: ['Ifng-NK_cell', 'Tnfsf12-NK_cell', 'Ifng-T_cell_CD4']
  Input receptors: 50
    Examples: ['Ifngr2_receptor::Macrophage', 'Cd40lg_receptor::Macrophage', 'Il18rap_receptor::Macrophage']

[extract_receptor_ligand_pairs] === RECEPTOR-LIGAND LAYER ===
  Shape: (2442, 7)
  Columns: ['ligand', 'receptor', 'receptor_clean', 'weight', 'celltype_source', 'celltype_target', 'network_key']
  Sample rows (first 3):
           ligand          receptor             receptor_clean    weight celltype_source celltype_target        network_key
Rps19::T_cell_CD8 C5ar1::Macrophage C5ar1_receptor::Macrophage 17.714996      T_cell_CD8      Macrophage cell_communication
        Copa::pDC  Cd74::Macrophage  Cd74_receptor::Macrophage 17.494999             pDC      Macrophage cell_communication
    Copa::NK_cell  Cd74::Macrophage  Cd74_receptor::Macrophage 17.480000         NK_cell      Macrophage cell_communication

[extract_receptor_ligand_pairs] === FILTERING RESULT ===
  Filtered pairs: 72 rows
  Sample filtered pairs (first 3):
          ligand             receptor   weight
        Grn::pDC Tnfrsf1b::Macrophage 3.165001
Cd28::T_cell_CD4     Cd80::Macrophage 0.360000
Cd28::T_cell_CD8     Cd80::Macrophage 0.325000

[extract_gene_tf_pairs] === INPUT ===
  Input genes: 5 - ['Nfkb1::Macrophage', 'Tnf::Macrophage', 'Il6::Macrophage']
  Input TFs: 20 - ['Sp100_TF::Macrophage', 'Cebpa_TF::Macrophage', 'Prdm11_TF::Macrophage']

[extract_gene_tf_pairs] === TF-GENE LAYER ===
  Shape: (500000, 5)
  Columns: ['Unnamed: 0', 'target', 'source', 'weight', 'network_key']
  Unique sources (TFs): 360
  Unique targets (genes): 10168
  Sample rows (first 3):
 Unnamed: 0           target               source   weight    network_key
          0 Pax5::Macrophage  Mbd1_TF::Macrophage 0.000095 Macrophage_grn
          2 Pax5::Macrophage Smad5_TF::Macrophage 0.000092 Macrophage_grn
          1 Pax5::Macrophage Smad1_TF::Macrophage 0.000092 Macrophage_grn

[extract_gene_tf_pairs] === FILTERING RESULT ===
  Filtered pairs: 47 rows
  Sample filtered pairs (first 3):
               source            target   weight
Prdm11_TF::Macrophage Nfkb1::Macrophage 0.000029
   Max_TF::Macrophage Nfkb1::Macrophage 0.000013
Zfp691_TF::Macrophage Nfkb1::Macrophage 0.000012

[extract_receptor_tf_pairs] === INPUT ===
  Input TFs: 20 - ['Sp100_TF::Macrophage', 'Cebpa_TF::Macrophage', 'Prdm11_TF::Macrophage']
  Input TFs (cleaned): ['Sp100::Macrophage', 'Cebpa::Macrophage', 'Prdm11::Macrophage']
  Input receptors: 50 - ['Ifngr2_receptor::Macrophage', 'Cd40lg_receptor::Macrophage', 'Il18rap_receptor::Macrophage']

[extract_receptor_tf_pairs] === BIPARTITE LAYER ===
  Shape: (434183, 4)
  Columns: ['col2', 'col1', 'weight', 'network_key']
  Unique col1 (genes/TFs): 9892
  Unique col2 (receptors): 705
  Sample rows (first 3):
                     col2               col1   weight                        network_key
A1bg_receptor::Macrophage  Abca1::Macrophage 0.005156 Macrophage_grn-Macrophage_receptor
A1bg_receptor::Macrophage Abcb1a::Macrophage 0.005877 Macrophage_grn-Macrophage_receptor
A1bg_receptor::Macrophage Abcb1b::Macrophage 0.005877 Macrophage_grn-Macrophage_receptor

[extract_receptor_tf_pairs] === FILTERING RESULT ===
  Filtered pairs: 381 rows
  Sample filtered pairs (first 3):
               col1                        col2   weight
 Arid5b::Macrophage Adam15_receptor::Macrophage 0.005325
Bhlhe40::Macrophage Adam15_receptor::Macrophage 0.009077
  Cebpa::Macrophage Adam15_receptor::Macrophage 0.005940

Saving Plots

All Sankey plot functions accept a save_path parameter to export the interactive HTML file:

# Save ligand cascade to HTML
recon.plot.sankey_paths.plot_ligand_sankey(
    multicell_obj=multicell,
    results=results,
    cell_type=focal_celltype,
    seeds=seed_genes,
    ligand_cells=ligand_source_cells,
    top_ligand_n=20,
    top_receptor_n=20,
    top_tf_n=15,
    per_celltype=True,
    flow="upstream",
    save_path="macrophage_ligand_cascade.html"
)

Key Parameters Reference

Parameter

Description

cell_type

Focal cell type receiving signals

seeds

Target genes of interest (list of gene names without celltype suffix)

ligand_cells

Cell types that can produce ligands

top_ligand_n

Number of top-scoring ligands to include

top_receptor_n

Number of top-scoring receptors to include

top_tf_n

Number of top-scoring TFs to include

before_top_n

Number of upstream regulators in ligand-producing cells (for 6-layer plot)

per_celltype

If True, select top ligands per source cell type (more balanced view)

flow

"upstream" (ligand→gene) or "downstream" (gene→ligand)

save_path

Path to save HTML file (None = display only)

verbose

If True, print debugging information about layer sizes

Interpreting Sankey Diagrams

  • Node layers (left to right in upstream flow): Ligands → Receptors → TFs → Genes

  • Link thickness: Proportional to edge weights in the network

  • Colors: Different colors distinguish between source cell types for ligands

  • Interactive: Hover over nodes and links to see detailed information