.. note:: To download the tutorial data, use the following commands:
All tutorial data:
from recon.data import fetch_all_tutorial_data
fetch_all_tutorial_data(data_dir='./data')
Specific file (e.g., RNA data):
from recon.data import fetch_tutorial_data
fetch_tutorial_data('perturbation_tuto/rna.h5ad', data_dir='./data')
Visualizing Molecular Cascades with Sankey Diagrams¶
This tutorial demonstrates how to use ReCoN’s Sankey diagram functions to visualize molecular cascades across cell types. These visualizations help understand how signals propagate from ligands through receptors, transcription factors (TFs), and target genes.
What you will learn:
How to create intracellular cascades (Receptor → TF → Gene)
How to visualize ligand-receptor signaling (Ligand → Receptor → TF → Gene)
How to display full intercellular cascades with upstream regulators
Setup¶
import numpy as np
import scanpy as sc # single cell data
import pandas as pd # data manipulation
import liana as li # cell communication
import recon # multilayer and perturbation prediction
import recon.data
import recon.explore
import recon.plot.sankey_paths
# Load example scRNA-seq data
rna = sc.read_h5ad("./data/perturbation_tuto/rna.h5ad")
# Check available cell types
rna.obs["celltype"].unique().tolist()
['B_cell',
'ILC',
'Macrophage',
'MigDC',
'Monocyte',
'NK_cell',
'Neutrophil',
'T_cell_CD4',
'T_cell_CD8',
'T_cell_gd',
'Treg',
'cDC1',
'cDC2',
'eTAC',
'pDC']
Create ReCoN’s Multilayer Network¶
1. Import Gene Regulatory Network (GRN)¶
grn_path = "./data/perturbation_tuto/grn.csv"
grn = pd.read_csv(grn_path)
grn = grn.sort_values(by="weight", ascending=False)[:500_000]
grn["source"] = grn["source"].str.capitalize()
grn["source"] = grn["source"] + '_TF'
grn["target"] = grn["target"].str.capitalize()
grn.head(3)
| Unnamed: 0 | target | source | weight | |
|---|---|---|---|---|
| 0 | 0 | Pax5 | Mbd1_TF | 0.000095 |
| 2 | 2 | Pax5 | Smad5_TF | 0.000092 |
| 1 | 1 | Pax5 | Smad1_TF | 0.000092 |
2. Compute Cell-Cell Communication with LIANA+¶
li.method.cellphonedb(
rna,
resource_name="mouseconsensus",
expr_prop=0.00,
use_raw=False,
groupby="celltype",
verbose=True,
key_added='cpdb_res'
)
Using resource `mouseconsensus`.
Using `.X`!
/pasteur/appa/homes/rtrimbou/miniconda3/envs/snakemake/envs/recon-grn/lib/python3.10/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
15364 features of mat are empty, they will be removed.
Make sure that normalized counts are passed!
/pasteur/appa/homes/rtrimbou/miniconda3/envs/snakemake/envs/recon-grn/lib/python3.10/site-packages/liana/method/_pipe_utils/_pre.py:146: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
/pasteur/appa/homes/rtrimbou/miniconda3/envs/snakemake/envs/recon-grn/lib/python3.10/site-packages/liana/method/_pipe_utils/_pre.py:149: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
0.36 of entities in the resource are missing from the data.
Generating ligand-receptor stats for 1296 samples and 937 features
100%|██████████| 1000/1000 [00:04<00:00, 230.63it/s]
# Format cell communication network for ReCoN
ccc_network = rna.uns["cpdb_res"].copy()
ccc_network = ccc_network[["ligand", "receptor", "lr_means", "source", "target"]]
ccc_network = ccc_network.rename(columns={
"lr_means": "weight",
"source": "celltype_source",
"target": "celltype_target",
"ligand": "source",
"receptor": "target"
})
ccc_network = ccc_network[ccc_network['weight'] != 0]
ccc_network.head(3)
| source | target | weight | celltype_source | celltype_target | |
|---|---|---|---|---|---|
| 406685 | App | Cd74 | 102.485008 | cDC2 | cDC1 |
| 405645 | Copa | Cd74 | 102.370003 | cDC1 | cDC1 |
| 410237 | Copa | Cd74 | 102.366211 | eTAC | cDC1 |
3. Load Receptor-Gene Links¶
receptor_genes = recon.data.load_data.load_receptor_genes("mouse_receptor_gene_from_NichenetPKN")
genes = np.unique(grn['source'].tolist() + grn['target'].tolist())
receptor_genes = receptor_genes[receptor_genes['target'].isin(genes)]
receptor_genes.head()
| source | target | weight | |
|---|---|---|---|
| 2 | A1bg | Abca1 | 0.005156 |
| 3 | A1bg | Abcb1a | 0.005877 |
| 4 | A1bg | Abcb1b | 0.005877 |
| 7 | A1bg | Acsl1 | 0.005915 |
| 8 | A1bg | Adk | 0.005092 |
4. Define Seeds and Parameters¶
# Define seed genes (example: inflammatory signaling genes)
seed_genes = ["Nfkb1", "Tnf", "Il6", "Ccl2", "Cxcl10"]
# Define focal cell type
focal_celltype = "Macrophage"
# Add celltype suffix for seeds
seeds_with_suffix = {f"{g}::{focal_celltype}": 1.0 for g in seed_genes}
# Network parameters
cell_communication_graph_directed = False
cell_communication_graph_weighted = True
restart_proba = 0.6
grn_graph_weighted = True
grn_graph_directed = False
Assemble the Multicellular Network¶
celltypes = ["B_cell", "pDC", "Macrophage", "NK_cell", "T_cell_CD4", "T_cell_CD8"]
multicell = recon.explore.Multicell(
celltypes={celltype: recon.explore.Celltype(
grn_graph=grn,
receptor_grn_bipartite=receptor_genes,
celltype_name=celltype,
receptor_graph_directed=False,
receptor_graph_weighted=False,
grn_graph_directed=grn_graph_directed,
grn_graph_weighted=grn_graph_weighted,
receptor_grn_bipartite_graph_directed=False,
receptor_grn_bipartite_graph_weighted=True,
seeds=[]
) for celltype in celltypes},
cell_communication_graph=ccc_network.iloc[
ccc_network["celltype_source"].isin(celltypes).values &
ccc_network["celltype_target"].isin(celltypes).values, :
],
cell_communication_graph_directed=cell_communication_graph_directed,
cell_communication_graph_weighted=cell_communication_graph_weighted,
bipartite_grn_cell_communication_directed=False,
bipartite_grn_cell_communication_weighted=False,
bipartite_cell_communication_receptor_directed=False,
bipartite_cell_communication_receptor_weighted=False,
seeds=seeds_with_suffix,
)
/pasteur/helix/projects/ml4ig_hot/Users/rtrimbou/ReCoN/src/recon/explore/recon.py:122: UserWarning:
No receptor_graph provided,
an empty receptor graph will be created.
/pasteur/helix/projects/ml4ig_hot/Users/rtrimbou/ReCoN/src/recon/explore/recon.py:387: UserWarning: The celltypes dictionary was converted toa list of Celltype objects.
The keys of the dictionary will be the celltype names.
Set Lambda for Upstream Exploration¶
multicell.lamb = recon.explore.set_lambda(
multicell,
direction="upstream",
strategy="intercell",
)
Run Random Walk with Restart¶
# Run multicellular exploration
results = multicell.explore(restart_proba=restart_proba)
# Format results as gene profiles per cell type
cell_type_profiles = recon.explore.format_multicell_results(results, celltypes=celltypes)
Seeds are provided as a dictionary with weights per seed.
Creating a multixrank object with seeds as a dictionary.
cell_communication
receptor
gene
receptor
gene
receptor
gene
receptor
gene
receptor
gene
receptor
gene
Identifying produced ligands in response to the perturbation.
/pasteur/helix/projects/ml4ig_hot/Users/rtrimbou/ReCoN/src/recon/explore/recon.py:588: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/pasteur/helix/projects/ml4ig_hot/Users/rtrimbou/ReCoN/src/recon/explore/recon.py:588: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
# View results
results.sort_values(by="score", ascending=False)
| multiplex | node | layer | score | |
|---|---|---|---|---|
| 5662 | Macrophage_grn | Nfkb1::Macrophage | gene | 0.200103 |
| 2043 | Macrophage_grn | Cxcl10::Macrophage | gene | 0.200019 |
| 9170 | Macrophage_grn | Tnf::Macrophage | gene | 0.200013 |
| 705 | Macrophage_receptor | fake_receptor::Macrophage | receptor | 0.049329 |
| 8302 | Macrophage_grn | Sp100_TF::Macrophage | gene | 0.008345 |
| ... | ... | ... | ... | ... |
| 1798 | cell_communication | Ptpra-T_cell_CD8 | cell_communication | 0.000000 |
| 1797 | cell_communication | Ptpra-T_cell_CD4 | cell_communication | 0.000000 |
| 1796 | cell_communication | Ptpra-NK_cell | cell_communication | 0.000000 |
| 617 | cell_communication | Clec1b-Macrophage | cell_communication | 0.000000 |
| 1531 | cell_communication | Mttp-B_cell | cell_communication | 0.000000 |
69791 rows × 4 columns
Visualize Molecular Cascades with Sankey Diagrams¶
ReCoN provides three types of Sankey diagrams to visualize molecular cascades:
Function |
Layers |
Use Case |
|---|---|---|
|
Receptor → TF → Gene |
Intracellular regulation only |
|
Ligand → Receptor → TF → Gene |
Cross-cell signaling (4 layers) |
|
Before-Receptor → Before-TF → Ligand → Receptor → TF → Gene |
Full cascade with upstream regulators (6 layers) |
1. Intracellular Cascade (Receptor → TF → Gene)¶
Shows regulation within a single cell type: how receptors connect to transcription factors that regulate the seed genes.
recon.plot.sankey_paths.plot_intracell_sankey(
multicell_obj=multicell,
results=results,
cell_type=focal_celltype,
seeds=seed_genes,
top_receptor_n=10, # Number of top receptors to include
top_tf_n=15, # Number of top TFs to include
flow="upstream", # Direction: receptor → TF → gene
save_path=None
)
2. Ligand-Receptor Cascade (Ligand → Receptor → TF → Gene)¶
Shows how ligands from other cell types activate receptors in the focal cell type, leading to TF activation and gene regulation.
# Define ligand source cell types (cells that send signals)
ligand_source_cells = [ct for ct in celltypes if ct != focal_celltype]
print(f"Ligand source cells: {ligand_source_cells}")
Ligand source cells: ['B_cell', 'pDC', 'NK_cell', 'T_cell_CD4', 'T_cell_CD8']
recon.plot.sankey_paths.plot_ligand_sankey(
multicell_obj=multicell,
results=results,
cell_type=focal_celltype,
seeds=seed_genes,
ligand_cells=ligand_source_cells,
top_ligand_n=20, # Number of top ligands to include
top_receptor_n=10, # Number of top receptors to include
top_tf_n=10, # Number of top TFs to include
per_celltype=True, # Select top ligands per source cell type (more balanced)
flow="upstream",
save_path=None
)
3. Full Intercellular Cascade (6 Layers)¶
Shows the complete regulatory cascade including:
Before-Receptors: Receptors in ligand-producing cells
Before-TFs: TFs that regulate ligand production
Ligands: Signaling molecules from other cell types
Receptors: Receptors in the focal cell type
TFs: Transcription factors in the focal cell type
Target genes: The seed genes of interest
Warning
Fewer cell types than expected?
The 6-layer cascade requires full connectivity across all layers. Only cell types where the chain Receptor→TF→Ligand is connected to your focal cell will appear. If you see fewer cell types than in plot_ligand_sankey, this is expected - it means some cell types lack the upstream regulatory paths (receptor→TF→ligand edges) in the prior knowledge network.
To debug, use verbose=True to see filtering details.
Tip
If layers become disconnected (e.g., before_tf_ligand has 0 rows), try increasing before_top_n or top_ligand_n to include more candidate edges.
recon.plot.sankey_paths.plot_intercell_sankey(
multicell_obj=multicell,
results=results,
cell_type=focal_celltype,
seeds=seed_genes,
ligand_cells=ligand_source_cells,
top_ligand_n=20, # Number of top ligands
top_receptor_n=50, # Number of top receptors in focal cell
top_tf_n=20, # Number of top TFs in focal cell
before_top_n=10, # Number of top regulators in ligand-producing cells
per_celltype=True,
flow="upstream",
verbose=True, # Show filtering details
save_path=None
)
[extract_receptor_ligand_pairs] === INPUT ===
Input ligands: 100
Examples: ['Ifng-NK_cell', 'Tnfsf12-NK_cell', 'Ifng-T_cell_CD4']
Input receptors: 50
Examples: ['Ifngr2_receptor::Macrophage', 'Cd40lg_receptor::Macrophage', 'Il18rap_receptor::Macrophage']
[extract_receptor_ligand_pairs] === RECEPTOR-LIGAND LAYER ===
Shape: (2442, 7)
Columns: ['ligand', 'receptor', 'receptor_clean', 'weight', 'celltype_source', 'celltype_target', 'network_key']
Sample rows (first 3):
ligand receptor receptor_clean weight celltype_source celltype_target network_key
Rps19::T_cell_CD8 C5ar1::Macrophage C5ar1_receptor::Macrophage 17.714996 T_cell_CD8 Macrophage cell_communication
Copa::pDC Cd74::Macrophage Cd74_receptor::Macrophage 17.494999 pDC Macrophage cell_communication
Copa::NK_cell Cd74::Macrophage Cd74_receptor::Macrophage 17.480000 NK_cell Macrophage cell_communication
[extract_receptor_ligand_pairs] === FILTERING RESULT ===
Filtered pairs: 72 rows
Sample filtered pairs (first 3):
ligand receptor weight
Grn::pDC Tnfrsf1b::Macrophage 3.165001
Cd28::T_cell_CD4 Cd80::Macrophage 0.360000
Cd28::T_cell_CD8 Cd80::Macrophage 0.325000
[extract_gene_tf_pairs] === INPUT ===
Input genes: 5 - ['Nfkb1::Macrophage', 'Tnf::Macrophage', 'Il6::Macrophage']
Input TFs: 20 - ['Sp100_TF::Macrophage', 'Cebpa_TF::Macrophage', 'Prdm11_TF::Macrophage']
[extract_gene_tf_pairs] === TF-GENE LAYER ===
Shape: (500000, 5)
Columns: ['Unnamed: 0', 'target', 'source', 'weight', 'network_key']
Unique sources (TFs): 360
Unique targets (genes): 10168
Sample rows (first 3):
Unnamed: 0 target source weight network_key
0 Pax5::Macrophage Mbd1_TF::Macrophage 0.000095 Macrophage_grn
2 Pax5::Macrophage Smad5_TF::Macrophage 0.000092 Macrophage_grn
1 Pax5::Macrophage Smad1_TF::Macrophage 0.000092 Macrophage_grn
[extract_gene_tf_pairs] === FILTERING RESULT ===
Filtered pairs: 47 rows
Sample filtered pairs (first 3):
source target weight
Prdm11_TF::Macrophage Nfkb1::Macrophage 0.000029
Max_TF::Macrophage Nfkb1::Macrophage 0.000013
Zfp691_TF::Macrophage Nfkb1::Macrophage 0.000012
[extract_receptor_tf_pairs] === INPUT ===
Input TFs: 20 - ['Sp100_TF::Macrophage', 'Cebpa_TF::Macrophage', 'Prdm11_TF::Macrophage']
Input TFs (cleaned): ['Sp100::Macrophage', 'Cebpa::Macrophage', 'Prdm11::Macrophage']
Input receptors: 50 - ['Ifngr2_receptor::Macrophage', 'Cd40lg_receptor::Macrophage', 'Il18rap_receptor::Macrophage']
[extract_receptor_tf_pairs] === BIPARTITE LAYER ===
Shape: (434183, 4)
Columns: ['col2', 'col1', 'weight', 'network_key']
Unique col1 (genes/TFs): 9892
Unique col2 (receptors): 705
Sample rows (first 3):
col2 col1 weight network_key
A1bg_receptor::Macrophage Abca1::Macrophage 0.005156 Macrophage_grn-Macrophage_receptor
A1bg_receptor::Macrophage Abcb1a::Macrophage 0.005877 Macrophage_grn-Macrophage_receptor
A1bg_receptor::Macrophage Abcb1b::Macrophage 0.005877 Macrophage_grn-Macrophage_receptor
[extract_receptor_tf_pairs] === FILTERING RESULT ===
Filtered pairs: 381 rows
Sample filtered pairs (first 3):
col1 col2 weight
Arid5b::Macrophage Adam15_receptor::Macrophage 0.005325
Bhlhe40::Macrophage Adam15_receptor::Macrophage 0.009077
Cebpa::Macrophage Adam15_receptor::Macrophage 0.005940
Saving Plots¶
All Sankey plot functions accept a save_path parameter to export the interactive HTML file:
# Save ligand cascade to HTML
recon.plot.sankey_paths.plot_ligand_sankey(
multicell_obj=multicell,
results=results,
cell_type=focal_celltype,
seeds=seed_genes,
ligand_cells=ligand_source_cells,
top_ligand_n=20,
top_receptor_n=20,
top_tf_n=15,
per_celltype=True,
flow="upstream",
save_path="macrophage_ligand_cascade.html"
)
Key Parameters Reference¶
Parameter |
Description |
|---|---|
|
Focal cell type receiving signals |
|
Target genes of interest (list of gene names without celltype suffix) |
|
Cell types that can produce ligands |
|
Number of top-scoring ligands to include |
|
Number of top-scoring receptors to include |
|
Number of top-scoring TFs to include |
|
Number of upstream regulators in ligand-producing cells (for 6-layer plot) |
|
If |
|
|
|
Path to save HTML file ( |
|
If |
Interpreting Sankey Diagrams¶
Node layers (left to right in upstream flow): Ligands → Receptors → TFs → Genes
Link thickness: Proportional to edge weights in the network
Colors: Different colors distinguish between source cell types for ligands
Interactive: Hover over nodes and links to see detailed information