{ "cells": [ { "cell_type": "markdown", "id": "4bee5c8e", "metadata": {}, "source": [ ".. note::\n", " To download the tutorial data, use the following commands:\n", "\n", " **All tutorial data:**\n", " ```python\n", " from recon.data import fetch_all_tutorial_data\n", " fetch_all_tutorial_data(data_dir='./data')\n", " ```\n", "\n", " **Specific file (e.g., RNA data):**\n", " ```python\n", " from recon.data import fetch_tutorial_data\n", " fetch_tutorial_data('perturbation_tuto/rna.h5ad', data_dir='./data')\n", " ```" ] }, { "cell_type": "markdown", "id": "1e98cdcd", "metadata": {}, "source": [ "# Inferring Gene Regulatory Networks with HuMMuS\n", "\n", "This tutorial demonstrates how to use ReCoN to infer **Gene Regulatory Networks (GRNs)** from paired single-cell RNA-seq and ATAC-seq data using the HuMMuS methodology.\n", "\n", "## What you will learn\n", "\n", "1. How to compute individual network layers (RNA, ATAC, TF)\n", "2. How to link layers via bipartite networks (TF→ATAC, ATAC→RNA)\n", "3. How to integrate everything into a multilayer GRN using Random Walk with Restart\n", "\n", "```{warning}\n", "**Python 3.10 required!**\n", "\n", "GRN inference requires a Python 3.10 environment due to CellOracle and gimmemotifs dependencies.\n", "See the [Installation guide](../recon_explained/get_ready.rst) for setup instructions.\n", "```" ] }, { "cell_type": "markdown", "id": "5dc95caa", "metadata": {}, "source": [ "---\n", "\n", "## Prerequisites\n", "\n", "```{note}\n", "**Required packages**\n", "\n", "- `bedtools` - for ATAC-seq peak processing (`conda install -c bioconda bedtools`)\n", "- `muon` - for multi-omics data handling (`pip install muon`)\n", "- `circe-py` - for ATAC co-accessibility networks (included with ReCoN)\n", "```\n", "\n", "### Check bedtools installation" ] }, { "cell_type": "code", "execution_count": 1, "id": "a470e4e2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/remitrimbour/ReCoN_project/ReCoN\n" ] } ], "source": [ "cd ../../../" ] }, { "cell_type": "code", "execution_count": 2, "id": "30efc2b5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/remitrimbour/miniconda3/envs/recon/bin/bedtools\n", "bedtools v2.31.1\n" ] } ], "source": [ "import os\n", "import shutil\n", "\n", "# Check if bedtools is available\n", "if shutil.which(\"bedtools\") is None:\n", " # Try to find bedtools in common conda locations\n", " conda_prefix = os.environ.get(\"CONDA_PREFIX\", \"\")\n", " possible_paths = [\n", " f\"{conda_prefix}/bin\",\n", " os.path.expanduser(\"~/miniconda3/bin\"),\n", " os.path.expanduser(\"~/anaconda3/bin\"),\n", " \"/usr/local/bin\",\n", " ]\n", " for path in possible_paths:\n", " bedtools_path = os.path.join(path, \"bedtools\")\n", " if os.path.exists(bedtools_path):\n", " os.environ[\"PATH\"] = path + \":\" + os.environ[\"PATH\"]\n", " print(f\"Added {path} to PATH\")\n", " break\n", " \n", "# Verify bedtools is now available\n", "!which bedtools && bedtools --version" ] }, { "cell_type": "markdown", "id": "ff47b089", "metadata": {}, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": 3, "id": "8c22f5ea", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n", "Matplotlib is building the font cache; this may take a moment.\n", "Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)\n", "2026-01-14 14:15:45,106 - INFO - Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)\n", "To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle\n", "2026-01-14 14:15:45,109 - INFO - To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle\n" ] } ], "source": [ "\n", "import hummuspy.loader # HuMMuS multilayer network utilities\n", "import circe as ci # ATAC-seq co-accessibility networks\n", "import recon # ReCoN GRN inference\n", "import recon.infer_grn # ReCoN GRN inference" ] }, { "cell_type": "code", "execution_count": 4, "id": "47ea9f71", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/scanpy/__init__.py:96: FutureWarning: `__version__` is deprecated, use `importlib.metadata.version('scanpy')` instead\n", " warnings.warn(msg, FutureWarning, stacklevel=2)\n" ] } ], "source": [ "import muon as mu # Multi-omics data handling\n", "import scanpy as sc # Single-cell analysis" ] }, { "cell_type": "markdown", "id": "2da84c94", "metadata": {}, "source": [ "---\n", "\n", "## Load Multi-omics Data\n", "\n", "We use paired scRNA-seq and scATAC-seq data stored in a MuData object.\n", "\n", "```{tip}\n", "The example dataset (PBMC 10X) can be downloaded from the [CIRCE benchmark repository](https://github.com/cantinilab/circe_benchmark).\n", "```" ] }, { "cell_type": "code", "execution_count": 5, "id": "dddd89ba", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RNA: (9631, 18410), ATAC: (9631, 215676)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.\n", " warnings.warn(\n", "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.\n", " warnings.warn(\n" ] } ], "source": [ "# Load paired RNA + ATAC data\n", "mudata = mu.read_h5mu(\"./data/build_grn_tuto/pbmc10x.h5mu\")\n", "print(f\"RNA: {mudata.mod['rna'].shape}, ATAC: {mudata.mod['atac'].shape}\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "53529504", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Subset - RNA: (5000, 1000), ATAC: (5000, 5000)\n" ] } ], "source": [ "# Subset for faster computation (remove for full analysis)\n", "rna = mudata.mod['rna'][:5000, :1000] # First 1000 genes\n", "atac = mudata.mod['atac'][:5000, :5000] # First 5000 peaks\n", "print(f\"Subset - RNA: {rna.shape}, ATAC: {atac.shape}\")" ] }, { "cell_type": "markdown", "id": "172ee80b", "metadata": {}, "source": [ "---\n", "\n", "## Build the Multilayer GRN\n", "\n", "The HuMMuS methodology builds a GRN by integrating multiple layers:\n", "\n", "| Layer | Description | Method |\n", "|-------|-------------|--------|\n", "| **RNA layer** | Gene co-regulation patterns (which genes vary together) | GRNBoost2 (Arboreto) |\n", "| **ATAC layer** | Peak co-accessibility (which regulatory regions are co-active) | CIRCE |\n", "| **TF layer** | TF-TF co-expression | Correlation |\n", "| **TF → ATAC** | Where TFs bind DNA (motif-based) | CellOracle motif scanning |\n", "| **ATAC → RNA** | Which peaks regulate which genes | CIRCE peak-gene links |\n", "\n", "```{note}\n", "**Why multiple layers?**\n", "\n", "GRNBoost2 alone infers statistical associations between TFs and genes, but these can include indirect relationships. By integrating chromatin accessibility (ATAC), we add mechanistic evidence: a TF can only regulate a gene if (1) it binds nearby accessible DNA and (2) that region is linked to the gene. The multilayer RWR combines both direct (TF→gene) and indirect (TF→ATAC→gene) evidence for more robust GRN inference.\n", "```\n", "\n", "### Step 1: Load TF list" ] }, { "cell_type": "code", "execution_count": 7, "id": "d096d99a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loaded 914 transcription factors\n" ] } ], "source": [ "# Load curated list of human transcription factors\n", "tfs_list = hummuspy.loader.load_tfs(\"human_tfs_r_hummus\")\n", "print(f\"Loaded {len(tfs_list)} transcription factors\")" ] }, { "cell_type": "markdown", "id": "624550f0", "metadata": {}, "source": [ "### Step 2: Compute RNA network (gene co-regulation)\n", "\n", "Uses GRNBoost2 to identify genes that co-vary with each TF. This captures statistical associations that may include both direct targets and genes in the same regulatory programs.\n", "\n", "```{warning}\n", "GRNBoost2 edges are used to represent **co-regulated** genes, not final TF-gene regulation. The multilayer integration in the final step will leverage these by requiring chromatin accessibility evidence.\n", "```" ] }, { "cell_type": "code", "execution_count": 8, "id": "e8f63427", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Calculating TF-to-gene importance\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Running using 15 cores: 100%|██████████| 1000/1000 [00:25<00:00, 39.75it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "RNA network: 25881 edges\n" ] } ], "source": [ "# Compute TF → gene network using GRNBoost2\n", "rna_network = recon.infer_grn.compute_rna_network(rna, tf_names=tfs_list, n_cpu=15)\n", "print(f\"RNA network: {len(rna_network)} edges\")" ] }, { "cell_type": "markdown", "id": "ac25be81", "metadata": {}, "source": [ "### Step 3: Compute ATAC network (peak co-accessibility)\n", "\n", "Uses CIRCE to compute co-accessibility between ATAC-seq peaks." ] }, { "cell_type": "code", "execution_count": 9, "id": "11fab776", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: \n", " No organism, nor value passed for the parameters: ['window_size', 'distance_constraint', 's'],\n", " using default values.\n", " The default values are defined from human and mouse data,\n", " you might want to change them if you are working with\n", " another organisms.\n", "\n", " Default values used:\n", " {'window_size': 500000, 'distance_constraint': 250000, 's': 0.75}\n", "\n", " You can check how to define them in https://cole-trapnell-lab.github.io/cicero-release/docs_m3/#important-considerations-for-non-human-data.\n", " \n", " _warn(*args, **kwargs)\n" ] }, { "data": { "text/html": [ "
/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: \n",
       "install \"ipywidgets\" for Jupyter support\n",
       "  _warn(*args, **kwargs)\n",
       "
\n" ], "text/plain": [ "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: \n", "install \"ipywidgets\" for Jupyter support\n", " _warn(*args, **kwargs)\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Calculating co-accessibility scores...\n",
      "Concatenating results from all chromosomes...\n",
      "ATAC network: 172271 edges\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/anndata/_core/aligned_mapping.py:156: ImplicitModificationWarning: Setting element `.varp['atac_network']` of view, initializing view as actual.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# Prepare ATAC peak names (replace special characters)\n",
    "atac.var_names = atac.var_names.str.replace(\"-\", \"_\")\n",
    "atac.var_names = atac.var_names.str.replace(\":\", \"_\")\n",
    "\n",
    "# Add genomic region information\n",
    "atac = ci.add_region_infos(atac)\n",
    "\n",
    "# Compute peak co-accessibility network\n",
    "ci.compute_atac_network(atac, njobs=20)\n",
    "\n",
    "# Extract and format the network\n",
    "atac_network = ci.extract_atac_links(atac, key='atac_network')\n",
    "atac_network = atac_network.rename(columns={\n",
    "    \"Peak1\": \"source\",\n",
    "    \"Peak2\": \"target\",\n",
    "    \"score\": \"weight\"\n",
    "})\n",
    "print(f\"ATAC network: {len(atac_network)} edges\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "029902a8",
   "metadata": {},
   "source": [
    "### Step 4: Compute TF network (TF-TF interactions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "4358ea03",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TF network: 29 edges\n"
     ]
    }
   ],
   "source": [
    "# Compute TF-TF correlation network\n",
    "tf_network = recon.infer_grn.compute_tf_network(rna, tfs_list=tfs_list)\n",
    "print(f\"TF network: {len(tf_network)} edges\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "db25db56",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourcetarget
0TFAP2E_TFfake_TF
1HES2_TFfake_TF
2ID3_TFfake_TF
\n", "
" ], "text/plain": [ " source target\n", "0 TFAP2E_TF fake_TF\n", "1 HES2_TF fake_TF\n", "2 ID3_TF fake_TF" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tf_network.head(3)" ] }, { "cell_type": "markdown", "id": "f1aaa195", "metadata": {}, "source": [ "### Step 5: Compute bipartite links\n", "\n", "Now we connect the layers:\n", "- **ATAC → RNA**: Which peaks regulate which genes (based on proximity and correlation)\n", "- **TF → ATAC**: Where TFs bind DNA (based on motif scanning)\n", "\n", "```{note}\n", "These steps require a **reference genome**. Make sure to install it first:\n", "```python\n", "import genomepy\n", "genomepy.install_genome(\"hg19\", \"UCSC\", annotation=True)\n", "```\n", "```" ] }, { "cell_type": "code", "execution_count": 12, "id": "4c9c9216", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "que bed peaks: 5000\n", "tss peaks in que: 180\n", "ATAC→RNA links: 130 edges\n" ] } ], "source": [ "# ATAC → RNA: Peak-gene links based on proximity and correlation\n", "atac_rna_links = recon.infer_grn.compute_atac_to_rna_links(atac, rna, ref_genome=\"hg19\")\n", "print(f\"ATAC→RNA links: {len(atac_rna_links)} edges\")" ] }, { "cell_type": "code", "execution_count": 13, "id": "145fda2f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Peaks before filtering: 5000\n", "Peaks with invalid chr_name: 0\n", "Peaks with invalid length: 0\n", "Peaks after filtering: 5000\n", "No motif data entered. Loading default motifs for your species ...\n", " Default motif for vertebrate: gimme.vertebrate.v5.0. \n", " For more information, please see https://gimmemotifs.readthedocs.io/en/master/overview.html \n", "\n", "Initiating scanner... \n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2026-01-14 14:17:58,331 - DEBUG - using background: genome mm10 with size 200\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Calculating FPR-based threshold. This step may take substantial time when you load a new ref-genome. It will be done quicker on the second time. \n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2026-01-14 14:18:00,535 - DEBUG - determining FPR-based threshold\n", "Matplotlib is building the font cache; this may take a moment.\n", "Matplotlib is building the font cache; this may take a moment.\n", "Matplotlib is building the font cache; this may take a moment.\n", "Matplotlib is building the font cache; this may take a moment.\n", "Matplotlib is building the font cache; this may take a moment.\n", "Matplotlib is building the font cache; this may take a moment.\n", "Matplotlib is building the font cache; this may take a moment.\n", "Matplotlib is building the font cache; this may take a moment.\n", "Matplotlib is building the font cache; this may take a moment.\n", "Matplotlib is building the font cache; this may take a moment.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Motif scan started .. It may take long time.\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Scanning: 0%| | 0/5000 [00:00 100930\n", "1. Converting scanned results into one-hot encoded dataframe.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 4548/4548 [00:04<00:00, 1093.61it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "2. Converting results into dictionaries.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 1/1 [00:00<00:00, 26.58it/s]\n", "100%|██████████| 1093/1093 [00:00<00:00, 20999.38it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "TF→ATAC links: 2253638 edges\n" ] } ], "source": [ "# TF → ATAC: Motif scanning to find TF binding sites\n", "tf_atac_links = recon.infer_grn.compute_tf_to_atac_links(atac, ref_genome=\"mm10\", n_cpus=30)\n", "print(f\"TF→ATAC links: {len(tf_atac_links)} edges\")" ] }, { "cell_type": "code", "execution_count": 14, "id": "b2b63e5a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Filtered TF→ATAC links: 1053944 edges\n" ] } ], "source": [ "# Filter to keep only links from known TFs\n", "tf_atac_links = tf_atac_links[tf_atac_links['source'].str.replace(\"_TF\", \"\").isin(tfs_list)]\n", "print(f\"Filtered TF→ATAC links: {len(tf_atac_links)} edges\")" ] }, { "cell_type": "code", "execution_count": 15, "id": "bd2ab3eb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourcetarget
5GATA3_TFchr1_10016349_10016849
6GATA2_TFchr1_10016349_10016849
7RORC_TFchr1_10016349_10016849
\n", "
" ], "text/plain": [ " source target\n", "5 GATA3_TF chr1_10016349_10016849\n", "6 GATA2_TF chr1_10016349_10016849\n", "7 RORC_TF chr1_10016349_10016849" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tf_atac_links.head(3)" ] }, { "cell_type": "markdown", "id": "69f5cf77", "metadata": {}, "source": [ "---\n", "\n", "## Integrate into Final GRN\n", "\n", "Now we combine all layers using Random Walk with Restart (RWR) to propagate information across the multilayer network.\n", "\n", "```{tip}\n", "The RWR integration weighs indirect evidence (TF → ATAC → RNA) alongside direct evidence (TF → RNA), improving GRN quality compared to single-layer approaches.\n", "```" ] }, { "cell_type": "code", "execution_count": 18, "id": "dcc77be3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TF_0\n", "ATAC_0\n", "RNA_0\n", "Initializing Dask cluster with 1 workers...\n", "http://192.168.1.150:8787/status\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Running RWR per seed: 100%|██████████| 29/29 [00:01<00:00, 20.07it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Building Dask DataFrame graph with delayed parallel tasks...\n", "Final GRN: 2929 TF → gene edges\n" ] } ], "source": [ "# Generate integrated GRN via RWR across all layers\n", "grn = recon.infer_grn.generate_grn(\n", " rna_network=rna_network,\n", " atac_network=atac_network,\n", " tf_network=tf_network,\n", " atac_to_rna_links=atac_rna_links,\n", " tf_to_atac_links=tf_atac_links,\n", " n_jobs=1\n", ")\n", "\n", "# Keep only RNA layer (TF → gene relationships)\n", "grn = grn[grn.layer == 'RNA']\n", "print(f\"Final GRN: {len(grn)} TF → gene edges\")" ] }, { "cell_type": "code", "execution_count": 19, "id": "ae23cde1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexlayertargetpath_layerscoreseed
10755618RNAPADI6RNA_00.000109GFI1_TF
11123156RNACLCN6RNA_00.000098HIVEP3_TF
11181809RNASNRNP40RNA_00.000097HIVEP3_TF
11303759RNASDHBRNA_00.000096HIVEP3_TF
11360643RNAPIK3CD-AS1RNA_00.000096NFIA_TF
11813242RNAEFHD2RNA_00.000091JUN_TF
11825913RNATSSK3RNA_00.000091JUN_TF
11864100RNACA6RNA_00.000091JUN_TF
11877538RNAMIR34AHGRNA_00.000090JUN_TF
12039747RNARUNX3RNA_00.000079ZNF683_TF
\n", "
" ], "text/plain": [ " index layer target path_layer score seed\n", "10755 618 RNA PADI6 RNA_0 0.000109 GFI1_TF\n", "11123 156 RNA CLCN6 RNA_0 0.000098 HIVEP3_TF\n", "11181 809 RNA SNRNP40 RNA_0 0.000097 HIVEP3_TF\n", "11303 759 RNA SDHB RNA_0 0.000096 HIVEP3_TF\n", "11360 643 RNA PIK3CD-AS1 RNA_0 0.000096 NFIA_TF\n", "11813 242 RNA EFHD2 RNA_0 0.000091 JUN_TF\n", "11825 913 RNA TSSK3 RNA_0 0.000091 JUN_TF\n", "11864 100 RNA CA6 RNA_0 0.000091 JUN_TF\n", "11877 538 RNA MIR34AHG RNA_0 0.000090 JUN_TF\n", "12039 747 RNA RUNX3 RNA_0 0.000079 ZNF683_TF" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Preview the final GRN\n", "grn.head(10)" ] }, { "cell_type": "markdown", "id": "7bfd2a15", "metadata": {}, "source": [ "---\n", "\n", "## Save the GRN\n", "\n", "Save the inferred GRN for use in other ReCoN tutorials (e.g., Tutorial 2: Multicellular Coordination)." ] }, { "cell_type": "code", "execution_count": null, "id": "8bfd6948", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GRN saved to recon_hummus_grn.csv\n" ] } ], "source": [ "# Save to CSV\n", "grn.to_csv(\"recon_hummus_grn.csv\", index=False)\n", "print(\"GRN saved to recon_hummus_grn.csv\")" ] }, { "cell_type": "markdown", "id": "f758af52", "metadata": {}, "source": [ "```{tip}\n", "**Next steps**\n", "\n", "Use this GRN in:\n", "- [Tutorial 2: Multicellular Coordination](2.recon_multicellular_coordination.ipynb) - Explore upstream regulators\n", "- [Tutorial 3: Molecular Cascades](3.recon_molecular_cascades.ipynb) - Visualize signaling pathways\n", "```\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "43f19203", "metadata": {}, "source": [ "This work uses **ReCoN**’s wrapper of different packages, which are all cited in the Methods section:\n", "\n", "- The **RNA layer** uses [**arboreto¹**](#arboreto) \n", "- The **ATAC layer** uses [**CIRCE²**](#circe) \n", "- The **TF-ATAC** and **ATAC-RNA bipartites** use [**CellOracle³**](#celloracle) \n", "- The exploration of the **multilayer** uses [**HuMMuS⁴**](#hummus) and [**MultiXrank⁵**](#multixrank) \n", "\n", "---\n", "\n", "### References\n", "\n", "[1] Moerman, T., Aibar, S., Bravo González-Blas, C., Simm, J., Moreau, Y., Aerts, J., & Aerts, S. (2019). \n", "**GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks.** \n", "*Bioinformatics*, 35(12), 2159–2161. doi:[10.1093/bioinformatics/bty916](https://doi.org/10.1093/bioinformatics/bty916)\n", "\n", "\n", "[2] Trimbour, R., Saez Rodriguez J., Cantini L. (2025).\n", "**CIRCE: a scalable Python package to predict cis-regulatory DNA interactions from single-cell chromatin accessibility data.** doi:[https://doi.org/10.1101/2025.09.23.678054](https://doi.org/10.1101/2025.09.23.678054)\n", "\n", "\n", "[3] Kamimoto, K., Stringa, B., Hoffmann, C. M., Jindal, K., Solnica-Krezel, L., Morris, S. A., et al. (2023). \n", "**Dissecting cell identity via network inference and in silico gene perturbation.** \n", "*Nature*, 614, 742–751. doi:[10.1038/s41586-022-05688-9](https://doi.org/10.1038/s41586-022-05688-9)\n", "\n", "\n", "[4] Trimbour, R., Deutschmann I. M., Cantini L. (2024). \n", "**Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS.** \n", "*Bioinformatics*, 40(5), btae143. doi:[10.1093/bioinformatics/btae143](https://doi.org/10.1093/bioinformatics/btae143)\n", "\n", "\n", "[5] Baptista, A., González, A., & Baudot, A. (2022). \n", "**Universal multilayer network exploration by random walk with restart.** \n", "*Communications Physics*, 5, 170. doi:[10.1038/s42005-022-00937-9](https://doi.org/10.1038/s42005-022-00937-9)\n" ] }, { "cell_type": "markdown", "id": "984c1129", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "recon", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.19" } }, "nbformat": 4, "nbformat_minor": 5 }