{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4bee5c8e",
   "metadata": {},
   "source": [
    ".. note::\n",
    "   To download the tutorial data, use the following commands:\n",
    "\n",
    "   **All tutorial data:**\n",
    "   ```python\n",
    "   from recon.data import fetch_all_tutorial_data\n",
    "   fetch_all_tutorial_data(data_dir='./data')\n",
    "   ```\n",
    "\n",
    "   **Specific file (e.g., RNA data):**\n",
    "   ```python\n",
    "   from recon.data import fetch_tutorial_data\n",
    "   fetch_tutorial_data('perturbation_tuto/rna.h5ad', data_dir='./data')\n",
    "   ```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e98cdcd",
   "metadata": {},
   "source": [
    "# Inferring Gene Regulatory Networks with HuMMuS\n",
    "\n",
    "This tutorial demonstrates how to use ReCoN to infer **Gene Regulatory Networks (GRNs)** from paired single-cell RNA-seq and ATAC-seq data using the HuMMuS methodology.\n",
    "\n",
    "## What you will learn\n",
    "\n",
    "1. How to compute individual network layers (RNA, ATAC, TF)\n",
    "2. How to link layers via bipartite networks (TF→ATAC, ATAC→RNA)\n",
    "3. How to integrate everything into a multilayer GRN using Random Walk with Restart\n",
    "\n",
    "```{warning}\n",
    "**Python 3.10 required!**\n",
    "\n",
    "GRN inference requires a Python 3.10 environment due to CellOracle and gimmemotifs dependencies.\n",
    "See the [Installation guide](../recon_explained/get_ready.rst) for setup instructions.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5dc95caa",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "```{note}\n",
    "**Required packages**\n",
    "\n",
    "- `bedtools` - for ATAC-seq peak processing (`conda install -c bioconda bedtools`)\n",
    "- `muon` - for multi-omics data handling (`pip install muon`)\n",
    "- `circe-py` - for ATAC co-accessibility networks (included with ReCoN)\n",
    "```\n",
    "\n",
    "### Check bedtools installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a470e4e2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/Users/remitrimbour/ReCoN_project/ReCoN\n"
     ]
    }
   ],
   "source": [
    "cd ../../../"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "30efc2b5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/Users/remitrimbour/miniconda3/envs/recon/bin/bedtools\n",
      "bedtools v2.31.1\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import shutil\n",
    "\n",
    "# Check if bedtools is available\n",
    "if shutil.which(\"bedtools\") is None:\n",
    "    # Try to find bedtools in common conda locations\n",
    "    conda_prefix = os.environ.get(\"CONDA_PREFIX\", \"\")\n",
    "    possible_paths = [\n",
    "        f\"{conda_prefix}/bin\",\n",
    "        os.path.expanduser(\"~/miniconda3/bin\"),\n",
    "        os.path.expanduser(\"~/anaconda3/bin\"),\n",
    "        \"/usr/local/bin\",\n",
    "    ]\n",
    "    for path in possible_paths:\n",
    "        bedtools_path = os.path.join(path, \"bedtools\")\n",
    "        if os.path.exists(bedtools_path):\n",
    "            os.environ[\"PATH\"] = path + \":\" + os.environ[\"PATH\"]\n",
    "            print(f\"Added {path} to PATH\")\n",
    "            break\n",
    "    \n",
    "# Verify bedtools is now available\n",
    "!which bedtools && bedtools --version"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff47b089",
   "metadata": {},
   "source": [
    "### Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8c22f5ea",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)\n",
      "2026-01-14 14:15:45,106 - INFO - Using celloracle-lite v0.22.0 (lightweight fork for ReCoN/HuMMuS)\n",
      "To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle\n",
      "2026-01-14 14:15:45,109 - INFO - To install the original CellOracle, please check: https://github.com/morris-lab/CellOracle\n"
     ]
    }
   ],
   "source": [
    "\n",
    "import hummuspy.loader  # HuMMuS multilayer network utilities\n",
    "import circe as ci       # ATAC-seq co-accessibility networks\n",
    "import recon             # ReCoN GRN inference\n",
    "import recon.infer_grn   # ReCoN GRN inference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "47ea9f71",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/scanpy/__init__.py:96: FutureWarning: `__version__` is deprecated, use `importlib.metadata.version('scanpy')` instead\n",
      "  warnings.warn(msg, FutureWarning, stacklevel=2)\n"
     ]
    }
   ],
   "source": [
    "import muon as mu   # Multi-omics data handling\n",
    "import scanpy as sc # Single-cell analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2da84c94",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Load Multi-omics Data\n",
    "\n",
    "We use paired scRNA-seq and scATAC-seq data stored in a MuData object.\n",
    "\n",
    "```{tip}\n",
    "The example dataset (PBMC 10X) can be downloaded from the [CIRCE benchmark repository](https://github.com/cantinilab/circe_benchmark).\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "dddd89ba",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RNA: (9631, 18410), ATAC: (9631, 215676)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.\n",
      "  warnings.warn(\n",
      "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/mudata/_core/mudata.py:591: FutureWarning: From 0.4 .update() will not pull obs/var columns from individual modalities by default anymore. Set mudata.set_options(pull_on_update=False) to adopt the new behaviour, which will become the default. Use new pull_obs/pull_var and push_obs/push_var methods for more flexibility.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# Load paired RNA + ATAC data\n",
    "mudata = mu.read_h5mu(\"./data/build_grn_tuto/pbmc10x.h5mu\")\n",
    "print(f\"RNA: {mudata.mod['rna'].shape}, ATAC: {mudata.mod['atac'].shape}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "53529504",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Subset - RNA: (5000, 1000), ATAC: (5000, 5000)\n"
     ]
    }
   ],
   "source": [
    "# Subset for faster computation (remove for full analysis)\n",
    "rna = mudata.mod['rna'][:5000, :1000]   # First 1000 genes\n",
    "atac = mudata.mod['atac'][:5000, :5000] # First 5000 peaks\n",
    "print(f\"Subset - RNA: {rna.shape}, ATAC: {atac.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "172ee80b",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Build the Multilayer GRN\n",
    "\n",
    "The HuMMuS methodology builds a GRN by integrating multiple layers:\n",
    "\n",
    "| Layer | Description | Method |\n",
    "|-------|-------------|--------|\n",
    "| **RNA layer** | Gene co-regulation patterns (which genes vary together) | GRNBoost2 (Arboreto) |\n",
    "| **ATAC layer** | Peak co-accessibility (which regulatory regions are co-active) | CIRCE |\n",
    "| **TF layer** | TF-TF co-expression | Correlation |\n",
    "| **TF → ATAC** | Where TFs bind DNA (motif-based) | CellOracle motif scanning |\n",
    "| **ATAC → RNA** | Which peaks regulate which genes | CIRCE peak-gene links |\n",
    "\n",
    "```{note}\n",
    "**Why multiple layers?**\n",
    "\n",
    "GRNBoost2 alone infers statistical associations between TFs and genes, but these can include indirect relationships. By integrating chromatin accessibility (ATAC), we add mechanistic evidence: a TF can only regulate a gene if (1) it binds nearby accessible DNA and (2) that region is linked to the gene. The multilayer RWR combines both direct (TF→gene) and indirect (TF→ATAC→gene) evidence for more robust GRN inference.\n",
    "```\n",
    "\n",
    "### Step 1: Load TF list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "d096d99a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded 914 transcription factors\n"
     ]
    }
   ],
   "source": [
    "# Load curated list of human transcription factors\n",
    "tfs_list = hummuspy.loader.load_tfs(\"human_tfs_r_hummus\")\n",
    "print(f\"Loaded {len(tfs_list)} transcription factors\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "624550f0",
   "metadata": {},
   "source": [
    "### Step 2: Compute RNA network (gene co-regulation)\n",
    "\n",
    "Uses GRNBoost2 to identify genes that co-vary with each TF. This captures statistical associations that may include both direct targets and genes in the same regulatory programs.\n",
    "\n",
    "```{warning}\n",
    "GRNBoost2 edges are used to represent **co-regulated** genes, not final TF-gene regulation. The multilayer integration in the final step will leverage these by requiring chromatin accessibility evidence.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "e8f63427",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Calculating TF-to-gene importance\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Running using 15 cores: 100%|██████████| 1000/1000 [00:25<00:00, 39.75it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RNA network: 25881 edges\n"
     ]
    }
   ],
   "source": [
    "# Compute TF → gene network using GRNBoost2\n",
    "rna_network = recon.infer_grn.compute_rna_network(rna, tf_names=tfs_list, n_cpu=15)\n",
    "print(f\"RNA network: {len(rna_network)} edges\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac25be81",
   "metadata": {},
   "source": [
    "### Step 3: Compute ATAC network (peak co-accessibility)\n",
    "\n",
    "Uses CIRCE to compute co-accessibility between ATAC-seq peaks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "11fab776",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: \n",
      "                No organism, nor value passed for the parameters: ['window_size', 'distance_constraint', 's'],\n",
      "                using default values.\n",
      "                The default values are defined from human and mouse data,\n",
      "                you might want to change them if you are working with\n",
      "                another organisms.\n",
      "\n",
      "                Default values used:\n",
      "                {'window_size': 500000, 'distance_constraint': 250000, 's': 0.75}\n",
      "\n",
      "                You can check how to define them in https://cole-trapnell-lab.github.io/cicero-release/docs_m3/#important-considerations-for-non-human-data.\n",
      "                \n",
      "  _warn(*args, **kwargs)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: \n",
       "install \"ipywidgets\" for Jupyter support\n",
       "  _warn(*args, **kwargs)\n",
       "</pre>\n"
      ],
      "text/plain": [
       "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/gimmemotifs/__init__.py:30: UserWarning: \n",
       "install \"ipywidgets\" for Jupyter support\n",
       "  _warn(*args, **kwargs)\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Calculating co-accessibility scores...\n",
      "Concatenating results from all chromosomes...\n",
      "ATAC network: 172271 edges\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/remitrimbour/miniconda3/envs/recon/lib/python3.10/site-packages/anndata/_core/aligned_mapping.py:156: ImplicitModificationWarning: Setting element `.varp['atac_network']` of view, initializing view as actual.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# Prepare ATAC peak names (replace special characters)\n",
    "atac.var_names = atac.var_names.str.replace(\"-\", \"_\")\n",
    "atac.var_names = atac.var_names.str.replace(\":\", \"_\")\n",
    "\n",
    "# Add genomic region information\n",
    "atac = ci.add_region_infos(atac)\n",
    "\n",
    "# Compute peak co-accessibility network\n",
    "ci.compute_atac_network(atac, njobs=20)\n",
    "\n",
    "# Extract and format the network\n",
    "atac_network = ci.extract_atac_links(atac, key='atac_network')\n",
    "atac_network = atac_network.rename(columns={\n",
    "    \"Peak1\": \"source\",\n",
    "    \"Peak2\": \"target\",\n",
    "    \"score\": \"weight\"\n",
    "})\n",
    "print(f\"ATAC network: {len(atac_network)} edges\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "029902a8",
   "metadata": {},
   "source": [
    "### Step 4: Compute TF network (TF-TF interactions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "4358ea03",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TF network: 29 edges\n"
     ]
    }
   ],
   "source": [
    "# Compute TF-TF correlation network\n",
    "tf_network = recon.infer_grn.compute_tf_network(rna, tfs_list=tfs_list)\n",
    "print(f\"TF network: {len(tf_network)} edges\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "db25db56",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>source</th>\n",
       "      <th>target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TFAP2E_TF</td>\n",
       "      <td>fake_TF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>HES2_TF</td>\n",
       "      <td>fake_TF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ID3_TF</td>\n",
       "      <td>fake_TF</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      source   target\n",
       "0  TFAP2E_TF  fake_TF\n",
       "1    HES2_TF  fake_TF\n",
       "2     ID3_TF  fake_TF"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf_network.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1aaa195",
   "metadata": {},
   "source": [
    "### Step 5: Compute bipartite links\n",
    "\n",
    "Now we connect the layers:\n",
    "- **ATAC → RNA**: Which peaks regulate which genes (based on proximity and correlation)\n",
    "- **TF → ATAC**: Where TFs bind DNA (based on motif scanning)\n",
    "\n",
    "```{note}\n",
    "These steps require a **reference genome**. Make sure to install it first:\n",
    "```python\n",
    "import genomepy\n",
    "genomepy.install_genome(\"hg19\", \"UCSC\", annotation=True)\n",
    "```\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "4c9c9216",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "que bed peaks: 5000\n",
      "tss peaks in que: 180\n",
      "ATAC→RNA links: 130 edges\n"
     ]
    }
   ],
   "source": [
    "# ATAC → RNA: Peak-gene links based on proximity and correlation\n",
    "atac_rna_links = recon.infer_grn.compute_atac_to_rna_links(atac, rna, ref_genome=\"hg19\")\n",
    "print(f\"ATAC→RNA links: {len(atac_rna_links)} edges\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "145fda2f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Peaks before filtering:  5000\n",
      "Peaks with invalid chr_name:  0\n",
      "Peaks with invalid length:  0\n",
      "Peaks after filtering:  5000\n",
      "No motif data entered. Loading default motifs for your species ...\n",
      " Default motif for vertebrate: gimme.vertebrate.v5.0. \n",
      " For more information, please see https://gimmemotifs.readthedocs.io/en/master/overview.html \n",
      "\n",
      "Initiating scanner... \n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2026-01-14 14:17:58,331 - DEBUG - using background: genome mm10 with size 200\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Calculating FPR-based threshold. This step may take substantial time when you load a new ref-genome. It will be done quicker on the second time. \n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2026-01-14 14:18:00,535 - DEBUG - determining FPR-based threshold\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Motif scan started .. It may take long time.\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Scanning:   0%|          | 0/5000 [00:00<?, ? sequences/s]Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Matplotlib is building the font cache; this may take a moment.\n",
      "Scanning: 100%|██████████| 5000/5000 [03:10<00:00, 26.28 sequences/s]  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Filtering finished: 473238 -> 100930\n",
      "1. Converting scanned results into one-hot encoded dataframe.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 4548/4548 [00:04<00:00, 1093.61it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2. Converting results into dictionaries.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00, 26.58it/s]\n",
      "100%|██████████| 1093/1093 [00:00<00:00, 20999.38it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TF→ATAC links: 2253638 edges\n"
     ]
    }
   ],
   "source": [
    "# TF → ATAC: Motif scanning to find TF binding sites\n",
    "tf_atac_links = recon.infer_grn.compute_tf_to_atac_links(atac, ref_genome=\"mm10\", n_cpus=30)\n",
    "print(f\"TF→ATAC links: {len(tf_atac_links)} edges\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "b2b63e5a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Filtered TF→ATAC links: 1053944 edges\n"
     ]
    }
   ],
   "source": [
    "# Filter to keep only links from known TFs\n",
    "tf_atac_links = tf_atac_links[tf_atac_links['source'].str.replace(\"_TF\", \"\").isin(tfs_list)]\n",
    "print(f\"Filtered TF→ATAC links: {len(tf_atac_links)} edges\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "bd2ab3eb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>source</th>\n",
       "      <th>target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>GATA3_TF</td>\n",
       "      <td>chr1_10016349_10016849</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>GATA2_TF</td>\n",
       "      <td>chr1_10016349_10016849</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>RORC_TF</td>\n",
       "      <td>chr1_10016349_10016849</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     source                  target\n",
       "5  GATA3_TF  chr1_10016349_10016849\n",
       "6  GATA2_TF  chr1_10016349_10016849\n",
       "7   RORC_TF  chr1_10016349_10016849"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf_atac_links.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69f5cf77",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Integrate into Final GRN\n",
    "\n",
    "Now we combine all layers using Random Walk with Restart (RWR) to propagate information across the multilayer network.\n",
    "\n",
    "```{tip}\n",
    "The RWR integration weighs indirect evidence (TF → ATAC → RNA) alongside direct evidence (TF → RNA), improving GRN quality compared to single-layer approaches.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "dcc77be3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TF_0\n",
      "ATAC_0\n",
      "RNA_0\n",
      "Initializing Dask cluster with 1 workers...\n",
      "http://192.168.1.150:8787/status\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Running RWR per seed: 100%|██████████| 29/29 [00:01<00:00, 20.07it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Building Dask DataFrame graph with delayed parallel tasks...\n",
      "Final GRN: 2929 TF → gene edges\n"
     ]
    }
   ],
   "source": [
    "# Generate integrated GRN via RWR across all layers\n",
    "grn = recon.infer_grn.generate_grn(\n",
    "    rna_network=rna_network,\n",
    "    atac_network=atac_network,\n",
    "    tf_network=tf_network,\n",
    "    atac_to_rna_links=atac_rna_links,\n",
    "    tf_to_atac_links=tf_atac_links,\n",
    "    n_jobs=1\n",
    ")\n",
    "\n",
    "# Keep only RNA layer (TF → gene relationships)\n",
    "grn = grn[grn.layer == 'RNA']\n",
    "print(f\"Final GRN: {len(grn)} TF → gene edges\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "ae23cde1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>layer</th>\n",
       "      <th>target</th>\n",
       "      <th>path_layer</th>\n",
       "      <th>score</th>\n",
       "      <th>seed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10755</th>\n",
       "      <td>618</td>\n",
       "      <td>RNA</td>\n",
       "      <td>PADI6</td>\n",
       "      <td>RNA_0</td>\n",
       "      <td>0.000109</td>\n",
       "      <td>GFI1_TF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11123</th>\n",
       "      <td>156</td>\n",
       "      <td>RNA</td>\n",
       "      <td>CLCN6</td>\n",
       "      <td>RNA_0</td>\n",
       "      <td>0.000098</td>\n",
       "      <td>HIVEP3_TF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11181</th>\n",
       "      <td>809</td>\n",
       "      <td>RNA</td>\n",
       "      <td>SNRNP40</td>\n",
       "      <td>RNA_0</td>\n",
       "      <td>0.000097</td>\n",
       "      <td>HIVEP3_TF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11303</th>\n",
       "      <td>759</td>\n",
       "      <td>RNA</td>\n",
       "      <td>SDHB</td>\n",
       "      <td>RNA_0</td>\n",
       "      <td>0.000096</td>\n",
       "      <td>HIVEP3_TF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11360</th>\n",
       "      <td>643</td>\n",
       "      <td>RNA</td>\n",
       "      <td>PIK3CD-AS1</td>\n",
       "      <td>RNA_0</td>\n",
       "      <td>0.000096</td>\n",
       "      <td>NFIA_TF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11813</th>\n",
       "      <td>242</td>\n",
       "      <td>RNA</td>\n",
       "      <td>EFHD2</td>\n",
       "      <td>RNA_0</td>\n",
       "      <td>0.000091</td>\n",
       "      <td>JUN_TF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11825</th>\n",
       "      <td>913</td>\n",
       "      <td>RNA</td>\n",
       "      <td>TSSK3</td>\n",
       "      <td>RNA_0</td>\n",
       "      <td>0.000091</td>\n",
       "      <td>JUN_TF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11864</th>\n",
       "      <td>100</td>\n",
       "      <td>RNA</td>\n",
       "      <td>CA6</td>\n",
       "      <td>RNA_0</td>\n",
       "      <td>0.000091</td>\n",
       "      <td>JUN_TF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11877</th>\n",
       "      <td>538</td>\n",
       "      <td>RNA</td>\n",
       "      <td>MIR34AHG</td>\n",
       "      <td>RNA_0</td>\n",
       "      <td>0.000090</td>\n",
       "      <td>JUN_TF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12039</th>\n",
       "      <td>747</td>\n",
       "      <td>RNA</td>\n",
       "      <td>RUNX3</td>\n",
       "      <td>RNA_0</td>\n",
       "      <td>0.000079</td>\n",
       "      <td>ZNF683_TF</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       index layer      target path_layer     score       seed\n",
       "10755    618   RNA       PADI6      RNA_0  0.000109    GFI1_TF\n",
       "11123    156   RNA       CLCN6      RNA_0  0.000098  HIVEP3_TF\n",
       "11181    809   RNA     SNRNP40      RNA_0  0.000097  HIVEP3_TF\n",
       "11303    759   RNA        SDHB      RNA_0  0.000096  HIVEP3_TF\n",
       "11360    643   RNA  PIK3CD-AS1      RNA_0  0.000096    NFIA_TF\n",
       "11813    242   RNA       EFHD2      RNA_0  0.000091     JUN_TF\n",
       "11825    913   RNA       TSSK3      RNA_0  0.000091     JUN_TF\n",
       "11864    100   RNA         CA6      RNA_0  0.000091     JUN_TF\n",
       "11877    538   RNA    MIR34AHG      RNA_0  0.000090     JUN_TF\n",
       "12039    747   RNA       RUNX3      RNA_0  0.000079  ZNF683_TF"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Preview the final GRN\n",
    "grn.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bfd2a15",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Save the GRN\n",
    "\n",
    "Save the inferred GRN for use in other ReCoN tutorials (e.g., Tutorial 2: Multicellular Coordination)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8bfd6948",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GRN saved to recon_hummus_grn.csv\n"
     ]
    }
   ],
   "source": [
    "# Save to CSV\n",
    "grn.to_csv(\"recon_hummus_grn.csv\", index=False)\n",
    "print(\"GRN saved to recon_hummus_grn.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f758af52",
   "metadata": {},
   "source": [
    "```{tip}\n",
    "**Next steps**\n",
    "\n",
    "Use this GRN in:\n",
    "- [Tutorial 2: Multicellular Coordination](2.recon_multicellular_coordination.ipynb) - Explore upstream regulators\n",
    "- [Tutorial 3: Molecular Cascades](3.recon_molecular_cascades.ipynb) - Visualize signaling pathways\n",
    "```\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43f19203",
   "metadata": {},
   "source": [
    "This work uses **ReCoN**’s wrapper of different packages, which are all cited in the Methods section:\n",
    "\n",
    "- The **RNA layer** uses [**arboreto¹**](#arboreto)  \n",
    "- The **ATAC layer** uses [**CIRCE²**](#circe)  \n",
    "- The **TF-ATAC** and **ATAC-RNA bipartites** use [**CellOracle³**](#celloracle)  \n",
    "- The exploration of the **multilayer** uses [**HuMMuS⁴**](#hummus) and [**MultiXrank⁵**](#multixrank)  \n",
    "\n",
    "---\n",
    "\n",
    "### References\n",
    "<a id=\"arboreto\"></a>\n",
    "[1] Moerman, T., Aibar, S., Bravo González-Blas, C., Simm, J., Moreau, Y., Aerts, J., & Aerts, S. (2019).  \n",
    "**GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks.**  \n",
    "*Bioinformatics*, 35(12), 2159–2161. doi:[10.1093/bioinformatics/bty916](https://doi.org/10.1093/bioinformatics/bty916)\n",
    "\n",
    "<a id=\"circe\"></a>\n",
    "[2] Trimbour, R., Saez Rodriguez J., Cantini L. (2025).\n",
    "**CIRCE: a scalable Python package to predict cis-regulatory DNA interactions from single-cell chromatin accessibility data.** doi:[https://doi.org/10.1101/2025.09.23.678054](https://doi.org/10.1101/2025.09.23.678054)\n",
    "\n",
    "<a id=\"celloracle\"></a>\n",
    "[3] Kamimoto, K., Stringa, B., Hoffmann, C. M., Jindal, K., Solnica-Krezel, L., Morris, S. A., et al. (2023).  \n",
    "**Dissecting cell identity via network inference and in silico gene perturbation.**  \n",
    "*Nature*, 614, 742–751. doi:[10.1038/s41586-022-05688-9](https://doi.org/10.1038/s41586-022-05688-9)\n",
    "\n",
    "<a id=\"hummus\"></a>\n",
    "[4] Trimbour, R., Deutschmann I. M., Cantini L. (2024).  \n",
    "**Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS.**  \n",
    "*Bioinformatics*, 40(5), btae143. doi:[10.1093/bioinformatics/btae143](https://doi.org/10.1093/bioinformatics/btae143)\n",
    "\n",
    "<a id=\"multixrank\"></a>\n",
    "[5] Baptista, A., González, A., & Baudot, A. (2022).  \n",
    "**Universal multilayer network exploration by random walk with restart.**  \n",
    "*Communications Physics*, 5, 170. doi:[10.1038/s42005-022-00937-9](https://doi.org/10.1038/s42005-022-00937-9)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "984c1129",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "recon",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}