Installation troubleshooting & FAQ¶

Installation¶

Basic installation¶

For basic ReCoN functionality (without GRN inference from ATAC-seq):

pip install recon

Or install from source for development:

git clone https://github.com/cantinilab/recon.git
cd recon
pip install -e .

Installation with GRN inference (optional)¶

You’ll need gimmemotifs<=0.17.2, celloracle (lite branch), and llvmlite to install ReCoN with full GRN inference capabilities.

# Create environment with required dependencies
conda create -n recon  python=3.10
conda activate recon

# Install ReCoN with GRN extras
pip install recon[grn-lite]

Installation with macOS¶

Some packages may be tricky to install with pip on macOS due to system library dependencies. We recommend using conda to manage these dependencies: gimmemotifs and llvmlite

# Create environment with required dependencies
conda create -n recon -c bioconda -c conda-forge python=3.10 gimmemotifs llvmlite cmake

conda activate recon

# Install ReCoN with GRN extras
pip install recon[grn-lite]

You’ll need cmake, gimmemotifs, and llvmlite to install ReCoN with full GRN inference capabilities including ATAC-seq motif scanning.

# Create environment with required dependencies
conda create -n recon -c bioconda -c conda-forge python=3.10 gimmemotifs llvmlite cmake
conda activate recon

# Install ReCoN with GRN extras
pip install recon[grn-lite]

Why these dependencies?

gimmemotifs: TF motif scanning for ATAC-seq peaks (requires pre-compiled binaries from conda)
llvmlite: Required by numba for JIT compilation (system LLVM libraries needed on macOS)
cmake: Build tool for compiling C/C++ extensions

Installing reference genomes for ATAC-seq¶

If you plan to use ATAC-seq data for TF-to-peak motif scanning, you need to install reference genomes:

# Install genomepy (included with celloracle)
pip install genomepy

# Install mouse genome (mm10)
genomepy install mm10 -p UCSC -a

# Install human genome (hg38)
genomepy install hg38 -p UCSC -a

# Check installed genomes
genomepy genomes

# List available genomes
genomepy search mouse

Where are genomes stored?

Genomes are downloaded to ~/.local/share/genomes/ by default. There are usually large files: Genomes are typically 1-3 GB each.

You can customize the location:

# Install to custom directory
genomepy install mm10 -p UCSC -a -g /path/to/genomes

# Or set environment variable
export GENOMES_DIR=/path/to/genomes
genomepy install mm10 -p UCSC -a

Available genome providers:

UCSC: University of California Santa Cruz (recommended)
Ensembl: European Bioinformatics Institute
NCBI: National Center for Biotechnology Information

Common genomes:

mm10: Mouse (GRCm38/mm10)
mm39: Mouse (GRCm39, latest)
hg38: Human (GRCh38)
hg19: Human (GRCh37, older)

Common installation issues¶

Problem: llvmlite build fails on macOS

# Install via conda instead of pip
conda install -c conda-forge llvmlite numba

Problem: gimmemotifs installation fails

Gimmemotifs has C dependencies that may not compile on all systems. Install from conda-forge and bioconda channels:

conda install -c conda-forge -c bioconda gimmemotifs

Problem: genomepy install command fails with “Got unexpected extra argument”

The correct syntax uses -p flag for provider:

# WRONG: genomepy install mm10 UCSC --annotation
# RIGHT:
genomepy install mm10 -p UCSC -a

Problem: “Genomes_dir does not exist” error

Create the directory first or specify a custom location:

# Option 1: Create default directory
mkdir -p ~/.local/share/genomes

# Option 2: Use custom directory
genomepy install mm10 -p UCSC -a -g /path/to/genomes

Problem: ATAC tests are skipped with “mm10 genome not installed”

This is expected when the genome isn’t downloaded. To run ATAC-seq tests:

# Install celloracle
pip install 'git+https://github.com/cantinilab/celloracle@lite'

# Install genome
genomepy install mm10 -p UCSC -a

# Verify installation
ls ~/.local/share/genomes/mm10/

# Run tests
pytest tests/test_infer_grn.py -v

Problem: I cannot compute GRNs

CellOracle is required for GRN inference with ATAC-seq data. Options:

Install our ‘lite’ branch direclty with recon: pip install recon[grn-lite]
Install it separately: pip install 'git+https://github.com/cantinilab/celloracle@lite'
Compute your GRN externally and provide it to ReCoN.

Problem: Tests are skipped for celloracle functions

This is expected behavior when celloracle is not installed. The tests use @pytest.mark.skipif to gracefully skip ATAC-seq tests when celloracle is unavailable. Install with [grn] extras to run all tests.

Python version compatibility¶

Minimum: Python 3.8
Recommended: Python 3.10+
Note: Some dependencies (like circe-py) use Python 3.10+ type syntax (Type | None). If you encounter TypeError: unsupported operand type(s) for |, upgrade to Python 3.10+.

GRN inference¶

What GRN inference methods are available?¶

ReCoN supports multiple approaches:

TF-to-gene (RNA-seq only): Uses GRNBoost2-style gradient boosting (GBM) or Random Forest (RF)
TF-to-gene with ATAC-seq: Adds TF-to-peak motif scanning via CellOracle
Receptor-to-gene: Custom connections for cell surface receptors

When should I use ATAC-seq integration?¶

Use ATAC-seq when:

You want more accurate TF-gene regulatory links
You have matched scRNA-seq + scATAC-seq data
You’re interested in chromatin accessibility effects

Skip ATAC-seq when:

You only have scRNA-seq data
Computational resources are limited (motif scanning is slow)
You’re doing quick exploratory analysis

How accurate is GRN inference?¶

GRN inference is probabilistic and noisy. ReCoN combines:

Expression correlation (GRNBoost2 importance scores)
Motif evidence (TF binding site predictions from ATAC peaks)
Network propagation (RWR to capture indirect effects)

Best practices:

Use biological validation (ChIP-seq, literature)
Focus on highly-ranked edges (top 10-20%)
Combine with perturbation data when available

Can I use my own GRN?¶

Yes! Provide a custom DataFrame with columns ['source', 'target', 'weight']:

import pandas as pd
from recon.explore import Celltype

# Custom GRN from literature or ChIP-seq
custom_grn = pd.DataFrame({
    'source': ['TF1', 'TF1', 'TF2'],
    'target': ['GENE1', 'GENE2', 'GENE3'],
    'weight': [0.8, 0.6, 0.9]
})

celltype = Celltype(
    grn=custom_grn,
    receptor_grn=receptor_grn,
    name="MyCell"
)

ReCoN results¶

What do the scores mean?¶

ReCoN outputs Random Walk with Restart (RWR) scores representing:

Treatment propagation: How molecular perturbations flow through the network
Values 0-1: Higher = more affected by treatment
Relative ranking: Compare scores across genes/cells, not absolute magnitudes

The alpha parameter (default 0.8) controls:

High alpha (0.8-0.9): Treatment stays local to seeds
Low alpha (0.3-0.5): Treatment diffuses widely across network

What seeds should I use?¶

Seeds are your treatment entry points. Common choices:

Differentially expressed genes from treated vs control
Drug targets (e.g., receptor targeted by therapy)
Pathway genes (e.g., all genes in immune response pathway)

# Dictionary format: {gene: score}
seeds = {'RECEPTOR1': 1.0, 'TF1': 0.8}

# Or list format (all seeds weighted equally)
seeds = ['RECEPTOR1', 'TF1', 'GENE1']

How do I interpret multicellular results?¶

In Multicell objects:

Node names: Suffixed with ::celltype (e.g., CD8::T_cell)
Ligand-receptor connections: LIGAND-celltype → RECEPTOR_receptor::celltype
Cell communication layer: Bipartite graphs between cell types
Lamb matrix: Controls transition probabilities between layers

Higher scores in receiving cells indicate:

Strong cell-cell communication effects
Potential for coordinated responses
Targets for combination therapies

Why are some scores zero?¶

Possible reasons:

Disconnected components: Gene not reachable from seeds in network
Low edge weights: Weak connections filtered out
High restart probability: Treatment didn’t diffuse far enough (try lower restart probability)
Missing edges: Incomplete GRN (add more regulatory links)

ReCoN interpretation¶

How to validate ReCoN predictions?¶

Literature search: Check if predicted genes are known treatment targets
Pathway analysis: Are high-scoring genes in expected pathways?
Perturbation data: Compare with experimental knockdown/overexpression
Cross-validation: Split cells into train/test, validate predictions
Temporal data: Do predictions match time-series gene expression?

What biological insights can ReCoN provide?¶

ReCoN helps answer:

Which genes are affected by a treatment beyond direct targets?
How do cell types coordinate their responses?
What off-target effects might occur?
Why do some cells respond differently than others?
And maybe your own biological question! :)

How to compare conditions?¶

Compare RWR scores between conditions:

# Run ReCoN on both conditions
results_control = celltype.Multixrank(seeds=control_seeds, alpha=0.8)
results_treated = celltype.Multixrank(seeds=treated_seeds, alpha=0.8)

# Compare scores
import pandas as pd
comparison = pd.DataFrame({
    'control': results_control['GRN'],
    'treated': results_treated['GRN']
})
comparison['delta'] = comparison['treated'] - comparison['control']

# Genes with largest changes
top_changes = comparison.nlargest(20, 'delta')

ReCoN Visualization¶

What visualization tools are available?¶

Sankey diagrams: Trace treatment flow from seeds through network
Network plots: Visualize multicellular architecture
Heatmaps: Compare scores across cell types or conditions
Custom plotting: Export scores to pandas DataFrames for ggplot/matplotlib

Example Sankey diagram:

from recon.plot import plot_sankey

plot_sankey(
    multicell=multicell,
    results=results,
    source_celltype='Tumor',
    target_celltype='T_cell',
    top_n=10
)

How to export results?¶

Results are pandas DataFrames - use standard methods:

# Save to CSV
results['GRN'].to_csv('recon_results.csv')

# Save to Excel with multiple sheets
with pd.ExcelWriter('recon_results.xlsx') as writer:
    results['GRN'].to_excel(writer, sheet_name='GRN')
    results['Receptor'].to_excel(writer, sheet_name='Receptors')

Reproducibility¶

How to ensure reproducible results?¶

Set random seeds: ReCoN uses deterministic algorithms, but upstream GRN inference may not
Version dependencies: Document versions of recon, scanpy, etc.
Save parameters: Record alpha, seeds, network sizes
Archive networks: Save GRN/receptor_grn DataFrames

# Save complete configuration
import json

config = {
    'recon_version': '0.1.0',
    'alpha': 0.8,
    'seeds': seeds,
    'graph_types': {'GRN': '01', 'Receptor': '01'},
    'n_genes': len(celltype.multiplexes['GRN'])
}

with open('recon_config.json', 'w') as f:
    json.dump(config, f, indent=2)

Performance & Scalability¶

How long does ReCoN take?¶

GRN inference (slowest):

GRNBoost2: 10-60 minutes for 5000 genes
ATAC motif scanning: 1-10 hours depending on peak count

RWR computation (fast):

Single celltype: Seconds to minutes
Multicellular (3-5 celltypes): 1-5 minutes

Tips for speed:

Pre-compute GRN once, reuse for multiple seed sets
Limit GRN to top expressed genes (2000-5000)
Use sparse matrices (automatically handled)

Memory requirements?¶

Minimal: ~2 GB for small networks (<1000 genes)
Typical: 4-8 GB for realistic scRNA-seq data
Large: 16+ GB for 10+ cell types with full GRNs

Reduce memory by:

Filtering low-expressed genes before GRN inference
Using fewer celltypes in multicellular models
Clearing intermediate results: del results

Can I parallelize ReCoN?¶

GRN inference: Parallel by default (set n_cpu parameter)
RWR computation: Single-threaded (already very fast)
Multiple conditions: Run in parallel with multiprocessing

from multiprocessing import Pool

def run_recon(seed_set):
    return celltype.Multixrank(seeds=seed_set, alpha=0.8)

with Pool(4) as p:
    results = p.map(run_recon, [seeds1, seeds2, seeds3, seeds4])

Getting Help¶

Where to find more information?¶

Documentation: https://recon.readthedocs.io
GitHub Issues: https://github.com/cantinilab/recon/issues
Examples: See notebooks in docs/source/recon_examples/
Paper: [Add citation when published]

How to report bugs?¶

Open a GitHub issue with:

Python/package versions: pip list | grep recon
Minimal example: Code that reproduces the error
Error message: Full traceback
Expected behavior: What should happen instead

# Get version info for bug report
python -c "import recon; print(recon.__version__)"
python --version
pip list | grep -E "recon|scanpy|numpy|pandas"

Contributing¶

ReCoN is open source (GPL-3.0 license). Contributions welcome:

Code: Submit pull requests on GitHub
Documentation: Fix typos, add examples
Testing: Report issues, suggest improvements
Citations: Cite ReCoN in your publications

License: GPL-3.0 allows free use, modification, and distribution, but derivative works must also be open source.