Installation troubleshooting & FAQ¶
Installation¶
Basic installation¶
For basic ReCoN functionality (without GRN inference from ATAC-seq):
pip install recon
Or install from source for development:
git clone https://github.com/cantinilab/recon.git
cd recon
pip install -e .
Installation with GRN inference (optional)¶
You’ll need gimmemotifs<=0.17.2, celloracle (lite branch), and llvmlite to install ReCoN with full GRN inference capabilities.
# Create environment with required dependencies
conda create -n recon python=3.10
conda activate recon
# Install ReCoN with GRN extras
pip install recon[grn-lite]
Installation with macOS¶
Some packages may be tricky to install with pip on macOS due to system library dependencies. We recommend using conda to manage these dependencies: gimmemotifs and llvmlite
# Create environment with required dependencies
conda create -n recon -c bioconda -c conda-forge python=3.10 gimmemotifs llvmlite cmake
conda activate recon
# Install ReCoN with GRN extras
pip install recon[grn-lite]
You’ll need cmake, gimmemotifs, and llvmlite to install ReCoN with full GRN inference capabilities including ATAC-seq motif scanning.
# Create environment with required dependencies
conda create -n recon -c bioconda -c conda-forge python=3.10 gimmemotifs llvmlite cmake
conda activate recon
# Install ReCoN with GRN extras
pip install recon[grn-lite]
Why these dependencies?
gimmemotifs: TF motif scanning for ATAC-seq peaks (requires pre-compiled binaries from conda)llvmlite: Required by numba for JIT compilation (system LLVM libraries needed on macOS)cmake: Build tool for compiling C/C++ extensions
Installing reference genomes for ATAC-seq¶
If you plan to use ATAC-seq data for TF-to-peak motif scanning, you need to install reference genomes:
# Install genomepy (included with celloracle)
pip install genomepy
# Install mouse genome (mm10)
genomepy install mm10 -p UCSC -a
# Install human genome (hg38)
genomepy install hg38 -p UCSC -a
# Check installed genomes
genomepy genomes
# List available genomes
genomepy search mouse
Where are genomes stored?
Genomes are downloaded to ~/.local/share/genomes/ by default.
There are usually large files: Genomes are typically 1-3 GB each.
You can customize the location:
# Install to custom directory
genomepy install mm10 -p UCSC -a -g /path/to/genomes
# Or set environment variable
export GENOMES_DIR=/path/to/genomes
genomepy install mm10 -p UCSC -a
Available genome providers:
UCSC: University of California Santa Cruz (recommended)Ensembl: European Bioinformatics InstituteNCBI: National Center for Biotechnology Information
Common genomes:
mm10: Mouse (GRCm38/mm10)mm39: Mouse (GRCm39, latest)hg38: Human (GRCh38)hg19: Human (GRCh37, older)
Common installation issues¶
Problem: llvmlite build fails on macOS
# Install via conda instead of pip
conda install -c conda-forge llvmlite numba
Problem: gimmemotifs installation fails
Gimmemotifs has C dependencies that may not compile on all systems. Install from conda-forge and bioconda channels:
conda install -c conda-forge -c bioconda gimmemotifs
Problem: genomepy install command fails with “Got unexpected extra argument”
The correct syntax uses -p flag for provider:
# WRONG: genomepy install mm10 UCSC --annotation
# RIGHT:
genomepy install mm10 -p UCSC -a
Problem: “Genomes_dir does not exist” error
Create the directory first or specify a custom location:
# Option 1: Create default directory
mkdir -p ~/.local/share/genomes
# Option 2: Use custom directory
genomepy install mm10 -p UCSC -a -g /path/to/genomes
Problem: ATAC tests are skipped with “mm10 genome not installed”
This is expected when the genome isn’t downloaded. To run ATAC-seq tests:
# Install celloracle
pip install 'git+https://github.com/cantinilab/celloracle@lite'
# Install genome
genomepy install mm10 -p UCSC -a
# Verify installation
ls ~/.local/share/genomes/mm10/
# Run tests
pytest tests/test_infer_grn.py -v
Problem: I cannot compute GRNs
CellOracle is required for GRN inference with ATAC-seq data. Options:
Install our ‘lite’ branch direclty with recon:
pip install recon[grn-lite]Install it separately:
pip install 'git+https://github.com/cantinilab/celloracle@lite'Compute your GRN externally and provide it to ReCoN.
Problem: Tests are skipped for celloracle functions
This is expected behavior when celloracle is not installed. The tests use @pytest.mark.skipif to gracefully skip ATAC-seq tests when celloracle is unavailable. Install with [grn] extras to run all tests.
Python version compatibility¶
Minimum: Python 3.8
Recommended: Python 3.10+
Note: Some dependencies (like circe-py) use Python 3.10+ type syntax (
Type | None). If you encounterTypeError: unsupported operand type(s) for |, upgrade to Python 3.10+.
GRN inference¶
What GRN inference methods are available?¶
ReCoN supports multiple approaches:
TF-to-gene (RNA-seq only): Uses GRNBoost2-style gradient boosting (GBM) or Random Forest (RF)
TF-to-gene with ATAC-seq: Adds TF-to-peak motif scanning via CellOracle
Receptor-to-gene: Custom connections for cell surface receptors
When should I use ATAC-seq integration?¶
Use ATAC-seq when:
You want more accurate TF-gene regulatory links
You have matched scRNA-seq + scATAC-seq data
You’re interested in chromatin accessibility effects
Skip ATAC-seq when:
You only have scRNA-seq data
Computational resources are limited (motif scanning is slow)
You’re doing quick exploratory analysis
How accurate is GRN inference?¶
GRN inference is probabilistic and noisy. ReCoN combines:
Expression correlation (GRNBoost2 importance scores)
Motif evidence (TF binding site predictions from ATAC peaks)
Network propagation (RWR to capture indirect effects)
Best practices:
Use biological validation (ChIP-seq, literature)
Focus on highly-ranked edges (top 10-20%)
Combine with perturbation data when available
Can I use my own GRN?¶
Yes! Provide a custom DataFrame with columns ['source', 'target', 'weight']:
import pandas as pd
from recon.explore import Celltype
# Custom GRN from literature or ChIP-seq
custom_grn = pd.DataFrame({
'source': ['TF1', 'TF1', 'TF2'],
'target': ['GENE1', 'GENE2', 'GENE3'],
'weight': [0.8, 0.6, 0.9]
})
celltype = Celltype(
grn=custom_grn,
receptor_grn=receptor_grn,
name="MyCell"
)
ReCoN results¶
What do the scores mean?¶
ReCoN outputs Random Walk with Restart (RWR) scores representing:
Treatment propagation: How molecular perturbations flow through the network
Values 0-1: Higher = more affected by treatment
Relative ranking: Compare scores across genes/cells, not absolute magnitudes
The alpha parameter (default 0.8) controls:
High alpha (0.8-0.9): Treatment stays local to seeds
Low alpha (0.3-0.5): Treatment diffuses widely across network
What seeds should I use?¶
Seeds are your treatment entry points. Common choices:
Differentially expressed genes from treated vs control
Drug targets (e.g., receptor targeted by therapy)
Pathway genes (e.g., all genes in immune response pathway)
# Dictionary format: {gene: score}
seeds = {'RECEPTOR1': 1.0, 'TF1': 0.8}
# Or list format (all seeds weighted equally)
seeds = ['RECEPTOR1', 'TF1', 'GENE1']
How do I interpret multicellular results?¶
In Multicell objects:
Node names: Suffixed with
::celltype(e.g.,CD8::T_cell)Ligand-receptor connections:
LIGAND-celltype→RECEPTOR_receptor::celltypeCell communication layer: Bipartite graphs between cell types
Lamb matrix: Controls transition probabilities between layers
Higher scores in receiving cells indicate:
Strong cell-cell communication effects
Potential for coordinated responses
Targets for combination therapies
Why are some scores zero?¶
Possible reasons:
Disconnected components: Gene not reachable from seeds in network
Low edge weights: Weak connections filtered out
High restart probability: Treatment didn’t diffuse far enough (try lower restart probability)
Missing edges: Incomplete GRN (add more regulatory links)
ReCoN interpretation¶
How to validate ReCoN predictions?¶
Literature search: Check if predicted genes are known treatment targets
Pathway analysis: Are high-scoring genes in expected pathways?
Perturbation data: Compare with experimental knockdown/overexpression
Cross-validation: Split cells into train/test, validate predictions
Temporal data: Do predictions match time-series gene expression?
What biological insights can ReCoN provide?¶
ReCoN helps answer:
Which genes are affected by a treatment beyond direct targets?
How do cell types coordinate their responses?
What off-target effects might occur?
Why do some cells respond differently than others?
And maybe your own biological question! :)
How to compare conditions?¶
Compare RWR scores between conditions:
# Run ReCoN on both conditions
results_control = celltype.Multixrank(seeds=control_seeds, alpha=0.8)
results_treated = celltype.Multixrank(seeds=treated_seeds, alpha=0.8)
# Compare scores
import pandas as pd
comparison = pd.DataFrame({
'control': results_control['GRN'],
'treated': results_treated['GRN']
})
comparison['delta'] = comparison['treated'] - comparison['control']
# Genes with largest changes
top_changes = comparison.nlargest(20, 'delta')
ReCoN Visualization¶
What visualization tools are available?¶
Sankey diagrams: Trace treatment flow from seeds through network
Network plots: Visualize multicellular architecture
Heatmaps: Compare scores across cell types or conditions
Custom plotting: Export scores to pandas DataFrames for ggplot/matplotlib
Example Sankey diagram:
from recon.plot import plot_sankey
plot_sankey(
multicell=multicell,
results=results,
source_celltype='Tumor',
target_celltype='T_cell',
top_n=10
)
How to export results?¶
Results are pandas DataFrames - use standard methods:
# Save to CSV
results['GRN'].to_csv('recon_results.csv')
# Save to Excel with multiple sheets
with pd.ExcelWriter('recon_results.xlsx') as writer:
results['GRN'].to_excel(writer, sheet_name='GRN')
results['Receptor'].to_excel(writer, sheet_name='Receptors')
Reproducibility¶
How to ensure reproducible results?¶
Set random seeds: ReCoN uses deterministic algorithms, but upstream GRN inference may not
Version dependencies: Document versions of recon, scanpy, etc.
Save parameters: Record alpha, seeds, network sizes
Archive networks: Save GRN/receptor_grn DataFrames
# Save complete configuration
import json
config = {
'recon_version': '0.1.0',
'alpha': 0.8,
'seeds': seeds,
'graph_types': {'GRN': '01', 'Receptor': '01'},
'n_genes': len(celltype.multiplexes['GRN'])
}
with open('recon_config.json', 'w') as f:
json.dump(config, f, indent=2)
Performance & Scalability¶
How long does ReCoN take?¶
GRN inference (slowest):
GRNBoost2: 10-60 minutes for 5000 genes
ATAC motif scanning: 1-10 hours depending on peak count
RWR computation (fast):
Single celltype: Seconds to minutes
Multicellular (3-5 celltypes): 1-5 minutes
Tips for speed:
Pre-compute GRN once, reuse for multiple seed sets
Limit GRN to top expressed genes (2000-5000)
Use sparse matrices (automatically handled)
Memory requirements?¶
Minimal: ~2 GB for small networks (<1000 genes)
Typical: 4-8 GB for realistic scRNA-seq data
Large: 16+ GB for 10+ cell types with full GRNs
Reduce memory by:
Filtering low-expressed genes before GRN inference
Using fewer celltypes in multicellular models
Clearing intermediate results:
del results
Can I parallelize ReCoN?¶
GRN inference: Parallel by default (set
n_cpuparameter)RWR computation: Single-threaded (already very fast)
Multiple conditions: Run in parallel with multiprocessing
from multiprocessing import Pool
def run_recon(seed_set):
return celltype.Multixrank(seeds=seed_set, alpha=0.8)
with Pool(4) as p:
results = p.map(run_recon, [seeds1, seeds2, seeds3, seeds4])
Getting Help¶
Where to find more information?¶
Documentation: https://recon.readthedocs.io
GitHub Issues: https://github.com/cantinilab/recon/issues
Examples: See notebooks in
docs/source/recon_examples/Paper: [Add citation when published]
How to report bugs?¶
Open a GitHub issue with:
Python/package versions:
pip list | grep reconMinimal example: Code that reproduces the error
Error message: Full traceback
Expected behavior: What should happen instead
# Get version info for bug report
python -c "import recon; print(recon.__version__)"
python --version
pip list | grep -E "recon|scanpy|numpy|pandas"
Contributing¶
ReCoN is open source (GPL-3.0 license). Contributions welcome:
Code: Submit pull requests on GitHub
Documentation: Fix typos, add examples
Testing: Report issues, suggest improvements
Citations: Cite ReCoN in your publications
License: GPL-3.0 allows free use, modification, and distribution, but derivative works must also be open source.