# 13 Scoring Formulas and Cutoffs

## Quick Answer
Scores and thresholds are module-specific. Always report score type, default cutoffs, and whether values are pre-filtered or full-scope.

## What this does
Defines practical formula references and cutoff interpretation.

## Inputs
- Module score fields (`Confidence_Score`, pathway `p_value/q_value`, communication score)
- User-selected cutoffs/caps

## Outputs
- Ranked lists and charts with explicit thresholds
- Processing notes in UI and exports

## How calculated
Examples:
- Disease priority: grouped disease rows use `100 * (0.28 * evidence_breadth + 0.22 * support_breadth + 0.14 * source_breadth + 0.12 * publication_breadth + 0.12 * direct_fraction + 0.12 * score_signal)`. Evidence, support, and publication burdens are log-normalized to result maxima; source breadth is divided by the result maximum; score signal is the clipped median source score.
- Pathway enrichment: hypergeometric `p_value`, BH-adjusted `q_value`; the Alzheimer manuscript figure state uses `p <= 0.05` and `q <= 0.10`. The minimum overlap is 1 gene for final gene sets under 50 genes and 2 genes otherwise.
- Pathway category links: category nodes summarize significant pathways by pathway count, overlapping genes, mean p-value, and median p-value. Links require more than one shared gene and shared-gene Jaccard `> 0.1`.
- LR expression score: the default method is `(log1p(ligand_expr) + log1p(receptor_expr)) / 2`. `LR_SCORE_METHOD` can switch to `product_log1p` or `geometric_log1p`.
- Communication relevance: `45 * expression_score + 15 * direct_query_ligand + 10 * query_receptor + 10 * EV-reported_ligand + 10 * secreted_or_membrane_ligand + 5 * curated_source_count + 5 * active_system_focus`.
- Cell specificity signal: per-gene expression z-score across cell types, clipped and scaled as `clip(z_score, 0, 3) / 3`.
- Cell context score with direct seeds: `100 * (0.42 * seed_coverage + 0.33 * seed_specificity + 0.13 * marker_support + 0.12 * expanded_support)`.
- Cell context fallback with no direct seed genes: `100 * (0.60 * expanded_support + 0.25 * expanded_specificity + 0.15 * expanded_coverage)`.
- Expanded support: `0.60 * expanded_coverage + 0.40 * expanded_specificity`.
- Communication readiness: `100 * (0.65 * localization_ready_fraction + 0.35 * expressed_query_ligand_fraction)`.
- Composite cell context score: `0.76 * context_score + 0.14 * system_relevance * confidence_scale + 0.10 * communication_readiness`.
- miRNA-target expansion: aggregate `miRNA_targets_scored.parquet` by mRNA using support count, mean `Confidence_Score`, and maximum `Confidence_Score`; target expansion score is `mean_confidence * support_count`.
- miRNA bridge ranking: `1000 * support_count + 10 * mean_confidence + max_confidence`; direct query overlaps are labeled separately from shared target bridges.
- STRING/PPI: high-confidence bridge support uses `combined_score >= 700` in the manuscript state.
- Processing guards: `MAX_MOLECULES_PER_SEARCH` defaults to 25 for immediate molecule input, `STAGED_MOLECULE_TRIGGER` defaults to 6 for staged-load guidance, `MAX_GENESET_TARGET_SCAN_ROWS` defaults to 1,500,000, `MAX_CELLSPEC_SCAN_ROWS` defaults to 2,000,000, and `MAX_STRING_ROWS_PER_PROTEIN` defaults to 200,000.

## What to download
Include score columns and processing notes when exporting, then reproduce filtering downstream with documented thresholds.

## Known limits
Scores from different modules are not directly interchangeable. Use module-native interpretation. All scores are for ranking and triage unless an upstream source explicitly encodes a validated experimental claim.
