# Data Provenance

EVd3x integrates public resources into canonical analysis tables with source-attributed traceability from query to export.

## Provenance principles

- Preserve source database fields in UI and exports.
- Preserve publication identifiers where available.
- Keep direct evidence and derived context separable.

## Canonical data scope

- EVd3x uses 17 canonical Apache Parquet analysis tables.
- Runtime caches support speed and enrichment but are not counted as canonical analysis layers.
- Canonical and auxiliary table metadata is documented in `static/docs/data_source_inventory.json`.

## Main source groups

- Identity mapping: Ensembl/BioMart, UniProt, miRBase, RNAcentral.
- miRNA-target support: miRTarBase, TarBase, TargetScan.
- EV evidence: ExoCarta, Vesiclepedia, SVAtlas.
- Pathways: Reactome, KEGG, GO, WikiPathways.
- Cell and localization context: HPA, RNALocate, miRmine, miRNATissueAtlas2.
- Communication context: CellPhoneDB, Cellinker, CellTalkDB, OmniPath.
- Disease context: DisGeNET and linked harmonized disease resources.

## EV-TRACK interpretation

- EV-TRACK is record-level provenance metadata when available.
- Missing EV-TRACK fields can reflect upstream source availability.
- Interpret EV-TRACK together with source and publication fields.

## Reproducibility minimum

When sharing results, include:
- exact query,
- analysis mode,
- active filters and caps,
- exported machine-readable tables,
- version/date of the analyzed state.
