﻿# 16 Data Inventory Appendix

## Quick Answer
This appendix points to machine-readable provenance files that capture dataset schemas, row counts, source databases, processing scripts, graph definitions, and FAQ/glossary retrieval metadata.

## What this does
Provides direct references to artifacts used by humans and LLM systems for reproducibility and retrieval.

## Inputs
- `static/docs/data_source_inventory.json`
- `static/docs/graph_specs.json`
- `static/docs/docs_index.json`
- `static/docs/glossary.json`
- `static/docs/faq.json`

## Outputs
- Deterministic documentation metadata for auditing and manuscript method sections.

## How calculated
`data_source_inventory.json` is generated by `scripts/build_docs_inventory.py` from:
- parquet schemas in `sample_databases/`
- ingestion file and script references in `C:\Users\jsw82\Documents\EV_dex\data_sources`

## What to download
- `static/docs/data_source_inventory.json`
- `static/docs/graph_specs.json`
- `static/docs/citations.bib`

## Known limits
Inventory is only as current as the latest generator run. Re-run the script after pipeline updates or schema changes.

## Artifact map
- `data_source_inventory.json`: per-dataset schema, row count, upstream sources, raw inputs, processing scripts, key IDs, where-used mapping.
- `graph_specs.json`: graph key, container selector, source tables, filter dependencies, export links, interpretation cautions.
- `faq.json`: structured FAQ with category and tag metadata.
- `glossary.json`: canonical term definitions with aliases and section links.
- `docs_index.json`: section metadata and dependency references.
