# Data Policy & Security Scope - Strong separation of concerns: core library and CI run offline by default; data acquisition is manual and opt-in. - Reproducible, conservative results: artifacts include uncertainty and attestation with hashes and environment info. Acquisition - Prefer CSV/JSON inputs you control. Avoid serialized Python objects (pickle) due to code execution risk. - Materials Project access is manual: use `MP_API_KEY` in the `materials_audit` workflow or local sessions only. - RouterBench dataset remains gated; helper script fetches from the canonical source and warns about `.pkl` risk. CI/CD boundaries - Default CI does not download external data or call external APIs. - Manual workflows (`workflow_dispatch`) accept user-provided CSV paths and secrets, and upload artifacts (no Releases). - Attestation JSON (hashes, env, commit) accompanies outputs for review reproducibility. Security - Bandit scans `src/compitum`, `tools`, `examples`, `scripts` in CI. - We do not unpickle arbitrary files in CI; `.pkl` files are advisory only and never committed. - Secrets are scoped to manual jobs and not echoed in logs; outputs exclude secrets. Recommended layout - `data/` for local inputs and products (ignored by Git). Suggested subfolders: - `data/external/` for third-party CSVs - `data/samples/` for small examples (checked in) - Avoid committing third-party data; instead, reference provenance in attestation or a small README next to the file. Reproducibility - Use the offline Matbench workflow to calibrate and evaluate from a repo CSV path, producing: - Calibration JSON, Regret CSV/JSON (+ groups/budget optional), Baseline CSV/JSON, Layers CSV/JSON, and Attestation JSON.