Data Policy & Security¶
Scope
Strong separation of concerns: core library and CI run offline by default; data acquisition is manual and opt-in.
Reproducible, conservative results: artifacts include uncertainty and attestation with hashes and environment info.
Acquisition
Prefer CSV/JSON inputs you control. Avoid serialized Python objects (pickle) due to code execution risk.
Materials Project access is manual: use
MP_API_KEYin thematerials_auditworkflow or local sessions only.RouterBench dataset remains gated; helper script fetches from the canonical source and warns about
.pklrisk.
CI/CD boundaries
Default CI does not download external data or call external APIs.
Manual workflows (
workflow_dispatch) accept user-provided CSV paths and secrets, and upload artifacts (no Releases).Attestation JSON (hashes, env, commit) accompanies outputs for review reproducibility.
Security
Bandit scans
src/compitum,tools,examples,scriptsin CI.We do not unpickle arbitrary files in CI;
.pklfiles are advisory only and never committed.Secrets are scoped to manual jobs and not echoed in logs; outputs exclude secrets.
Recommended layout
data/for local inputs and products (ignored by Git). Suggested subfolders:data/external/for third-party CSVsdata/samples/for small examples (checked in)
Avoid committing third-party data; instead, reference provenance in attestation or a small README next to the file.
Reproducibility
Use the offline Matbench workflow to calibrate and evaluate from a repo CSV path, producing:
Calibration JSON, Regret CSV/JSON (+ groups/budget optional), Baseline CSV/JSON, Layers CSV/JSON, and Attestation JSON.