# Matbench Regret (Offline)

This document defines the regret metrics we use for Matbench-style evaluations and how to
reproduce results offline with CSVs. The approach is conservative: it reports uncertainty,
uses fixed seeds, and avoids comparative claims by default.

Terminology
- Objective (`y_true`): numeric property to maximize or minimize (declared via `--mode`).
- Score: ranking function for selection. By default, we use SRMF proxies: `kappa − λ·leak`.
- Regret@k: Difference between oracle top‑k utility and model-selected top‑k utility, with
  normalized variant dividing by oracle utility.
- AURC: Area under the normalized regret curve over k (lower is better).

Offline tools
- Calibration: choose λ to minimize validation AURC, report held‑out test AURC with CIs.
  - `tools/calibrate_matbench_srmf.py`
- Evaluation: compute Regret@k and AURC using the chosen λ (or a provided score column).
  - `tools/eval_matbench_regret.py`

CSV schema
- Required features for SRMF: `band_gap, density, nsites, formation_energy_per_atom`.
- Required objective: e.g., `y_true` (set via `--objective-col`).
- Optional: `material_id` and `formula_pretty` for reporting.

Reproducible run
- Calibrate:
  - `python tools/calibrate_matbench_srmf.py --path data.csv --objective-col y_true --mode max --topk-grid 1,5,10 --lambda-grid 0.0,0.5,1.0 --bootstrap 1000 --seed 0 --out-json reports/matbench_calibration.json --scores-out reports/matbench_scores_test.csv`
- Evaluate with tuned λ:
  - `python tools/eval_matbench_regret.py --path data.csv --objective-col y_true --mode max --use-srmf --lambda-weight $(jq -r .best_lambda reports/matbench_calibration.json) --topk-grid 1,5,10 --out-csv reports/matbench_regret.csv --out-json reports/matbench_regret.json --bootstrap 1000 --seed 0`

Claims and limitations
- SRMF mapping is a proxy for manifold geometry and stability signals; it is not a surrogate for
  ab initio calculations. We report uncertainty, avoid comparative claims by default, and keep
  live integrations gated.


## Attestation and Groups
- Attestation: tools/generate_matbench_attestation.py
- Per-group regret: tools/eval_matbench_regret.py --group-col group --out-group-csv reports/matbench_regret_groups.csv


## Baselines and Layers
- Baseline CV regret CLI: tools/eval_baseline_regret.py
- Emergence exploration: tools/explore_matbench_layers.py
- Example: use quantile layers on band_gap and report AURC per-layer.