# Matbench Regret (Offline) This document defines the regret metrics we use for Matbench-style evaluations and how to reproduce results offline with CSVs. The approach is conservative: it reports uncertainty, uses fixed seeds, and avoids comparative claims by default. Terminology - Objective (`y_true`): numeric property to maximize or minimize (declared via `--mode`). - Score: ranking function for selection. By default, we use SRMF proxies: `kappa − λ·leak`. - Regret@k: Difference between oracle top‑k utility and model-selected top‑k utility, with normalized variant dividing by oracle utility. - AURC: Area under the normalized regret curve over k (lower is better). Offline tools - Calibration: choose λ to minimize validation AURC, report held‑out test AURC with CIs. - `tools/calibrate_matbench_srmf.py` - Evaluation: compute Regret@k and AURC using the chosen λ (or a provided score column). - `tools/eval_matbench_regret.py` CSV schema - Required features for SRMF: `band_gap, density, nsites, formation_energy_per_atom`. - Required objective: e.g., `y_true` (set via `--objective-col`). - Optional: `material_id` and `formula_pretty` for reporting. Reproducible run - Calibrate: - `python tools/calibrate_matbench_srmf.py --path data.csv --objective-col y_true --mode max --topk-grid 1,5,10 --lambda-grid 0.0,0.5,1.0 --bootstrap 1000 --seed 0 --out-json reports/matbench_calibration.json --scores-out reports/matbench_scores_test.csv` - Evaluate with tuned λ: - `python tools/eval_matbench_regret.py --path data.csv --objective-col y_true --mode max --use-srmf --lambda-weight $(jq -r .best_lambda reports/matbench_calibration.json) --topk-grid 1,5,10 --out-csv reports/matbench_regret.csv --out-json reports/matbench_regret.json --bootstrap 1000 --seed 0` Claims and limitations - SRMF mapping is a proxy for manifold geometry and stability signals; it is not a surrogate for ab initio calculations. We report uncertainty, avoid comparative claims by default, and keep live integrations gated. ## Attestation and Groups - Attestation: tools/generate_matbench_attestation.py - Per-group regret: tools/eval_matbench_regret.py --group-col group --out-group-csv reports/matbench_regret_groups.csv ## Baselines and Layers - Baseline CV regret CLI: tools/eval_baseline_regret.py - Emergence exploration: tools/explore_matbench_layers.py - Example: use quantile layers on band_gap and report AURC per-layer.