Matbench Regret (Offline)¶
This document defines the regret metrics we use for Matbench-style evaluations and how to reproduce results offline with CSVs. The approach is conservative: it reports uncertainty, uses fixed seeds, and avoids comparative claims by default.
Terminology
Objective (
y_true): numeric property to maximize or minimize (declared via--mode).Score: ranking function for selection. By default, we use SRMF proxies:
kappa − λ·leak.Regret@k: Difference between oracle top‑k utility and model-selected top‑k utility, with normalized variant dividing by oracle utility.
AURC: Area under the normalized regret curve over k (lower is better).
Offline tools
Calibration: choose λ to minimize validation AURC, report held‑out test AURC with CIs.
tools/calibrate_matbench_srmf.py
Evaluation: compute Regret@k and AURC using the chosen λ (or a provided score column).
tools/eval_matbench_regret.py
CSV schema
Required features for SRMF:
band_gap, density, nsites, formation_energy_per_atom.Required objective: e.g.,
y_true(set via--objective-col).Optional:
material_idandformula_prettyfor reporting.
Reproducible run
Calibrate:
python tools/calibrate_matbench_srmf.py --path data.csv --objective-col y_true --mode max --topk-grid 1,5,10 --lambda-grid 0.0,0.5,1.0 --bootstrap 1000 --seed 0 --out-json reports/matbench_calibration.json --scores-out reports/matbench_scores_test.csv
Evaluate with tuned λ:
python tools/eval_matbench_regret.py --path data.csv --objective-col y_true --mode max --use-srmf --lambda-weight $(jq -r .best_lambda reports/matbench_calibration.json) --topk-grid 1,5,10 --out-csv reports/matbench_regret.csv --out-json reports/matbench_regret.json --bootstrap 1000 --seed 0
Claims and limitations
SRMF mapping is a proxy for manifold geometry and stability signals; it is not a surrogate for ab initio calculations. We report uncertainty, avoid comparative claims by default, and keep live integrations gated.
Attestation and Groups¶
Attestation: tools/generate_matbench_attestation.py
Per-group regret: tools/eval_matbench_regret.py –group-col group –out-group-csv reports/matbench_regret_groups.csv
Baselines and Layers¶
Baseline CV regret CLI: tools/eval_baseline_regret.py
Emergence exploration: tools/explore_matbench_layers.py
Example: use quantile layers on band_gap and report AURC per-layer.