Matbench Regret (Offline)

This document defines the regret metrics we use for Matbench-style evaluations and how to reproduce results offline with CSVs. The approach is conservative: it reports uncertainty, uses fixed seeds, and avoids comparative claims by default.

Terminology

  • Objective (y_true): numeric property to maximize or minimize (declared via --mode).

  • Score: ranking function for selection. By default, we use SRMF proxies: kappa λ·leak.

  • Regret@k: Difference between oracle top‑k utility and model-selected top‑k utility, with normalized variant dividing by oracle utility.

  • AURC: Area under the normalized regret curve over k (lower is better).

Offline tools

  • Calibration: choose λ to minimize validation AURC, report held‑out test AURC with CIs.

    • tools/calibrate_matbench_srmf.py

  • Evaluation: compute Regret@k and AURC using the chosen λ (or a provided score column).

    • tools/eval_matbench_regret.py

CSV schema

  • Required features for SRMF: band_gap, density, nsites, formation_energy_per_atom.

  • Required objective: e.g., y_true (set via --objective-col).

  • Optional: material_id and formula_pretty for reporting.

Reproducible run

  • Calibrate:

    • python tools/calibrate_matbench_srmf.py --path data.csv --objective-col y_true --mode max --topk-grid 1,5,10 --lambda-grid 0.0,0.5,1.0 --bootstrap 1000 --seed 0 --out-json reports/matbench_calibration.json --scores-out reports/matbench_scores_test.csv

  • Evaluate with tuned λ:

    • python tools/eval_matbench_regret.py --path data.csv --objective-col y_true --mode max --use-srmf --lambda-weight $(jq -r .best_lambda reports/matbench_calibration.json) --topk-grid 1,5,10 --out-csv reports/matbench_regret.csv --out-json reports/matbench_regret.json --bootstrap 1000 --seed 0

Claims and limitations

  • SRMF mapping is a proxy for manifold geometry and stability signals; it is not a surrogate for ab initio calculations. We report uncertainty, avoid comparative claims by default, and keep live integrations gated.

Attestation and Groups

  • Attestation: tools/generate_matbench_attestation.py

  • Per-group regret: tools/eval_matbench_regret.py –group-col group –out-group-csv reports/matbench_regret_groups.csv

Baselines and Layers

  • Baseline CV regret CLI: tools/eval_baseline_regret.py

  • Emergence exploration: tools/explore_matbench_layers.py

  • Example: use quantile layers on band_gap and report AURC per-layer.