--- title: Executive Overview description: What Compitum is, why it matters, where it wins, and how to evaluate it responsibly. --- # Executive Overview Why Compitum (in 60 seconds) - Today, teams route across many models with different quality, latency, and price. Heuristics and ad‑hoc cascades leak money and create opaque failure modes; judge‑based feedback loops are slow, hard to audit, and risky in regulated settings. - Compitum makes the tradeoff explicit and auditable: U = performance − lambda · cost, with hard constraints and a routing certificate that shows exactly why a choice was made, so you can fix the right thing fast. - Outcome: near‑frontier behavior (efficient use of spend) with constraint compliance by design and immediate, mechanistic signals for operations, safety, and research. What is Compitum - A cost–quality aware routing engine that picks among models using a simple utility: U = performance − lambda · cost, subject to hard constraints. Every decision emits a mechanistic routing certificate for audit and ablations. Why it’s different - Mechanistic transparency: the certificate exposes utility components, constraint feasibility and shadow prices, boundary diagnostics (gap, entropy, sigma), and drift monitors. - Constraint‑first: policy/compliance limits are built in; infeasible routes are rejected by construction. - Responsible competitiveness: we report per‑baseline win rates at fixed WTP (lambda) and frontier gap with CIs, showing near‑frontier behavior even when “envelope wins” are rare. Results (high level) - Per‑baseline wins at fixed WTP slices (0.1, 1.0) on a bounded panel; detailed per‑task summaries available. - Frontier gap is small with frequent “at‑frontier” cases; 95% bootstrap CIs included. - Constraint compliance ~100% by design. Ethics and Reproducibility - Offline, deterministic pipeline with fixed seeds; no judge‑based model calls. - Licensed inputs only; we do not redistribute proprietary datasets. Artifacts are local with SHA‑256 manifest. - 100% line+branch coverage; mutation score 1.0; lint/type/security checks are clean. Docs build warning‑free. How to try (Windows one‑shot) ```bat make peer-review python tools\generate_eval_tables.py .\.venv\Scripts\python -m sphinx -b html docs docs\_build\html ``` What to read next - Results Summary — {doc}`Results-Summary` - Frontier Gap (with CIs) — {doc}`Frontier-Gap` - Per‑Baseline Win Rate — {doc}`Per-Baseline-WinRate` - Panel Summary — {doc}`Panel-Summary` - Routing Certificate — {doc}`Certificate-Schema` - Math Brief (plain language) — {doc}`Math-Brief` - Peer Review (artifact guide) — {doc}`PEER_REVIEW` - Artifact README (AE checklist) — {doc}`Artifact-README` - RouterBench Fairness — {doc}`RouterBench-Fairness` Where it fits vs. alternatives - Heuristics/cascades: simple but brittle; Compitum gives you a single utility, constraint handling, and an audit trail for every decision. - Judge‑based reward loops: flexible but opaque and risky; Compitum uses mechanistic, local signals you can inspect and test. - Black‑box gates: may win panels but are hard to debug; Compitum favors near‑frontier efficiency with certificates that turn “why” into engineering actions. When not to use - If you have a single model and a fixed, non‑negotiable budget/latency, a simple static policy works. - If you require external judges or human moderation in the loop, keep them outside the router and use the certificate to decide when to defer. What to evaluate next (decision checklist) - Does near‑frontier behavior hold on your tasks at your WTP slices? - Do certificates help you resolve incidents faster (e.g., binding constraints, ambiguity signals)? - Can you reduce spend or latency at parity quality for key workloads? ## Contributions (At a Glance) - Control of Error for routing: instantaneous, judge‑free feedback signals (feasibility, boundary ambiguity, calibrated uncertainty) exposed per decision via a routing certificate. - Stable online adaptation: Lyapunov‑inspired trust‑region controller caps update step sizes; SPD metric updates remain PD by construction. - Constraint‑compliant decision rule: feasibility‑first argmax utility with approximate shadow prices reported for auditing. - Evidence and tooling: fixed‑WTP regret/win‑rate with paired bootstrap CIs; Control‑of‑Error Index (CEI), reliability curve, and control KPIs helpers for calibration and stability.