Executive Overview

Why Compitum (in 60 seconds)

  • Today, teams route across many models with different quality, latency, and price. Heuristics and ad‑hoc cascades leak money and create opaque failure modes; judge‑based feedback loops are slow, hard to audit, and risky in regulated settings.

  • Compitum makes the tradeoff explicit and auditable: U = performance − lambda · cost, with hard constraints and a routing certificate that shows exactly why a choice was made, so you can fix the right thing fast.

  • Outcome: near‑frontier behavior (efficient use of spend) with constraint compliance by design and immediate, mechanistic signals for operations, safety, and research.

What is Compitum

  • A cost–quality aware routing engine that picks among models using a simple utility: U = performance − lambda · cost, subject to hard constraints. Every decision emits a mechanistic routing certificate for audit and ablations.

Why it’s different

  • Mechanistic transparency: the certificate exposes utility components, constraint feasibility and shadow prices, boundary diagnostics (gap, entropy, sigma), and drift monitors.

  • Constraint‑first: policy/compliance limits are built in; infeasible routes are rejected by construction.

  • Responsible competitiveness: we report per‑baseline win rates at fixed WTP (lambda) and frontier gap with CIs, showing near‑frontier behavior even when “envelope wins” are rare.

Results (high level)

  • Per‑baseline wins at fixed WTP slices (0.1, 1.0) on a bounded panel; detailed per‑task summaries available.

  • Frontier gap is small with frequent “at‑frontier” cases; 95% bootstrap CIs included.

  • Constraint compliance ~100% by design.

Ethics and Reproducibility

  • Offline, deterministic pipeline with fixed seeds; no judge‑based model calls.

  • Licensed inputs only; we do not redistribute proprietary datasets. Artifacts are local with SHA‑256 manifest.

  • 100% line+branch coverage; mutation score 1.0; lint/type/security checks are clean. Docs build warning‑free.

How to try (Windows one‑shot)

make peer-review
python tools\generate_eval_tables.py
.\.venv\Scripts\python -m sphinx -b html docs docs\_build\html

What to read next

Where it fits vs. alternatives

  • Heuristics/cascades: simple but brittle; Compitum gives you a single utility, constraint handling, and an audit trail for every decision.

  • Judge‑based reward loops: flexible but opaque and risky; Compitum uses mechanistic, local signals you can inspect and test.

  • Black‑box gates: may win panels but are hard to debug; Compitum favors near‑frontier efficiency with certificates that turn “why” into engineering actions.

When not to use

  • If you have a single model and a fixed, non‑negotiable budget/latency, a simple static policy works.

  • If you require external judges or human moderation in the loop, keep them outside the router and use the certificate to decide when to defer.

What to evaluate next (decision checklist)

  • Does near‑frontier behavior hold on your tasks at your WTP slices?

  • Do certificates help you resolve incidents faster (e.g., binding constraints, ambiguity signals)?

  • Can you reduce spend or latency at parity quality for key workloads?

Contributions (At a Glance)

  • Control of Error for routing: instantaneous, judge‑free feedback signals (feasibility, boundary ambiguity, calibrated uncertainty) exposed per decision via a routing certificate.

  • Stable online adaptation: Lyapunov‑inspired trust‑region controller caps update step sizes; SPD metric updates remain PD by construction.

  • Constraint‑compliant decision rule: feasibility‑first argmax utility with approximate shadow prices reported for auditing.

  • Evidence and tooling: fixed‑WTP regret/win‑rate with paired bootstrap CIs; Control‑of‑Error Index (CEI), reliability curve, and control KPIs helpers for calibration and stability.