Executive Overview¶

Why Compitum (in 60 seconds)

Today, teams route across many models with different quality, latency, and price. Heuristics and ad‑hoc cascades leak money and create opaque failure modes; judge‑based feedback loops are slow, hard to audit, and risky in regulated settings.
Compitum makes the tradeoff explicit and auditable: U = performance − lambda · cost, with hard constraints and a routing certificate that shows exactly why a choice was made, so you can fix the right thing fast.
Outcome: near‑frontier behavior (efficient use of spend) with constraint compliance by design and immediate, mechanistic signals for operations, safety, and research.

What is Compitum

A cost–quality aware routing engine that picks among models using a simple utility: U = performance − lambda · cost, subject to hard constraints. Every decision emits a mechanistic routing certificate for audit and ablations.

Why it’s different

Mechanistic transparency: the certificate exposes utility components, constraint feasibility and shadow prices, boundary diagnostics (gap, entropy, sigma), and drift monitors.
Constraint‑first: policy/compliance limits are built in; infeasible routes are rejected by construction.
Responsible competitiveness: we report per‑baseline win rates at fixed WTP (lambda) and frontier gap with CIs, showing near‑frontier behavior even when “envelope wins” are rare.

Results (high level)

Per‑baseline wins at fixed WTP slices (0.1, 1.0) on a bounded panel; detailed per‑task summaries available.
Frontier gap is small with frequent “at‑frontier” cases; 95% bootstrap CIs included.
Constraint compliance ~100% by design.

Ethics and Reproducibility

Offline, deterministic pipeline with fixed seeds; no judge‑based model calls.
Licensed inputs only; we do not redistribute proprietary datasets. Artifacts are local with SHA‑256 manifest.
100% line+branch coverage; mutation score 1.0; lint/type/security checks are clean. Docs build warning‑free.

How to try (Windows one‑shot)

make peer-review
python tools\generate_eval_tables.py
.\.venv\Scripts\python -m sphinx -b html docs docs\_build\html

Contributions (At a Glance)¶

Control of Error for routing: instantaneous, judge‑free feedback signals (feasibility, boundary ambiguity, calibrated uncertainty) exposed per decision via a routing certificate.
Stable online adaptation: Lyapunov‑inspired trust‑region controller caps update step sizes; SPD metric updates remain PD by construction.
Constraint‑compliant decision rule: feasibility‑first argmax utility with approximate shadow prices reported for auditing.
Evidence and tooling: fixed‑WTP regret/win‑rate with paired bootstrap CIs; Control‑of‑Error Index (CEI), reliability curve, and control KPIs helpers for calibration and stability.