# Language Perspective (cs.CL) This page frames Compitum for cs.CL reviewers: an NLP router for LLM tasks that trades cost and quality at fixed willingness-to-pay (lambda), enforces policy/region constraints, detects ambiguity, and emits an auditable certificate per decision — all without a judge model. > Related: [cs.LG](Learning-Perspective.md) · [cs.SY](Control-Perspective.md) · [stat.ML](Statistical-Notes.md) · [SRMF ⇄ Lyapunov](SRMF-as-Lyapunov.md) · [Peer Review Protocol](PEER_REVIEW.md) · [Certificate Schema](Certificate-Schema.md) ## Problem in NLP Terms - Inputs: prompts and light-weight prompt-derived features (PGD), plus optional embeddings. - Models: a small set of LLM backends (fast, thinking, auto) with different quality/cost/latency profiles. - Objective: maximize utility U = quality − lambda * total_cost at fixed lambda, subject to constraints (e.g., region/policy/rate). - Decision: route each prompt to the best feasible model; optionally defer when ambiguity is high (boundary region). Code anchors: src/compitum/router.py:80 (route), src/compitum/constraints.py:36 (feasibility), src/compitum/energy.py:33 (utility), src/compitum/boundary.py:19 (ambiguity), src/compitum/router.py:25 (certificate schema). ## What We Emit (Certificate) Per decision, a certificate exposes: - utility components (quality, latency, cost), overall utility U - constraints (feasible, approximate local shadow prices) - boundary diagnostics (gap to runner-up, entropy, uncertainty) - drift/trust-region state (for online adaptation) This makes routing auditable and suitable for post-hoc error analysis and deferral policies. ## Relation to cs.CL Literature - Mixture-of-Experts and routing: Compitum is a deterministic router across a small panel of LLMs, optimizing a scalarized utility with constraints, not a learned soft gate within a single model. - Selective prediction/abstention: boundary flags align with deferral; we measure deferral quality against high-regret items. - Cost-aware evaluation: fixed lambda slices capture cost–quality tradeoffs; we report regret, win rate, and cost deltas (when available). - Calibration: we use calibrated component predictors and report reliability of uncertainty vs. absolute regret. ## Constraints and Safety - Policy/rate/region constraints are enforced before selection (feasibility-first); constraint compliance should be ~100%. - Approximate shadow prices (finite-difference viability) help identify binding constraints; they are diagnostic only. ## Coherence and Ambiguity - A metric-aware KDE prior (whitened) nudges decisions toward familiar contexts with predictable behavior. - Boundary diagnostic combines gap, entropy, and uncertainty to flag ambiguous prompts where deferral or conservative routing is prudent. ## Evaluation for cs.CL - Panel and per-task summaries at fixed lambda (e.g., 0.1 and 1.0): - regret and win rate vs best baseline - boundary/deferral rate and quality - optional cost delta on wins if cost columns are present - Calibration diagnostics: reliability curve (uncertainty bins) and Spearman rho(uncertainty, |regret|). - Constraint compliance rate (~100%). Helper commands: - cs.CL summary ```bat python tools\analysis\cl_summary.py ^ --input data\rb_clean\eval_results\.csv ^ --out-json reports\cl_summary.json ^ --out-md reports\cl_summary.md ``` - Reliability curve and CEI (see docs/Statistical-Notes.md and docs/Control-of-Error.md) - Decision curves (ambiguity-based deferral upper bound + boundary AP/AUROC) ```bat python tools\analysis\cl_decision_curves.py ^ --input data\rb_clean\eval_results\.csv ^ --quantiles 0,0.05,0.1,0.2,0.3,0.4,0.5 ^ --out-json reports\cl_decision_curves.json ^ --out-md reports\cl_decision_curves.md ^ --out-png reports\cl_decision_curve.png ``` ## Reproducibility - Deterministic evaluation with fixed seeds and offline artifacts. - Attach reports/cl_summary.md, reliability_curve.md/png, cei_report.md, fixed_wtp_summary.md. ## Determinism & Explainability (0.1.1) - Determinism - Repeated route and batch determinism under fixed seeds/embeddings. - Tests: `tests/invariants/test_invariants_router_determinism.py`, `tests/router/test_router_batch_determinism.py` - Paraphrase robustness - Flip budget under small lexical/format edits; flips must be explainable via certificate deltas (distance or feasibility changes). - Tests: `tests/invariants/test_paraphrase_invariance.py`, `tests/invariants/test_paraphrase_explainability.py` ## Limits (cs.CL) - No judge model; utility proxies depend on upstream task scoring and pricing assumptions. - Shadow prices are approximate diagnostics; coherence prior is bounded and small-weight. - Router panel size is intentionally small; extending to larger panels is future work.