Meet Tess

Instantaneous, judge-free feedback for LLM routing.

Inspired by Montessori’s Control of Error and geometric ML: feasibility-first selection, SPD metric learning, and a Lyapunov-inspired trust-region. Claims below link to verifiable reports.

Regret (fixed WTP)
Win Rate (panel)
Test Coverage (CI)
Cost Delta (wins)
Scroll to explore

How Tess Learns

Geometry meets pedagogy. Montessori meets machine learning.

Riemannian Geometry

Prompts and models live in a learned curved space (SPD Mahalanobis metric). Distance reflects semantic fit, not raw position.

Utility (free-energy) optimization

We balance quality, latency, cost, distance, and evidence in a single scalarized utility. See protocol.

Control of Error

Immediate, judge-free signals (feasibility, ambiguity, uncertainty) drive bounded updates. See Control of Error.

Boundary Detection

High entropy, low utility gap, high uncertainty ? flag for deferral. See decision curves in cs.CL.

Constraint Satisfaction

Feasibility-first selection enforces region/policy/capability limits. Approximate shadow prices are diagnostic. See constraints.

Adaptive Learning

Trust-region (Lyapunov-inspired) caps step size for online metric updates. Stability indicators in cs.SY.

Built for Science

Rigorous. Reproducible. Open.

Tests & QA

Hypothesis derandomized profile; invariants; mutation testing; coverage target 100%. See CI and determinism.

One-Shot Reproducible

Run scripts/run_peer_review.bat to regenerate tables, CEI, KPIs, and a consolidated HTML report with a SHA-256 manifest.

Peer-Review Ready

Docs for cs.LG, cs.CL, cs.SY, stat.ML; CEI; decision curves; control KPIs; fixed-? summaries. See documentation.

For Reviewers

Direct links to audience-specific notes.

cs.LG

Problem setup, decision rule, evaluation protocol, significance of instantaneous feedback.

cs.CL

Per-task routing behavior, decision curves, deferral quality, calibration.

cs.SY

Closed loop, trust-region controller, stability indicators, control KPIs.

stat.ML

Paired bootstrap, calibration, KDE/shrinkage, robustness notes.

SRMF ? Lyapunov (Compitum)

Instantaneous feedback, bounded updates, falsifiable claims.

Statement

In Compitum, the Self-Regulating Mapping Function (SRMF) plays the role of a Lyapunov functional for the discrete update map: metric updates decrease a surrogate energy via line search; the controller’s integral decays under zero drift; stride separation isolates timescales.

See the brief: SRMF as a Lyapunov Functional.

Falsifiability (Tests)

Learning descent, zero-error decay, two-timescale isolation, and routing-level distance descent are tested under tests/invariants/.

Core Science 0.1.1

Geometry • Stability • Coherence • Constraints • Determinism • Pedagogy

  • Geometry: SPD bounds, triangle inequality, ray monotonicity, update descent
  • Stability: Lyapunov decay/saturation/recovery; dV proxy sequences; combined boundedness
  • Coherence: monotone outward, symmetry (±v), inward score direction, mixture discrimination
  • Constraints: feasibility monotone; duals slack˜0, boundary=0; monotone/scale sanity
  • Determinism: batch/repeated determinism; paraphrase flip budget + explainability
  • Pedagogy: practice raises evidence/utility (ßs>0); prepared environment fixes constraints