Compitum Test & Benchmark Report

Generated 2025-10-30 04:05 UTC

Overview

This report summarizes Compitum's test and benchmark results alongside common baselines.

Charts: bars include Compitum (blue/green) and baselines (gray). The scatter shows average cost vs performance for all models.

Topline Takeaways

WTP policy: best-of-grid [0.0001, 0.001, 0.01, 0.1, 1.0] (regret at best WTP)

Coverage: win-rate denominator = 86 evals; Compitum evals at this WTP = 86.

Unit Tests

........................................................................ [ 50%]
......................................................................   [100%]

------------------------------------------------------------------------------------------------------------ benchmark: 6 tests ------------------------------------------------------------------------------------------------------------
Name (time in ns)                                  Min                       Max                   Mean                 StdDev                 Median                   IQR             Outliers  OPS (Kops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_iso_utility_savings_vs_fixed_best         40.5170 (1.0)          4,212.9310 (1.0)          43.5317 (1.0)          19.9865 (1.0)          42.2415 (1.0)          0.8621 (1.0)      933;17557   22,971.7826 (1.0)      192308         116
test_energy_drift                           1,899.9854 (46.89)      190,899.9984 (45.31)     2,121.8334 (48.74)     1,046.0108 (52.34)     2,099.9869 (49.71)      100.0008 (116.00)    532;4360      471.2905 (0.02)     114943           1
test_spd_det_and_trust_radius_bounds        2,799.9922 (69.11)    1,770,700.0079 (420.30)    3,124.5824 (71.78)     6,779.2280 (339.19)    2,999.9937 (71.02)      100.0008 (116.00)    148;2668      320.0428 (0.01)      78125           1
test_constraint_violation_rate              2,899.9930 (71.57)      995,899.9872 (236.39)    3,223.0994 (74.04)     5,168.7014 (258.61)    3,099.9945 (73.39)      100.0008 (116.00)    132;2203      310.2604 (0.01)      46512           1
test_router_throughput_and_latency         19,900.0060 (491.15)     409,600.0048 (97.22)    21,331.2078 (490.02)    6,074.6730 (303.94)   20,700.0121 (490.04)     699.9762 (811.97)    385;1883       46.8797 (0.00)      30675           1
test_mean_regret_and_pareto                74,999.9890 (>1000.0)  1,412,000.0124 (335.16)   82,819.6565 (>1000.0)  28,107.3446 (>1000.0)  78,700.0172 (>1000.0)  2,600.0198 (>1000.0)    112;439       12.0744 (0.00)       3612           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
142 passed in 67.54s (0:01:07)

Exit code: 0

Step Status

StepReturn CodeTimed OutDuration (s)Timeout Cap (s)
Unit Tests0NO69.879900
RouterBench0NO902.0561200
Compitum0NO159.36900

Full machine-readable log stored alongside this report as JSON.

RouterBench Artifacts

Compitum Artifacts

Average Performance
Average Cost
Mean Regret
Avg Cost vs Performance

Numerical Summary

ModelAvg PerformanceAvg Total Cost
compitum0.70913.371482
claude-instant-v10.39230.257096
claude-v10.46122.491849
claude-v20.51112.611772
gpt-3.5-turbo-11060.58420.300727
gpt-4-1106-preview0.70913.371482

Regret & Wins (WTP-selected)

Mean RegretP95 RegretWin RateAvg Cost Delta on Wins
0.004249 0.025114 86.0% 0.000000

Regret computed at best WTP=0.0001 WTP policy: best-of-grid [0.0001, 0.001, 0.01, 0.1, 1.0] (regret at best WTP)

Glossary