# Compitum RouterBench Evaluation Summary This report compares Compitum against baseline routers on a bounded evaluation set. Higher oracle_match indicates lower regret relative to the oracle assignment. ## Metrics - compitum - accuracy_mean: 0.5022 - cost_mean: 1.0159 - WizardLM/WizardLM-13B-V1.2 - accuracy_mean: 0.4423 - cost_mean: 0.0902 - claude-instant-v1 - accuracy_mean: 0.3923 - cost_mean: 0.2571 - claude-v1 - accuracy_mean: 0.4612 - cost_mean: 2.4918 - claude-v2 - accuracy_mean: 0.5111 - cost_mean: 2.6118 - gpt-3.5-turbo-1106 - accuracy_mean: 0.5842 - cost_mean: 0.3007 - gpt-4-1106-preview - accuracy_mean: 0.7091 - cost_mean: 3.3715 - meta/code-llama-instruct-34b-chat - accuracy_mean: 0.4275 - cost_mean: 0.2264 - meta/llama-2-70b-chat - accuracy_mean: 0.4853 - cost_mean: 0.2640 - mistralai/mistral-7b-chat - accuracy_mean: 0.4401 - cost_mean: 0.0589 - mistralai/mixtral-8x7b-chat - accuracy_mean: 0.5768 - cost_mean: 0.1757 - oracle - accuracy_mean: 0.8704 - cost_mean: 0.2064 - zero-one-ai/Yi-34B-Chat - accuracy_mean: 0.5883 - cost_mean: 0.2368 ## Where Compitum Wins - Cost mean vs WizardLM/WizardLM-13B-V1.2: +0.9257 - Accuracy mean vs WizardLM/WizardLM-13B-V1.2: +0.0599 - Cost mean vs claude-instant-v1: +0.7588 - Accuracy mean vs claude-instant-v1: +0.1100 - Cost mean vs claude-v1: -1.4759 - Accuracy mean vs claude-v1: +0.0410 - Cost mean vs claude-v2: -1.5958 - Accuracy mean vs claude-v2: -0.0089 - Cost mean vs gpt-3.5-turbo-1106: +0.7152 - Accuracy mean vs gpt-3.5-turbo-1106: -0.0820 - Cost mean vs gpt-4-1106-preview: -2.3555 - Accuracy mean vs gpt-4-1106-preview: -0.2069 - Cost mean vs meta/code-llama-instruct-34b-chat: +0.7895 - Accuracy mean vs meta/code-llama-instruct-34b-chat: +0.0747 - Cost mean vs meta/llama-2-70b-chat: +0.7519 - Accuracy mean vs meta/llama-2-70b-chat: +0.0169 - Cost mean vs mistralai/mistral-7b-chat: +0.9570 - Accuracy mean vs mistralai/mistral-7b-chat: +0.0622 - Cost mean vs mistralai/mixtral-8x7b-chat: +0.8402 - Accuracy mean vs mistralai/mixtral-8x7b-chat: -0.0745 - Cost mean vs oracle: +0.8095 - Accuracy mean vs oracle: -0.3682 - Cost mean vs zero-one-ai/Yi-34B-Chat: +0.7791 - Accuracy mean vs zero-one-ai/Yi-34B-Chat: -0.0860 ### Regret (accuracy gap to oracle) - Compitum: +0.3682 - WizardLM/WizardLM-13B-V1.2: +0.4281 - claude-instant-v1: +0.4782 - claude-v1: +0.4092 - claude-v2: +0.3593 - gpt-3.5-turbo-1106: +0.2863 - gpt-4-1106-preview: +0.1613 - meta/code-llama-instruct-34b-chat: +0.4429 - meta/llama-2-70b-chat: +0.3851 - mistralai/mistral-7b-chat: +0.4304 - mistralai/mixtral-8x7b-chat: +0.2937 - zero-one-ai/Yi-34B-Chat: +0.2822 ## Determinism Compitum routing is deterministic given fixed models and parameters, reducing variance and improving reproducibility compared to routers relying on stochastic LLM calls for decisions.