Files
2025-01-31 18:03:33 +11:00

1.3 KiB

1modeljudgemark_scorejudgemark_score_rawkendall_tau_bootstrappedstd_devkw_statci99_overlap_magnitude_sum_normcalibrated_score_range_normmodulated_ci95emd_norm
2ministral/Ministral-3b-instruct7.515.060.00.0990.0130.0940.8580.0120.363
3gpt-4o-mini44.0544.350.6280.4380.2980.3243.1490.1091.187
4deepseek-ai/deepseek-r176.9771.770.8630.8210.9050.6836.0980.62.195
5claude-3.5-haiku-2024102277.6476.880.8980.8320.8910.6655.9750.5752.229
6claude-3.5-sonnet-2024062078.8474.720.9050.830.9020.6646.2550.5912.232
7microsoft/wizardlm-2-8x22b55.4252.860.7490.5660.4630.4174.620.1981.53
8gpt-4o-2024-11-2082.5177.750.890.8370.9850.756.2330.8142.228
9gemini-2.0-flash-thinking-exp-121971.7966.10.8180.7650.8170.6315.5190.4532.057
10deepseek-ai/deepseek-r1-distill-llama-70b59.6756.860.7640.6420.60.4744.4850.2431.726
11mistralai/mistral-large-instruct-241165.6363.250.7940.7320.6360.5055.8730.2961.935
12meta-llama/llama-3.1-8b-instruct35.7235.670.460.4010.2010.2073.2320.0631.075
13meta-llama/llama-3.1-405b-instruct60.0958.760.7910.6330.5820.4714.5150.2431.712
14deepseek/deepseek-r1-distill-qwen-14b
15mistralai/mistral-small-24b-instruct-250163.6461.230.8080.670.6180.5045.1940.2861.802
16liquid/lfm-7b10.2414.330.1390.1920.0270.01.7020.00.584