codebench/benchmark_results/Apple_M1_Pro_localhost_11434.log
2025-03-04 04:34:23 +01:00

237 lines
8.7 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Benchmark Run: 20250303_174821
Server: http://localhost:11434
CPU Information:
python_version: 3.10.16.final.0 (64 bit)
cpuinfo_version: [9, 0, 0]
cpuinfo_version_string: 9.0.0
arch: ARM_8
bits: 64
count: 10
arch_string_raw: arm64
brand_raw: Apple M1 Pro
Benchmark Results:
🏆 Final Model Leaderboard:
qwen2.5-coder:7b-instruct-q4_K_M
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 19.33 (18.75 - 19.58)
Average Duration: 17.32s
Min/Max Avg Duration: 8.67s / 17.99s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
falcon3:10b
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 13.21 (12.53 - 13.31)
Average Duration: 13.46s
Min/Max Avg Duration: 6.76s / 13.46s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
qwen2.5:14b
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 9.78 (9.78 - 9.88)
Average Duration: 35.25s
Min/Max Avg Duration: 30.09s / 35.25s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
qwen2.5-coder:14b-instruct-q4_K_M
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 9.68 (9.65 - 9.88)
Average Duration: 37.18s
Min/Max Avg Duration: 23.06s / 37.18s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
phi4:latest
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 9.01 (8.96 - 9.32)
Average Duration: 23.44s
Min/Max Avg Duration: 23.44s / 38.82s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
deepseek-r1:14b
Overall Success Rate: 97.2% (70/72 cases)
Average Tokens/sec: 9.05 (8.90 - 9.38)
Average Duration: 278.32s
Min/Max Avg Duration: 174.30s / 482.10s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 16/18 cases (88.9%)
- Anagram Check: ✅ 18/18 cases (100.0%)
llama3.2-vision:11b-instruct-q4_K_M
Overall Success Rate: 95.8% (69/72 cases)
Average Tokens/sec: 15.68 (14.92 - 15.92)
Average Duration: 22.33s
Min/Max Avg Duration: 16.31s / 28.85s
Test Results:
- Fibonacci: ❌ 16/18 cases (88.9%)
- Binary Search: ❌ 17/18 cases (94.4%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
llama3.2:3b
Overall Success Rate: 94.4% (68/72 cases)
Average Tokens/sec: 36.09 (30.85 - 37.53)
Average Duration: 2.67s
Min/Max Avg Duration: 1.04s / 2.76s
Test Results:
- Fibonacci: ❌ 14/18 cases (77.8%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
llama3.1:8b
Overall Success Rate: 94.4% (68/72 cases)
Average Tokens/sec: 17.92 (17.92 - 18.45)
Average Duration: 18.04s
Min/Max Avg Duration: 14.68s / 19.56s
Test Results:
- Fibonacci: ❌ 14/18 cases (77.8%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
hhao/qwen2.5-coder-tools:7b
Overall Success Rate: 91.7% (66/72 cases)
Average Tokens/sec: 17.75 (16.05 - 17.75)
Average Duration: 9.35s
Min/Max Avg Duration: 4.17s / 9.35s
Test Results:
- Fibonacci: ❌ 12/18 cases (66.7%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
Qwen2.5-Coder-7B-Instruct-s1k:latest
Overall Success Rate: 88.9% (64/72 cases)
Average Tokens/sec: 18.38 (18.38 - 18.94)
Average Duration: 9.95s
Min/Max Avg Duration: 9.06s / 12.91s
Test Results:
- Fibonacci: ❌ 16/18 cases (88.9%)
- Binary Search: ❌ 12/18 cases (66.7%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
deepseek-r1:8b
Overall Success Rate: 86.1% (62/72 cases)
Average Tokens/sec: 17.43 (17.29 - 18.01)
Average Duration: 168.97s
Min/Max Avg Duration: 107.91s / 168.97s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 16/18 cases (88.9%)
- Anagram Check: ❌ 10/18 cases (55.6%)
llama3.2:1b-instruct-q4_K_M
Overall Success Rate: 81.9% (59/72 cases)
Average Tokens/sec: 88.24 (88.24 - 88.93)
Average Duration: 3.64s
Min/Max Avg Duration: 1.87s / 4.93s
Test Results:
- Fibonacci: ❌ 5/18 cases (27.8%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
samantha-mistral:latest
Overall Success Rate: 80.6% (58/72 cases)
Average Tokens/sec: 23.92 (23.91 - 24.79)
Average Duration: 12.21s
Min/Max Avg Duration: 7.59s / 12.21s
Test Results:
- Fibonacci: ❌ 8/18 cases (44.4%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 16/18 cases (88.9%)
- Anagram Check: ❌ 16/18 cases (88.9%)
marco-o1:latest
Overall Success Rate: 80.6% (58/72 cases)
Average Tokens/sec: 19.19 (19.19 - 19.39)
Average Duration: 41.14s
Min/Max Avg Duration: 33.28s / 51.50s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ❌ 6/18 cases (33.3%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ❌ 16/18 cases (88.9%)
deepseek-r1:7b
Overall Success Rate: 80.6% (58/72 cases)
Average Tokens/sec: 18.01 (18.01 - 19.07)
Average Duration: 336.87s
Min/Max Avg Duration: 78.71s / 336.87s
Test Results:
- Fibonacci: ❌ 10/18 cases (55.6%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 12/18 cases (66.7%)
- Anagram Check: ✅ 18/18 cases (100.0%)
deepseek-r1:1.5b-qwen-distill-q8_0
Overall Success Rate: 52.8% (38/72 cases)
Average Tokens/sec: 57.37 (53.88 - 59.60)
Average Duration: 137.59s
Min/Max Avg Duration: 41.38s / 371.13s
Test Results:
- Fibonacci: ❌ 11/18 cases (61.1%)
- Binary Search: ❌ 12/18 cases (66.7%)
- Palindrome: ❌ 6/18 cases (33.3%)
- Anagram Check: ❌ 9/18 cases (50.0%)
openthinker:7b
Overall Success Rate: 47.2% (34/72 cases)
Average Tokens/sec: 18.16 (17.98 - 18.29)
Average Duration: 263.00s
Min/Max Avg Duration: 168.91s / 302.79s
Test Results:
- Fibonacci: ❌ 0/18 cases (0.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 12/18 cases (66.7%)
- Anagram Check: ❌ 4/18 cases (22.2%)
wizard-vicuna-uncensored:latest
Overall Success Rate: 9.7% (7/72 cases)
Average Tokens/sec: 22.01 (22.01 - 24.42)
Average Duration: 9.06s
Min/Max Avg Duration: 5.60s / 11.45s
Test Results:
- Fibonacci: ❌ 0/18 cases (0.0%)
- Binary Search: ❌ 0/18 cases (0.0%)
- Palindrome: ❌ 6/18 cases (33.3%)
- Anagram Check: ❌ 1/18 cases (5.6%)
mxbai-embed-large:latest
Overall Success Rate: 0.0% (0/72 cases)
Average Tokens/sec: 0.00 (0.00 - 0.00)
Average Duration: 0.00s
Min/Max Avg Duration: 0.00s / 0.00s
Test Results:
- Fibonacci: ❌ 0/18 cases (0.0%)
- Binary Search: ❌ 0/18 cases (0.0%)
- Palindrome: ❌ 0/18 cases (0.0%)
- Anagram Check: ❌ 0/18 cases (0.0%)