8.7 KiB
8.7 KiB
Benchmark Run: 20250303_174821
Server: http://localhost:11434
CPU Information:
python_version: 3.10.16.final.0 (64 bit)
cpuinfo_version: [9, 0, 0]
cpuinfo_version_string: 9.0.0
arch: ARM_8
bits: 64
count: 10
arch_string_raw: arm64
brand_raw: Apple M1 Pro
Benchmark Results:
🏆 Final Model Leaderboard:
qwen2.5-coder:7b-instruct-q4_K_M
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 19.33 (18.75 - 19.58)
Average Duration: 17.32s
Min/Max Avg Duration: 8.67s / 17.99s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
falcon3:10b
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 13.21 (12.53 - 13.31)
Average Duration: 13.46s
Min/Max Avg Duration: 6.76s / 13.46s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
qwen2.5:14b
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 9.78 (9.78 - 9.88)
Average Duration: 35.25s
Min/Max Avg Duration: 30.09s / 35.25s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
qwen2.5-coder:14b-instruct-q4_K_M
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 9.68 (9.65 - 9.88)
Average Duration: 37.18s
Min/Max Avg Duration: 23.06s / 37.18s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
phi4:latest
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 9.01 (8.96 - 9.32)
Average Duration: 23.44s
Min/Max Avg Duration: 23.44s / 38.82s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
deepseek-r1:14b
Overall Success Rate: 97.2% (70/72 cases)
Average Tokens/sec: 9.05 (8.90 - 9.38)
Average Duration: 278.32s
Min/Max Avg Duration: 174.30s / 482.10s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 16/18 cases (88.9%)
- Anagram Check: ✅ 18/18 cases (100.0%)
llama3.2-vision:11b-instruct-q4_K_M
Overall Success Rate: 95.8% (69/72 cases)
Average Tokens/sec: 15.68 (14.92 - 15.92)
Average Duration: 22.33s
Min/Max Avg Duration: 16.31s / 28.85s
Test Results:
- Fibonacci: ❌ 16/18 cases (88.9%)
- Binary Search: ❌ 17/18 cases (94.4%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
llama3.2:3b
Overall Success Rate: 94.4% (68/72 cases)
Average Tokens/sec: 36.09 (30.85 - 37.53)
Average Duration: 2.67s
Min/Max Avg Duration: 1.04s / 2.76s
Test Results:
- Fibonacci: ❌ 14/18 cases (77.8%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
llama3.1:8b
Overall Success Rate: 94.4% (68/72 cases)
Average Tokens/sec: 17.92 (17.92 - 18.45)
Average Duration: 18.04s
Min/Max Avg Duration: 14.68s / 19.56s
Test Results:
- Fibonacci: ❌ 14/18 cases (77.8%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
hhao/qwen2.5-coder-tools:7b
Overall Success Rate: 91.7% (66/72 cases)
Average Tokens/sec: 17.75 (16.05 - 17.75)
Average Duration: 9.35s
Min/Max Avg Duration: 4.17s / 9.35s
Test Results:
- Fibonacci: ❌ 12/18 cases (66.7%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
Qwen2.5-Coder-7B-Instruct-s1k:latest
Overall Success Rate: 88.9% (64/72 cases)
Average Tokens/sec: 18.38 (18.38 - 18.94)
Average Duration: 9.95s
Min/Max Avg Duration: 9.06s / 12.91s
Test Results:
- Fibonacci: ❌ 16/18 cases (88.9%)
- Binary Search: ❌ 12/18 cases (66.7%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
deepseek-r1:8b
Overall Success Rate: 86.1% (62/72 cases)
Average Tokens/sec: 17.43 (17.29 - 18.01)
Average Duration: 168.97s
Min/Max Avg Duration: 107.91s / 168.97s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 16/18 cases (88.9%)
- Anagram Check: ❌ 10/18 cases (55.6%)
llama3.2:1b-instruct-q4_K_M
Overall Success Rate: 81.9% (59/72 cases)
Average Tokens/sec: 88.24 (88.24 - 88.93)
Average Duration: 3.64s
Min/Max Avg Duration: 1.87s / 4.93s
Test Results:
- Fibonacci: ❌ 5/18 cases (27.8%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
samantha-mistral:latest
Overall Success Rate: 80.6% (58/72 cases)
Average Tokens/sec: 23.92 (23.91 - 24.79)
Average Duration: 12.21s
Min/Max Avg Duration: 7.59s / 12.21s
Test Results:
- Fibonacci: ❌ 8/18 cases (44.4%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 16/18 cases (88.9%)
- Anagram Check: ❌ 16/18 cases (88.9%)
marco-o1:latest
Overall Success Rate: 80.6% (58/72 cases)
Average Tokens/sec: 19.19 (19.19 - 19.39)
Average Duration: 41.14s
Min/Max Avg Duration: 33.28s / 51.50s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ❌ 6/18 cases (33.3%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ❌ 16/18 cases (88.9%)
deepseek-r1:7b
Overall Success Rate: 80.6% (58/72 cases)
Average Tokens/sec: 18.01 (18.01 - 19.07)
Average Duration: 336.87s
Min/Max Avg Duration: 78.71s / 336.87s
Test Results:
- Fibonacci: ❌ 10/18 cases (55.6%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 12/18 cases (66.7%)
- Anagram Check: ✅ 18/18 cases (100.0%)
deepseek-r1:1.5b-qwen-distill-q8_0
Overall Success Rate: 52.8% (38/72 cases)
Average Tokens/sec: 57.37 (53.88 - 59.60)
Average Duration: 137.59s
Min/Max Avg Duration: 41.38s / 371.13s
Test Results:
- Fibonacci: ❌ 11/18 cases (61.1%)
- Binary Search: ❌ 12/18 cases (66.7%)
- Palindrome: ❌ 6/18 cases (33.3%)
- Anagram Check: ❌ 9/18 cases (50.0%)
openthinker:7b
Overall Success Rate: 47.2% (34/72 cases)
Average Tokens/sec: 18.16 (17.98 - 18.29)
Average Duration: 263.00s
Min/Max Avg Duration: 168.91s / 302.79s
Test Results:
- Fibonacci: ❌ 0/18 cases (0.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 12/18 cases (66.7%)
- Anagram Check: ❌ 4/18 cases (22.2%)
wizard-vicuna-uncensored:latest
Overall Success Rate: 9.7% (7/72 cases)
Average Tokens/sec: 22.01 (22.01 - 24.42)
Average Duration: 9.06s
Min/Max Avg Duration: 5.60s / 11.45s
Test Results:
- Fibonacci: ❌ 0/18 cases (0.0%)
- Binary Search: ❌ 0/18 cases (0.0%)
- Palindrome: ❌ 6/18 cases (33.3%)
- Anagram Check: ❌ 1/18 cases (5.6%)
mxbai-embed-large:latest
Overall Success Rate: 0.0% (0/72 cases)
Average Tokens/sec: 0.00 (0.00 - 0.00)
Average Duration: 0.00s
Min/Max Avg Duration: 0.00s / 0.00s
Test Results:
- Fibonacci: ❌ 0/18 cases (0.0%)
- Binary Search: ❌ 0/18 cases (0.0%)
- Palindrome: ❌ 0/18 cases (0.0%)
- Anagram Check: ❌ 0/18 cases (0.0%)
Server: http://localhost:11434
CPU Information:
python_version: 3.10.16.final.0 (64 bit)
cpuinfo_version: [9, 0, 0]
cpuinfo_version_string: 9.0.0
arch: ARM_8
bits: 64
count: 10
arch_string_raw: arm64
brand_raw: Apple M1 Pro
Benchmark Results:
🏆 Final Model Leaderboard:
qwen2.5-coder:7b-instruct-q4_K_M
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 19.33 (18.75 - 19.58)
Average Duration: 17.32s
Min/Max Avg Duration: 8.67s / 17.99s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
falcon3:10b
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 13.21 (12.53 - 13.31)
Average Duration: 13.46s
Min/Max Avg Duration: 6.76s / 13.46s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
qwen2.5:14b
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 9.78 (9.78 - 9.88)
Average Duration: 35.25s
Min/Max Avg Duration: 30.09s / 35.25s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
qwen2.5-coder:14b-instruct-q4_K_M
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 9.68 (9.65 - 9.88)
Average Duration: 37.18s
Min/Max Avg Duration: 23.06s / 37.18s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
phi4:latest
Overall Success Rate: 100.0% (72/72 cases)
Average Tokens/sec: 9.01 (8.96 - 9.32)
Average Duration: 23.44s
Min/Max Avg Duration: 23.44s / 38.82s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
deepseek-r1:14b
Overall Success Rate: 97.2% (70/72 cases)
Average Tokens/sec: 9.05 (8.90 - 9.38)
Average Duration: 278.32s
Min/Max Avg Duration: 174.30s / 482.10s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 16/18 cases (88.9%)
- Anagram Check: ✅ 18/18 cases (100.0%)
llama3.2-vision:11b-instruct-q4_K_M
Overall Success Rate: 95.8% (69/72 cases)
Average Tokens/sec: 15.68 (14.92 - 15.92)
Average Duration: 22.33s
Min/Max Avg Duration: 16.31s / 28.85s
Test Results:
- Fibonacci: ❌ 16/18 cases (88.9%)
- Binary Search: ❌ 17/18 cases (94.4%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
llama3.2:3b
Overall Success Rate: 94.4% (68/72 cases)
Average Tokens/sec: 36.09 (30.85 - 37.53)
Average Duration: 2.67s
Min/Max Avg Duration: 1.04s / 2.76s
Test Results:
- Fibonacci: ❌ 14/18 cases (77.8%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
llama3.1:8b
Overall Success Rate: 94.4% (68/72 cases)
Average Tokens/sec: 17.92 (17.92 - 18.45)
Average Duration: 18.04s
Min/Max Avg Duration: 14.68s / 19.56s
Test Results:
- Fibonacci: ❌ 14/18 cases (77.8%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
hhao/qwen2.5-coder-tools:7b
Overall Success Rate: 91.7% (66/72 cases)
Average Tokens/sec: 17.75 (16.05 - 17.75)
Average Duration: 9.35s
Min/Max Avg Duration: 4.17s / 9.35s
Test Results:
- Fibonacci: ❌ 12/18 cases (66.7%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
Qwen2.5-Coder-7B-Instruct-s1k:latest
Overall Success Rate: 88.9% (64/72 cases)
Average Tokens/sec: 18.38 (18.38 - 18.94)
Average Duration: 9.95s
Min/Max Avg Duration: 9.06s / 12.91s
Test Results:
- Fibonacci: ❌ 16/18 cases (88.9%)
- Binary Search: ❌ 12/18 cases (66.7%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
deepseek-r1:8b
Overall Success Rate: 86.1% (62/72 cases)
Average Tokens/sec: 17.43 (17.29 - 18.01)
Average Duration: 168.97s
Min/Max Avg Duration: 107.91s / 168.97s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 16/18 cases (88.9%)
- Anagram Check: ❌ 10/18 cases (55.6%)
llama3.2:1b-instruct-q4_K_M
Overall Success Rate: 81.9% (59/72 cases)
Average Tokens/sec: 88.24 (88.24 - 88.93)
Average Duration: 3.64s
Min/Max Avg Duration: 1.87s / 4.93s
Test Results:
- Fibonacci: ❌ 5/18 cases (27.8%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ✅ 18/18 cases (100.0%)
samantha-mistral:latest
Overall Success Rate: 80.6% (58/72 cases)
Average Tokens/sec: 23.92 (23.91 - 24.79)
Average Duration: 12.21s
Min/Max Avg Duration: 7.59s / 12.21s
Test Results:
- Fibonacci: ❌ 8/18 cases (44.4%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 16/18 cases (88.9%)
- Anagram Check: ❌ 16/18 cases (88.9%)
marco-o1:latest
Overall Success Rate: 80.6% (58/72 cases)
Average Tokens/sec: 19.19 (19.19 - 19.39)
Average Duration: 41.14s
Min/Max Avg Duration: 33.28s / 51.50s
Test Results:
- Fibonacci: ✅ 18/18 cases (100.0%)
- Binary Search: ❌ 6/18 cases (33.3%)
- Palindrome: ✅ 18/18 cases (100.0%)
- Anagram Check: ❌ 16/18 cases (88.9%)
deepseek-r1:7b
Overall Success Rate: 80.6% (58/72 cases)
Average Tokens/sec: 18.01 (18.01 - 19.07)
Average Duration: 336.87s
Min/Max Avg Duration: 78.71s / 336.87s
Test Results:
- Fibonacci: ❌ 10/18 cases (55.6%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 12/18 cases (66.7%)
- Anagram Check: ✅ 18/18 cases (100.0%)
deepseek-r1:1.5b-qwen-distill-q8_0
Overall Success Rate: 52.8% (38/72 cases)
Average Tokens/sec: 57.37 (53.88 - 59.60)
Average Duration: 137.59s
Min/Max Avg Duration: 41.38s / 371.13s
Test Results:
- Fibonacci: ❌ 11/18 cases (61.1%)
- Binary Search: ❌ 12/18 cases (66.7%)
- Palindrome: ❌ 6/18 cases (33.3%)
- Anagram Check: ❌ 9/18 cases (50.0%)
openthinker:7b
Overall Success Rate: 47.2% (34/72 cases)
Average Tokens/sec: 18.16 (17.98 - 18.29)
Average Duration: 263.00s
Min/Max Avg Duration: 168.91s / 302.79s
Test Results:
- Fibonacci: ❌ 0/18 cases (0.0%)
- Binary Search: ✅ 18/18 cases (100.0%)
- Palindrome: ❌ 12/18 cases (66.7%)
- Anagram Check: ❌ 4/18 cases (22.2%)
wizard-vicuna-uncensored:latest
Overall Success Rate: 9.7% (7/72 cases)
Average Tokens/sec: 22.01 (22.01 - 24.42)
Average Duration: 9.06s
Min/Max Avg Duration: 5.60s / 11.45s
Test Results:
- Fibonacci: ❌ 0/18 cases (0.0%)
- Binary Search: ❌ 0/18 cases (0.0%)
- Palindrome: ❌ 6/18 cases (33.3%)
- Anagram Check: ❌ 1/18 cases (5.6%)
mxbai-embed-large:latest
Overall Success Rate: 0.0% (0/72 cases)
Average Tokens/sec: 0.00 (0.00 - 0.00)
Average Duration: 0.00s
Min/Max Avg Duration: 0.00s / 0.00s
Test Results:
- Fibonacci: ❌ 0/18 cases (0.0%)
- Binary Search: ❌ 0/18 cases (0.0%)
- Palindrome: ❌ 0/18 cases (0.0%)
- Anagram Check: ❌ 0/18 cases (0.0%)