Benchmark Run: 20250303_174821 Server: http://localhost:11434 CPU Information: python_version: 3.10.16.final.0 (64 bit) cpuinfo_version: [9, 0, 0] cpuinfo_version_string: 9.0.0 arch: ARM_8 bits: 64 count: 10 arch_string_raw: arm64 brand_raw: Apple M1 Pro Benchmark Results: 🏆 Final Model Leaderboard: qwen2.5-coder:7b-instruct-q4_K_M Overall Success Rate: 100.0% (72/72 cases) Average Tokens/sec: 19.33 (18.75 - 19.58) Average Duration: 17.32s Min/Max Avg Duration: 8.67s / 17.99s Test Results: - Fibonacci: ✅ 18/18 cases (100.0%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ✅ 18/18 cases (100.0%) falcon3:10b Overall Success Rate: 100.0% (72/72 cases) Average Tokens/sec: 13.21 (12.53 - 13.31) Average Duration: 13.46s Min/Max Avg Duration: 6.76s / 13.46s Test Results: - Fibonacci: ✅ 18/18 cases (100.0%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ✅ 18/18 cases (100.0%) qwen2.5:14b Overall Success Rate: 100.0% (72/72 cases) Average Tokens/sec: 9.78 (9.78 - 9.88) Average Duration: 35.25s Min/Max Avg Duration: 30.09s / 35.25s Test Results: - Fibonacci: ✅ 18/18 cases (100.0%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ✅ 18/18 cases (100.0%) qwen2.5-coder:14b-instruct-q4_K_M Overall Success Rate: 100.0% (72/72 cases) Average Tokens/sec: 9.68 (9.65 - 9.88) Average Duration: 37.18s Min/Max Avg Duration: 23.06s / 37.18s Test Results: - Fibonacci: ✅ 18/18 cases (100.0%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ✅ 18/18 cases (100.0%) phi4:latest Overall Success Rate: 100.0% (72/72 cases) Average Tokens/sec: 9.01 (8.96 - 9.32) Average Duration: 23.44s Min/Max Avg Duration: 23.44s / 38.82s Test Results: - Fibonacci: ✅ 18/18 cases (100.0%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ✅ 18/18 cases (100.0%) deepseek-r1:14b Overall Success Rate: 97.2% (70/72 cases) Average Tokens/sec: 9.05 (8.90 - 9.38) Average Duration: 278.32s Min/Max Avg Duration: 174.30s / 482.10s Test Results: - Fibonacci: ✅ 18/18 cases (100.0%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ❌ 16/18 cases (88.9%) - Anagram Check: ✅ 18/18 cases (100.0%) llama3.2-vision:11b-instruct-q4_K_M Overall Success Rate: 95.8% (69/72 cases) Average Tokens/sec: 15.68 (14.92 - 15.92) Average Duration: 22.33s Min/Max Avg Duration: 16.31s / 28.85s Test Results: - Fibonacci: ❌ 16/18 cases (88.9%) - Binary Search: ❌ 17/18 cases (94.4%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ✅ 18/18 cases (100.0%) llama3.2:3b Overall Success Rate: 94.4% (68/72 cases) Average Tokens/sec: 36.09 (30.85 - 37.53) Average Duration: 2.67s Min/Max Avg Duration: 1.04s / 2.76s Test Results: - Fibonacci: ❌ 14/18 cases (77.8%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ✅ 18/18 cases (100.0%) llama3.1:8b Overall Success Rate: 94.4% (68/72 cases) Average Tokens/sec: 17.92 (17.92 - 18.45) Average Duration: 18.04s Min/Max Avg Duration: 14.68s / 19.56s Test Results: - Fibonacci: ❌ 14/18 cases (77.8%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ✅ 18/18 cases (100.0%) hhao/qwen2.5-coder-tools:7b Overall Success Rate: 91.7% (66/72 cases) Average Tokens/sec: 17.75 (16.05 - 17.75) Average Duration: 9.35s Min/Max Avg Duration: 4.17s / 9.35s Test Results: - Fibonacci: ❌ 12/18 cases (66.7%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ✅ 18/18 cases (100.0%) Qwen2.5-Coder-7B-Instruct-s1k:latest Overall Success Rate: 88.9% (64/72 cases) Average Tokens/sec: 18.38 (18.38 - 18.94) Average Duration: 9.95s Min/Max Avg Duration: 9.06s / 12.91s Test Results: - Fibonacci: ❌ 16/18 cases (88.9%) - Binary Search: ❌ 12/18 cases (66.7%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ✅ 18/18 cases (100.0%) deepseek-r1:8b Overall Success Rate: 86.1% (62/72 cases) Average Tokens/sec: 17.43 (17.29 - 18.01) Average Duration: 168.97s Min/Max Avg Duration: 107.91s / 168.97s Test Results: - Fibonacci: ✅ 18/18 cases (100.0%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ❌ 16/18 cases (88.9%) - Anagram Check: ❌ 10/18 cases (55.6%) llama3.2:1b-instruct-q4_K_M Overall Success Rate: 81.9% (59/72 cases) Average Tokens/sec: 88.24 (88.24 - 88.93) Average Duration: 3.64s Min/Max Avg Duration: 1.87s / 4.93s Test Results: - Fibonacci: ❌ 5/18 cases (27.8%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ✅ 18/18 cases (100.0%) samantha-mistral:latest Overall Success Rate: 80.6% (58/72 cases) Average Tokens/sec: 23.92 (23.91 - 24.79) Average Duration: 12.21s Min/Max Avg Duration: 7.59s / 12.21s Test Results: - Fibonacci: ❌ 8/18 cases (44.4%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ❌ 16/18 cases (88.9%) - Anagram Check: ❌ 16/18 cases (88.9%) marco-o1:latest Overall Success Rate: 80.6% (58/72 cases) Average Tokens/sec: 19.19 (19.19 - 19.39) Average Duration: 41.14s Min/Max Avg Duration: 33.28s / 51.50s Test Results: - Fibonacci: ✅ 18/18 cases (100.0%) - Binary Search: ❌ 6/18 cases (33.3%) - Palindrome: ✅ 18/18 cases (100.0%) - Anagram Check: ❌ 16/18 cases (88.9%) deepseek-r1:7b Overall Success Rate: 80.6% (58/72 cases) Average Tokens/sec: 18.01 (18.01 - 19.07) Average Duration: 336.87s Min/Max Avg Duration: 78.71s / 336.87s Test Results: - Fibonacci: ❌ 10/18 cases (55.6%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ❌ 12/18 cases (66.7%) - Anagram Check: ✅ 18/18 cases (100.0%) deepseek-r1:1.5b-qwen-distill-q8_0 Overall Success Rate: 52.8% (38/72 cases) Average Tokens/sec: 57.37 (53.88 - 59.60) Average Duration: 137.59s Min/Max Avg Duration: 41.38s / 371.13s Test Results: - Fibonacci: ❌ 11/18 cases (61.1%) - Binary Search: ❌ 12/18 cases (66.7%) - Palindrome: ❌ 6/18 cases (33.3%) - Anagram Check: ❌ 9/18 cases (50.0%) openthinker:7b Overall Success Rate: 47.2% (34/72 cases) Average Tokens/sec: 18.16 (17.98 - 18.29) Average Duration: 263.00s Min/Max Avg Duration: 168.91s / 302.79s Test Results: - Fibonacci: ❌ 0/18 cases (0.0%) - Binary Search: ✅ 18/18 cases (100.0%) - Palindrome: ❌ 12/18 cases (66.7%) - Anagram Check: ❌ 4/18 cases (22.2%) wizard-vicuna-uncensored:latest Overall Success Rate: 9.7% (7/72 cases) Average Tokens/sec: 22.01 (22.01 - 24.42) Average Duration: 9.06s Min/Max Avg Duration: 5.60s / 11.45s Test Results: - Fibonacci: ❌ 0/18 cases (0.0%) - Binary Search: ❌ 0/18 cases (0.0%) - Palindrome: ❌ 6/18 cases (33.3%) - Anagram Check: ❌ 1/18 cases (5.6%) mxbai-embed-large:latest Overall Success Rate: 0.0% (0/72 cases) Average Tokens/sec: 0.00 (0.00 - 0.00) Average Duration: 0.00s Min/Max Avg Duration: 0.00s / 0.00s Test Results: - Fibonacci: ❌ 0/18 cases (0.0%) - Binary Search: ❌ 0/18 cases (0.0%) - Palindrome: ❌ 0/18 cases (0.0%) - Anagram Check: ❌ 0/18 cases (0.0%)