# Codebench - Ollama Model Benchmark Tool A Python-based benchmarking tool for testing and comparing different Ollama models on coding tasks. ## Features - Test multiple Ollama models against common coding problems - Measure performance metrics (tokens/sec, response time) - Track success rates across different coding challenges - Support for local and remote Ollama servers - Detailed test results and leaderboard generation - CPU information tracking for benchmarks ## Prerequisites - Python 3.8+ - Ollama server (local or remote) - Together API key (optional, for advanced code analysis) ## Installation 1. Clone the repository: ```bash git clone https://github.com/yourusername/codebench.git cd codebench 2. Install required packages: ```bash pip install -r requirements.txt ``` 3. (Optional) Set up Together API: ```bash export TOGETHER_API_KEY='your_api_key_here' ``` ## Usage Basic usage: ```bash python3 main.py ``` Available options: ```bash python main.py --server [local|z60] --model [model_name] --number [count|all] --verbose ``` ## Arguments: - --server : Choose Ollama server (default: local) - --model : Test specific model only - --number : Number of models to test - --verbose : Enable detailed output ## Supported Tests The tool currently tests models on these coding challenges: 1. Fibonacci Sequence 2. Binary Search 3. Palindrome Check 4. Anagram Detection ## Test Process & Validation ### Code Generation 1. Each model is prompted with specific coding tasks 2. Generated code is extracted from the model's response 3. Initial syntax validation is performed ### Test Validation For each test case: - Input values are provided to the function - Output is compared with expected results - Test results are marked as ✅ (pass) or ❌ (fail) Example test cases: ```plaintext Fibonacci: - Input: 6 Expected: 8 - Input: 0 Expected: 0 - Input: -1 Expected: -1 Binary Search: - Input: ([1,2,3,4,5], 3) Expected: 2 - Input: ([], 1) Expected: -1 - Input: ([1], 1) Expected: 0 ``` ## Output Results are saved in the benchmark_results directory with the following naming convention: ```plaintext [CPU_Model]_[Server_Address].json ``` Example: ```plaintext Apple_M1_Pro_localhost_11434.json ``` ## Server Configuration Default servers are configured in the code: - Local: http://localhost:11434 - Z60: http://192.168.196.60:11434 ## Example Output ```plaintext 🏆 Final Model Leaderboard: codellama:13b Overall Success Rate: 95.8% (23/24 cases) Average Tokens/sec: 145.23 Average Duration: 2.34s Test Results: - Fibonacci: ✅ 6/6 cases (100.0%) - Binary Search: ✅ 6/6 cases (100.0%) ``` ## Contributing Feel free to submit issues and enhancement requests! ## License [Your chosen license]