2.7 KiB
2.7 KiB
Codebench - Ollama Model Benchmark Tool
A Python-based benchmarking tool for testing and comparing different Ollama models on coding tasks.
Features
- Test multiple Ollama models against common coding problems
- Measure performance metrics (tokens/sec, response time)
- Track success rates across different coding challenges
- Support for local and remote Ollama servers
- Detailed test results and leaderboard generation
- CPU information tracking for benchmarks
Prerequisites
- Python 3.8+
- Ollama server (local or remote)
- Together API key (optional, for advanced code analysis)
Installation
- Clone the repository:
git clone https://github.com/yourusername/codebench.git
cd codebench
2. Install required packages:
```bash
pip install -r requirements.txt
- (Optional) Set up Together API:
export TOGETHER_API_KEY='your_api_key_here'
Usage
Basic usage:
python3 main.py
Available options:
python main.py --server [local|z60] --model [model_name] --number [count|all] --verbose
Arguments:
- --server : Choose Ollama server (default: local)
- --model : Test specific model only
- --number : Number of models to test
- --verbose : Enable detailed output
Supported Tests
The tool currently tests models on these coding challenges:
- Fibonacci Sequence
- Binary Search
- Palindrome Check
- Anagram Detection
Test Process & Validation
Code Generation
- Each model is prompted with specific coding tasks
- Generated code is extracted from the model's response
- Initial syntax validation is performed
Test Validation
For each test case:
- Input values are provided to the function
- Output is compared with expected results
- Test results are marked as ✅ (pass) or ❌ (fail)
Example test cases:
Fibonacci:
- Input: 6 Expected: 8
- Input: 0 Expected: 0
- Input: -1 Expected: -1
Binary Search:
- Input: ([1,2,3,4,5], 3) Expected: 2
- Input: ([], 1) Expected: -1
- Input: ([1], 1) Expected: 0
Output
Results are saved in the benchmark_results directory with the following naming convention:
[CPU_Model]_[Server_Address].json
Example:
Apple_M1_Pro_localhost_11434.json
Server Configuration
Default servers are configured in the code:
- Local: http://localhost:11434
- Z60: http://192.168.196.60:11434
Example Output
🏆 Final Model Leaderboard:
codellama:13b
Overall Success Rate: 95.8% (23/24 cases)
Average Tokens/sec: 145.23
Average Duration: 2.34s
Test Results:
- Fibonacci: ✅ 6/6 cases (100.0%)
- Binary Search: ✅ 6/6 cases (100.0%)
Contributing
Feel free to submit issues and enhancement requests!
License
[Your chosen license]