sammm/codebench

Fork 0

leduc a3b06718a2 first commit

2025-03-02 22:40:55 +01:00

2.7 KiB

Raw Blame History

Codebench - Ollama Model Benchmark Tool

A Python-based benchmarking tool for testing and comparing different Ollama models on coding tasks.

Features

Test multiple Ollama models against common coding problems
Measure performance metrics (tokens/sec, response time)
Track success rates across different coding challenges
Support for local and remote Ollama servers
Detailed test results and leaderboard generation
CPU information tracking for benchmarks

Prerequisites

Python 3.8+
Ollama server (local or remote)
Together API key (optional, for advanced code analysis)

Installation

Clone the repository:

git clone https://github.com/yourusername/codebench.git
cd codebench


2. Install required packages:
```bash
pip install -r requirements.txt

(Optional) Set up Together API:

export TOGETHER_API_KEY='your_api_key_here'

Usage

Basic usage:

python3 main.py

Available options:

python main.py --server [local|z60] --model [model_name] --number [count|all] --verbose

Arguments:

--server : Choose Ollama server (default: local)
--model : Test specific model only
--number : Number of models to test
--verbose : Enable detailed output

Supported Tests

The tool currently tests models on these coding challenges:

Fibonacci Sequence
Binary Search
Palindrome Check
Anagram Detection

Test Process & Validation

Code Generation

Each model is prompted with specific coding tasks
Generated code is extracted from the model's response
Initial syntax validation is performed

Test Validation

For each test case:

Input values are provided to the function
Output is compared with expected results
Test results are marked as ✅ (pass) or ❌ (fail)

Example test cases:

Fibonacci:
- Input: 6      Expected: 8
- Input: 0      Expected: 0
- Input: -1     Expected: -1

Binary Search:
- Input: ([1,2,3,4,5], 3)    Expected: 2
- Input: ([], 1)             Expected: -1
- Input: ([1], 1)            Expected: 0

Output

Results are saved in the benchmark_results directory with the following naming convention:

[CPU_Model]_[Server_Address].json

Example:

Apple_M1_Pro_localhost_11434.json

Server Configuration

Default servers are configured in the code:

Local: http://localhost:11434
Z60: http://192.168.196.60:11434

Example Output

🏆 Final Model Leaderboard:

codellama:13b
   Overall Success Rate: 95.8% (23/24 cases)
   Average Tokens/sec: 145.23
   Average Duration: 2.34s
   Test Results:
   - Fibonacci: ✅ 6/6 cases (100.0%)
   - Binary Search: ✅ 6/6 cases (100.0%)

Contributing

Feel free to submit issues and enhancement requests!

License

[Your chosen license]

2.7 KiB Raw Blame History