results consistency explanation
This commit is contained in:
parent
ee9e3d2a04
commit
81dc8bdcbe
@ -1,4 +1,4 @@
|
||||
# Codebench - Ollama Model Benchmark Tool
|
||||
# Codebench - Ollama Models Python Benchmark Tool
|
||||
|
||||
A Python-based benchmarking tool for testing and comparing different Ollama models on coding tasks. This tool allows you to benchmark multiple Ollama models against common coding problems, measure their performance, and visualize the results.
|
||||
|
||||
@ -76,6 +76,9 @@ The tool currently tests models on these coding challenges:
|
||||
1. Each model is prompted with specific coding tasks
|
||||
2. Generated code is extracted from the model's response
|
||||
3. Initial syntax validation is performed
|
||||
4. Code that fails validation is passed to Together API for advanced code analysis
|
||||
5. Code that passes validation is executed and validated with given data and compared to expected results
|
||||
6. Each test is run 4 times for consistency and only the last 3 results are used for metrics
|
||||
|
||||
### Test Validation
|
||||
For each test case:
|
||||
|
Loading…
Reference in New Issue
Block a user