70b analysis doc

This commit is contained in:
leduc 2025-03-15 12:33:21 +01:00
parent 81dc8bdcbe
commit cddbbf844d
3 changed files with 13 additions and 3 deletions

BIN
.DS_Store vendored

Binary file not shown.

View File

@ -78,13 +78,19 @@ The tool currently tests models on these coding challenges:
3. Initial syntax validation is performed
4. Code that fails validation is passed to Together API for advanced code analysis
5. Code that passes validation is executed and validated with given data and compared to expected results
6. Each test is run 4 times for consistency and only the last 3 results are used for metrics
### Test Validation
For each test case:
- Input values are provided to the function
- Output is compared with expected results
- Test results are marked as ✅ (pass) or ❌ (fail)
- The last 3 results of 4are used for metrics
- An online 70b model (via free Together API) : meta-llama/Llama-3.3-70B-Instruct-Turbo-Free is used to give structured explanations for failed code
### Metrics
- Success Rate: Percentage of successful test cases
- Tokens per Second (tk/sec): Number of tokens processed per second
- Response Time: Time taken to generate a response
Example test cases:
```plaintext

View File

@ -111,8 +111,12 @@ Model: {model}"""
analysis = response.choices[0].message.content
should_pass = "SHOULD PASS" in analysis.upper()
if verbose: print(f"\n{BLUE}[{model}] Together Analysis:{ENDC}")
if verbose: print(f"{GREEN if should_pass else RED}{analysis}{ENDC}")
if verbose:
print(f"\n{BLUE}{'='*50}{ENDC}")
print(f"{BLUE}[{model}] TOGETHER API CODE ANALYSIS:{ENDC}")
print(f"{BLUE}{'='*50}{ENDC}")
print(f"{GREEN if should_pass else RED}{analysis}{ENDC}")
print(f"{BLUE}{'='*50}{ENDC}")
return should_pass
except Exception as e:
print(f"\n{RED}Error getting Together API analysis: {e}{ENDC}")