add best of n sampling baselines + generate test baselines#7
add best of n sampling baselines + generate test baselines#7prateekiiest wants to merge 11 commits intomicrosoft:mainfrom
Conversation
|
|
||
| def run_k_samples(prompt, llm_server, k, seed): | ||
| outputs = [] | ||
| for i in range(k): |
There was a problem hiding this comment.
Do we want to parallelize this?
OR
If we want to run generate test, after each iteration we need to add the feedback for the next iteration. Can we add that feature here?
| ): | ||
| """Run up to K samples, evaluate with critic, and score with ground truth.""" | ||
| sample_results = [] | ||
| for i in range(k): |
@microsoft-github-policy-service agree company="Microsoft" |
Added Verina support and helper functions for evaluation.
Add utility functions for Verina benchmark tasks, including loading data, parsing Lean files, and rendering unit tests.
Added a custom JSON encoder to handle numpy types when dumping JSON.
No description provided.