Problems encountered during code reproduction

Hello, thanks for sharing this excellent work! Trying to reproduce the experiment in your paper, but encountered some problems when using the existing code. Specifically, when I used starcoder for code generation, I did not modify the prompt in the code. The resulting generated content contains some additional information, such as:
```json
"generate_response": "#--------------------------------------------------\ n #The below code fragment can be found in:\ n #huggingface_diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion.py\ n #--------------------------------------------------\ n #).pixel_values astype (image.dtype) \ n ##safety_checker does not support batched inputs yet\ n #images, has_nsfw_concept = [], []\ n #for i in range ("
```
These annotations affect the final metrics. When testing'api-level '"python" coarse2fine "with starcoder, I only got the following metrics:
```Shell
EM: 0.132, ES: 28057119527892604
```
This is very different from the metrics reported in the paper. However, there is no such problem when using codegen-350M-mono, and the resulting metrics are:
```Shell
EM: 0.3485, ES: 0.5977332666211187
```
The code for calculating the indicator is as follows:
```Python
from utils.metrics import compute_batch_EM, compute_batch_ES

if __name__ == "__main__":
    ground_truth_file_path = "RepoEval-Updated/api_level.python.test.jsonl"
    generation_res_file_path = "generation_results/codegen-350M-mono/api_level.python.coarse2fine.10.0.retrieval.codegen-350M-mono.gen_res.jsonl"
    em = compute_batch_EM(ground_truth_file_path, generation_res_file_path)
    es = compute_batch_ES(ground_truth_file_path, generation_res_file_path)
    print(f"EM: {em}, ES: {es}")
```
May I ask if the construction method of prompt when you implement the paper is exactly the same as that in the repo, and whether the various sampling parameters of llm are different, thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems encountered during code reproduction #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problems encountered during code reproduction #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions