-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hello, thanks for sharing this excellent work! Trying to reproduce the experiment in your paper, but encountered some problems when using the existing code. Specifically, when I used starcoder for code generation, I did not modify the prompt in the code. The resulting generated content contains some additional information, such as:
"generate_response": "#--------------------------------------------------\ n #The below code fragment can be found in:\ n #huggingface_diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion.py\ n #--------------------------------------------------\ n #).pixel_values astype (image.dtype) \ n ##safety_checker does not support batched inputs yet\ n #images, has_nsfw_concept = [], []\ n #for i in range ("These annotations affect the final metrics. When testing'api-level '"python" coarse2fine "with starcoder, I only got the following metrics:
EM: 0.132, ES: 28057119527892604This is very different from the metrics reported in the paper. However, there is no such problem when using codegen-350M-mono, and the resulting metrics are:
EM: 0.3485, ES: 0.5977332666211187The code for calculating the indicator is as follows:
from utils.metrics import compute_batch_EM, compute_batch_ES
if __name__ == "__main__":
ground_truth_file_path = "RepoEval-Updated/api_level.python.test.jsonl"
generation_res_file_path = "generation_results/codegen-350M-mono/api_level.python.coarse2fine.10.0.retrieval.codegen-350M-mono.gen_res.jsonl"
em = compute_batch_EM(ground_truth_file_path, generation_res_file_path)
es = compute_batch_ES(ground_truth_file_path, generation_res_file_path)
print(f"EM: {em}, ES: {es}")May I ask if the construction method of prompt when you implement the paper is exactly the same as that in the repo, and whether the various sampling parameters of llm are different, thank you!