Skip to content

Problems encountered during code reproduction #2

@TimeTrapzz

Description

@TimeTrapzz

Hello, thanks for sharing this excellent work! Trying to reproduce the experiment in your paper, but encountered some problems when using the existing code. Specifically, when I used starcoder for code generation, I did not modify the prompt in the code. The resulting generated content contains some additional information, such as:

"generate_response": "#--------------------------------------------------\ n #The below code fragment can be found in:\ n #huggingface_diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion.py\ n #--------------------------------------------------\ n #).pixel_values astype (image.dtype) \ n ##safety_checker does not support batched inputs yet\ n #images, has_nsfw_concept = [], []\ n #for i in range ("

These annotations affect the final metrics. When testing'api-level '"python" coarse2fine "with starcoder, I only got the following metrics:

EM: 0.132, ES: 28057119527892604

This is very different from the metrics reported in the paper. However, there is no such problem when using codegen-350M-mono, and the resulting metrics are:

EM: 0.3485, ES: 0.5977332666211187

The code for calculating the indicator is as follows:

from utils.metrics import compute_batch_EM, compute_batch_ES

if __name__ == "__main__":
    ground_truth_file_path = "RepoEval-Updated/api_level.python.test.jsonl"
    generation_res_file_path = "generation_results/codegen-350M-mono/api_level.python.coarse2fine.10.0.retrieval.codegen-350M-mono.gen_res.jsonl"
    em = compute_batch_EM(ground_truth_file_path, generation_res_file_path)
    es = compute_batch_ES(ground_truth_file_path, generation_res_file_path)
    print(f"EM: {em}, ES: {es}")

May I ask if the construction method of prompt when you implement the paper is exactly the same as that in the repo, and whether the various sampling parameters of llm are different, thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions