Skip to content

NYCU-NLP-Lab/MathEDU

Repository files navigation

MathEDU

MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support

Requirements

Install requirements using pip install -r requirements.txt

Dataset

This dataset includes authentic student solutions and expert feedback annotations.

Data Structure

Each entry in the dataset represents a student's response to a specific problem, including the following fields:

  • id: A unique identifier for each entry, which can be mapped to a problem in MathQA.
  • student_id: The ID of the student who answered the problem.
  • student_answer: The student's final answer to the problem.
  • student_process: The student's problem-solving process in LATEX format.
  • correct_or_not: Indicates whether the student's answer was correct or wrong.
  • the_reason_why_student_cant_solve_ch: A field for explaining the reason why the student could not solve the problem, in Chinese.
  • the_reason_why_student_cant_solve_en: A field for explaining the reason why the student could not solve the problem, in English.
  • teacher_review: A dictionary containing the teacher's feedback, including:
    • error_counts: The number of errors identified in the student's answer.
    • error: A list of errors, with details including:
      • error_type: The type of error (e.g., "Wrong mathematical operation/concept").
      • error_equation: The specific part of the solution where the error occurred.
      • teacher_advice_ch: Teacher's feedback in Chinese.
      • teacher_advice_en: Teacher's feedback in English.

Example

Here is an example of an entry in the dataset:

{
    "id": 9420,
    "student_id": 5,
    "student_answer": "3:5",
    "student_process": "ratio of de: bc equal to the ratio of the area, Ans: 3:5",
    "correct_or_not": "wrong",
    "the_reason_why_student_cant_solve_ch": "",
    "the_reason_why_student_cant_solve_en": "",
    "teacher_review": {
        "error_counts": 1,
        "error": [
            {
                "error_type": "Wrong mathematical operation/concept",
                "error_equation": "ratio of de: bc equal to the ratio of the area",
                "teacher_advice_ch": "觀念錯誤,還需考慮到兩者的高和三角形與梯形的面積公式不同的問題,由於de:bc=3:5因此兩者的高的比值為3:(5-3)=3:2,三角形ade的面積為3*3/2=9/2,而梯形debc的面積為(3+5)*2/2=8,因此面積比為(9/2)/8=9/16",
                "teacher_advice_en": "The concept is incorrect. You need to consider the different heights and the different area formulas for triangles and trapezoids. Since DE:BC = 3:5, the ratio of their heights is 3:(5-3) = 3:2. The area of triangle ADE is 3*3/2 = 9/2, and the area of trapezoid BCED is (3+5)*2/2 = 8. Therefore, the ratio of areas is (9/2)/8 = 9/16."
            }
        ]
    }
}

Run

  • To obtain the few-shot prompt grading results for Llama3 8B:
python llama3_8b_grading.py
  • To obtain the few-shot prompt grading results for Llama3 70B:
python llama3_70b_grading.py
  • To obtain the few-shot prompt grading results for GPT-3.5:
python gpt_3.5__grading.py
  • To obtain the few-shot prompt grading results for o1-mini:
python o1_mini_grading.py
  • To Observe the analysis of the model's responses:
python response_analyze.py
  • To see the results of LLM ratings generated by GPT-4:
python gpt4_llm_rating.py
  • To create fine-tuned data:
python create_finetuned_data.py
  • To fine-tune the Llama3 8B model:
huggingface-cli login –token "your_hf_token"
#edit your own finetune.yaml
!ACCELERATE_USE_FSDP=1 FSDP_CPU_RAM_EFFICIENT_LOADING=1 torchrun --nproc_per_node=4 train.py --config finetune.yaml
  • To inference the fine-tuned model:
python inference.py –config finetune.yaml
  • Prompts used in this work: Prompts

Citation

@article{hsu2025mathedu,
  title={MathEDU: Towards Adaptive Feedback for Student Mathematical Problem-Solving},
  author={Hsu, Wei-Ling and Tang, Yu-Chien and Yen, An-Zi},
  journal={arXiv preprint arXiv:2505.18056},
  year={2025}
}

About

MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages