MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
Install requirements using pip install -r requirements.txt
This dataset includes authentic student solutions and expert feedback annotations.
Each entry in the dataset represents a student's response to a specific problem, including the following fields:
- id: A unique identifier for each entry, which can be mapped to a problem in MathQA.
- student_id: The ID of the student who answered the problem.
- student_answer: The student's final answer to the problem.
- student_process: The student's problem-solving process in LATEX format.
- correct_or_not: Indicates whether the student's answer was correct or wrong.
- the_reason_why_student_cant_solve_ch: A field for explaining the reason why the student could not solve the problem, in Chinese.
- the_reason_why_student_cant_solve_en: A field for explaining the reason why the student could not solve the problem, in English.
- teacher_review: A dictionary containing the teacher's feedback, including:
- error_counts: The number of errors identified in the student's answer.
- error: A list of errors, with details including:
- error_type: The type of error (e.g., "Wrong mathematical operation/concept").
- error_equation: The specific part of the solution where the error occurred.
- teacher_advice_ch: Teacher's feedback in Chinese.
- teacher_advice_en: Teacher's feedback in English.
Here is an example of an entry in the dataset:
{
"id": 9420,
"student_id": 5,
"student_answer": "3:5",
"student_process": "ratio of de: bc equal to the ratio of the area, Ans: 3:5",
"correct_or_not": "wrong",
"the_reason_why_student_cant_solve_ch": "",
"the_reason_why_student_cant_solve_en": "",
"teacher_review": {
"error_counts": 1,
"error": [
{
"error_type": "Wrong mathematical operation/concept",
"error_equation": "ratio of de: bc equal to the ratio of the area",
"teacher_advice_ch": "觀念錯誤,還需考慮到兩者的高和三角形與梯形的面積公式不同的問題,由於de:bc=3:5因此兩者的高的比值為3:(5-3)=3:2,三角形ade的面積為3*3/2=9/2,而梯形debc的面積為(3+5)*2/2=8,因此面積比為(9/2)/8=9/16",
"teacher_advice_en": "The concept is incorrect. You need to consider the different heights and the different area formulas for triangles and trapezoids. Since DE:BC = 3:5, the ratio of their heights is 3:(5-3) = 3:2. The area of triangle ADE is 3*3/2 = 9/2, and the area of trapezoid BCED is (3+5)*2/2 = 8. Therefore, the ratio of areas is (9/2)/8 = 9/16."
}
]
}
}
- To obtain the few-shot prompt grading results for Llama3 8B:
python llama3_8b_grading.py
- To obtain the few-shot prompt grading results for Llama3 70B:
python llama3_70b_grading.py
- To obtain the few-shot prompt grading results for GPT-3.5:
python gpt_3.5__grading.py
- To obtain the few-shot prompt grading results for o1-mini:
python o1_mini_grading.py
- To Observe the analysis of the model's responses:
python response_analyze.py
- To see the results of LLM ratings generated by GPT-4:
python gpt4_llm_rating.py
- To create fine-tuned data:
python create_finetuned_data.py
- To fine-tune the Llama3 8B model:
huggingface-cli login –token "your_hf_token"
#edit your own finetune.yaml
!ACCELERATE_USE_FSDP=1 FSDP_CPU_RAM_EFFICIENT_LOADING=1 torchrun --nproc_per_node=4 train.py --config finetune.yaml
- To inference the fine-tuned model:
python inference.py –config finetune.yaml
- Prompts used in this work: Prompts
@article{hsu2025mathedu,
title={MathEDU: Towards Adaptive Feedback for Student Mathematical Problem-Solving},
author={Hsu, Wei-Ling and Tang, Yu-Chien and Yen, An-Zi},
journal={arXiv preprint arXiv:2505.18056},
year={2025}
}