MathEDU

MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support

Requirements

Install requirements using pip install -r requirements.txt

Dataset

This dataset includes authentic student solutions and expert feedback annotations.

Data Structure

Each entry in the dataset represents a student's response to a specific problem, including the following fields:

id: A unique identifier for each entry, which can be mapped to a problem in MathQA.
student_id: The ID of the student who answered the problem.
student_answer: The student's final answer to the problem.
student_process: The student's problem-solving process in LATEX format.
correct_or_not: Indicates whether the student's answer was correct or wrong.
the_reason_why_student_cant_solve_ch: A field for explaining the reason why the student could not solve the problem, in Chinese.
the_reason_why_student_cant_solve_en: A field for explaining the reason why the student could not solve the problem, in English.
teacher_review: A dictionary containing the teacher's feedback, including:
- error_counts: The number of errors identified in the student's answer.
- error: A list of errors, with details including:
  - error_type: The type of error (e.g., "Wrong mathematical operation/concept").
  - error_equation: The specific part of the solution where the error occurred.
  - teacher_advice_ch: Teacher's feedback in Chinese.
  - teacher_advice_en: Teacher's feedback in English.

Example

Here is an example of an entry in the dataset:

{
    "id": 9420,
    "student_id": 5,
    "student_answer": "3:5",
    "student_process": "ratio of de: bc equal to the ratio of the area, Ans: 3:5",
    "correct_or_not": "wrong",
    "the_reason_why_student_cant_solve_ch": "",
    "the_reason_why_student_cant_solve_en": "",
    "teacher_review": {
        "error_counts": 1,
        "error": [
            {
                "error_type": "Wrong mathematical operation/concept",
                "error_equation": "ratio of de: bc equal to the ratio of the area",
                "teacher_advice_ch": "觀念錯誤，還需考慮到兩者的高和三角形與梯形的面積公式不同的問題，由於de:bc=3:5因此兩者的高的比值為3:(5-3)=3:2，三角形ade的面積為3*3/2=9/2，而梯形debc的面積為(3+5)*2/2=8，因此面積比為(9/2)/8=9/16",
                "teacher_advice_en": "The concept is incorrect. You need to consider the different heights and the different area formulas for triangles and trapezoids. Since DE:BC = 3:5, the ratio of their heights is 3:(5-3) = 3:2. The area of triangle ADE is 3*3/2 = 9/2, and the area of trapezoid BCED is (3+5)*2/2 = 8. Therefore, the ratio of areas is (9/2)/8 = 9/16."
            }
        ]
    }
}

Run

To obtain the few-shot prompt grading results for Llama3 8B:

python llama3_8b_grading.py

To obtain the few-shot prompt grading results for Llama3 70B:

python llama3_70b_grading.py

To obtain the few-shot prompt grading results for GPT-3.5:

python gpt_3.5__grading.py

To obtain the few-shot prompt grading results for o1-mini:

python o1_mini_grading.py

To Observe the analysis of the model's responses:

python response_analyze.py

To see the results of LLM ratings generated by GPT-4:

python gpt4_llm_rating.py

To create fine-tuned data:

python create_finetuned_data.py

To fine-tune the Llama3 8B model:

huggingface-cli login –token "your_hf_token"
#edit your own finetune.yaml
!ACCELERATE_USE_FSDP=1 FSDP_CPU_RAM_EFFICIENT_LOADING=1 torchrun --nproc_per_node=4 train.py --config finetune.yaml

To inference the fine-tuned model:

python inference.py –config finetune.yaml

Prompts used in this work: Prompts

Citation

@article{hsu2025mathedu,
  title={MathEDU: Towards Adaptive Feedback for Student Mathematical Problem-Solving},
  author={Hsu, Wei-Ling and Tang, Yu-Chien and Yen, An-Zi},
  journal={arXiv preprint arXiv:2505.18056},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MathEDU

Requirements

Dataset

Data Structure

Example

Run

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dataset		dataset
Prompts.pdf		Prompts.pdf
README.md		README.md
create_finetuned_data.py		create_finetuned_data.py
finetune.yaml		finetune.yaml
gpt4_llm_rating.py		gpt4_llm_rating.py
gpt_3.5_grading.py		gpt_3.5_grading.py
inference.py		inference.py
llama3_70b_grading.py		llama3_70b_grading.py
llama3_8b_grading.py		llama3_8b_grading.py
o1_mini_grading.py		o1_mini_grading.py
requirement.txt		requirement.txt
response_analyze.py		response_analyze.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

MathEDU

Requirements

Dataset

Data Structure

Example

Run

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages