SCoRe (Self-Correction via Reinforcement Learning)

This is our attempt to implement SCoRe according to google's SCoRe paper.

To ru.n score:

Installation

pip install -r requirements.txt

Huggingface setup

''' huggingface-cli login

Token : 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 '''

Run

Use this command

''' python run.py --task MATH --model_variant meta-llama/Llama-3.2-3B-Instruct --data_path ./data --output_dir ./outputs --mixed_precision --no_bleu --no_rouge '''

References

https://github.com/sanowl/Self-Correcting-LLM--Reinforcement-Learning-

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
MATH-test		MATH-test
MATH		MATH
data		data
outputs		outputs
self-refine		self-refine
wandb		wandb
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
all_level_1_problems.json		all_level_1_problems.json
answer_pairs_llama.json		answer_pairs_llama.json
answer_pairs_mathstral.json		answer_pairs_mathstral.json
answers-llama-test.json		answers-llama-test.json
answers.json		answers.json
application.log		application.log
application_20241207_110750.log		application_20241207_110750.log
application_20241207_110857.log		application_20241207_110857.log
application_20241207_112419.log		application_20241207_112419.log
calc_acc_log.py		calc_acc_log.py
check-dataset.ipynb		check-dataset.ipynb
createnewdataset.py		createnewdataset.py
createnewdataset2.py		createnewdataset2.py
dl_zip.zip		dl_zip.zip
evaluate_logs_20241207_112442.json		evaluate_logs_20241207_112442.json
evaluate_logs_20241207_155226.json		evaluate_logs_20241207_155226.json
evaluate_logs_20241207_203207.json		evaluate_logs_20241207_203207.json
flan-t5-223m.py		flan-t5-223m.py
flant5.py		flant5.py
generate_data_set.py		generate_data_set.py
llama-1b-test.py		llama-1b-test.py
main.py		main.py
printresults.py		printresults.py
requirements.txt		requirements.txt
run.py		run.py
run_commands.txt		run_commands.txt
run_latest.py		run_latest.py
run_resume.py		run_resume.py
rw.ipynb		rw.ipynb
script.py		script.py
selected_problems_105.json		selected_problems_105.json
stage_one_logs_20241207_114915.json		stage_one_logs_20241207_114915.json
stage_two_logs_20241207_134942.json		stage_two_logs_20241207_134942.json
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCoRe (Self-Correction via Reinforcement Learning)

To ru.n score:

Installation

Huggingface setup

Run

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SCoRe (Self-Correction via Reinforcement Learning)

To ru.n score:

Installation

Huggingface setup

Run

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages