This is our attempt to implement SCoRe according to google's SCoRe paper.
pip install -r requirements.txt
''' huggingface-cli login
Token : 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 '''
Use this command
''' python run.py --task MATH --model_variant meta-llama/Llama-3.2-3B-Instruct --data_path ./data --output_dir ./outputs --mixed_precision --no_bleu --no_rouge '''
https://github.com/sanowl/Self-Correcting-LLM--Reinforcement-Learning-