Skip to content

Conversation

@LudWittg
Copy link

The original regex didn’t correctly match newlines inside the reasoning and answer tags. As a result, soft_format_reward_func always returned 0, since strict_format_reward_func required exactly two newlines inside each tag.

@Erland366
Copy link
Collaborator

I move this PR to make it updated with the new notebook as well to here -> #96

Will close this for now

@Erland366 Erland366 closed this Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants