Issues exist with the settings for num generation and the number of model-generated responses (Completion). The number of Completion responses consistently equals per_device_batch_size multiplied by gradient_acc_steps.

During runtime, I must set `per_device_batch_size*gradient_acc_steps` equal to `num generation`. Otherwise, the number of responses per sample (`Completions`) will be `per_device_batch_size*gradient_acc_steps`, while the total steps will change.
For example: Running on 4 GPUs with per_device_batch_size=1, gradient_acc_steps=8, and num_generation=8, each sample outputs 8 responses. The total steps become Step=20*3*num_generation/per_device_batch_size/gradient_acc_steps=15.

Keeping everything else constant, changing gradient_acc_steps to 4 results in 4 completions per sample being printed. This equals per_device_batch_size * gradient_acc_steps and no longer equals num_generation (8). However, the total steps change from 15 to 30. The total number of samples processed remains unchanged, and I'm unsure how this affects training.

Changing gradient_acc_steps to 16 should theoretically yield total steps of 20*3*num_generation/per_device_batch_size/gradient_acc_steps=7.5. However, actual runs show total steps as 6, while Completion remains per_device_batch_size*gradient_acc_steps, outputting 16.

I looked up resources, but they only state that (number of GPUs * batch size * gradient) % num_generations must be ensured.

Does anyone know why this happens? To ensure each prompt generates 8 responses for training, must I set per_device_batch_size=1, gradient_acc_steps=8, and num_generation=8? Will this configuration guarantee accuracy? What are the limitations of other settings? Or is it irrelevant? Is it sufficient to only ensure (number_of_cards * batch_size * gradient) % num_generation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues exist with the settings for num generation and the number of model-generated responses (Completion). The number of Completion responses consistently equals per_device_batch_size multiplied by gradient_acc_steps. #4346

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues exist with the settings for num generation and the number of model-generated responses (Completion). The number of Completion responses consistently equals per_device_batch_size multiplied by gradient_acc_steps. #4346

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions