Skip to content

Issues exist with the settings for num generation and the number of model-generated responses (Completion). The number of Completion responses consistently equals per_device_batch_size multiplied by gradient_acc_steps. #4346

@Haoyong23

Description

@Haoyong23

During runtime, I must set per_device_batch_size*gradient_acc_steps equal to num generation. Otherwise, the number of responses per sample (Completions) will be per_device_batch_size*gradient_acc_steps, while the total steps will change.
For example: Running on 4 GPUs with per_device_batch_size=1, gradient_acc_steps=8, and num_generation=8, each sample outputs 8 responses. The total steps become Step=203num_generation/per_device_batch_size/gradient_acc_steps=15.

Keeping everything else constant, changing gradient_acc_steps to 4 results in 4 completions per sample being printed. This equals per_device_batch_size * gradient_acc_steps and no longer equals num_generation (8). However, the total steps change from 15 to 30. The total number of samples processed remains unchanged, and I'm unsure how this affects training.

Changing gradient_acc_steps to 16 should theoretically yield total steps of 203num_generation/per_device_batch_size/gradient_acc_steps=7.5. However, actual runs show total steps as 6, while Completion remains per_device_batch_size*gradient_acc_steps, outputting 16.

I looked up resources, but they only state that (number of GPUs * batch size * gradient) % num_generations must be ensured.

Does anyone know why this happens? To ensure each prompt generates 8 responses for training, must I set per_device_batch_size=1, gradient_acc_steps=8, and num_generation=8? Will this configuration guarantee accuracy? What are the limitations of other settings? Or is it irrelevant? Is it sufficient to only ensure (number_of_cards * batch_size * gradient) % num_generation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    ❓ questionSeeking clarification or more information🐛 bugSomething isn't working🚀 deepspeedRelated to deepspeed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions