Skip to content

Conversation

@behroozazarkhalili
Copy link
Collaborator

Summary

This PR addresses issue #4379 by expanding the Training Customization documentation section with 5 new comprehensive examples, rather than removing it.

Resolves #4379

Changes Made

New Examples Added (5):

  1. Custom Callbacks - Shows how to add custom callbacks for logging, monitoring, or early stopping
  2. Custom Evaluation Metrics - Demonstrates defining custom metrics to track during training
  3. Mixed Precision Training - Explains bf16/fp16 usage for speed and memory optimization
  4. Gradient Accumulation - Shows how to simulate larger batch sizes with limited GPU memory
  5. Custom Data Collator - Demonstrates custom data preprocessing and padding strategies

Documentation Improvements:

  • Updated introduction for better clarity and consistency
  • All examples follow the same pattern as existing examples
  • All code examples verified against the codebase
  • Proper imports and configuration options validated

Statistics

  • Original examples: 5
  • New examples: 5
  • Total examples: 10 (doubled!)
  • Lines added: ~150

Verification

✅ All imports verified against codebase
✅ All config options verified in DPOConfig
✅ DataCollatorForPreference import path corrected
✅ Consistent code style with existing examples
✅ Examples apply to most/all trainers as stated

Test Plan

  • Verified all imports exist in the codebase
  • Validated config parameters against DPOConfig
  • Ensured consistent formatting with existing examples
  • Checked that examples follow DPOTrainer pattern as stated in intro

Resolves huggingface#4379

- Add custom callbacks example for logging and monitoring
- Add custom evaluation metrics example
- Add mixed precision training example (bf16/fp16)
- Add gradient accumulation example
- Add custom data collator example
- Update introduction for better clarity
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

behroozazarkhalili added a commit that referenced this pull request Nov 3, 2025
- Clarify that bf16 is the default in mixed precision section
- Move gradient accumulation section to reducing memory guide
- Expand gradient accumulation examples to include DPO, SFT, and Reward trainers

Addresses review comments from @qgallouedec on PR #4427
@behroozazarkhalili
Copy link
Collaborator Author

I've addressed both review comments:

  1. Mixed precision section: Added clarification that bf16=True is the default in TRL. Updated the example to show when/how to override defaults for older GPUs or to disable mixed precision.

  2. Gradient accumulation section: Moved from customization guide to the reducing memory usage guide (reducing_memory_usage.md), as it's primarily a memory optimization technique. Expanded the examples to include DPO, SFT, and Reward trainers.

Ready for re-review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove or populate "Training customization"

3 participants