[DBO] Compile non-dbo cudagraphs for shapes that are close to dbo_decode_token_threshold #27771
      
        
          +47
        
        
          −20
        
        
          
        
      
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Purpose
When num_tokens is near the dbo_decode_token_threshold, different ranks may make different microbatching decisions (some above threshold, some below). Since all ranks must agree for DBO to work, they'll all fall back to non-DBO execution. To avoid running without cudagraphs in these mixed cases, this PR adds logic to compile cudagraphs for both microbatching modes.
Size before
Graph capturing finished in 33 secs, took 2.46 GiBSize after
Graph capturing finished in 35 secs, took 2.52 GiBTest Plan
To test I ran lm_eval with Deepseek V2 Lite with DP=2 and dbo-decode-threshold=26. Since ranks usually get 25-30 tokens in this scenario, setting the threshold at 26 ensures some ranks will be above and some below, triggering the mixed-decision scenario. I added logging to the code and verified that we are now properly running with non-dbo cudagraphs when one rank is running with 25 tokens. I've also included lm eval results.
Test Result