Skip to content

Conversation

@hodgesds
Copy link
Contributor

@hodgesds hodgesds commented Nov 12, 2025

scx_p2dq: Add DHQ support and fix migration-disabled task errors

Integrate Double Helix Queue (DHQ) as an alternative to ATQ for LLC-aware task migration.

DHQ Integration:

  • Add --dhq-enabled flag to enable DHQ mode for LLC migration
  • Add --dhq-max-imbalance parameter (default: 3) to control strand balance
  • Create one DHQ per pair of LLCs in same NUMA node
  • Map each LLC to a specific strand (A or B) for cache affinity
  • Each CPU inherits strand from its LLC for proper load distribution
  • DHQ provides cache-aware migration with controlled cross-LLC movement

Strand-Specific DHQ Operations:
Use scx_dhq_peek_strand() and scx_dhq_pop_strand() instead of generic operations to ensure CPUs only consume from their designated strand. This preserves cache locality and prevents load imbalance.

Data Structure Changes:

  • Add mig_dhq and dhq_strand to cpu_ctx and llc_ctx
  • Add llc_pair_dhqs[] for shared DHQs between LLC pairs
  • Add llcs_per_node[] to track LLCs per NUMA node
  • Add P2DQ_ENQUEUE_PROMISE_DHQ_VTIME enqueue promise type
  • Add enqueue_promise_dhq struct for DHQ-specific metadata

Configuration:

  • p2dq_config.dhq_enabled: Enable DHQ mode
  • p2dq_config.dhq_max_imbalance: Control strand pairing (0 = unlimited)
  • Priority mode: lowest vtime wins across strands

Build System:

  • Add lib/dhq.bpf.c to scx_p2dq and scx_chaos builds
  • Include lib/dhq.h in types.h

scx_chaos Compatibility:

  • Update enqueue promise handling to recognize DHQ type
  • Error message updated to mention both ATQs and DHQs not supported

Benefits:

  • Cache affinity: Tasks stay on origin LLC (strand)
  • Controlled migration: max_imbalance prevents migration storms
  • Race-free: Atomic affinity handling eliminates migration-disabled errors
  • Work conservation: Cross-strand stealing when priority demands
  • Scalable: Lock contention distributed across DHQ strands

Signed-off-by: Daniel Hodges hodgesd@meta.com

@hodgesds hodgesds changed the title scx_p2dq: Add DHQ support and fix migration-disabled task errors scx_p2dq: Add DHQ support Nov 12, 2025
@hodgesds hodgesds force-pushed the p2dq-dhq branch 6 times, most recently from f611110 to ea228ea Compare November 12, 2025 22:40
@hodgesds
Copy link
Contributor Author

Performance tests using stress-ng --cacheline with workers equal to nproc / 4, with best of 3 runs

eevdf (min 1503.89 ops/s):

$ stress-ng --cacheline 44 -t 15 -M
stress-ng: info:  [1950705] setting to a 15 secs run per stressor
stress-ng: info:  [1950705] dispatching hogs: 44 cacheline
stress-ng: metrc: [1950705] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [1950705]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [1950705] cacheline         23884     15.00   1296.73     10.26      1592.24          18.27       198.03          1416
stress-ng: info:  [1950705] skipped: 0
stress-ng: info:  [1950705] passed: 44: cacheline (44)
stress-ng: info:  [1950705] failed: 0
stress-ng: info:  [1950705] metrics untrustworthy: 0
stress-ng: info:  [1950705] successful run completed in 15.01 secs

p2dq (min 1534.57 ops/s):

$ stress-ng --cacheline 44 -t 15 -M
stress-ng: info:  [1943430] setting to a 15 secs run per stressor
stress-ng: info:  [1943430] dispatching hogs: 44 cacheline
stress-ng: metrc: [1943430] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [1943430]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [1943430] cacheline         24847     15.00   1296.88     10.19      1656.48          19.01       198.04          1444
stress-ng: info:  [1943430] skipped: 0
stress-ng: info:  [1943430] passed: 44: cacheline (44)
stress-ng: info:  [1943430] failed: 0
stress-ng: info:  [1943430] metrics untrustworthy: 0
stress-ng: info:  [1943430] successful run completed in 15.01 secs

scx_p2dq --dhq-enabled true (min 1564.45 ops/s):

$ stress-ng --cacheline 44 -t 15 -M
stress-ng: info:  [1989170] setting to a 15 secs run per stressor
stress-ng: info:  [1989170] dispatching hogs: 44 cacheline
stress-ng: metrc: [1989170] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [1989170]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [1989170] cacheline         25199     15.00   1297.93      9.69      1679.89          19.27       198.12          1424
stress-ng: info:  [1989170] skipped: 0
stress-ng: info:  [1989170] passed: 44: cacheline (44)
stress-ng: info:  [1989170] failed: 0
stress-ng: info:  [1989170] metrics untrustworthy: 0
stress-ng: info:  [1989170] successful run completed in 15.00 secs

scx_p2dq --dhq-enabled true --dhq-max-imbalance 8 (min 1589.18 ops/s):

$ stress-ng --cacheline 44 -t 15 -M
stress-ng: info:  [1975046] setting to a 15 secs run per stressor
stress-ng: info:  [1975046] dispatching hogs: 44 cacheline
stress-ng: metrc: [1975046] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [1975046]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [1975046] cacheline         25080     15.00   1297.28      9.96      1671.90          19.19       198.06          1428
stress-ng: info:  [1975046] skipped: 0
stress-ng: info:  [1975046] passed: 44: cacheline (44)
stress-ng: info:  [1975046] failed: 0
stress-ng: info:  [1975046] metrics untrustworthy: 0
stress-ng: info:  [1975046] successful run completed in 15.01 secs

@hodgesds hodgesds force-pushed the p2dq-dhq branch 4 times, most recently from 05388d2 to 667b6e4 Compare November 13, 2025 17:32
Copy link
Contributor

@multics69 multics69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I like the idea of DHQ.

In my understanding, the key idea is to pair two LLC domains for first-level load balancing (under the abstraction of DHQ), so that cache affinity is preserved within the pair.

One minor suggestion is that dhq.h should be included in the first commit rather than the second.

@hodgesds hodgesds added this pull request to the merge queue Nov 17, 2025
@hodgesds hodgesds removed this pull request from the merge queue due to a manual request Nov 17, 2025
@etsal
Copy link
Contributor

etsal commented Nov 17, 2025

Can we add the README to a separate directory, e.g., docs/ ?

@hodgesds hodgesds added this pull request to the merge queue Nov 18, 2025
@hodgesds hodgesds removed this pull request from the merge queue due to a manual request Nov 18, 2025
Implement Double Helix Queue (DHQ), a dual-strand priority queue designed
for LLC-aware task migration in multi-cache systems. DHQ maintains two
parallel strands (analogous to DNA's double helix) with coordinated access
to preserve cache affinity while enabling work-stealing.

Key features:
- Fixed-size implementation with pre-allocated capacity for use in
  non-sleepable BPF contexts (enqueue/dispatch callbacks)
- Strand pairing constraint prevents unbounded imbalance between strands
- Three dequeue modes: Priority (lowest vtime), Alternating (fair), and
  Balanced (load-aware)
- Both FIFO and VTime ordering modes within each strand
- Arena-based allocation for scalable concurrent access

DHQ advantages over single queues:
- Cache locality: Each strand maps to an LLC, preserving cache warmth
- Controlled migration: max_imbalance parameter limits cross-LLC movement
- Lower lock contention: Operations distributed across two strands
- Work conservation: Priority mode allows stealing while respecting affinity
- Prevents pathological cases where one LLC monopolizes migration queue

Implementation details:
- Backed by red-black trees (via scx_minheap) for O(log n) operations
- Strand constraint enforced at both enqueue and dequeue time
- Returns -EAGAIN when strand imbalance would be violated
- Returns -ENOSPC when capacity is reached
- Thread-safe via arena spinlocks

Files added:
- lib/dhq.bpf.c: Core DHQ implementation
- lib/DHQ_README.md: Comprehensive documentation with complexity analysis
- lib/selftests/st_dhq.bpf.c: Unit tests for DHQ operations

Designed for LLC migration, where:
- LLC pairs in same NUMA node share one DHQ
- Strand A = LLC 0 tasks (cache-warm to LLC 0)
- Strand B = LLC 1 tasks (cache-warm to LLC 1)
- Priority mode migrates highest-urgency tasks across LLCs
- max_imbalance controls migration rate

Signed-off-by: Daniel Hodges <hodgesd@meta.com>
Integrate Double Helix Queue (DHQ) as an alternative to ATQ for LLC-aware
task migration, and fix critical race condition causing migration-disabled
task errors.

DHQ Integration:
- Add --dhq-enabled flag to enable DHQ mode for LLC migration
- Add --dhq-max-imbalance parameter (default: 3) to control strand balance
- Create one DHQ per pair of LLCs in same NUMA node
- Map each LLC to a specific strand (A or B) for cache affinity
- Each CPU inherits strand from its LLC for proper load distribution
- DHQ provides cache-aware migration with controlled cross-LLC movement

Strand-Specific DHQ Operations:
Use scx_dhq_peek_strand() and scx_dhq_pop_strand() instead of generic
operations to ensure CPUs only consume from their designated strand.
This preserves cache locality and prevents load imbalance.

Data Structure Changes:
- Add mig_dhq and dhq_strand to cpu_ctx and llc_ctx
- Add llc_pair_dhqs[] for shared DHQs between LLC pairs
- Add llcs_per_node[] to track LLCs per NUMA node
- Add P2DQ_ENQUEUE_PROMISE_DHQ_VTIME enqueue promise type
- Add enqueue_promise_dhq struct for DHQ-specific metadata

Configuration:
- p2dq_config.dhq_enabled: Enable DHQ mode
- p2dq_config.dhq_max_imbalance: Control strand pairing (0 = unlimited)
- Priority mode: lowest vtime wins across strands

Build System:
- Add lib/dhq.bpf.c to scx_p2dq and scx_chaos builds

scx_chaos Compatibility:
- Update enqueue promise handling to recognize DHQ type
- Error message updated to mention both ATQs and DHQs not supported

Benefits:
- Cache affinity: Tasks stay on origin LLC (strand)
- Controlled migration: max_imbalance prevents migration storms
- Race-free: Atomic affinity handling eliminates migration-disabled errors
- Work conservation: Cross-strand stealing when priority demands
- Scalable: Lock contention distributed across DHQ strands

Signed-off-by: Daniel Hodges <hodgesd@meta.com>
@hodgesds hodgesds added this pull request to the merge queue Nov 19, 2025
Merged via the queue into sched-ext:main with commit cf2f4a3 Nov 19, 2025
22 checks passed
@hodgesds hodgesds deleted the p2dq-dhq branch November 19, 2025 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants