scx_p2dq: Add DHQ support #3035

hodgesds · 2025-11-12T20:56:10Z

scx_p2dq: Add DHQ support and fix migration-disabled task errors

Integrate Double Helix Queue (DHQ) as an alternative to ATQ for LLC-aware task migration.

DHQ Integration:

Add --dhq-enabled flag to enable DHQ mode for LLC migration
Add --dhq-max-imbalance parameter (default: 3) to control strand balance
Create one DHQ per pair of LLCs in same NUMA node
Map each LLC to a specific strand (A or B) for cache affinity
Each CPU inherits strand from its LLC for proper load distribution
DHQ provides cache-aware migration with controlled cross-LLC movement

Strand-Specific DHQ Operations:
Use scx_dhq_peek_strand() and scx_dhq_pop_strand() instead of generic operations to ensure CPUs only consume from their designated strand. This preserves cache locality and prevents load imbalance.

Data Structure Changes:

Add mig_dhq and dhq_strand to cpu_ctx and llc_ctx
Add llc_pair_dhqs[] for shared DHQs between LLC pairs
Add llcs_per_node[] to track LLCs per NUMA node
Add P2DQ_ENQUEUE_PROMISE_DHQ_VTIME enqueue promise type
Add enqueue_promise_dhq struct for DHQ-specific metadata

Configuration:

p2dq_config.dhq_enabled: Enable DHQ mode
p2dq_config.dhq_max_imbalance: Control strand pairing (0 = unlimited)
Priority mode: lowest vtime wins across strands

Build System:

Add lib/dhq.bpf.c to scx_p2dq and scx_chaos builds
Include lib/dhq.h in types.h

scx_chaos Compatibility:

Update enqueue promise handling to recognize DHQ type
Error message updated to mention both ATQs and DHQs not supported

Benefits:

Cache affinity: Tasks stay on origin LLC (strand)
Controlled migration: max_imbalance prevents migration storms
Race-free: Atomic affinity handling eliminates migration-disabled errors
Work conservation: Cross-strand stealing when priority demands
Scalable: Lock contention distributed across DHQ strands

Signed-off-by: Daniel Hodges hodgesd@meta.com

hodgesds · 2025-11-13T14:06:06Z

Performance tests using stress-ng --cacheline with workers equal to nproc / 4, with best of 3 runs

eevdf (min 1503.89 ops/s):

$ stress-ng --cacheline 44 -t 15 -M
stress-ng: info:  [1950705] setting to a 15 secs run per stressor
stress-ng: info:  [1950705] dispatching hogs: 44 cacheline
stress-ng: metrc: [1950705] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [1950705]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [1950705] cacheline         23884     15.00   1296.73     10.26      1592.24          18.27       198.03          1416
stress-ng: info:  [1950705] skipped: 0
stress-ng: info:  [1950705] passed: 44: cacheline (44)
stress-ng: info:  [1950705] failed: 0
stress-ng: info:  [1950705] metrics untrustworthy: 0
stress-ng: info:  [1950705] successful run completed in 15.01 secs

p2dq (min 1534.57 ops/s):

$ stress-ng --cacheline 44 -t 15 -M
stress-ng: info:  [1943430] setting to a 15 secs run per stressor
stress-ng: info:  [1943430] dispatching hogs: 44 cacheline
stress-ng: metrc: [1943430] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [1943430]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [1943430] cacheline         24847     15.00   1296.88     10.19      1656.48          19.01       198.04          1444
stress-ng: info:  [1943430] skipped: 0
stress-ng: info:  [1943430] passed: 44: cacheline (44)
stress-ng: info:  [1943430] failed: 0
stress-ng: info:  [1943430] metrics untrustworthy: 0
stress-ng: info:  [1943430] successful run completed in 15.01 secs

scx_p2dq --dhq-enabled true (min 1564.45 ops/s):

$ stress-ng --cacheline 44 -t 15 -M
stress-ng: info:  [1989170] setting to a 15 secs run per stressor
stress-ng: info:  [1989170] dispatching hogs: 44 cacheline
stress-ng: metrc: [1989170] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [1989170]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [1989170] cacheline         25199     15.00   1297.93      9.69      1679.89          19.27       198.12          1424
stress-ng: info:  [1989170] skipped: 0
stress-ng: info:  [1989170] passed: 44: cacheline (44)
stress-ng: info:  [1989170] failed: 0
stress-ng: info:  [1989170] metrics untrustworthy: 0
stress-ng: info:  [1989170] successful run completed in 15.00 secs

scx_p2dq --dhq-enabled true --dhq-max-imbalance 8 (min 1589.18 ops/s):

$ stress-ng --cacheline 44 -t 15 -M
stress-ng: info:  [1975046] setting to a 15 secs run per stressor
stress-ng: info:  [1975046] dispatching hogs: 44 cacheline
stress-ng: metrc: [1975046] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [1975046]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [1975046] cacheline         25080     15.00   1297.28      9.96      1671.90          19.19       198.06          1428
stress-ng: info:  [1975046] skipped: 0
stress-ng: info:  [1975046] passed: 44: cacheline (44)
stress-ng: info:  [1975046] failed: 0
stress-ng: info:  [1975046] metrics untrustworthy: 0
stress-ng: info:  [1975046] successful run completed in 15.01 secs

multics69

LGTM. I like the idea of DHQ.

In my understanding, the key idea is to pair two LLC domains for first-level load balancing (under the abstraction of DHQ), so that cache affinity is preserved within the pair.

One minor suggestion is that dhq.h should be included in the first commit rather than the second.

etsal · 2025-11-17T19:00:44Z

Can we add the README to a separate directory, e.g., docs/ ?

Implement Double Helix Queue (DHQ), a dual-strand priority queue designed for LLC-aware task migration in multi-cache systems. DHQ maintains two parallel strands (analogous to DNA's double helix) with coordinated access to preserve cache affinity while enabling work-stealing. Key features: - Fixed-size implementation with pre-allocated capacity for use in non-sleepable BPF contexts (enqueue/dispatch callbacks) - Strand pairing constraint prevents unbounded imbalance between strands - Three dequeue modes: Priority (lowest vtime), Alternating (fair), and Balanced (load-aware) - Both FIFO and VTime ordering modes within each strand - Arena-based allocation for scalable concurrent access DHQ advantages over single queues: - Cache locality: Each strand maps to an LLC, preserving cache warmth - Controlled migration: max_imbalance parameter limits cross-LLC movement - Lower lock contention: Operations distributed across two strands - Work conservation: Priority mode allows stealing while respecting affinity - Prevents pathological cases where one LLC monopolizes migration queue Implementation details: - Backed by red-black trees (via scx_minheap) for O(log n) operations - Strand constraint enforced at both enqueue and dequeue time - Returns -EAGAIN when strand imbalance would be violated - Returns -ENOSPC when capacity is reached - Thread-safe via arena spinlocks Files added: - lib/dhq.bpf.c: Core DHQ implementation - lib/DHQ_README.md: Comprehensive documentation with complexity analysis - lib/selftests/st_dhq.bpf.c: Unit tests for DHQ operations Designed for LLC migration, where: - LLC pairs in same NUMA node share one DHQ - Strand A = LLC 0 tasks (cache-warm to LLC 0) - Strand B = LLC 1 tasks (cache-warm to LLC 1) - Priority mode migrates highest-urgency tasks across LLCs - max_imbalance controls migration rate Signed-off-by: Daniel Hodges <hodgesd@meta.com>

Integrate Double Helix Queue (DHQ) as an alternative to ATQ for LLC-aware task migration, and fix critical race condition causing migration-disabled task errors. DHQ Integration: - Add --dhq-enabled flag to enable DHQ mode for LLC migration - Add --dhq-max-imbalance parameter (default: 3) to control strand balance - Create one DHQ per pair of LLCs in same NUMA node - Map each LLC to a specific strand (A or B) for cache affinity - Each CPU inherits strand from its LLC for proper load distribution - DHQ provides cache-aware migration with controlled cross-LLC movement Strand-Specific DHQ Operations: Use scx_dhq_peek_strand() and scx_dhq_pop_strand() instead of generic operations to ensure CPUs only consume from their designated strand. This preserves cache locality and prevents load imbalance. Data Structure Changes: - Add mig_dhq and dhq_strand to cpu_ctx and llc_ctx - Add llc_pair_dhqs[] for shared DHQs between LLC pairs - Add llcs_per_node[] to track LLCs per NUMA node - Add P2DQ_ENQUEUE_PROMISE_DHQ_VTIME enqueue promise type - Add enqueue_promise_dhq struct for DHQ-specific metadata Configuration: - p2dq_config.dhq_enabled: Enable DHQ mode - p2dq_config.dhq_max_imbalance: Control strand pairing (0 = unlimited) - Priority mode: lowest vtime wins across strands Build System: - Add lib/dhq.bpf.c to scx_p2dq and scx_chaos builds scx_chaos Compatibility: - Update enqueue promise handling to recognize DHQ type - Error message updated to mention both ATQs and DHQs not supported Benefits: - Cache affinity: Tasks stay on origin LLC (strand) - Controlled migration: max_imbalance prevents migration storms - Race-free: Atomic affinity handling eliminates migration-disabled errors - Work conservation: Cross-strand stealing when priority demands - Scalable: Lock contention distributed across DHQ strands Signed-off-by: Daniel Hodges <hodgesd@meta.com>

hodgesds requested review from JakeHillion, arighi, etsal, htejun, multics69 and tommy-u November 12, 2025 20:56

hodgesds changed the title ~~scx_p2dq: Add DHQ support and fix migration-disabled task errors~~ scx_p2dq: Add DHQ support Nov 12, 2025

hodgesds force-pushed the p2dq-dhq branch 6 times, most recently from f611110 to ea228ea Compare November 12, 2025 22:40

hodgesds force-pushed the p2dq-dhq branch 4 times, most recently from 05388d2 to 667b6e4 Compare November 13, 2025 17:32

multics69 approved these changes Nov 16, 2025

View reviewed changes

hodgesds force-pushed the p2dq-dhq branch from 667b6e4 to fedd3b4 Compare November 17, 2025 17:11

hodgesds added this pull request to the merge queue Nov 17, 2025

hodgesds removed this pull request from the merge queue due to a manual request Nov 17, 2025

multics69 mentioned this pull request Nov 18, 2025

scx_lavd: perform task stealing in circular distance order. #3061

Merged

hodgesds added this pull request to the merge queue Nov 18, 2025

hodgesds removed this pull request from the merge queue due to a manual request Nov 18, 2025

hodgesds added 2 commits November 19, 2025 10:29

hodgesds force-pushed the p2dq-dhq branch from fedd3b4 to c204ae8 Compare November 19, 2025 18:30

hodgesds enabled auto-merge November 19, 2025 18:32

hodgesds added this pull request to the merge queue Nov 19, 2025

Merged via the queue into sched-ext:main with commit cf2f4a3 Nov 19, 2025
22 checks passed

hodgesds deleted the p2dq-dhq branch November 19, 2025 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scx_p2dq: Add DHQ support #3035

scx_p2dq: Add DHQ support #3035

Uh oh!

hodgesds commented Nov 12, 2025 •

edited

Loading

Uh oh!

hodgesds commented Nov 13, 2025

Uh oh!

multics69 left a comment

Uh oh!

Uh oh!

etsal commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scx_p2dq: Add DHQ support #3035

scx_p2dq: Add DHQ support #3035

Uh oh!

Conversation

hodgesds commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hodgesds commented Nov 13, 2025

Uh oh!

multics69 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

etsal commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hodgesds commented Nov 12, 2025 •

edited

Loading