Skip to content

Track external accumulators in tracer instead of using SparkInfo values#10553

Open
charlesmyu wants to merge 4 commits intomasterfrom
charles.yu/djm-0000/fix-spark-plan-metrics
Open

Track external accumulators in tracer instead of using SparkInfo values#10553
charlesmyu wants to merge 4 commits intomasterfrom
charles.yu/djm-0000/fix-spark-plan-metrics

Conversation

@charlesmyu
Copy link
Contributor

@charlesmyu charlesmyu commented Feb 9, 2026

What Does This Do

Updates the metrics in the _dd.spark.sql_plan meta field to use distributions calculated from individual task metrics, rather than the naively summed metrics provided by the StageInfo objects from Spark. This is because StageInfo naively sums all accumulators, even though that may not make sense for certain Spark SQL metrics (e.g. avg hash probes per key for aggr operations). Instead, we should accumulate those ourselves into distribution metrics and emit them accordingly.

Currently in the UI, this is only used in one place (in the Spark SQL metrics in the DJM product), so we're not too worried about changing the format here. UI update to follow.

If any issues arise with sending traces with a larger number of histograms, we can disable it using the DD_SPARK_TASK_HISTOGRAM_ENABLED feature flag.

Motivation

We'd like accurate metrics for Spark SQL operations that can reflect task-level characteristics as a distribution. This brings us more in line with what is shown in the Spark UI:
image

Additional Notes

We can't get rid of the original map that tracks accumulators to stages as we still use that to associate Spark SQL operations to stages. However, we can avoid storing the entire accumulator now, and instead just store a simple map of accumulator ID to stage ID. This will be done in a followup PR: #10645

Contributor Checklist

Jira ticket: [PROJ-IDENT]

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

@pr-commenter
Copy link

pr-commenter bot commented Feb 9, 2026

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-0000/fix-spark-plan-metrics
git_commit_date 1772631867 1772638339
git_commit_sha 70410da 7e4b7de
release_version 1.61.0-SNAPSHOT~70410da0e2 1.61.0-SNAPSHOT~7e4b7dec99
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1772640256 1772640256
ci_job_id 1475494235 1475494235
ci_pipeline_id 100352307 100352307
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-xihpldoz 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-xihpldoz 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 68 metrics, 3 unstable metrics.

Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~7e4b7dec99, baseline=1.61.0-SNAPSHOT~70410da0e2

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.057 s) : 0, 1057449
Total [baseline] (8.827 s) : 0, 8827399
Agent [candidate] (1.06 s) : 0, 1059530
Total [candidate] (8.843 s) : 0, 8843339
section iast
Agent [baseline] (1.225 s) : 0, 1224554
Total [baseline] (9.587 s) : 0, 9586709
Agent [candidate] (1.228 s) : 0, 1227830
Total [candidate] (9.588 s) : 0, 9587986
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.057 s -
Agent iast 1.225 s 167.105 ms (15.8%)
Total tracing 8.827 s -
Total iast 9.587 s 759.309 ms (8.6%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.06 s -
Agent iast 1.228 s 168.3 ms (15.9%)
Total tracing 8.843 s -
Total iast 9.588 s 744.647 ms (8.4%)
gantt
    title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~7e4b7dec99, baseline=1.61.0-SNAPSHOT~70410da0e2

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.2 ms) : 0, 1200
crashtracking [candidate] (1.207 ms) : 0, 1207
BytebuddyAgent [baseline] (627.634 ms) : 0, 627634
BytebuddyAgent [candidate] (629.037 ms) : 0, 629037
AgentMeter [baseline] (29.045 ms) : 0, 29045
AgentMeter [candidate] (29.109 ms) : 0, 29109
GlobalTracer [baseline] (256.393 ms) : 0, 256393
GlobalTracer [candidate] (257.641 ms) : 0, 257641
AppSec [baseline] (31.44 ms) : 0, 31440
AppSec [candidate] (31.436 ms) : 0, 31436
Debugger [baseline] (58.472 ms) : 0, 58472
Debugger [candidate] (58.329 ms) : 0, 58329
Remote Config [baseline] (603.742 µs) : 0, 604
Remote Config [candidate] (589.84 µs) : 0, 590
Telemetry [baseline] (8.624 ms) : 0, 8624
Telemetry [candidate] (8.703 ms) : 0, 8703
Flare Poller [baseline] (7.944 ms) : 0, 7944
Flare Poller [candidate] (7.4 ms) : 0, 7400
section iast
crashtracking [baseline] (1.195 ms) : 0, 1195
crashtracking [candidate] (1.215 ms) : 0, 1215
BytebuddyAgent [baseline] (794.232 ms) : 0, 794232
BytebuddyAgent [candidate] (797.742 ms) : 0, 797742
AgentMeter [baseline] (11.316 ms) : 0, 11316
AgentMeter [candidate] (11.344 ms) : 0, 11344
GlobalTracer [baseline] (247.028 ms) : 0, 247028
GlobalTracer [candidate] (246.861 ms) : 0, 246861
IAST [baseline] (25.158 ms) : 0, 25158
IAST [candidate] (25.092 ms) : 0, 25092
AppSec [baseline] (26.313 ms) : 0, 26313
AppSec [candidate] (26.277 ms) : 0, 26277
Debugger [baseline] (62.596 ms) : 0, 62596
Debugger [candidate] (62.886 ms) : 0, 62886
Remote Config [baseline] (522.741 µs) : 0, 523
Remote Config [candidate] (531.68 µs) : 0, 532
Telemetry [baseline] (14.864 ms) : 0, 14864
Telemetry [candidate] (14.844 ms) : 0, 14844
Flare Poller [baseline] (5.185 ms) : 0, 5185
Flare Poller [candidate] (4.892 ms) : 0, 4892
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~7e4b7dec99, baseline=1.61.0-SNAPSHOT~70410da0e2

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.065 s) : 0, 1064769
Total [baseline] (11.089 s) : 0, 11088576
Agent [candidate] (1.061 s) : 0, 1060971
Total [candidate] (11.041 s) : 0, 11040965
section appsec
Agent [baseline] (1.246 s) : 0, 1246079
Total [baseline] (11.207 s) : 0, 11206686
Agent [candidate] (1.245 s) : 0, 1244613
Total [candidate] (11.083 s) : 0, 11082766
section iast
Agent [baseline] (1.229 s) : 0, 1229372
Total [baseline] (11.387 s) : 0, 11387054
Agent [candidate] (1.233 s) : 0, 1232798
Total [candidate] (11.36 s) : 0, 11360308
section profiling
Agent [baseline] (1.182 s) : 0, 1181885
Total [baseline] (11.133 s) : 0, 11132711
Agent [candidate] (1.189 s) : 0, 1188624
Total [candidate] (11.068 s) : 0, 11068497
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.065 s -
Agent appsec 1.246 s 181.311 ms (17.0%)
Agent iast 1.229 s 164.604 ms (15.5%)
Agent profiling 1.182 s 117.116 ms (11.0%)
Total tracing 11.089 s -
Total appsec 11.207 s 118.11 ms (1.1%)
Total iast 11.387 s 298.478 ms (2.7%)
Total profiling 11.133 s 44.135 ms (0.4%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.061 s -
Agent appsec 1.245 s 183.642 ms (17.3%)
Agent iast 1.233 s 171.827 ms (16.2%)
Agent profiling 1.189 s 127.653 ms (12.0%)
Total tracing 11.041 s -
Total appsec 11.083 s 41.802 ms (0.4%)
Total iast 11.36 s 319.344 ms (2.9%)
Total profiling 11.068 s 27.532 ms (0.2%)
gantt
    title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~7e4b7dec99, baseline=1.61.0-SNAPSHOT~70410da0e2

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.208 ms) : 0, 1208
crashtracking [candidate] (1.19 ms) : 0, 1190
BytebuddyAgent [baseline] (631.588 ms) : 0, 631588
BytebuddyAgent [candidate] (629.063 ms) : 0, 629063
AgentMeter [baseline] (29.32 ms) : 0, 29320
AgentMeter [candidate] (29.077 ms) : 0, 29077
GlobalTracer [baseline] (258.122 ms) : 0, 258122
GlobalTracer [candidate] (257.641 ms) : 0, 257641
AppSec [baseline] (31.575 ms) : 0, 31575
AppSec [candidate] (31.435 ms) : 0, 31435
Debugger [baseline] (59.617 ms) : 0, 59617
Debugger [candidate] (59.429 ms) : 0, 59429
Remote Config [baseline] (586.017 µs) : 0, 586
Remote Config [candidate] (587.179 µs) : 0, 587
Telemetry [baseline] (8.652 ms) : 0, 8652
Telemetry [candidate] (8.636 ms) : 0, 8636
Flare Poller [baseline] (7.974 ms) : 0, 7974
Flare Poller [candidate] (7.849 ms) : 0, 7849
section appsec
crashtracking [baseline] (1.186 ms) : 0, 1186
crashtracking [candidate] (1.184 ms) : 0, 1184
BytebuddyAgent [baseline] (658.141 ms) : 0, 658141
BytebuddyAgent [candidate] (656.856 ms) : 0, 656856
AgentMeter [baseline] (12.002 ms) : 0, 12002
AgentMeter [candidate] (12.05 ms) : 0, 12050
GlobalTracer [baseline] (258.169 ms) : 0, 258169
GlobalTracer [candidate] (257.86 ms) : 0, 257860
IAST [baseline] (23.932 ms) : 0, 23932
IAST [candidate] (23.899 ms) : 0, 23899
AppSec [baseline] (177.978 ms) : 0, 177978
AppSec [candidate] (178.099 ms) : 0, 178099
Debugger [baseline] (65.328 ms) : 0, 65328
Debugger [candidate] (65.496 ms) : 0, 65496
Remote Config [baseline] (569.732 µs) : 0, 570
Remote Config [candidate] (566.084 µs) : 0, 566
Telemetry [baseline] (8.913 ms) : 0, 8913
Telemetry [candidate] (8.814 ms) : 0, 8814
Flare Poller [baseline] (3.615 ms) : 0, 3615
Flare Poller [candidate] (3.549 ms) : 0, 3549
section iast
crashtracking [baseline] (1.189 ms) : 0, 1189
crashtracking [candidate] (1.203 ms) : 0, 1203
BytebuddyAgent [baseline] (796.554 ms) : 0, 796554
BytebuddyAgent [candidate] (800.887 ms) : 0, 800887
AgentMeter [baseline] (11.349 ms) : 0, 11349
AgentMeter [candidate] (11.503 ms) : 0, 11503
GlobalTracer [baseline] (248.323 ms) : 0, 248323
GlobalTracer [candidate] (247.748 ms) : 0, 247748
IAST [baseline] (25.135 ms) : 0, 25135
IAST [candidate] (25.31 ms) : 0, 25310
AppSec [baseline] (26.358 ms) : 0, 26358
AppSec [candidate] (26.419 ms) : 0, 26419
Debugger [baseline] (63.853 ms) : 0, 63853
Debugger [candidate] (63.299 ms) : 0, 63299
Remote Config [baseline] (532.419 µs) : 0, 532
Remote Config [candidate] (525.386 µs) : 0, 525
Telemetry [baseline] (14.943 ms) : 0, 14943
Telemetry [candidate] (14.852 ms) : 0, 14852
Flare Poller [baseline] (4.987 ms) : 0, 4987
Flare Poller [candidate] (4.892 ms) : 0, 4892
section profiling
crashtracking [baseline] (1.166 ms) : 0, 1166
crashtracking [candidate] (1.181 ms) : 0, 1181
BytebuddyAgent [baseline] (682.435 ms) : 0, 682435
BytebuddyAgent [candidate] (687.335 ms) : 0, 687335
AgentMeter [baseline] (8.614 ms) : 0, 8614
AgentMeter [candidate] (8.668 ms) : 0, 8668
GlobalTracer [baseline] (215.44 ms) : 0, 215440
GlobalTracer [candidate] (216.345 ms) : 0, 216345
AppSec [baseline] (31.827 ms) : 0, 31827
AppSec [candidate] (32.059 ms) : 0, 32059
Debugger [baseline] (65.401 ms) : 0, 65401
Debugger [candidate] (65.394 ms) : 0, 65394
Remote Config [baseline] (578.813 µs) : 0, 579
Remote Config [candidate] (578.657 µs) : 0, 579
Telemetry [baseline] (8.186 ms) : 0, 8186
Telemetry [candidate] (8.192 ms) : 0, 8192
Flare Poller [baseline] (3.495 ms) : 0, 3495
Flare Poller [candidate] (3.503 ms) : 0, 3503
ProfilingAgent [baseline] (93.805 ms) : 0, 93805
ProfilingAgent [candidate] (94.251 ms) : 0, 94251
Profiling [baseline] (94.372 ms) : 0, 94372
Profiling [candidate] (94.817 ms) : 0, 94817
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-0000/fix-spark-plan-metrics
git_commit_date 1772631867 1772638339
git_commit_sha 70410da 7e4b7de
release_version 1.61.0-SNAPSHOT~70410da0e2 1.61.0-SNAPSHOT~7e4b7dec99
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1772640819 1772640819
ci_job_id 1475494238 1475494238
ci_pipeline_id 100352307 100352307
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-t7yt2yfq 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-t7yt2yfq 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 1 performance improvements and 2 performance regressions! Performance is the same for 18 metrics, 15 unstable metrics.

scenario Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p95 Δ mean throughput candidate mean agg_http_req_duration_p50 candidate mean agg_http_req_duration_p95 candidate mean throughput baseline mean agg_http_req_duration_p50 baseline mean agg_http_req_duration_p95 baseline mean throughput
scenario:load:insecure-bank:iast:high_load worse
[+204.952µs; +300.122µs] or [+8.558%; +12.532%]
worse
[+318.331µs; +727.754µs] or [+4.429%; +10.126%]
unstable
[-265.061op/s; +46.061op/s] or [-18.021%; +3.132%]
2.647ms 7.710ms 1361.344op/s 2.395ms 7.187ms 1470.844op/s
scenario:load:petclinic:profiling:high_load better
[-1183.826µs; -406.305µs] or [-6.078%; -2.086%]
same
[-1384.720µs; +548.202µs] or [-4.506%; +1.784%]
unstable
[-17.343op/s; +36.343op/s] or [-7.324%; +15.349%]
18.681ms 30.315ms 246.281op/s 19.476ms 30.733ms 236.781op/s
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~7e4b7dec99, baseline=1.61.0-SNAPSHOT~70410da0e2
    dateFormat X
    axisFormat %s
section baseline
no_agent (18.94 ms) : 18748, 19131
.   : milestone, 18940,
appsec (18.363 ms) : 18180, 18546
.   : milestone, 18363,
code_origins (17.903 ms) : 17725, 18081
.   : milestone, 17903,
iast (17.878 ms) : 17696, 18060
.   : milestone, 17878,
profiling (19.718 ms) : 19520, 19916
.   : milestone, 19718,
tracing (17.55 ms) : 17379, 17722
.   : milestone, 17550,
section candidate
no_agent (19.207 ms) : 19010, 19404
.   : milestone, 19207,
appsec (18.585 ms) : 18398, 18772
.   : milestone, 18585,
code_origins (17.64 ms) : 17462, 17818
.   : milestone, 17640,
iast (17.912 ms) : 17734, 18090
.   : milestone, 17912,
profiling (18.953 ms) : 18762, 19144
.   : milestone, 18953,
tracing (17.498 ms) : 17324, 17672
.   : milestone, 17498,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 18.94 ms [18.748 ms, 19.131 ms] -
appsec 18.363 ms [18.18 ms, 18.546 ms] -576.198 µs (-3.0%)
code_origins 17.903 ms [17.725 ms, 18.081 ms] -1.037 ms (-5.5%)
iast 17.878 ms [17.696 ms, 18.06 ms] -1.061 ms (-5.6%)
profiling 19.718 ms [19.52 ms, 19.916 ms] 778.579 µs (4.1%)
tracing 17.55 ms [17.379 ms, 17.722 ms] -1.389 ms (-7.3%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 19.207 ms [19.01 ms, 19.404 ms] -
appsec 18.585 ms [18.398 ms, 18.772 ms] -621.635 µs (-3.2%)
code_origins 17.64 ms [17.462 ms, 17.818 ms] -1.567 ms (-8.2%)
iast 17.912 ms [17.734 ms, 18.09 ms] -1.295 ms (-6.7%)
profiling 18.953 ms [18.762 ms, 19.144 ms] -253.676 µs (-1.3%)
tracing 17.498 ms [17.324 ms, 17.672 ms] -1.709 ms (-8.9%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~7e4b7dec99, baseline=1.61.0-SNAPSHOT~70410da0e2
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.203 ms) : 1192, 1215
.   : milestone, 1203,
iast (3.109 ms) : 3068, 3149
.   : milestone, 3109,
iast_FULL (5.72 ms) : 5664, 5777
.   : milestone, 5720,
iast_GLOBAL (3.477 ms) : 3419, 3534
.   : milestone, 3477,
profiling (2.046 ms) : 2029, 2064
.   : milestone, 2046,
tracing (1.862 ms) : 1846, 1877
.   : milestone, 1862,
section candidate
no_agent (1.182 ms) : 1170, 1193
.   : milestone, 1182,
iast (3.364 ms) : 3324, 3404
.   : milestone, 3364,
iast_FULL (5.752 ms) : 5695, 5810
.   : milestone, 5752,
iast_GLOBAL (3.463 ms) : 3411, 3516
.   : milestone, 3463,
profiling (2.084 ms) : 2065, 2102
.   : milestone, 2084,
tracing (1.796 ms) : 1781, 1811
.   : milestone, 1796,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.203 ms [1.192 ms, 1.215 ms] -
iast 3.109 ms [3.068 ms, 3.149 ms] 1.905 ms (158.3%)
iast_FULL 5.72 ms [5.664 ms, 5.777 ms] 4.517 ms (375.3%)
iast_GLOBAL 3.477 ms [3.419 ms, 3.534 ms] 2.273 ms (188.9%)
profiling 2.046 ms [2.029 ms, 2.064 ms] 842.931 µs (70.0%)
tracing 1.862 ms [1.846 ms, 1.877 ms] 658.317 µs (54.7%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.182 ms [1.17 ms, 1.193 ms] -
iast 3.364 ms [3.324 ms, 3.404 ms] 2.182 ms (184.6%)
iast_FULL 5.752 ms [5.695 ms, 5.81 ms] 4.571 ms (386.8%)
iast_GLOBAL 3.463 ms [3.411 ms, 3.516 ms] 2.281 ms (193.1%)
profiling 2.084 ms [2.065 ms, 2.102 ms] 901.843 µs (76.3%)
tracing 1.796 ms [1.781 ms, 1.811 ms] 614.328 µs (52.0%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-0000/fix-spark-plan-metrics
git_commit_date 1772631867 1772638339
git_commit_sha 70410da 7e4b7de
release_version 1.61.0-SNAPSHOT~70410da0e2 1.61.0-SNAPSHOT~7e4b7dec99
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1772640514 1772640514
ci_job_id 1475494239 1475494239
ci_pipeline_id 100352307 100352307
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-8h364tmv 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-8h364tmv 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~7e4b7dec99, baseline=1.61.0-SNAPSHOT~70410da0e2
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.4 s) : 15400000, 15400000
.   : milestone, 15400000,
appsec (15.006 s) : 15006000, 15006000
.   : milestone, 15006000,
iast (17.741 s) : 17741000, 17741000
.   : milestone, 17741000,
iast_GLOBAL (17.357 s) : 17357000, 17357000
.   : milestone, 17357000,
profiling (14.796 s) : 14796000, 14796000
.   : milestone, 14796000,
tracing (15.117 s) : 15117000, 15117000
.   : milestone, 15117000,
section candidate
no_agent (14.97 s) : 14970000, 14970000
.   : milestone, 14970000,
appsec (15.066 s) : 15066000, 15066000
.   : milestone, 15066000,
iast (17.927 s) : 17927000, 17927000
.   : milestone, 17927000,
iast_GLOBAL (17.643 s) : 17643000, 17643000
.   : milestone, 17643000,
profiling (14.817 s) : 14817000, 14817000
.   : milestone, 14817000,
tracing (15.147 s) : 15147000, 15147000
.   : milestone, 15147000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.4 s [15.4 s, 15.4 s] -
appsec 15.006 s [15.006 s, 15.006 s] -394.0 ms (-2.6%)
iast 17.741 s [17.741 s, 17.741 s] 2.341 s (15.2%)
iast_GLOBAL 17.357 s [17.357 s, 17.357 s] 1.957 s (12.7%)
profiling 14.796 s [14.796 s, 14.796 s] -604.0 ms (-3.9%)
tracing 15.117 s [15.117 s, 15.117 s] -283.0 ms (-1.8%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 14.97 s [14.97 s, 14.97 s] -
appsec 15.066 s [15.066 s, 15.066 s] 96.0 ms (0.6%)
iast 17.927 s [17.927 s, 17.927 s] 2.957 s (19.8%)
iast_GLOBAL 17.643 s [17.643 s, 17.643 s] 2.673 s (17.9%)
profiling 14.817 s [14.817 s, 14.817 s] -153.0 ms (-1.0%)
tracing 15.147 s [15.147 s, 15.147 s] 177.0 ms (1.2%)
Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~7e4b7dec99, baseline=1.61.0-SNAPSHOT~70410da0e2
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.481 ms) : 1470, 1493
.   : milestone, 1481,
appsec (3.848 ms) : 3624, 4072
.   : milestone, 3848,
iast (2.287 ms) : 2216, 2357
.   : milestone, 2287,
iast_GLOBAL (2.319 ms) : 2248, 2389
.   : milestone, 2319,
profiling (2.134 ms) : 2076, 2192
.   : milestone, 2134,
tracing (2.086 ms) : 2032, 2141
.   : milestone, 2086,
section candidate
no_agent (1.484 ms) : 1472, 1495
.   : milestone, 1484,
appsec (3.85 ms) : 3625, 4075
.   : milestone, 3850,
iast (2.278 ms) : 2208, 2348
.   : milestone, 2278,
iast_GLOBAL (2.327 ms) : 2256, 2397
.   : milestone, 2327,
profiling (2.125 ms) : 2068, 2183
.   : milestone, 2125,
tracing (2.092 ms) : 2037, 2146
.   : milestone, 2092,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.481 ms [1.47 ms, 1.493 ms] -
appsec 3.848 ms [3.624 ms, 4.072 ms] 2.367 ms (159.8%)
iast 2.287 ms [2.216 ms, 2.357 ms] 805.493 µs (54.4%)
iast_GLOBAL 2.319 ms [2.248 ms, 2.389 ms] 837.496 µs (56.5%)
profiling 2.134 ms [2.076 ms, 2.192 ms] 652.955 µs (44.1%)
tracing 2.086 ms [2.032 ms, 2.141 ms] 605.28 µs (40.9%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.484 ms [1.472 ms, 1.495 ms] -
appsec 3.85 ms [3.625 ms, 4.075 ms] 2.366 ms (159.5%)
iast 2.278 ms [2.208 ms, 2.348 ms] 793.993 µs (53.5%)
iast_GLOBAL 2.327 ms [2.256 ms, 2.397 ms] 842.984 µs (56.8%)
profiling 2.125 ms [2.068 ms, 2.183 ms] 641.356 µs (43.2%)
tracing 2.092 ms [2.037 ms, 2.146 ms] 607.821 µs (41.0%)

@charlesmyu charlesmyu force-pushed the charles.yu/djm-0000/fix-spark-plan-metrics branch from 4e5bdc7 to ba09c80 Compare February 9, 2026 14:48
@charlesmyu charlesmyu force-pushed the charles.yu/djm-0000/fix-spark-plan-metrics branch 5 times, most recently from cde7981 to e52fbc5 Compare February 19, 2026 21:41
@charlesmyu charlesmyu force-pushed the charles.yu/djm-0000/fix-spark-plan-metrics branch from e52fbc5 to e413d1d Compare February 19, 2026 22:02
@charlesmyu charlesmyu added inst: apache spark Apache Spark instrumentation type: enhancement Enhancements and improvements labels Feb 19, 2026
@charlesmyu charlesmyu force-pushed the charles.yu/djm-0000/fix-spark-plan-metrics branch 2 times, most recently from 89df516 to 8651527 Compare February 24, 2026 20:06
Copy link
Contributor Author

charlesmyu commented Feb 24, 2026

@charlesmyu charlesmyu marked this pull request as ready for review March 4, 2026 15:18
@charlesmyu charlesmyu requested review from a team as code owners March 4, 2026 15:18
@charlesmyu charlesmyu requested a review from mcculls March 4, 2026 15:18
@charlesmyu charlesmyu force-pushed the charles.yu/djm-0000/fix-spark-plan-metrics branch from 8651527 to 7e4b7de Compare March 4, 2026 15:37
Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, elegant implementation tackling a complex problem — I only left a small comment.

private static final MethodHandles methodLoader =
new MethodHandles(ClassLoader.getSystemClassLoader());
private static final MethodHandle externalAccums =
methodLoader.method(TaskMetrics.class, "externalAccums");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you provide some doc on why do we need reflection and which Spark version support externalAccums/withExternalAccums?

Copy link
Contributor Author

@charlesmyu charlesmyu Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any good public-facing docs for this (probably since it's an internal API), but it seems like the relevant commit is here: apache/spark@b33a3ee

Somewhere in Spark v3.5.2, there was a change to move from directly accessing externalAccums to using the withExternalAccums pattern. Unfortunately it seems like it was to remediate a performance regression so there wasn't any backwards compatibility provided with that change, and as a result we need reflection to figure out which method to use when pulling the accumulators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inst: apache spark Apache Spark instrumentation type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants