Use this guide when benchmark ratios point at a real hotspot and you need call-stack evidence before changing the runtime.
The repo ships a focused benchmark-runner profile mode plus a perf wrapper script:
bash scripts/profile-benchmark-hot-path.sh codecmapper-serialize 120000 person-batch-25That command:
- runs one deterministic benchmark operation repeatedly in
Release - captures
perf statcounters intoperf.stat.txt - captures sampled stacks into
perf.data - emits both
.NETperf maps and JIT dump metadata so managed frames can be symbolized - renders a text call-stack report into
perf.report.txt
Artifacts land under .artifacts/profiling/<operation>-<scenario-or-records>-<iterations>iters/.
codecmapper-serializecodecmapper-deserialize-bytesstj-serializestj-deserializenewtonsoft-serializenewtonsoft-deserializeour-parser-scan-bytesutf8jsonreader-scan-bytestyped-experiment-deserialize-bytes
Profile the CodecMapper path first:
bash scripts/profile-benchmark-hot-path.sh codecmapper-serialize 120000 person-batch-25
bash scripts/profile-benchmark-hot-path.sh codecmapper-deserialize-bytes 40000 person-batch-25-unknown-fieldsThen capture the System.Text.Json baseline on the same scenario:
bash scripts/profile-benchmark-hot-path.sh stj-serialize 120000 person-batch-25
bash scripts/profile-benchmark-hot-path.sh stj-deserialize 40000 person-batch-25-unknown-fieldsRead perf.stat.txt for high-level counters and perf.report.txt for the hottest call paths.
- The profile wrapper now defaults to the
person-batch-25scenario from the shared benchmark matrix. - Pass a scenario name such as
telemetry-500orescaped-articles-20to profile one of the standard workloads. - The diagnostics matrix now also includes decode-focused scenarios such as
decode-strings-1000,decode-flat-objects-400, anddecode-flat-objects-400-unknown-fieldswhen you need to isolate string handling, flat record assembly, or unknown-field skipping. - Passing a plain integer as the third argument still uses the legacy nested-record batch with
--records <n>. - The wrapper sets
DOTNET_PerfMapEnabled=3andCOMPlus_PerfMapEnabled=3soperf inject --jithas the metadata it needs for managed symbol names. - If
perf recordis blocked by local kernel permissions, the script will fail before writingperf.report.txt. In that case, fix localperfpermissions first and rerun the same command. - Keep comparisons on the same power profile and CPU governor, otherwise the counter deltas are noisy enough to mislead.