Directly Build UstrMap For Properties #525

krakow10 · 2025-05-03T19:17:30Z

~~Fixes Unreliable Benchmarks #529~~
Substantial performance regression on Windows
Substantial performance improvement on Linux, possibly bugged

If there's no reason why InstanceBuilder.properties can't be directly assigned to Instance.properties, then let's do that. The same logic is applied to building the properties in rbx_xml. ~~The "Miner's Haven/Deserialize" benchmark says throughput is increased by 30%. I was mistaken, the performance uplift was due to #529 comparing to a bad baseline run on master branch.~~ There seems to be a performance disparity between Windows and Linux for this change.

Dekkonot · 2025-05-03T23:22:04Z

For what it's worth, this was a performance decision because a lot of time is spent inserting into that hashmap: #467

On my machine the change in this PR actually degrades performance by quite a lot:

Are you able to replicate the boost in performance? What are your system specs like?

krakow10 · 2025-05-04T01:09:14Z

Excellent, I assumed it must be that way for performance reasons, but I still don't understand how it's faster to build a vec and then build a hashmap instead of just directly building a hashmap. Is is because of rehashing while allocating more capacity? I'm running the benchmarks on Linux using a 5950X like I mentioned on #529.

I reran the benchmarks first on master and then on properties branch, here is the full output for the properties branch:

     Running benches/suite/main.rs (target/release/deps/suite-4220ebfc15de915e)
Gnuplot not found, using plotters backend
100 Folders/Serialize   time:   [20.673 µs 20.685 µs 20.699 µs]
                        thrpt:  [36.445 MiB/s 36.469 MiB/s 36.490 MiB/s]
                 change:
                        time:   [+0.2859% +0.3785% +0.4745%] (p = 0.00 < 0.05)
                        thrpt:  [-0.4723% -0.3771% -0.2850%]
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
100 Folders/Deserialize time:   [34.843 µs 34.905 µs 34.985 µs]
                        thrpt:  [20.771 MiB/s 20.819 MiB/s 20.857 MiB/s]
                 change:
                        time:   [-6.5392% -6.4015% -6.2569%] (p = 0.00 < 0.05)
                        thrpt:  [+6.6745% +6.8393% +6.9967%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

100 Deep Folders/Serialize
                        time:   [22.916 µs 22.935 µs 22.952 µs]
                        thrpt:  [32.866 MiB/s 32.891 MiB/s 32.918 MiB/s]
                 change:
                        time:   [-2.3817% -2.0812% -1.8326%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8668% +2.1254% +2.4398%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
100 Deep Folders/Deserialize
                        time:   [40.367 µs 40.395 µs 40.423 µs]
                        thrpt:  [18.001 MiB/s 18.013 MiB/s 18.026 MiB/s]
                 change:
                        time:   [-5.9860% -5.8948% -5.8109%] (p = 0.00 < 0.05)
                        thrpt:  [+6.1694% +6.2640% +6.3671%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

100 100-line ModuleScripts/Serialize
                        time:   [87.191 µs 87.308 µs 87.453 µs]
                        thrpt:  [42.181 MiB/s 42.251 MiB/s 42.307 MiB/s]
                 change:
                        time:   [-0.9348% -0.0077% +1.3494%] (p = 0.98 > 0.05)
                        thrpt:  [-1.3314% +0.0077% +0.9436%]
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) high mild
  16 (16.00%) high severe
100 100-line ModuleScripts/Deserialize
                        time:   [148.15 µs 148.28 µs 148.43 µs]
                        thrpt:  [24.576 MiB/s 24.601 MiB/s 24.622 MiB/s]
                 change:
                        time:   [-4.6385% -4.1208% -3.6891%] (p = 0.00 < 0.05)
                        thrpt:  [+3.8305% +4.2979% +4.8642%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) high mild
  4 (4.00%) high severe

1,000 Parts/Serialize   time:   [1.7785 ms 1.7821 ms 1.7863 ms]
                        thrpt:  [2.3432 MiB/s 2.3487 MiB/s 2.3535 MiB/s]
                 change:
                        time:   [+0.8976% +1.1602% +1.4339%] (p = 0.00 < 0.05)
                        thrpt:  [-1.4136% -1.1469% -0.8896%]
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
1,000 Parts/Deserialize time:   [1.9735 ms 1.9768 ms 1.9807 ms]
                        thrpt:  [2.0516 MiB/s 2.0556 MiB/s 2.0591 MiB/s]
                 change:
                        time:   [+6.9658% +7.2007% +7.4589%] (p = 0.00 < 0.05)
                        thrpt:  [-6.9412% -6.7170% -6.5122%]
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

Benchmarking Miner's Haven/Serialize: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 14.3s, or reduce sample count to 30.
Miner's Haven/Serialize time:   [142.46 ms 142.93 ms 143.44 ms]
                        thrpt:  [52.796 MiB/s 52.986 MiB/s 53.159 MiB/s]
                 change:
                        time:   [-9.0355% -8.5930% -8.1495%] (p = 0.00 < 0.05)
                        thrpt:  [+8.8726% +9.4008% +9.9330%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
Benchmarking Miner's Haven/Deserialize: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 18.0s, or reduce sample count to 20.
Miner's Haven/Deserialize
                        time:   [178.69 ms 179.64 ms 180.70 ms]
                        thrpt:  [41.957 MiB/s 42.205 MiB/s 42.428 MiB/s]
                 change:
                        time:   [-26.671% -25.882% -25.149%] (p = 0.00 < 0.05)
                        thrpt:  [+33.600% +34.920% +36.371%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

Seems positive and reproducible.

kennethloeffler · 2025-05-05T23:53:45Z

This change overall degrades performance on my M1 MacBook Air. Here is the deserialize Miner's Haven benchmark plotted (n = 1000, red is latest master, blue is this branch):

I'm not entirely sure why there is such a significant difference, but I did some cursory profiling and noticed that these changes cause ~10% more L1 misses and ~12% more L2 misses when deserializing miners-haven.rbxl on my machine. So, one reason could be that using a scratch Vec for instance builder properties somehow improves cache locality over UstrMap (at least, on my machine and on this workload).

I highly recommend resolving #529 before moving on with any performance work! The benchmarks are only as good as the data they can collect - if the data is polluted with e.g. CPU boosts then it's much, much harder to get a good signal

krakow10 · 2025-05-06T01:11:20Z

I looked up how to set the cpu performance governor, and it can be done using cpupower like so:

sudo cpupower frequency-set -g performance

Which supposedly just sets it to always run at the maximum frequency. The default governer was called "powersave".

Running benchmarks repeatedly on the properties branch has been consistent within 5% across 10 runs, but running 5 benchmarks on the master or entries branch both elicit the 70% variance issue, going by feeling I'd say it's fast 50% of the time and slow 50% of the time. The properties branch's consistent benchmark is 10% slower than master for the "fast" variant of the "1,000 Parts/Deserialize" benchmark.

Ustr uses precomputed hashes which I assume are the same for every hashmap. Could it be accidental quadratic behaviour when doing new_map = map.iter().collect();?

Dekkonot · 2025-11-19T04:13:21Z

Tried this again to see if any of the other changes had magically fixed the performance regression but nope, this still regresses performance by a fair bit. I ran it around a dozen times and it was worse every time, so I'm still not sure what you're observing.

krakow10 · 2025-11-19T23:22:18Z

I've updated the description to include pertinent information. The originally reported performance uplift was because I was comparing to a low-performance run of master branch. Compared to the high-performance variant, which is the expected master branch performance, this change introduces a substantial performance regression.

krakow10 · 2025-11-20T01:57:06Z

Closing after revisiting benchmarks with a new governor tool, see #529 (comment).

…e.properties

krakow10 · 2025-11-20T04:22:00Z

~~Okay, I can no longer repro #529~~, but I still see a performance improvement on this PR, upwards of 30% on the micro benchmarks! I'm more interested to see why there is a disparity between Windows and Linux / what the bug is than I am to have the actual changes merged. I even hit a deserialize unwrap during benchmarking after rebasing:

thread 'main' (485409) panicked at /home/quat/git/rbx-dom/rbx_binary/src/deserializer/state.rs:527:80:
called `Option::unwrap()` on a `None` value

which corresponds to this line:

rbx-dom/rbx_binary/src/deserializer/state.rs

Line 527 in 1c5b4bc

let instance = self.instances_by_ref.get_mut(referent).unwrap();

I hit a segfault on the next run:

Benchmarking Miner's Haven/Deserialize: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 17.8s, or reduce sample count to 20.
Benchmarking Miner's Haven/Deserialize: Collecting 100 samples in estimated 17.803 s (100 iterations)error: bench failed, to rerun pass `-p rbx_binary --bench suite`

Caused by:
  process didn't exit successfully: `/home/quat/git/rbx-dom/target/release/deps/suite-4db239afe3b2da11 --bench` (signal: 11, SIGSEGV: invalid memory reference)

Very curious! Undefined behaviour!

First successful run:

(Nov 20)
I ran the benchmarks on the master branch today and my computer hard crashed and rebooted! What the hell!

(Nov 29)
I am once again getting #529, even with cpu-power-gui.

krakow10 mentioned this pull request May 13, 2025

Unreliable Benchmarks #529

Closed

krakow10 force-pushed the properties branch from f1e9ccd to 66a4569 Compare November 7, 2025 04:41

krakow10 closed this Nov 20, 2025

krakow10 deleted the properties branch November 20, 2025 01:57

krakow10 added 3 commits November 19, 2025 20:11

rbx_dom_weak: use UstrMap for InstanceBuilder.properties

1fde975

rbx_dom_weak: directly insert InstanceBuilder.properties into Instanc…

057ff9d

…e.properties

rbx_xml: directly build UstrMap for Instance.properties

b131883

krakow10 restored the properties branch November 20, 2025 04:21

krakow10 reopened this Nov 20, 2025

krakow10 force-pushed the properties branch from 66a4569 to b131883 Compare November 20, 2025 04:37

krakow10 marked this pull request as draft November 21, 2025 00:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Directly Build UstrMap For Properties #525

Directly Build UstrMap For Properties #525

Uh oh!

krakow10 commented May 3, 2025 •

edited

Loading

Uh oh!

Dekkonot commented May 3, 2025

Uh oh!

krakow10 commented May 4, 2025 •

edited

Loading

Uh oh!

kennethloeffler commented May 5, 2025 •

edited

Loading

Uh oh!

krakow10 commented May 6, 2025 •

edited

Loading

Uh oh!

Dekkonot commented Nov 19, 2025

Uh oh!

krakow10 commented Nov 19, 2025 •

edited

Loading

Uh oh!

krakow10 commented Nov 20, 2025

Uh oh!

krakow10 commented Nov 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Directly Build UstrMap For Properties #525

Are you sure you want to change the base?

Directly Build UstrMap For Properties #525

Uh oh!

Conversation

krakow10 commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dekkonot commented May 3, 2025

Uh oh!

krakow10 commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kennethloeffler commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krakow10 commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dekkonot commented Nov 19, 2025

Uh oh!

krakow10 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krakow10 commented Nov 20, 2025

Uh oh!

krakow10 commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

krakow10 commented May 3, 2025 •

edited

Loading

krakow10 commented May 4, 2025 •

edited

Loading

kennethloeffler commented May 5, 2025 •

edited

Loading

krakow10 commented May 6, 2025 •

edited

Loading

krakow10 commented Nov 19, 2025 •

edited

Loading

krakow10 commented Nov 20, 2025 •

edited

Loading