Two unrelated improvements possible: 1. ~~Test the same thing for 128-bit atomic on x64. it was also initially implemented with cx16-based load, but now with SSE2, so the same issue applies~~ not yet, actually, see #4480 2. It has UB and that is commented in the test, but with `atmoic_ref` it can be avoided Not sure if either are worth doing. It is an old test for customer's situation, so may be preserved as it is.