enable MEM_FORCE_MEMORY_ACCESS=2 for RISC-V targets with zicclsm#4596
enable MEM_FORCE_MEMORY_ACCESS=2 for RISC-V targets with zicclsm#4596Polaris-911 wants to merge 1 commit intofacebook:devfrom
Conversation
|
This looks good to me. Ultimately, it would be better if the compiler was able to handle optimizing CC @Cyan4973 in case you have an opinion, before I merge |
|
@Polaris-911 I don't see the same improvement to decompression speed as we see in PR #4584. Do you know what function the difference in decompression speed is coming from between these two PRs? Other than these functions, the decompressor really shouldn't be doing any unaligned accesses, so I don't know where the difference would be coming from. |
|
@terrelln Thanks for pointing that out! I suspect the difference in decompression speed might be related to ZSTD_wildcopy rather than the MEM_read/MEM_write macros. While the decompressor doesn't directly use MEM_read* for data manipulation, ZSTD_execSequence relies heavily on ZSTD_wildcopy (which eventually calls __builtin_memcpy via ZSTD_copy8/ZSTD_copy16) to copy both literals and matches. |
|
@terrelln @Cyan4973 Hi , I submitted a similar PR( #4524) before, and at that time @Cyan4973 was concerned about undefined behavior (UB). I completely understand @Cyan4973's concerns about Undefined Behavior and that the compiler should handle memcpy optimizations natively (Method 0). However, given the current state of RISC-V compilers, they still fall short in optimizing memcpy for unaligned accesses, leaving a significant performance gap on hardware that supports it (like with zicclsm). Would you be open to accepting either this MEM_FORCE_MEMORY_ACCESS=2 approach or explicitly adding -mno-strict-align to the RISC-V build flags(#4584) as a temporary workaround? We can absolutely revert this once the upstream compilers improve their memcpy optimizations for RISC-V. This would allow users to benefit from the immediate performance gains in the meantime. |
Summary
This PR updates
lib/common/mem.hto selectMEM_FORCE_MEMORY_ACCESS=2for RISC-V builds when both__riscvand__riscv_zicclsmare defined.Motivation
Zicclsmindicates support for misaligned loads/stores in main memory.For these targets, using method 2 enables direct unaligned memory access in
mem.h.References
GCC Zicclsm
RVA20U64 specification
Changes
MEM_FORCE_MEMORY_ACCESSselection logic:MEM_FORCE_MEMORY_ACCESS=2whendefined(__riscv) && defined(__riscv_zicclsm)MEM_FORCE_MEMORY_ACCESS=1for other GCC targetsMEM_FORCE_MEMORY_ACCESSis explicitly set by the build system.Validation
mem.h.Benchmark Data (Compression Speed)
Data screenshot