Skip to content

[LLVM,X64] Add support for SHA intrinsics#37

Open
TechnoPorg wants to merge 1 commit intotpde2:masterfrom
TechnoPorg:push-tpxsssznxpsy
Open

[LLVM,X64] Add support for SHA intrinsics#37
TechnoPorg wants to merge 1 commit intotpde2:masterfrom
TechnoPorg:push-tpxsssznxpsy

Conversation

@TechnoPorg
Copy link
Copy Markdown
Contributor

Adds the LLVM x86 sha1* and sha256* intrinsics, but not vsha512* (not supported by fadec yet).

I'm not convinced I've done the fixed register in SHA256RNDS2 properly. Apologies if I've misunderstood something.

Copy link
Copy Markdown
Member

@aengelke aengelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Comment on lines +623 to +627
auto [dst_ref, dst_part] = this->val_ref_single(inst->getOperand(0));
auto [src_ref, src_part] = this->val_ref_single(inst->getOperand(1));
ASM(SHA1MSG1rr,
dst_part.load_to_reg(),
src_part.load_to_reg());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This destroys a value that might still be used. Use into_temporary, see crc32 above. Likewise for other cases.

case llvm::Intrinsic::x86_sha256rnds2: {
auto [dst_ref, dst_part] = this->val_ref_single(inst->getOperand(0));
auto [src_ref, src_part] = this->val_ref_single(inst->getOperand(1));
auto [round_constants_ref, round_constants_part] = this->val_ref_single(inst->getOperand(2));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use load_to_specific and do this before loading any other values. FixedRegBackup should never be used outside of generated code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: you will have to call into_temporary first as otherwise it will break for fixed assignments.

; X64-LABEL: <sha1msg1>:
; X64: sha1msg1 xmm0, xmm1
; X64-NEXT: ret
%res = call <4 x i32> @llvm.x86.sha1msg1(<4 x i32> %a, <4 x i32> %b)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests lack variant where destination reg cannot be reused.

@T0b1-iOS
Copy link
Copy Markdown
Member

T0b1-iOS commented Apr 2, 2026

Maybe just add an #ifdef __x86_64__ to the encode template and use encodegen for all the intrinsics except the one with the immediate argument. Otherwise you can also merge the cases and do another if/switch in them for the intrinsics where there is code duplication. Doesn‘t look very nice like this.

@TechnoPorg
Copy link
Copy Markdown
Contributor Author

Why?

rust's core::arch module uses LLVM intrinsics, and these were some of the first that I ran across in the wild. If you're not interested in supporting them, please let me know before I make more such contributions.

Alternatively, if you'd prefer to generate code for all* architecture-specific intrinsics by scraping LLVM's TableGen files, I could instead work on that.

@aengelke
Copy link
Copy Markdown
Member

aengelke commented Apr 2, 2026

Is this something that comes up while building reasonably common code? In that case, it would be interesting to know how exactly this comes about (I'd expect crypto intrinsics to be only used in crypto libraries and these routines typically don't benefit from inlining, so shouldn't be compiled that often). Or does it just happen when building some fairly/the standard libraries? Standard libraries tend to contain a lot of rarely used constructs, which tend to require a lot of effort.

The primary goal for TPDE-LLVM should be to compile typical modules (95%?) as generated by front-ends without or with light optimizations. So unless a non-tiny amount of modules contain these intrinsics, I'd rather focus on missing support for more important IR constructs.

Support for all intrinsics is most likely a non-goal: it adds a lot of code and substantially increases maintenance costs while providing very little benefit in practice.

@TechnoPorg
Copy link
Copy Markdown
Contributor Author

Is this something that comes up while building reasonably common code?

I encountered it when (perhaps overly ambitiously) trying to build pingora. As you said, I don't expect the SHA intrinsics specifically would be used outside of web/crypto code and the standard library. But most Rust projects will use an intrinsic somewhere in the dependency tree, and this remains a problem for rustc_codegen_cranelift on real-world code (rust-lang/rustc_codegen_cranelift#171). It currently implements intrinsics manually, but is looking into automating the process (rust-lang/rustc_codegen_cranelift#1547). Would you prefer for rustc_codegen_tpde to handle the intrinsics itself?

Support for all intrinsics is most likely a non-goal: it adds a lot of code and substantially increases maintenance costs while providing very little benefit in practice.

Thanks for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants