Skip to content

Conversation

@leiwen2025
Copy link

This PR introduces an optimized adler32_rvv implementation for vlen=128.

The optimization has been verified on the SG2044 platform:

SG2044:
        new: adler32_warm: runtime =    3062471 usecs, bandwidth 23095 MB in 3.0625 sec = 7541.43 MB/s
        old: adler32_warm: runtime =    3062465 usecs, bandwidth 9233 MB in 3.0625 sec = 3015.15 MB/s

Signed-off-by: WenLei <lei.wen2@zte.com.cn>
Signed-off-by: WenLei <lei.wen2@zte.com.cn>
@pablodelara
Copy link
Contributor

@sunyuechi can you review this? Thanks!

addi sp, sp, -32
sd ra, 24(sp)
sd s1, 16(sp)
sd s2, 8(sp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the unused registers to reduce stack operations (at least a7, t5)

slli s1, a0, 48
srli s1, s1, 48 // s1: A = adler32 & 0xffff
srliw s2, a0, 16 // s2: B = adler32 >> 16
add s3, a1, a2 // s3 = end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3 unused?

Signed-off-by: WenLei <lei.wen2@zte.com.cn>
la a7, factors
vle8.v v0, (a7)
vmv.v.i v4, 0
vmv.v.i v8, 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v4 hasn’t been modified, so you can just use v4.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the review!

mv t2, t1
1:
mv a3, t5
mv a4, t6
Copy link
Contributor

@sunyuechi sunyuechi Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t5, t6 -> a3, a4
update a3, a4
a3, a4 -> t5, t6

It doesn’t seem to be needed here — is it fine to just update t5 and t6 directly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the review!

Signed-off-by: WenLei <lei.wen2@zte.com.cn>
@leiwen2025 leiwen2025 force-pushed the rv64-igzip-adler32rvv128 branch from 787635b to 7ed6de8 Compare December 6, 2025 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants