-
Notifications
You must be signed in to change notification settings - Fork 25
Add bounds reasoning comments to AArch64 backend #667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46421 cycles |
46419 cycles |
1.00 |
ML-DSA-44 sign |
132731 cycles |
132731 cycles |
1 |
ML-DSA-44 verify |
47842 cycles |
47839 cycles |
1.00 |
ML-DSA-65 keypair |
81446 cycles |
81443 cycles |
1.00 |
ML-DSA-65 sign |
219217 cycles |
219226 cycles |
1.00 |
ML-DSA-65 verify |
80140 cycles |
80140 cycles |
1 |
ML-DSA-87 keypair |
132750 cycles |
132771 cycles |
1.00 |
ML-DSA-87 sign |
280896 cycles |
280924 cycles |
1.00 |
ML-DSA-87 verify |
130322 cycles |
130330 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
115261 cycles |
115269 cycles |
1.00 |
ML-DSA-44 sign |
431714 cycles |
431703 cycles |
1.00 |
ML-DSA-44 verify |
122155 cycles |
122175 cycles |
1.00 |
ML-DSA-65 keypair |
197428 cycles |
197429 cycles |
1.00 |
ML-DSA-65 sign |
700994 cycles |
700966 cycles |
1.00 |
ML-DSA-65 verify |
197676 cycles |
197672 cycles |
1.00 |
ML-DSA-87 keypair |
325419 cycles |
325394 cycles |
1.00 |
ML-DSA-87 sign |
884517 cycles |
884436 cycles |
1.00 |
ML-DSA-87 verify |
328665 cycles |
328656 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
35434 cycles |
35082 cycles |
1.01 |
ML-DSA-44 sign |
120642 cycles |
120829 cycles |
1.00 |
ML-DSA-44 verify |
38292 cycles |
38217 cycles |
1.00 |
ML-DSA-65 keypair |
63150 cycles |
62792 cycles |
1.01 |
ML-DSA-65 sign |
201518 cycles |
201578 cycles |
1.00 |
ML-DSA-65 verify |
63025 cycles |
62619 cycles |
1.01 |
ML-DSA-87 keypair |
95825 cycles |
94417 cycles |
1.01 |
ML-DSA-87 sign |
235564 cycles |
230458 cycles |
1.02 |
ML-DSA-87 verify |
94899 cycles |
93737 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
95193 cycles |
95251 cycles |
1.00 |
ML-DSA-44 sign |
349301 cycles |
349405 cycles |
1.00 |
ML-DSA-44 verify |
100948 cycles |
100993 cycles |
1.00 |
ML-DSA-65 keypair |
165304 cycles |
164441 cycles |
1.01 |
ML-DSA-65 sign |
567264 cycles |
567124 cycles |
1.00 |
ML-DSA-65 verify |
165488 cycles |
165226 cycles |
1.00 |
ML-DSA-87 keypair |
268465 cycles |
268259 cycles |
1.00 |
ML-DSA-87 sign |
724063 cycles |
724173 cycles |
1.00 |
ML-DSA-87 verify |
272280 cycles |
272251 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
226343 cycles |
235855 cycles |
0.96 |
ML-DSA-44 sign |
669578 cycles |
697418 cycles |
0.96 |
ML-DSA-44 verify |
227619 cycles |
235679 cycles |
0.97 |
ML-DSA-65 keypair |
413279 cycles |
412522 cycles |
1.00 |
ML-DSA-65 sign |
1121384 cycles |
1115820 cycles |
1.00 |
ML-DSA-65 verify |
395696 cycles |
392924 cycles |
1.01 |
ML-DSA-87 keypair |
675837 cycles |
654605 cycles |
1.03 |
ML-DSA-87 sign |
1486942 cycles |
1432265 cycles |
1.04 |
ML-DSA-87 verify |
653743 cycles |
637053 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-87 keypair |
675837 cycles |
654605 cycles |
1.03 |
ML-DSA-87 sign |
1486942 cycles |
1432265 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
69634 cycles |
69408 cycles |
1.00 |
ML-DSA-44 sign |
185866 cycles |
185588 cycles |
1.00 |
ML-DSA-44 verify |
69632 cycles |
69022 cycles |
1.01 |
ML-DSA-65 keypair |
119519 cycles |
119527 cycles |
1.00 |
ML-DSA-65 sign |
296085 cycles |
296151 cycles |
1.00 |
ML-DSA-65 verify |
115496 cycles |
115310 cycles |
1.00 |
ML-DSA-87 keypair |
201515 cycles |
201740 cycles |
1.00 |
ML-DSA-87 sign |
385521 cycles |
385807 cycles |
1.00 |
ML-DSA-87 verify |
193326 cycles |
193642 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
117109 cycles |
115867 cycles |
1.01 |
ML-DSA-44 sign |
380933 cycles |
377525 cycles |
1.01 |
ML-DSA-44 verify |
121685 cycles |
120368 cycles |
1.01 |
ML-DSA-65 keypair |
200537 cycles |
200218 cycles |
1.00 |
ML-DSA-65 sign |
623530 cycles |
623046 cycles |
1.00 |
ML-DSA-65 verify |
198644 cycles |
198371 cycles |
1.00 |
ML-DSA-87 keypair |
328184 cycles |
327430 cycles |
1.00 |
ML-DSA-87 sign |
792111 cycles |
790806 cycles |
1.00 |
ML-DSA-87 verify |
325565 cycles |
324857 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
42714 cycles |
42050 cycles |
1.02 |
ML-DSA-44 sign |
130840 cycles |
130623 cycles |
1.00 |
ML-DSA-44 verify |
44222 cycles |
44275 cycles |
1.00 |
ML-DSA-65 keypair |
72977 cycles |
72347 cycles |
1.01 |
ML-DSA-65 sign |
210931 cycles |
211393 cycles |
1.00 |
ML-DSA-65 verify |
73284 cycles |
72769 cycles |
1.01 |
ML-DSA-87 keypair |
110228 cycles |
109988 cycles |
1.00 |
ML-DSA-87 sign |
248784 cycles |
248764 cycles |
1.00 |
ML-DSA-87 verify |
111640 cycles |
109522 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
135099 cycles |
135380 cycles |
1.00 |
ML-DSA-44 sign |
540292 cycles |
539702 cycles |
1.00 |
ML-DSA-44 verify |
148192 cycles |
148584 cycles |
1.00 |
ML-DSA-65 keypair |
228667 cycles |
228833 cycles |
1.00 |
ML-DSA-65 sign |
890151 cycles |
892086 cycles |
1.00 |
ML-DSA-65 verify |
238302 cycles |
238446 cycles |
1.00 |
ML-DSA-87 keypair |
373734 cycles |
372875 cycles |
1.00 |
ML-DSA-87 sign |
1105910 cycles |
1106065 cycles |
1.00 |
ML-DSA-87 verify |
387102 cycles |
387563 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
69576 cycles |
69610 cycles |
1.00 |
ML-DSA-44 sign |
213709 cycles |
213787 cycles |
1.00 |
ML-DSA-44 verify |
72626 cycles |
72475 cycles |
1.00 |
ML-DSA-65 keypair |
123669 cycles |
123352 cycles |
1.00 |
ML-DSA-65 sign |
350942 cycles |
350469 cycles |
1.00 |
ML-DSA-65 verify |
120890 cycles |
120415 cycles |
1.00 |
ML-DSA-87 keypair |
202348 cycles |
201192 cycles |
1.01 |
ML-DSA-87 sign |
450126 cycles |
449091 cycles |
1.00 |
ML-DSA-87 verify |
198811 cycles |
197923 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120412 cycles |
120466 cycles |
1.00 |
ML-DSA-44 sign |
453079 cycles |
455604 cycles |
0.99 |
ML-DSA-44 verify |
129899 cycles |
129917 cycles |
1.00 |
ML-DSA-65 keypair |
205074 cycles |
205155 cycles |
1.00 |
ML-DSA-65 sign |
734147 cycles |
735520 cycles |
1.00 |
ML-DSA-65 verify |
210029 cycles |
209895 cycles |
1.00 |
ML-DSA-87 keypair |
339889 cycles |
337601 cycles |
1.01 |
ML-DSA-87 sign |
931618 cycles |
924871 cycles |
1.01 |
ML-DSA-87 verify |
347721 cycles |
345776 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
74307 cycles |
74268 cycles |
1.00 |
ML-DSA-44 sign |
228607 cycles |
228732 cycles |
1.00 |
ML-DSA-44 verify |
78259 cycles |
78126 cycles |
1.00 |
ML-DSA-65 keypair |
130499 cycles |
130397 cycles |
1.00 |
ML-DSA-65 sign |
378327 cycles |
378266 cycles |
1.00 |
ML-DSA-65 verify |
129295 cycles |
129145 cycles |
1.00 |
ML-DSA-87 keypair |
209579 cycles |
211710 cycles |
0.99 |
ML-DSA-87 sign |
479323 cycles |
479642 cycles |
1.00 |
ML-DSA-87 verify |
208629 cycles |
210191 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
214694 cycles |
214295 cycles |
1.00 |
ML-DSA-44 sign |
782368 cycles |
794948 cycles |
0.98 |
ML-DSA-44 verify |
230498 cycles |
230029 cycles |
1.00 |
ML-DSA-65 keypair |
385625 cycles |
385839 cycles |
1.00 |
ML-DSA-65 sign |
1309399 cycles |
1307192 cycles |
1.00 |
ML-DSA-65 verify |
375860 cycles |
376224 cycles |
1.00 |
ML-DSA-87 keypair |
607551 cycles |
606942 cycles |
1.00 |
ML-DSA-87 sign |
1624842 cycles |
1625604 cycles |
1.00 |
ML-DSA-87 verify |
618005 cycles |
617304 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
57069 cycles |
57296 cycles |
1.00 |
ML-DSA-44 sign |
180312 cycles |
180557 cycles |
1.00 |
ML-DSA-44 verify |
61275 cycles |
61279 cycles |
1.00 |
ML-DSA-65 keypair |
99888 cycles |
99806 cycles |
1.00 |
ML-DSA-65 sign |
296356 cycles |
296199 cycles |
1.00 |
ML-DSA-65 verify |
100376 cycles |
100207 cycles |
1.00 |
ML-DSA-87 keypair |
154114 cycles |
154547 cycles |
1.00 |
ML-DSA-87 sign |
353395 cycles |
352845 cycles |
1.00 |
ML-DSA-87 verify |
152549 cycles |
152958 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
826071 cycles |
824630 cycles |
1.00 |
ML-DSA-44 sign |
3328025 cycles |
3328785 cycles |
1.00 |
ML-DSA-44 verify |
919467 cycles |
918716 cycles |
1.00 |
ML-DSA-65 keypair |
1403179 cycles |
1404566 cycles |
1.00 |
ML-DSA-65 sign |
5468362 cycles |
5454585 cycles |
1.00 |
ML-DSA-65 verify |
1466427 cycles |
1465675 cycles |
1.00 |
ML-DSA-87 keypair |
2303521 cycles |
2303418 cycles |
1.00 |
ML-DSA-87 sign |
6816366 cycles |
6808316 cycles |
1.00 |
ML-DSA-87 verify |
2403866 cycles |
2402810 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128332 cycles |
128280 cycles |
1.00 |
ML-DSA-44 sign |
456357 cycles |
457208 cycles |
1.00 |
ML-DSA-44 verify |
136286 cycles |
136322 cycles |
1.00 |
ML-DSA-65 keypair |
220788 cycles |
220689 cycles |
1.00 |
ML-DSA-65 sign |
746911 cycles |
746178 cycles |
1.00 |
ML-DSA-65 verify |
220358 cycles |
220384 cycles |
1.00 |
ML-DSA-87 keypair |
365198 cycles |
365074 cycles |
1.00 |
ML-DSA-87 sign |
944750 cycles |
944382 cycles |
1.00 |
ML-DSA-87 verify |
368963 cycles |
368862 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138815 cycles |
138766 cycles |
1.00 |
ML-DSA-44 sign |
493048 cycles |
493633 cycles |
1.00 |
ML-DSA-44 verify |
148351 cycles |
148356 cycles |
1.00 |
ML-DSA-65 keypair |
242409 cycles |
242242 cycles |
1.00 |
ML-DSA-65 sign |
809815 cycles |
809933 cycles |
1.00 |
ML-DSA-65 verify |
240722 cycles |
240609 cycles |
1.00 |
ML-DSA-87 keypair |
396739 cycles |
396619 cycles |
1.00 |
ML-DSA-87 sign |
1027495 cycles |
1027094 cycles |
1.00 |
ML-DSA-87 verify |
401595 cycles |
401371 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
159949 cycles |
157993 cycles |
1.01 |
ML-DSA-44 sign |
571755 cycles |
566309 cycles |
1.01 |
ML-DSA-44 verify |
172156 cycles |
169599 cycles |
1.02 |
ML-DSA-65 keypair |
271448 cycles |
271036 cycles |
1.00 |
ML-DSA-65 sign |
926175 cycles |
925911 cycles |
1.00 |
ML-DSA-65 verify |
276594 cycles |
275788 cycles |
1.00 |
ML-DSA-87 keypair |
451462 cycles |
451772 cycles |
1.00 |
ML-DSA-87 sign |
1183639 cycles |
1183487 cycles |
1.00 |
ML-DSA-87 verify |
461547 cycles |
460706 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
309643 cycles |
304613 cycles |
1.02 |
ML-DSA-44 sign |
1177956 cycles |
1238978 cycles |
0.95 |
ML-DSA-44 verify |
327779 cycles |
342557 cycles |
0.96 |
ML-DSA-65 keypair |
558913 cycles |
574348 cycles |
0.97 |
ML-DSA-65 sign |
1934782 cycles |
2000543 cycles |
0.97 |
ML-DSA-65 verify |
521783 cycles |
544249 cycles |
0.96 |
ML-DSA-87 keypair |
865936 cycles |
896114 cycles |
0.97 |
ML-DSA-87 sign |
2492678 cycles |
2558078 cycles |
0.97 |
ML-DSA-87 verify |
895228 cycles |
929425 cycles |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
115721 cycles |
115633 cycles |
1.00 |
ML-DSA-44 sign |
377208 cycles |
377229 cycles |
1.00 |
ML-DSA-44 verify |
120332 cycles |
120215 cycles |
1.00 |
ML-DSA-65 keypair |
200117 cycles |
200075 cycles |
1.00 |
ML-DSA-65 sign |
622821 cycles |
622903 cycles |
1.00 |
ML-DSA-65 verify |
198196 cycles |
198200 cycles |
1.00 |
ML-DSA-87 keypair |
327645 cycles |
326771 cycles |
1.00 |
ML-DSA-87 sign |
791183 cycles |
789996 cycles |
1.00 |
ML-DSA-87 verify |
325316 cycles |
324410 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
214094 cycles |
213856 cycles |
1.00 |
ML-DSA-44 sign |
781550 cycles |
782246 cycles |
1.00 |
ML-DSA-44 verify |
230039 cycles |
230309 cycles |
1.00 |
ML-DSA-65 keypair |
385262 cycles |
385241 cycles |
1.00 |
ML-DSA-65 sign |
1327081 cycles |
1314056 cycles |
1.01 |
ML-DSA-65 verify |
375526 cycles |
375747 cycles |
1.00 |
ML-DSA-87 keypair |
606484 cycles |
606555 cycles |
1.00 |
ML-DSA-87 sign |
1621418 cycles |
1622829 cycles |
1.00 |
ML-DSA-87 verify |
617191 cycles |
617434 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
292600 cycles |
292046 cycles |
1.00 |
ML-DSA-44 sign |
929575 cycles |
928646 cycles |
1.00 |
ML-DSA-44 verify |
296239 cycles |
295851 cycles |
1.00 |
ML-DSA-65 keypair |
492833 cycles |
494958 cycles |
1.00 |
ML-DSA-65 sign |
1524447 cycles |
1521425 cycles |
1.00 |
ML-DSA-65 verify |
481202 cycles |
484056 cycles |
0.99 |
ML-DSA-87 keypair |
838549 cycles |
846764 cycles |
0.99 |
ML-DSA-87 sign |
2061507 cycles |
2058225 cycles |
1.00 |
ML-DSA-87 verify |
820738 cycles |
820143 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
| Benchmark suite | Current: 37d7c09 | Previous: 314ad8e | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
469687 cycles |
469757 cycles |
1.00 |
ML-DSA-44 sign |
2212428 cycles |
2214917 cycles |
1.00 |
ML-DSA-44 verify |
550456 cycles |
550782 cycles |
1.00 |
ML-DSA-65 keypair |
782601 cycles |
782351 cycles |
1.00 |
ML-DSA-65 sign |
3630929 cycles |
3630275 cycles |
1.00 |
ML-DSA-65 verify |
853006 cycles |
848857 cycles |
1.00 |
ML-DSA-87 keypair |
1263087 cycles |
1268638 cycles |
1.00 |
ML-DSA-87 sign |
4489471 cycles |
4502481 cycles |
1.00 |
ML-DSA-87 verify |
1373034 cycles |
1369173 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
37d7c09 to
c685654
Compare
dev/aarch64_clean/src/mld_polyvecl_pointwise_acc_montgomery_l4.S
Outdated
Show resolved
Hide resolved
0f08ab9 to
db7d706
Compare
| pmull v24, v25, v16, v20 | ||
| pmull v26, v27, v17, v21 | ||
| pmull v28, v29, v18, v22 | ||
| pmull v30, v31, v19, v23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment, I didn't notice this when the code was introduced: We should really use symbolic register values (using .req), not architectural ones. It's much easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Thanks!
While coming up with names for the 4 registers for coefficients from a and b, I wanted to use a0, ..., a3, b0, ..., b3. However b0, ..., b3 are built-in register names already.
I'll resort to use a_0, ..., a_3, b_0, ..., b_3 for now. Any other naming schemes are appreciated!
| // load -q^-1 = 4236238847 | ||
| movz wtmp, #57343 | ||
| movk wtmp, #64639, lsl #16 | ||
| // load q^-1 = 58728449 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support it yet in assembly files, but can you add
/* check-magic: 58728449 == unsigned_mod(pow(MLDSA_Q, -1, 2^32), 2^32) */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Working on it now.
db7d706 to
de20938
Compare
Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
The Montgomery reduction is changed to the positive/subtractive variant to match mld_montgomery_reduce(). This avoids the need to clarify that the positive and the negative variant have exact same bounds despite their different appearance. Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
de20938 to
4b0bcb6
Compare
|
Thank you @hanno-becker and @mkannwischer for your suggestions. I've implemented them in the new patch. I also included some annotation for the range of intermediates inside decompose. |
Towards resolving Add bounds reasoning comments to AArch64 backend #529. Still need to do decompose.Note that the Montgomery reduction within the basemuls is changed to the positive/subtractive variant, in order to match
mld_montgomery_reduce()and reuse the same bound (and reasoning, when we add it later in #602). The performance is expected to be identical.