Skip to content

Conversation

@jammychiou1
Copy link
Contributor

@jammychiou1 jammychiou1 commented Nov 11, 2025

Note that the Montgomery reduction within the basemuls is changed to the positive/subtractive variant, in order to match mld_montgomery_reduce() and reuse the same bound (and reasoning, when we add it later in #602). The performance is expected to be identical.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 46421 cycles 46419 cycles 1.00
ML-DSA-44 sign 132731 cycles 132731 cycles 1
ML-DSA-44 verify 47842 cycles 47839 cycles 1.00
ML-DSA-65 keypair 81446 cycles 81443 cycles 1.00
ML-DSA-65 sign 219217 cycles 219226 cycles 1.00
ML-DSA-65 verify 80140 cycles 80140 cycles 1
ML-DSA-87 keypair 132750 cycles 132771 cycles 1.00
ML-DSA-87 sign 280896 cycles 280924 cycles 1.00
ML-DSA-87 verify 130322 cycles 130330 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 115261 cycles 115269 cycles 1.00
ML-DSA-44 sign 431714 cycles 431703 cycles 1.00
ML-DSA-44 verify 122155 cycles 122175 cycles 1.00
ML-DSA-65 keypair 197428 cycles 197429 cycles 1.00
ML-DSA-65 sign 700994 cycles 700966 cycles 1.00
ML-DSA-65 verify 197676 cycles 197672 cycles 1.00
ML-DSA-87 keypair 325419 cycles 325394 cycles 1.00
ML-DSA-87 sign 884517 cycles 884436 cycles 1.00
ML-DSA-87 verify 328665 cycles 328656 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 35434 cycles 35082 cycles 1.01
ML-DSA-44 sign 120642 cycles 120829 cycles 1.00
ML-DSA-44 verify 38292 cycles 38217 cycles 1.00
ML-DSA-65 keypair 63150 cycles 62792 cycles 1.01
ML-DSA-65 sign 201518 cycles 201578 cycles 1.00
ML-DSA-65 verify 63025 cycles 62619 cycles 1.01
ML-DSA-87 keypair 95825 cycles 94417 cycles 1.01
ML-DSA-87 sign 235564 cycles 230458 cycles 1.02
ML-DSA-87 verify 94899 cycles 93737 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 95193 cycles 95251 cycles 1.00
ML-DSA-44 sign 349301 cycles 349405 cycles 1.00
ML-DSA-44 verify 100948 cycles 100993 cycles 1.00
ML-DSA-65 keypair 165304 cycles 164441 cycles 1.01
ML-DSA-65 sign 567264 cycles 567124 cycles 1.00
ML-DSA-65 verify 165488 cycles 165226 cycles 1.00
ML-DSA-87 keypair 268465 cycles 268259 cycles 1.00
ML-DSA-87 sign 724063 cycles 724173 cycles 1.00
ML-DSA-87 verify 272280 cycles 272251 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 226343 cycles 235855 cycles 0.96
ML-DSA-44 sign 669578 cycles 697418 cycles 0.96
ML-DSA-44 verify 227619 cycles 235679 cycles 0.97
ML-DSA-65 keypair 413279 cycles 412522 cycles 1.00
ML-DSA-65 sign 1121384 cycles 1115820 cycles 1.00
ML-DSA-65 verify 395696 cycles 392924 cycles 1.01
ML-DSA-87 keypair 675837 cycles 654605 cycles 1.03
ML-DSA-87 sign 1486942 cycles 1432265 cycles 1.04
ML-DSA-87 verify 653743 cycles 637053 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-87 keypair 675837 cycles 654605 cycles 1.03
ML-DSA-87 sign 1486942 cycles 1432265 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 69634 cycles 69408 cycles 1.00
ML-DSA-44 sign 185866 cycles 185588 cycles 1.00
ML-DSA-44 verify 69632 cycles 69022 cycles 1.01
ML-DSA-65 keypair 119519 cycles 119527 cycles 1.00
ML-DSA-65 sign 296085 cycles 296151 cycles 1.00
ML-DSA-65 verify 115496 cycles 115310 cycles 1.00
ML-DSA-87 keypair 201515 cycles 201740 cycles 1.00
ML-DSA-87 sign 385521 cycles 385807 cycles 1.00
ML-DSA-87 verify 193326 cycles 193642 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 117109 cycles 115867 cycles 1.01
ML-DSA-44 sign 380933 cycles 377525 cycles 1.01
ML-DSA-44 verify 121685 cycles 120368 cycles 1.01
ML-DSA-65 keypair 200537 cycles 200218 cycles 1.00
ML-DSA-65 sign 623530 cycles 623046 cycles 1.00
ML-DSA-65 verify 198644 cycles 198371 cycles 1.00
ML-DSA-87 keypair 328184 cycles 327430 cycles 1.00
ML-DSA-87 sign 792111 cycles 790806 cycles 1.00
ML-DSA-87 verify 325565 cycles 324857 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 42714 cycles 42050 cycles 1.02
ML-DSA-44 sign 130840 cycles 130623 cycles 1.00
ML-DSA-44 verify 44222 cycles 44275 cycles 1.00
ML-DSA-65 keypair 72977 cycles 72347 cycles 1.01
ML-DSA-65 sign 210931 cycles 211393 cycles 1.00
ML-DSA-65 verify 73284 cycles 72769 cycles 1.01
ML-DSA-87 keypair 110228 cycles 109988 cycles 1.00
ML-DSA-87 sign 248784 cycles 248764 cycles 1.00
ML-DSA-87 verify 111640 cycles 109522 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 135099 cycles 135380 cycles 1.00
ML-DSA-44 sign 540292 cycles 539702 cycles 1.00
ML-DSA-44 verify 148192 cycles 148584 cycles 1.00
ML-DSA-65 keypair 228667 cycles 228833 cycles 1.00
ML-DSA-65 sign 890151 cycles 892086 cycles 1.00
ML-DSA-65 verify 238302 cycles 238446 cycles 1.00
ML-DSA-87 keypair 373734 cycles 372875 cycles 1.00
ML-DSA-87 sign 1105910 cycles 1106065 cycles 1.00
ML-DSA-87 verify 387102 cycles 387563 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 69576 cycles 69610 cycles 1.00
ML-DSA-44 sign 213709 cycles 213787 cycles 1.00
ML-DSA-44 verify 72626 cycles 72475 cycles 1.00
ML-DSA-65 keypair 123669 cycles 123352 cycles 1.00
ML-DSA-65 sign 350942 cycles 350469 cycles 1.00
ML-DSA-65 verify 120890 cycles 120415 cycles 1.00
ML-DSA-87 keypair 202348 cycles 201192 cycles 1.01
ML-DSA-87 sign 450126 cycles 449091 cycles 1.00
ML-DSA-87 verify 198811 cycles 197923 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 120412 cycles 120466 cycles 1.00
ML-DSA-44 sign 453079 cycles 455604 cycles 0.99
ML-DSA-44 verify 129899 cycles 129917 cycles 1.00
ML-DSA-65 keypair 205074 cycles 205155 cycles 1.00
ML-DSA-65 sign 734147 cycles 735520 cycles 1.00
ML-DSA-65 verify 210029 cycles 209895 cycles 1.00
ML-DSA-87 keypair 339889 cycles 337601 cycles 1.01
ML-DSA-87 sign 931618 cycles 924871 cycles 1.01
ML-DSA-87 verify 347721 cycles 345776 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 74307 cycles 74268 cycles 1.00
ML-DSA-44 sign 228607 cycles 228732 cycles 1.00
ML-DSA-44 verify 78259 cycles 78126 cycles 1.00
ML-DSA-65 keypair 130499 cycles 130397 cycles 1.00
ML-DSA-65 sign 378327 cycles 378266 cycles 1.00
ML-DSA-65 verify 129295 cycles 129145 cycles 1.00
ML-DSA-87 keypair 209579 cycles 211710 cycles 0.99
ML-DSA-87 sign 479323 cycles 479642 cycles 1.00
ML-DSA-87 verify 208629 cycles 210191 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 214694 cycles 214295 cycles 1.00
ML-DSA-44 sign 782368 cycles 794948 cycles 0.98
ML-DSA-44 verify 230498 cycles 230029 cycles 1.00
ML-DSA-65 keypair 385625 cycles 385839 cycles 1.00
ML-DSA-65 sign 1309399 cycles 1307192 cycles 1.00
ML-DSA-65 verify 375860 cycles 376224 cycles 1.00
ML-DSA-87 keypair 607551 cycles 606942 cycles 1.00
ML-DSA-87 sign 1624842 cycles 1625604 cycles 1.00
ML-DSA-87 verify 618005 cycles 617304 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 57069 cycles 57296 cycles 1.00
ML-DSA-44 sign 180312 cycles 180557 cycles 1.00
ML-DSA-44 verify 61275 cycles 61279 cycles 1.00
ML-DSA-65 keypair 99888 cycles 99806 cycles 1.00
ML-DSA-65 sign 296356 cycles 296199 cycles 1.00
ML-DSA-65 verify 100376 cycles 100207 cycles 1.00
ML-DSA-87 keypair 154114 cycles 154547 cycles 1.00
ML-DSA-87 sign 353395 cycles 352845 cycles 1.00
ML-DSA-87 verify 152549 cycles 152958 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 826071 cycles 824630 cycles 1.00
ML-DSA-44 sign 3328025 cycles 3328785 cycles 1.00
ML-DSA-44 verify 919467 cycles 918716 cycles 1.00
ML-DSA-65 keypair 1403179 cycles 1404566 cycles 1.00
ML-DSA-65 sign 5468362 cycles 5454585 cycles 1.00
ML-DSA-65 verify 1466427 cycles 1465675 cycles 1.00
ML-DSA-87 keypair 2303521 cycles 2303418 cycles 1.00
ML-DSA-87 sign 6816366 cycles 6808316 cycles 1.00
ML-DSA-87 verify 2403866 cycles 2402810 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 128332 cycles 128280 cycles 1.00
ML-DSA-44 sign 456357 cycles 457208 cycles 1.00
ML-DSA-44 verify 136286 cycles 136322 cycles 1.00
ML-DSA-65 keypair 220788 cycles 220689 cycles 1.00
ML-DSA-65 sign 746911 cycles 746178 cycles 1.00
ML-DSA-65 verify 220358 cycles 220384 cycles 1.00
ML-DSA-87 keypair 365198 cycles 365074 cycles 1.00
ML-DSA-87 sign 944750 cycles 944382 cycles 1.00
ML-DSA-87 verify 368963 cycles 368862 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 138815 cycles 138766 cycles 1.00
ML-DSA-44 sign 493048 cycles 493633 cycles 1.00
ML-DSA-44 verify 148351 cycles 148356 cycles 1.00
ML-DSA-65 keypair 242409 cycles 242242 cycles 1.00
ML-DSA-65 sign 809815 cycles 809933 cycles 1.00
ML-DSA-65 verify 240722 cycles 240609 cycles 1.00
ML-DSA-87 keypair 396739 cycles 396619 cycles 1.00
ML-DSA-87 sign 1027495 cycles 1027094 cycles 1.00
ML-DSA-87 verify 401595 cycles 401371 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 159949 cycles 157993 cycles 1.01
ML-DSA-44 sign 571755 cycles 566309 cycles 1.01
ML-DSA-44 verify 172156 cycles 169599 cycles 1.02
ML-DSA-65 keypair 271448 cycles 271036 cycles 1.00
ML-DSA-65 sign 926175 cycles 925911 cycles 1.00
ML-DSA-65 verify 276594 cycles 275788 cycles 1.00
ML-DSA-87 keypair 451462 cycles 451772 cycles 1.00
ML-DSA-87 sign 1183639 cycles 1183487 cycles 1.00
ML-DSA-87 verify 461547 cycles 460706 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 309643 cycles 304613 cycles 1.02
ML-DSA-44 sign 1177956 cycles 1238978 cycles 0.95
ML-DSA-44 verify 327779 cycles 342557 cycles 0.96
ML-DSA-65 keypair 558913 cycles 574348 cycles 0.97
ML-DSA-65 sign 1934782 cycles 2000543 cycles 0.97
ML-DSA-65 verify 521783 cycles 544249 cycles 0.96
ML-DSA-87 keypair 865936 cycles 896114 cycles 0.97
ML-DSA-87 sign 2492678 cycles 2558078 cycles 0.97
ML-DSA-87 verify 895228 cycles 929425 cycles 0.96

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 115721 cycles 115633 cycles 1.00
ML-DSA-44 sign 377208 cycles 377229 cycles 1.00
ML-DSA-44 verify 120332 cycles 120215 cycles 1.00
ML-DSA-65 keypair 200117 cycles 200075 cycles 1.00
ML-DSA-65 sign 622821 cycles 622903 cycles 1.00
ML-DSA-65 verify 198196 cycles 198200 cycles 1.00
ML-DSA-87 keypair 327645 cycles 326771 cycles 1.00
ML-DSA-87 sign 791183 cycles 789996 cycles 1.00
ML-DSA-87 verify 325316 cycles 324410 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 214094 cycles 213856 cycles 1.00
ML-DSA-44 sign 781550 cycles 782246 cycles 1.00
ML-DSA-44 verify 230039 cycles 230309 cycles 1.00
ML-DSA-65 keypair 385262 cycles 385241 cycles 1.00
ML-DSA-65 sign 1327081 cycles 1314056 cycles 1.01
ML-DSA-65 verify 375526 cycles 375747 cycles 1.00
ML-DSA-87 keypair 606484 cycles 606555 cycles 1.00
ML-DSA-87 sign 1621418 cycles 1622829 cycles 1.00
ML-DSA-87 verify 617191 cycles 617434 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@jammychiou1 jammychiou1 changed the title Add bounds reasoning comments to AVX2 [I]NTT and basemul Add bounds reasoning comments to AArch64 [I]NTT and basemul Nov 11, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 292600 cycles 292046 cycles 1.00
ML-DSA-44 sign 929575 cycles 928646 cycles 1.00
ML-DSA-44 verify 296239 cycles 295851 cycles 1.00
ML-DSA-65 keypair 492833 cycles 494958 cycles 1.00
ML-DSA-65 sign 1524447 cycles 1521425 cycles 1.00
ML-DSA-65 verify 481202 cycles 484056 cycles 0.99
ML-DSA-87 keypair 838549 cycles 846764 cycles 0.99
ML-DSA-87 sign 2061507 cycles 2058225 cycles 1.00
ML-DSA-87 verify 820738 cycles 820143 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Benchmark suite Current: 37d7c09 Previous: 314ad8e Ratio
ML-DSA-44 keypair 469687 cycles 469757 cycles 1.00
ML-DSA-44 sign 2212428 cycles 2214917 cycles 1.00
ML-DSA-44 verify 550456 cycles 550782 cycles 1.00
ML-DSA-65 keypair 782601 cycles 782351 cycles 1.00
ML-DSA-65 sign 3630929 cycles 3630275 cycles 1.00
ML-DSA-65 verify 853006 cycles 848857 cycles 1.00
ML-DSA-87 keypair 1263087 cycles 1268638 cycles 1.00
ML-DSA-87 sign 4489471 cycles 4502481 cycles 1.00
ML-DSA-87 verify 1373034 cycles 1369173 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@jammychiou1 jammychiou1 force-pushed the aarch64-bound-comments branch from 37d7c09 to c685654 Compare November 11, 2025 09:19
@jammychiou1 jammychiou1 marked this pull request as ready for review November 11, 2025 09:19
@jammychiou1 jammychiou1 requested a review from a team as a code owner November 11, 2025 09:19
@jammychiou1 jammychiou1 force-pushed the aarch64-bound-comments branch 2 times, most recently from 0f08ab9 to db7d706 Compare November 11, 2025 16:53
@hanno-becker
Copy link
Contributor

Stopping CI for now to free resources for #668 and, then, main. I believe we need to explicitly trigger a nix cache build on main once #668 is merged.

Comment on lines 75 to 78
pmull v24, v25, v16, v20
pmull v26, v27, v17, v21
pmull v28, v29, v18, v22
pmull v30, v31, v19, v23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment, I didn't notice this when the code was introduced: We should really use symbolic register values (using .req), not architectural ones. It's much easier to read.

Copy link
Contributor Author

@jammychiou1 jammychiou1 Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Thanks!

While coming up with names for the 4 registers for coefficients from a and b, I wanted to use a0, ..., a3, b0, ..., b3. However b0, ..., b3 are built-in register names already.

I'll resort to use a_0, ..., a_3, b_0, ..., b_3 for now. Any other naming schemes are appreciated!

// load -q^-1 = 4236238847
movz wtmp, #57343
movk wtmp, #64639, lsl #16
// load q^-1 = 58728449
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support it yet in assembly files, but can you add

/* check-magic: 58728449 == unsigned_mod(pow(MLDSA_Q, -1, 2^32), 2^32) */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Working on it now.

@jammychiou1 jammychiou1 force-pushed the aarch64-bound-comments branch from db7d706 to de20938 Compare November 12, 2025 03:33
Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
The Montgomery reduction is changed to the positive/subtractive variant
to match mld_montgomery_reduce(). This avoids the need to clarify that
the positive and the negative variant have exact same bounds despite
their different appearance.

Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
Signed-off-by: jammychiou1 <jammy.chiou1@gmail.com>
@jammychiou1 jammychiou1 force-pushed the aarch64-bound-comments branch from de20938 to 4b0bcb6 Compare November 12, 2025 03:34
@jammychiou1
Copy link
Contributor Author

Thank you @hanno-becker and @mkannwischer for your suggestions. I've implemented them in the new patch.

I also included some annotation for the range of intermediates inside decompose.

@jammychiou1 jammychiou1 changed the title Add bounds reasoning comments to AArch64 [I]NTT and basemul Add bounds reasoning comments to AArch64 backend Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add bounds reasoning comments to AArch64 backend

5 participants