Skip to content

Conversation

@jakemas
Copy link
Contributor

@jakemas jakemas commented Nov 7, 2025

This PR includes:

  • An update to s2n-bignum version to support proofs
  • Addition of list_proofs.sh, build-proof.sh, dump_bytecode.ml utilities
  • Addition of mldsa-specs.ml and mldsa-utils.ml to support HOL-Light proofs
  • Addition of first HOL-Light proof for ML-DSA x86 AVX2 NTT in mldsa_ntt.ml
  • Addition of hol_light to scripts/tests to wrap and run proofs
  • READMEs for usage descriptions
  • Autogen updated to use dump_bytecode.ml by the argument --update-hol-light-bytecode and --dry-run to check/update
  • Autogen HOL-Light assembly files gen_hol_light_asm_file and gen_hol_light_asm
  • 3 New CI jobs for HOL-light proofs

See the added proofs/hol_light/x86_64/README.md for details.

Running make -C proofs/hol_light/x86_64 produces an mldsa_ntt.correct file that ends in the following:

Running time: 18147.000000 sec, Start unixtime: 1762547981.000000, End unixtime: 1762566128.000000

(~5hour on my test EC2 m5.2xlarge, Successful in 249m in CI).

Discussion

There is an issue where the x86 bytecode generated on clang ARM Mac is different to that when compiled on a gcc Linux. The clang compiler performs some optimization of instructions by switching the source operands of VPADDD etc. to get a shorter encoding. This means the define_assert_from_elf check will fail when compiled on clang ARM Mac, as the proof is set with the machine code as compiled on gcc Linux (I do have proofs for both). We will have the same issue in mlkem-native when the x86 proofs arrive. So looking for opinions on (1) being okay with the x86 proofs not working on clang ARM Mac environments (2) fix the selective swapping of source operands to VPADDD to "optimise" the instructions down to the shorter size on gcc Linux, so both compile consistently (3) adding both proof types... likely all but (1) is out of scope of this PR -- just keeping notes.

@jakemas jakemas marked this pull request as ready for review November 8, 2025 02:22
@jakemas jakemas requested a review from a team as a code owner November 8, 2025 02:22
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakemas - this is awesome progress!

I tried to run the proofs on a c8g.4x-large in the nix develop .#hol_light shell, but
it does not work as x86_64-linux-gnu-objdump and x86_64-linux-gnu-as are not installed.
We can probably fix cross compilation in a follow-up PR -- running it on x86 now.

ubuntu@ip-172-31-34-86:~/mldsa-native tests hol_light -p mldsa_ntt
INFO  > HOL_LIGHT (1/1)    None        (native):         Starting HOL-Light proof for mldsa_ntt
ERROR > HOL_LIGHT (1/1)    None        (native):            FAILED (after 0s)
ERROR > HOL_LIGHT (1/1)    None        (native):         sh: line 1: x86_64-linux-gnu-as: command not found
sh: line 1: x86_64-linux-gnu-objdump: command not found
sh: line 1: x86_64-linux-gnu-as: command not found
make: *** [Makefile:70: mldsa/mldsa_ntt.o] Error 127

1 tests FAILED
* HOL-Light proof for mldsa_ntt
ubuntu@ip-172-31-34-86:~/mldsa-native$ make -C proofs/hol_light/x86_64 
make: Entering directory '/home/ubuntu/mldsa-native/proofs/hol_light/x86_64'
Preparing mldsa/mldsa_ntt.o ...
sh: line 1: x86_64-linux-gnu-as: command not found
AS: 
sh: line 1: x86_64-linux-gnu-objdump: command not found
OBJDUMP: 
[ -d mldsa ] || mkdir -p mldsa
cat mldsa/mldsa_ntt.S | gcc -E -xassembler-with-cpp - | tr ';' '\n' | x86_64-linux-gnu-as -o mldsa/mldsa_ntt.o -
sh: line 1: x86_64-linux-gnu-as: command not found
make: *** [Makefile:70: mldsa/mldsa_ntt.o] Error 127
make: Leaving directory '/home/ubuntu/mldsa-native/proofs/hol_light/x86_64'

mkannwischer added a commit that referenced this pull request Nov 8, 2025
This aligns the lint script with mlkem-native:
 - Fixing a typo that was discovered in #640
 - Uncommenting the shell linting

I have no memories why the latter was commented out.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
mkannwischer added a commit that referenced this pull request Nov 8, 2025
This aligns the lint script with mlkem-native:
 - Fixing a typo that was discovered in #640
 - Uncommenting the shell linting

I have no memories why the latter was commented out.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakemas. I took the liberty to also add it to the root README.

I verified that the proof runs okay.
I tested on a m4.4xlarge machine and the proofs completed after 20393s.

I tried to understand the main theorem that is being proven - the bounds look good and are indeed much tighter than what we require.

It currently only works on x86, but we can address that in a follow up.
Here is a list of follow-up issues that we should tackle:

@hanno-becker, WDYT?

Copy link
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @jakemas @jargh for the work! 🎉 It's great to see the first correctness proof for x86_64, let alone such a complicated one.

A few things:

  1. The documentation still needs adjusting after copy-over from mlkem-native. Most importantly, we cannot claim to have verified all x86_64 assembly, which is currently stated at the top of proofs/hol_light/x86_64/README.md.
  2. I would like to stick a homogeneous assembly syntax, for now AT&T everywhere. We can uniformly change this at a later point, but I don't want different syntaxes across mlkem-native and mldsa-native, or even within mldsa-native.
  3. We should have CI and autogen in place before we merge.

@jakemas Can you have a stab at those? 1 and 2 should be straightforward (note that simpasm should give you AT&T by default); if you have issues with 3, let us know. I may get to it as well, but the alpha release has priority.

@hanno-becker
Copy link
Contributor

(2) fix the selective swapping of source operands to VPADDD to "optimise" the instructions down to the shorter size on gcc Linux, so both compile consistently

If we can avoid the issue by swapping VPADD operands, I think we should do that. It should have no impact on the proof, right?

@jakemas
Copy link
Contributor Author

jakemas commented Nov 10, 2025

Thanks a lot @jakemas @jargh for the work! 🎉 It's great to see the first correctness proof for x86_64, let alone such a complicated one.

A few things:

  1. The documentation still needs adjusting after copy-over from mlkem-native. Most importantly, we cannot claim to have verified all x86_64 assembly, which is currently stated at the top of proofs/hol_light/x86_64/README.md.
  2. I would like to stick a homogeneous assembly syntax, for now AT&T everywhere. We can uniformly change this at a later point, but I don't want different syntaxes across mlkem-native and mldsa-native, or even within mldsa-native.
  3. We should have CI and autogen in place before we merge.

@jakemas Can you have a stab at those? 1 and 2 should be straightforward (note that simpasm should give you AT&T by default); if you have issues with 3, let us know. I may get to it as well, but the alpha release has priority.

Ok fixed (1) and (2) I will test on my x86 machine then clean up the commits. (I'd rather add the non-simpasm ASMB as it appears upstream (https://github.com/pq-crystals/dilithium/blob/master/avx2/ntt.S) with macros etc. to demonstrate clearly the applicability to the reference -- fortunately, John requires in s2n-bignum that the AT&T versions are generated from the Intel, so we have both).

  1. We should have CI and autogen in place before we merge.

By this do you mean these merge first:
#645, #646, #647, #648 ?

because the bytecode check happens as part of CI... part of this PR or another?

@jakemas
Copy link
Contributor Author

jakemas commented Nov 10, 2025

Testing - Automatic bytecode generation/insertion

Ok I hooked up the dump_bytecode.ml utility to autogen and I verified that changing the first value in byte code to 0x99 e.g. from mldsa_ntt.ml:

let mldsa_ntt_mc = define_assert_from_elf "mldsa_ntt_mc" "mldsa/mldsa_ntt.o"
(*** BYTECODE START ***)
[
  0x99; 0x0f; 0x1e; 0xfa;  (* ENDBR64 *)

was correctly found by python scripts/autogen --update-hol-light-bytecode --dry-run

error proofs/hol_light/x86_64/proofs/mldsa_ntt.ml
Autogenerated file proofs/hol_light/x86_64/proofs/mldsa_ntt.ml needs updating. Have you called scripts/autogen? Wrote new version to proofs/hol_light/x86_64/proofs/mldsa_ntt.ml.new.
20c20
<   0x99; 0x0f; 0x1e; 0xfa;  (* ENDBR64 *)
---
>   0xf3; 0x0f; 0x1e; 0xfa;  (* ENDBR64 *)

By then running python scripts/autogen --update-hol-light-bytecode the bytecode was fixed, and verified by running python scripts/autogen --update-hol-light-bytecode --dry-run again to see:

✓ Finalized and checked files are up to date (8.3s)

To generate the bytecode:

cd proofs/hol_light/x86_64 && make dump_bytecode
=== bytecode start: mldsa/mldsa_ntt.o ================
[
  0xf3; 0x0f; 0x1e; 0xfa;  (* ENDBR64 *)
  0xc5; 0xfd; 0x6f; 0x06;  (* VMOVDQA (%_% ymm0) (Memop Word256 (%% (rsi,0))) *)
  0xc4; 0xe2; 0x7d; 0x58; 0x8e; 0x84; 0x00; 0x00; 0x00;
                           (* VPBROADCASTD (%_% ymm1) (Memop Doubleword (%% (rsi,132))) *)
  0xc4; 0xe2; 0x7d; 0x58; 0x96; 0x24; 0x05; 0x00; 0x00;
                           (* VPBROADCASTD (%_% ymm2) (Memop Doubleword (%% (rsi,1316))) *)
  0xc5; 0xfd; 0x6f; 0x27;  (* VMOVDQA (%_% ymm4) (Memop Word256 (%% (rdi,0))) *)
  0xc5; 0xfd; 0x6f; 0xaf; 0x80; 0x00; 0x00; 0x00;
                           (* VMOVDQA (%_% ymm5) (Memop Word256 (%% (rdi,128))) *)
  0xc5; 0xfd; 0x6f; 0xb7; 0x00; 0x01; 0x00; 0x00;
                           (* VMOVDQA (%_% ymm6) (Memop Word256 (%% (rdi,256))) *)
  0xc5; 0xfd; 0x6f; 0xbf; 0x80; 0x01; 0x00; 0x00;
                           (* VMOVDQA (%_% ymm7) (Memop Word256 (%% (rdi,384))) *)
  0xc5; 0x7d; 0x6f; 0x87; 0x00; 0x02; 0x00; 0x00;
                           (* VMOVDQA (%_% ymm8) (Memop Word256 (%% (rdi,512))) *)
  0xc5; 0x7d; 0x6f; 0x8f; 0x80; 0x02; 0x00; 0x00;
                           (* VMOVDQA (%_% ymm9) (Memop Word256 (%% (rdi,640))) *)
  0xc5; 0x7d; 0x6f; 0x97; 0x00; 0x03; 0x00; 0x00;
                           (* VMOVDQA (%_% ymm10) (Memop Word256 (%% (rdi,768))) *)
  0xc5; 0x7d; 0x6f; 0x9f; 0x80; 0x03; 0x00; 0x00;
                           (* VMOVDQA (%_% ymm11) (Memop Word256 (%% (rdi,896))) *)
  0xc4; 0x62; 0x3d; 0x28; 0xe9;
                           (* VPMULDQ (%_% ymm13) (%_% ymm8) (%_% ymm1) *)
  0xc4; 0x41; 0x7e; 0x1

...
                           (* VMOVDQA (Memop Word256 (%% (rdi,960))) (%_% ymm3) *)
  0xc5; 0x7d; 0x7f; 0x9f; 0xe0; 0x03; 0x00; 0x00;
                           (* VMOVDQA (Memop Word256 (%% (rdi,992))) (%_% ymm11) *)
  0xc3                     (* RET *)
];;
==== bytecode end =====================================

Running time: 96.000000 sec, Start unixtime: 1762801050.000000, End unixtime: 1762801146.000000

@jakemas
Copy link
Contributor Author

jakemas commented Nov 10, 2025

HOL-Light Proofs in CI

I've added .github/workflows/hol_light.yml that works on adding some of this to CI (#645). I've not added to the CI before, but looked at the existing examples. I chose x86 runners for the proofs, to best emulate the running environment, but I don't know if the instances are set up to last long enough, or be powerful enough for the proofs. Please take a look @mkannwischer to see what you think of my CI setup.

As in mlkem-native a three part test:

The interactive shell test in mlkem-native https://github.com/pq-code-package/mlkem-native/blob/main/.github/workflows/hol_light.yml#L66-L68 performs needs "proofs/mlkem_poly_tobytes.ml";;, however, in mldsa-native we have no small proof to run here -- I tried running needs "proofs/mldsa_ntt.ml";; but running in the interactive shell takes longer than the actual CI job for HOL-Light / HOL Light proof for mldsa_ntt.S. So instead, I think it's reasonable to test loading the mldsa specific environment, i.e. 'needs "x86/proofs/base.ml";; and needs "proofs/mldsa_specs.ml";; -- when I add a smaller proof, I will swap this out for needs "proofs/mldsa_small_proof";;.

@jakemas jakemas changed the title Hol-Light: Add Hol-light proof framework and Forward NTT proof Hol-Light: Add Hol-light proof framework and x86 AVX2 NTT proof Nov 10, 2025
@jakemas jakemas force-pushed the hol-light-ntt branch 2 times, most recently from c3596c3 to fdd4867 Compare November 11, 2025 03:09
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakemas. The CI looks good to me.
Three minor comments on autogen above.

Can you also autogenerate mldsa_ntt.S please (see gen_hol_light_asm in mlkem-native)?

@jakemas
Copy link
Contributor Author

jakemas commented Nov 11, 2025

Thanks @jakemas. The CI looks good to me. Three minor comments on autogen above.

Can you also autogenerate mldsa_ntt.S please (see gen_hol_light_asm in mlkem-native)?

Okay, also added autogen.

@hanno-becker
Copy link
Contributor

By this do you mean these merge first:
#645, #646, #647, #648 ?

#648 and #646 are not required for this PR (in my mind), but we should have #645, #647.

@jakemas
Copy link
Contributor Author

jakemas commented Nov 11, 2025

By this do you mean these merge first:
#645, #646, #647, #648 ?

#648 and #646 are not required for this PR (in my mind), but we should have #645, #647.

Ok, they are implemented in this PR now. I have kept the description up to date.

@jargh
Copy link

jargh commented Nov 11, 2025

On the problem of clang-based and gcc-based assemblers behaving differently, (the former selectively swapping arguments of commutative functions like VPADD to get a more compact encoding), there could be an argument for forcing the optimization by switching the order in the code, in the hope that it becomes more platform-independent. A possible issue with doing that simply via the clean code is that the triggering of the optimization depends on the register values, so it might be tricky to enforce at the macro level. It would be interesting to know exactly when the swap gets triggered. Maybe it's when one argument is >= ymm8 and the other is <= ymm7 (which I think is the only situation that makes the encoding shorter) or maybe it's simply when one register code is greater than another.

@jargh
Copy link

jargh commented Nov 11, 2025

That is, I'd instinctively vote for option (2) abiove if it isn't too troublesome

(2) fix the selective swapping of source operands to VPADDD to "optimise" the instructions down to the shorter size on gcc Linux, so both compile consistently

Add HOL-Light framework with NTT proof. This includes:
- list_proofs.sh, build-proof.sh, dump_bytecode.ml utilities
- mldsa-specs.ml and mldsa-utils.ml to support HOL-Light proofs
- HOL-Light proof for ML-DSA x86 AVX2 NTT in mldsa_ntt.ml
- Addition of hol_light to scripts/tests to wrap and run proofs
- READMEs for usage descriptions
- Autogenerate embedded bytecode using dump_bytecode.ml
- Add CI jobs to test and run hol-light proofs
- Add autogen for asmb files

Signed-off-by: Jake Massimo <jakemas@amazon.com>
@hanno-becker hanno-becker changed the title Hol-Light: Add Hol-light proof framework and x86 AVX2 NTT proof HOL-Light: Add HOL-Lght proof framework and x86 AVX2 NTT proof Nov 12, 2025
@hanno-becker hanno-becker changed the title HOL-Light: Add HOL-Lght proof framework and x86 AVX2 NTT proof HOL-Light: Add HOL-Light proof framework and x86 AVX2 NTT proof Nov 12, 2025
@mkannwischer mkannwischer merged commit 76e23ff into main Nov 12, 2025
273 checks passed
@mkannwischer mkannwischer deleted the hol-light-ntt branch November 12, 2025 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HOL-Light: Autogenerate assembly files HOL-Light: Autogenerate embedded byte-code HOL-Light: Run proofs in CI Hol-Light: Prove AVX2 NTT

5 participants