-
-
Notifications
You must be signed in to change notification settings - Fork 180
Segmentation fault during embedBatch on ARM64 Linux (aarch64) #590
Description
Description
node-llama-cpp segfaults during embedBatch calls on ARM64 Linux (aarch64). The crash occurs consistently when embedding documents using the embeddinggemma-300M-Q8_0.gguf model, even with as few as 2 documents (36 chunks).
The issue reproduces on both Bun 1.3.8 and 1.3.11.
Environment
- OS: Ubuntu 24.04, Linux 6.17.0-1009-oracle aarch64
- CPU: ARM Neoverse (Oracle Cloud ARM instance,
neon fp aes crc32 atomics) - node-llama-cpp version: 3.15.1
- Bun versions tested: 1.3.8, 1.3.11
- Model:
hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf - RAM: 24GB available, process uses ~700MB RSS before crash
- No GPU — falls back to CPU (
[node-llama-cpp] A prebuilt binary was not found, falling back to using no GPU)
Reproduction
Using qmd which calls node-llama-cpp for embeddings:
qmd embed # crashes after model loads and begins embedding computationMinimal reproduction: any call to session.embedBatch(texts) with the embedding model loaded on ARM64 Linux. Crashes regardless of batch size (tested with 2 docs / 36 chunks up to 2353 docs / 4648 chunks).
Crash Output
Chunking 2 documents by token count...
[node-llama-cpp] A prebuilt binary was not found, falling back to using no GPU
Embedding 2 documents (36 chunks, 78.8 KB)
Model: embeddinggemma
============================================================
Bun v1.3.11 (af24e281) Linux arm64
Linux Kernel v6.17.0 | glibc v2.39
CPU: neon fp aes crc32 atomics
panic: Segmentation fault at address 0xE20BE9728980
oh no: Bun has crashed. This indicates a bug in Bun, not your code.
Bun crash report: https://bun.report/1.3.8/La1b64edcbijEugggCuzynqE+lF_2huF+y3F+95FmmjM291jBm13jBuhklB+2imBm8/Lur+L+t9t9C+zzvgD21shB2096BA2+43DgwkvzJ
Notes
- The crash happens deep in the native embedding/GGML code, not in JS/TS
- Same model and code works fine on x86_64 Linux
- Process reaches ~700MB RSS / 74GB VSZ before the segfault
- The prebuilt binary message suggests no optimized ARM64 build is available, so it falls back to a generic build
- Workaround: Using Ollama with
nomic-embed-textvia HTTP API for embeddings instead of node-llama-cpp
Expected Behavior
embedBatch should complete without crashing on ARM64 Linux.
Actual Behavior
Segmentation fault after loading the model and beginning embedding computation. Occurs consistently on every run.