Skip to content

Segmentation fault during embedBatch on ARM64 Linux (aarch64) #590

@yeHHH1g

Description

@yeHHH1g

Description

node-llama-cpp segfaults during embedBatch calls on ARM64 Linux (aarch64). The crash occurs consistently when embedding documents using the embeddinggemma-300M-Q8_0.gguf model, even with as few as 2 documents (36 chunks).

The issue reproduces on both Bun 1.3.8 and 1.3.11.

Environment

  • OS: Ubuntu 24.04, Linux 6.17.0-1009-oracle aarch64
  • CPU: ARM Neoverse (Oracle Cloud ARM instance, neon fp aes crc32 atomics)
  • node-llama-cpp version: 3.15.1
  • Bun versions tested: 1.3.8, 1.3.11
  • Model: hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf
  • RAM: 24GB available, process uses ~700MB RSS before crash
  • No GPU — falls back to CPU ([node-llama-cpp] A prebuilt binary was not found, falling back to using no GPU)

Reproduction

Using qmd which calls node-llama-cpp for embeddings:

qmd embed  # crashes after model loads and begins embedding computation

Minimal reproduction: any call to session.embedBatch(texts) with the embedding model loaded on ARM64 Linux. Crashes regardless of batch size (tested with 2 docs / 36 chunks up to 2353 docs / 4648 chunks).

Crash Output

Chunking 2 documents by token count...
[node-llama-cpp] A prebuilt binary was not found, falling back to using no GPU
Embedding 2 documents (36 chunks, 78.8 KB)
Model: embeddinggemma

============================================================
Bun v1.3.11 (af24e281) Linux arm64
Linux Kernel v6.17.0 | glibc v2.39
CPU: neon fp aes crc32 atomics

panic: Segmentation fault at address 0xE20BE9728980
oh no: Bun has crashed. This indicates a bug in Bun, not your code.

Bun crash report: https://bun.report/1.3.8/La1b64edcbijEugggCuzynqE+lF_2huF+y3F+95FmmjM291jBm13jBuhklB+2imBm8/Lur+L+t9t9C+zzvgD21shB2096BA2+43DgwkvzJ

Notes

  • The crash happens deep in the native embedding/GGML code, not in JS/TS
  • Same model and code works fine on x86_64 Linux
  • Process reaches ~700MB RSS / 74GB VSZ before the segfault
  • The prebuilt binary message suggests no optimized ARM64 build is available, so it falls back to a generic build
  • Workaround: Using Ollama with nomic-embed-text via HTTP API for embeddings instead of node-llama-cpp

Expected Behavior

embedBatch should complete without crashing on ARM64 Linux.

Actual Behavior

Segmentation fault after loading the model and beginning embedding computation. Occurs consistently on every run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions