-
Notifications
You must be signed in to change notification settings - Fork 537
Description
Grobid version
Docker, 0.8.2, CRF only
Operating System and architecture (arm64, amd64, x86, etc.)
MacOS M1 Apple Silicon
What is your Java version
No response
Log and information
Environment:
- Host: Apple Silicon (ARM64, macOS)
- Docker: running grobid/grobid:0.8.2 (AMD64 image via QEMU emulation)
- Image architecture: linux/amd64 — no ARM64 variant available
What happens:
The default config (grobid.yaml) uses engine: "delft" for 8 models (header, header-article-light, header-article-light-ref, reference-segmenter, affiliation-address, citation, patent-citation, funding-acknowledgement).
Loading any DeLFT model initializes TensorFlow, which was compiled with AVX instructions. QEMU does not emulate AVX, causing an immediate hard crash:
The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
qemu: uncaught target signal 6 (Aborted) - core dumped
After this crash, all subsequent API calls return HTTP 500 with no diagnostic information. /api/isalive may still return true depending on timing, giving a false impression the service is healthy.
Expected behavior:
Either (a) an ARM64-native image is provided so TF runs without emulation, or (b) the failure surfaces a clear error message to the API client explaining the DeLFT/AVX incompatibility rather than a generic 500.
Workaround:
Mount a custom grobid.yaml that forces all models to engine: "wapiti", preventing TensorFlow from being loaded at all. This loses DeLFT quality improvements but produces correct output:
volumes:
- ./grobid.yaml:/opt/grobid/grobid-home/config/grobid.yaml:ro
With every model entry set to engine: "wapiti" and modelPreload: false. This appears to fix it well enough to function.
Further information
No response