Tool-Neuron GGML Backend

CPU-only LLM/VLM inference engine for Android, built on llama.cpp.

Overview

A production fork of llama.cpp stripped to the CPU backend and optimized for ARM Android devices. All GPU backends (CUDA, Metal, Vulkan, OpenCL) have been removed. Three engine components are built on top for the Tool-Neuron Android app.

Kotlin SDK (gguf_lib)
    |
JNI bridge
    |
Engine layer (engine/)
  - GGMLEngine    model load/unload, generation, KV cache, context tracking
  - VLM Engine    vision and audio understanding (20+ architectures)
  - ToolManager   model-agnostic tool calling (JSON, XML, function-call)
  - RAG Engine    late chunking, binary quantized retrieval
  - Logging       callback-based, routes to Android logcat or custom handler
    |
llama.cpp core (src/ + common/)
    |
GGML CPU backend (ggml/)
  - NEON, i8mm, dotprod, fp16, bf16
  - KleidiAI ARM micro-kernels

Directory Structure

src/             llama.cpp model loading, tokenization, inference, sampling
include/         public C/C++ headers (llama.h, llama-cpp.h)
ggml/            tensor library, CPU backend only, ARM optimized
common/          chat templates, JSON schema grammar, sampling, jinja
engine/          engine layer (ggml-engine, vlm, tool-manager, rag-engine, tn-log)
  vlm/           vision/audio encoders (CLIP, SigLIP, Whisper, 20+ architectures)
vendor/          nlohmann/json, stb_image, miniaudio
cmake/           build-info, license, compiler flags
docs/            API reference, architecture, build guide, benchmarks

Supported Models

Any GGUF model works. All compute graphs from upstream llama.cpp are preserved.

Text: LLaMA, Mistral, Phi, Qwen, Gemma, DeepSeek, Command-R, and 100+ architectures
Vision: SmolVLM, LLaVA, Qwen2-VL, Qwen3-VL, InternVL, Pixtral, Gemma3-Vision, and 20+ VLM architectures
Audio: Whisper, Conformer encoders
Quantization: Q4_0, Q4_K_M, Q5_K_M, Q6_K, Q8_0, F16, F32, IQ variants

Usage

This repo is consumed as a CMake subdirectory by an Android library module:

set(LLAMA_DIR "/path/to/this/repo")
add_subdirectory(${LLAMA_DIR} ${CMAKE_CURRENT_BINARY_DIR}/llama)
target_link_libraries(my_jni_lib tn-engine llama common ggml)

All public engine headers are pure C (extern "C") and safe for JNI binding.

Build

See docs/BUILD.md for full details. Key CMake variables:

Variable	Value	Purpose
`GGML_CPU`	ON	CPU backend
`GGML_CPU_ARM_ARCH`	`armv8.6-a+i8mm+dotprod+fp16`	ARM feature flags
`GGML_CPU_KLEIDIAI`	ON	ARM KleidiAI micro-kernels
`GGML_LTO`	ON	Link-time optimization
`BUILD_SHARED_LIBS`	OFF	Static link into single .so

Performance

Tested on Cortex-X3 (armv9, i8mm, bf16, NEON, dotprod):

Model	Quant	Generation
LFM2-350M	Q8_0	29-30 t/s
SmolVLM-500M	Q8_0	28 t/s text, 22 t/s with vision
Qwen3-0.6B	Q8_0	17-19 t/s
Gemma3-1B	Q4_K_M	14 t/s

Documentation

Document	Description
API Reference	C API for GGMLEngine, VLM, ToolManager, RAG, Logging
Architecture	Stack diagram, directory map, data flows
Build Guide	CMake variables, NDK cross-compilation
Performance	Benchmarks, ARM optimizations, threading
Models	Supported architectures, quantization, sizing

License

MIT License -- see LICENSE.

Based on llama.cpp by Georgi Gerganov and contributors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tool-Neuron GGML Backend

Overview

Directory Structure

Supported Models

Usage

Build

Performance

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
cmake		cmake
common		common
docs		docs
engine		engine
ggml		ggml
include		include
src		src
vendor		vendor
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Tool-Neuron GGML Backend

Overview

Directory Structure

Supported Models

Usage

Build

Performance

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages