-
Notifications
You must be signed in to change notification settings - Fork 1
BitNet Core
Sujan Mishra edited this page Jun 26, 2025
·
1 revision
A pure Rust, streaming-friendly core engine for BitNet models, focused on high-performance inference, quantization, and kernel dispatch. Includes all performance-critical logic, model definitions, and backend implementations for both CPU and GPU (WGSL).
- Purpose
- Main Modules
- Architecture
- How to Use
- Features
- Kernel & Quantization
- Test Coverage
- Implementation Notes
- Serve as the backend engine for BitNet inference (and planned training)
- Provide modular, extensible components for model architecture, quantization, and kernel dispatch
- Support both CPU (SIMD) and GPU (WGSL) backends
- Enable streaming-friendly, per-block model loading and execution
-
model.rs: Pure Rust Transformer model architecture (no burn dependency) -
attention.rs,feed_forward.rs,rms_norm.rs: Core model submodules (pure Rust) -
bitnet_linear.rs: BitLinear quantized layer, packing, and quantization utilities -
kernels/: CPU/GPU kernel implementations (WGSL, SIMD) -
settings.rs: Inference and generation settings -
embedding.rs: Embedding layer -
tokenizer.rs: Tokenizer and chat template logic -
error.rs: Error types and handling -
gui/: (Optional) Core-level visualization and debugging UI for developers (feature-gated) -
training.rs,visualization.rs: (Planned) Training and logging/metrics hooks
- Pure Rust, burn-free: All core logic is implemented in Rust, with no dependency on the burn framework for inference
- Streaming-friendly: Model weights are loaded per-block, supporting large models and efficient memory usage
- Quantized & packed: Uses ternary quantization and efficient packing for weights and activations
- GPU kernel integration: Includes WGSL kernels for high-performance inference on modern GPUs
Add to your Cargo.toml:
bitnet-core = { path = "../bitnet-core" }Then in your code:
use bitnet_core::model::Transformer;
// ...- Modular, extensible design
- Optional GPU and core-gui features (feature flags)
- Designed for correctness, performance, and portability
- Streaming-friendly model loading and execution
- Robust error handling and test coverage
-
WGSL GPU kernel: See
src/kernels/bitnet_kernel.wgslfor the main ternary matmul kernel -
Packing utilities: See
src/kernels.rsfor pure Rust packing and scale calculation - Quantization: Scalar and SIMD quantization utilities for activations and weights
- Tested against scalar reference: All kernels are validated against pure Rust reference implementations
- Unit tests for packing, quantization, and kernel correctness
- Direct wgpu kernel launch tests (no burn dependency)
- End-to-end model pipeline validation (see
tests/pipeline_validation.rs) - Streaming and per-block model loading tests
-
Optional Stress Test: A long-running stress test (
stress_test_maximum_dimension_support) is available but ignored by default. To run it, set theRUN_STRESS_TESTSenvironment variable:-
PowerShell:
$env:RUN_STRESS_TESTS="1"; cargo test --package bitnet-core --test kernel_tests -- --nocapture
-
Linux/macOS:
RUN_STRESS_TESTS=1 cargo test --package bitnet-core --test kernel_tests -- --nocapture
-
PowerShell:
- See the project plan for architecture and validation strategies
- Use feature flags to enable GPU or core-gui modules
- For kernel and quantization details, see code comments in
src/kernels.rsandsrc/kernels/bitnet_kernel.wgsl
For questions or contributions, see the main project README or open an issue.