Getting Started

Requirements

WebGPU: Any environment with a WebGPU device — browsers (Chrome 113+, Edge 113+, Firefox Nightly, Safari 18+), Deno, Node.js with wgpu/Dawn bindings
Node.js: 18+ (for build tooling / development)
GPU VRAM: Depends on model size — see estimates below

Model	Parameters	Approximate VRAM
BitNet b1.58 2B-4T	2B	~1.5 GB

Installation

npm install 0xbitnet

Load a Model

Pass a URL to a GGUF file. The model is downloaded, parsed, and uploaded to the GPU automatically. In browser environments, subsequent loads use the IndexedDB cache — no re-download needed.

import { BitNet } from "0xbitnet";

const model = await BitNet.load(
  "https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf"
);

Progress Tracking

const model = await BitNet.load(url, {
  onProgress(p) {
    // p.phase: "download" | "parse" | "upload"
    // p.fraction: 0.0 – 1.0
    console.log(`${p.phase}: ${(p.fraction * 100).toFixed(1)}%`);
  },
});

The LoadProgress object contains:

Field	Type	Description
`phase`	`"download" \| "parse" \| "upload"`	Current loading phase
`loaded`	`number`	Bytes/tensors processed so far
`total`	`number`	Total bytes/tensors
`fraction`	`number`	Progress ratio (0.0 – 1.0)

Generate Text (Streaming)

generate() returns an AsyncGenerator<string>, yielding one token at a time:

for await (const token of model.generate("The meaning of life is")) {
  process.stdout.write(token);
}

Generation Options

for await (const token of model.generate("Once upon a time", {
  maxTokens: 512,       // default: 256
  temperature: 0.8,     // default: 1.0
  topK: 40,             // default: 50
  repeatPenalty: 1.1,   // default: 1.0
  repeatLastN: 64,      // default: 64
})) {
  process.stdout.write(token);
}

Token Callback

If you prefer a callback style over for await, use onToken:

const tokens: string[] = [];
// eslint-disable-next-line @typescript-eslint/no-unused-vars
for await (const _ of model.generate("Hello", {
  onToken(token) {
    tokens.push(token);
  },
})) {
  // tokens are also available via the callback
}

Chat Messages

Pass an array of ChatMessage objects to use the built-in chat template:

import type { ChatMessage } from "0xbitnet";

const messages: ChatMessage[] = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing in one sentence." },
];

for await (const token of model.generate(messages, { maxTokens: 128 })) {
  process.stdout.write(token);
}

The ChatMessage interface:

interface ChatMessage {
  role: "system" | "user" | "assistant";
  content: string;
}

Model Caching

In browser environments, models are automatically cached in IndexedDB after the first download. Subsequent calls to BitNet.load() with the same URL will use the cached copy instantly. In non-browser environments (Deno, Node.js), caching is skipped gracefully.

List Cached Models

import { listCachedModels } from "0xbitnet";

const urls = await listCachedModels();
console.log("Cached:", urls);
// ["https://huggingface.co/.../model.gguf"]

Delete a Cached Model

import { deleteCachedModel } from "0xbitnet";

await deleteCachedModel("https://huggingface.co/.../model.gguf");

Cancellation with AbortSignal

Both load() and generate() accept an AbortSignal for cancellation:

Cancel Loading

const controller = new AbortController();

// Cancel after 30 seconds
setTimeout(() => controller.abort(), 30_000);

try {
  const model = await BitNet.load(url, { signal: controller.signal });
} catch (err) {
  if (err instanceof DOMException && err.name === "AbortError") {
    console.log("Loading cancelled");
  }
}

Cancel Generation

const controller = new AbortController();

// Cancel after 100 tokens
let count = 0;
for await (const token of model.generate("Hello", { signal: controller.signal })) {
  process.stdout.write(token);
  if (++count >= 100) controller.abort();
}

Cleanup

Always call dispose() when you're done with a model to release GPU resources:

model.dispose();

Node.js Usage

0xBitNet runs in Node.js using the webgpu npm package, which provides Dawn (Google's WebGPU implementation) bindings.

npm install 0xbitnet webgpu

import { create, globals } from "webgpu";
import { BitNet } from "0xbitnet";

// Inject WebGPU globals (GPUBufferUsage, GPUMapMode, etc.)
Object.assign(globalThis, globals);

// Create a Dawn WebGPU device
const gpu = create([]);
const adapter = await gpu.requestAdapter({ powerPreference: "high-performance" });
const device = await adapter!.requestDevice({
  requiredLimits: {
    maxBufferSize: adapter!.limits.maxBufferSize,
    maxStorageBufferBindingSize: adapter!.limits.maxStorageBufferBindingSize,
  },
});

// Pass the device to BitNet.load()
const model = await BitNet.load(
  "https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf",
  { device, onProgress: (p) => console.log(`${p.phase}: ${(p.fraction * 100).toFixed(1)}%`) }
);

for await (const token of model.generate("The meaning of life is")) {
  process.stdout.write(token);
}

model.dispose();

Key points:

Object.assign(globalThis, globals) is required — it sets up GPUBufferUsage, GPUMapMode, and other WebGPU constants that the core library reads from globalThis
Pass your device via LoadOptions.device so the library skips its browser-oriented navigator.gpu init
IndexedDB caching is automatically skipped in Node.js

See examples/node-cli/ for a complete interactive CLI example.

Next Steps

API Reference — Full API documentation
Architecture — How 0xBitNet works internally
Model Compatibility — Supported models and GGUF requirements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

Requirements

Installation

Load a Model

Progress Tracking

Generate Text (Streaming)

Generation Options

Token Callback

Chat Messages

Model Caching

List Cached Models

Delete a Cached Model

Cancellation with AbortSignal

Cancel Loading

Cancel Generation

Cleanup

Node.js Usage

Next Steps

FilesExpand file tree

getting-started.md

Latest commit

History

getting-started.md

File metadata and controls

Getting Started

Requirements

Installation

Load a Model

Progress Tracking

Generate Text (Streaming)

Generation Options

Token Callback

Chat Messages

Model Caching

List Cached Models

Delete a Cached Model

Cancellation with AbortSignal

Cancel Loading

Cancel Generation

Cleanup

Node.js Usage

Next Steps