Add local inference service for task summarization by vanpelt · Pull Request #219 · wandb/catnip

vanpelt · 2025-11-17T21:56:12Z

Summary

Adds local GGUF model inference using llama.cpp via yzma, enabling on-device task summarization and git branch name generation with our fine-tuned Gemma 3 270M model.

Key Features

CLI Commands
- catnip summarize "task description" - Generate task summary and branch name
- catnip download - Pre-download model and llama.cpp libraries
REST API
- POST /v1/inference/summarize - Inference endpoint for programmatic access
- GET /v1/inference/status - Check inference service availability
Auto-downloading
- Models cached to ~/.catnip/models/
- llama.cpp libraries auto-downloaded for current platform to ~/.catnip/lib/

Critical Bug Fix

Fixed inference producing incorrect outputs (always returning "Add Dark Mode" from examples instead of actual summaries).

Root cause: Missing BOS (Beginning of Sequence) token when tokenizing prompts for Gemma models.

Fix: Set addSpecial=true in tokenization call to include required special tokens.

Test plan

Verify catnip summarize produces varied, contextually appropriate outputs
Compare output quality with Ollama using same model
Test multiple prompts to confirm no example contamination
Verify lint passes

🤖 Generated with Claude Code

Adds local GGUF model inference using llama.cpp via yzma for task summarization and branch name generation. Key components: - InferenceService: Handles model loading and text generation - ModelDownloader: Downloads and caches GGUF models from HuggingFace - LibraryDownloader: Auto-downloads llama.cpp libraries for current platform - summarize command: CLI interface for generating summaries - download command: Pre-download model and libraries - REST API endpoint: POST /v1/inference/summarize Critical fix: Must use addSpecial=true when tokenizing prompts for Gemma models to include BOS token - without this, the model produces incorrect outputs (was outputting examples from the prompt instead of actual summaries). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

codecov · 2025-11-17T21:56:47Z

Codecov Report

❌ Patch coverage is 1.23839% with 638 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
container/internal/services/downloader.go	0.00%	235 Missing ⚠️
container/internal/services/inference.go	0.00%	201 Missing ⚠️
container/internal/cmd/download.go	6.57%	71 Missing ⚠️
container/internal/handlers/inference.go	0.00%	52 Missing ⚠️
container/internal/cmd/summarize.go	9.09%	30 Missing ⚠️
container/internal/services/stderr_unix.go	0.00%	28 Missing ⚠️
container/internal/cmd/serve.go	0.00%	17 Missing ⚠️
container/internal/tui/initialization_commands.go	0.00%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

socket-security · 2025-11-17T21:59:47Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	golang/github.com/hybridgroup/yzma@v0.9.0

View full report

socket-security · 2025-11-17T21:59:48Z

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action	Severity	Alert (click "▶" to expand/collapse)
Warn		Native binaries present: golang `github.com/jupiterrider/ffi` Location: Package overview From: container/go.mod → `golang/github.com/hybridgroup/yzma@v0.9.0` → `golang/github.com/jupiterrider/ffi@v0.5.1` ℹ Read more on: This package \| This alert \| Why is native code a concern? Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at `support@socket.dev`. Suggestion: Verify that the inclusion of native code is expected and necessary for this package's functionality. If it is unnecessary or unexpected, consider using alternative packages without native code to mitigate potential risks. Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment `@SocketSecurity ignore golang/github.com/jupiterrider/ffi@v0.5.1`. You can also ignore all packages with `@SocketSecurity ignore-all`. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

- Truncate parts slice to max 3 elements before loop - Add nolint comment for false positive gosec warning - Update golangci-lint version to 2.6.2 to match CI 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Implement non-blocking background initialization for inference service - Add state management (initializing/ready/failed/disabled) with progress tracking - Return 503 with status info while model downloads in background - Add retry logic with exponential backoff (3 attempts) - Use golang.org/x/sys/unix for cross-platform stderr suppression - Clean up .gitignore (remove models/) and .goreleaser.yml (remove bundled libs) The inference service now starts immediately and downloads libraries/model in the background. Enable with CATNIP_INFERENCE=1 environment variable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Specify stable versions (yarn@4, pnpm@9, npm@10) instead of letting corepack pick dev versions that may not be available. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

claude and others added 2 commits November 17, 2025 16:52

add claude settings

5445b73

vanpelt changed the title ~~feat: Local inference with llama~~ Add local inference service for task summarization Nov 17, 2025

vanpelt force-pushed the fix/inference-bos-token branch from f347259 to 8069e87 Compare November 17, 2025 22:02

vanpelt force-pushed the fix/inference-bos-token branch from 8069e87 to ec38067 Compare November 17, 2025 22:04

claude and others added 2 commits November 17, 2025 21:40

vanpelt force-pushed the fix/inference-bos-token branch from 58b9e28 to 1cfccbe Compare November 18, 2025 02:46

Merge branch 'main' into fix/inference-bos-token

7f12371

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add local inference service for task summarization#219

Add local inference service for task summarization#219
vanpelt wants to merge 6 commits intomainfrom
fix/inference-bos-token

vanpelt commented Nov 17, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 17, 2025 •

edited

Loading

Uh oh!

socket-security bot commented Nov 17, 2025 •

edited

Loading

Uh oh!

socket-security bot commented Nov 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vanpelt commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Critical Bug Fix

Test plan

Uh oh!

codecov bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

socket-security bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

socket-security bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vanpelt commented Nov 17, 2025 •

edited

Loading

codecov bot commented Nov 17, 2025 •

edited

Loading

socket-security bot commented Nov 17, 2025 •

edited

Loading

socket-security bot commented Nov 17, 2025 •

edited

Loading