Skip to content

SquadyAI/RealtimeIntent

Repository files navigation

RealtimeIntent

English  |  中文

Intent recognition for voice AI — 50× faster than LLM routing

<100 ms latency  ·  50+ intents  ·  1 GPU  ·  3 commands to run

License Python 3.12+ PRs Welcome

Quick Start  ·   API Docs  ·   Realtime (main project)  ·   Report Bug


curl -X POST http://localhost:8000/intent \
  -H "Content-Type: application/json" \
  -d '{"conversation": [{"role": "user", "content": "Will it rain in Beijing tomorrow?"}]}'

# → {"intent": "agent.information.weather"}    # 48 ms

Tip

RealtimeIntent is the intent router for SquadyAI Realtime — the open-source voice AI engine (Rust, <450 ms end-to-end, 100+ concurrent sessions). Use them together for a complete voice assistant, or use RealtimeIntent standalone as a drop-in intent API for any chatbot.

Why not just use LLM function calling?

LLM Function Calling RealtimeIntent
Latency 500–2 000 ms (full LLM inference) < 100 ms (embed → vector search → rerank)
Cost Token cost per request Zero API cost — runs on local 0.6B–4B models
Determinism Prompt-sensitive, may drift Same input → same output, always
Customization Edit prompts and hope Add examples via API, instant effect
Multi-turn Context window fills up Async LLM summary keeps context compact

RealtimeIntent handles the "what does the user want?" question so your LLM can focus on "how to respond" — faster, cheaper, and more reliably.

How It Works

"Will it rain in Beijing tomorrow?"
        │
        ▼
 ┌─────────────┐     ┌────────────────┐     ┌──────────────┐     ┌──────────────┐
 │  Embedding  │────▶│ Qdrant Vector  │────▶│   Reranker   │────▶│ Intent Label │
 │  (4B model) │     │  Search top-K  │     │ (0.6B model) │     │   + Score    │
 └─────────────┘     └────────────────┘     └──────────────┘     └──────────────┘
                                                                        │ async
                                                                        ▼
                                                                 ┌──────────────┐
                                                                 │ LLM Summary  │ (optional)
                                                                 │  for context  │ multi-turn
                                                                 └──────────────┘

No training required. Add intent examples via API or seed script — the system learns from examples, not fine-tuning.

Where It Fits

┌──────────┐     ┌──────────┐     ┌───────────────────┐     ┌─────────┐     ┌──────────┐
│   User   │────▶│   VAD    │────▶│       ASR         │────▶│   LLM   │────▶│   TTS    │
│  Audio   │     │ (Silero) │     │  (WhisperLive)    │     │ (Qwen)  │     │(MiniMax) │
└──────────┘     └──────────┘     └──────────┬────────┘     └────▲────┘     └──────────┘
                                             │                   │
                                             ▼                   │
                                    ┌─────────────────┐          │
                                    │ RealtimeIntent  │──────────┘
                                    │  < 100ms route  │  agent.weather → call weather tool
                                    └─────────────────┘  agent.music   → call music tool
                                                         __no_intent__ → just chat

See SquadyAI Realtime for the full voice conversation engine that orchestrates this pipeline.

Supported Intents

50+ built-in intent categories (click to expand)
Category Intents
Weather & News weather, news, date, time, currency, event, movie
Q&A general, domain (wiki), daily, visual (camera)
Music & Media play, query, control, setting, audiobook, podcast, radio
Calendar query, set, remove
Reminders query, set, remove
Volume up, down, mute
Lists query, set, remove
Navigation direction, traffic, taxi, transit tickets
Smart Home lights (on/off/dim/color), plugs, coffee, cleaning
Device battery, camera (photo/video), recorder
Language translate
Search web search, stock, cooking/recipe
Social post, query, email, contacts
General greet, joke, goodbye, creative content, math
Special __no_intent__ (noise / ASR false trigger)

Custom intents can be added at runtime via the API — no retraining needed.

Quick Start

Full GPU stack (recommended)

git clone https://github.com/SquadyAI/RealtimeIntent.git && cd RealtimeIntent
cp .env.example .env
docker compose -f docker-compose.gpu.yml up -d

Wait for models to load (~2 min), then seed the database:

pip install datasets
python scripts/seed/index_massive_to_qdrant.py    # loads ~4,600 intent examples from HuggingFace

Test:

curl -s -X POST http://localhost:8000/intent \
  -H "Content-Type: application/json" \
  -d '{"conversation": [{"role": "user", "content": "Play some jazz"}]}' | python -m json.tool

GPU config: Both models default to GPU 0. Set EMBEDDING_GPU=0 and RERANKER_GPU=1 in .env for separate devices.

Bring your own models

Already running embedding / reranker services? Just point to them:

git clone https://github.com/SquadyAI/RealtimeIntent.git && cd RealtimeIntent
cp .env.example .env
# Edit .env: set EMBEDDING_API_URL and RERANK_API_URL
docker compose up -d    # starts intent-service + Qdrant only

No Docker

pip install -r requirements.txt
export EMBEDDING_API_URL=http://localhost:30003/v1/embeddings
export RERANK_API_URL=http://localhost:30000/score
uvicorn app.main:app --host 0.0.0.0 --port 8000 --loop uvloop

Adding Your Own Intents

No retraining needed. Add examples at runtime:

# Single
curl -X POST http://localhost:8000/insert_entry \
  -H "Content-Type: application/json" \
  -d '{"label": "agent.smart_home.light", "text": "Turn off the living room lights"}'

# Batch
curl -X POST http://localhost:8000/batch_insert \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {"label": "agent.smart_home.light", "text": "把卧室灯调暗一点"},
      {"label": "agent.smart_home.light", "text": "Dim the bedroom lights"},
      {"label": "agent.smart_home.ac", "text": "Set AC to 24 degrees"}
    ]
  }'

Or bulk-load from HuggingFace MASSIVE (supports zh-CN, en-US, ja-JP, ko-KR, and more):

python scripts/seed/index_massive_to_qdrant.py --dataset-name SetFit/amazon_massive_intent_en-US

API Reference

Interactive docs at http://localhost:8000/docs (Swagger UI).

POST /intent — Detect intent

Parameter Type Required Description
conversation List[Dict] Yes Conversation history in [{"role": "user", "content": "..."}] format
timeout float No Request timeout in seconds
debug bool No Return match details (score, source)
curl -X POST http://localhost:8000/intent \
  -H "Content-Type: application/json" \
  -d '{
    "conversation": [
      {"role": "user", "content": "What is the weather like today?"},
      {"role": "assistant", "content": "Sunny, 25°C."},
      {"role": "user", "content": "What about tomorrow?"}
    ],
    "debug": true
  }'
{
  "intent": "agent.information.weather",
  "payload": {
    "payload": { "label": "agent.information.weather", "text": "明天天气怎么样", "source_file": "SetFit/amazon_massive_intent_zh-CN" },
    "score": 0.766
  }
}

Multi-turn works out of the box — the optional LLM summary keeps context across turns ("What about tomorrow?" → still weather).

GET /labels — List all intents
curl http://localhost:8000/labels
# → {"success": true, "count": 4, "labels": ["agent.information.weather", ...]}
POST /insert_entry — Add one example
curl -X POST http://localhost:8000/insert_entry \
  -H "Content-Type: application/json" \
  -d '{"label": "agent.information.weather", "text": "明天会下雨吗"}'
# → {"success": true, "point_id": "..."}
POST /batch_insert — Add examples in bulk
curl -X POST http://localhost:8000/batch_insert \
  -H "Content-Type: application/json" \
  -d '{"items": [
    {"label": "agent.information.weather", "text": "今天天气怎么样"},
    {"label": "agent.conversation.end", "text": "再见,下次聊"}
  ]}'
# → {"success": true, "count": 2, "point_ids": ["...", "..."]}
POST /get_label_content — Browse examples by label
Parameter Type Required Description
label string Yes Intent label to query
limit int No Page size (1–100)
offset string No Pagination cursor from previous response
curl -X POST http://localhost:8000/get_label_content \
  -H "Content-Type: application/json" \
  -d '{"label": "agent.calendar.set", "limit": 10}'
# → {"success": true, "count": 10, "has_next": true, "next_page_offset": "...", "points": [...]}
DELETE /delete_point/{point_id} — Remove one example
curl -X DELETE http://localhost:8000/delete_point/123e4567-e89b-12d3-a456-426614174000
POST /batch_delete_points — Batch remove
curl -X POST http://localhost:8000/batch_delete_points \
  -H "Content-Type: application/json" \
  -d '{"point_ids": ["id1", "id2"]}'
POST /delete_by_label — Remove all examples for a label

Warning: This deletes all data under the label and cannot be undone.

curl -X POST http://localhost:8000/delete_by_label \
  -H "Content-Type: application/json" \
  -d '{"label": "agent.calendar.set", "confirm": true}'
Monitoring
Method Path Description
GET /health Service health check
GET /metrics Latency percentiles, cache hit rates
GET /stats Detailed service statistics

Error Handling

Status Description
200 Success
400 Bad request — invalid params or missing required fields
404 Resource not found
408 Timeout — processing exceeded the specified timeout
500 Internal error — dependency or database failure

Error response format: {"detail": "error message"}

Client Examples

Python
import requests

def detect_intent(conversation, timeout=5.0):
    resp = requests.post("http://localhost:8000/intent", json={
        "conversation": conversation,
        "timeout": timeout,
    })
    resp.raise_for_status()
    return resp.json()["intent"]

intent = detect_intent([
    {"role": "user", "content": "今天天气怎么样?"},
    {"role": "assistant", "content": "今天天气晴朗。"},
    {"role": "user", "content": "明天呢?"},
])
print(intent)  # agent.information.weather
JavaScript
async function detectIntent(conversation, timeout = 5.0) {
  const res = await fetch("http://localhost:8000/intent", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ conversation, timeout }),
  });
  if (!res.ok) throw new Error(`${res.status} ${await res.text()}`);
  return (await res.json()).intent;
}

const intent = await detectIntent([
  { role: "user", content: "今天天气怎么样?" },
  { role: "assistant", content: "今天天气晴朗。" },
  { role: "user", content: "明天呢?" },
]);
console.log(intent); // agent.information.weather

Best Practices

  • Conversation context: Provide 2–3 turns for best accuracy on follow-up queries
  • Batch operations: Use /batch_insert for bulk data loading instead of calling /insert_entry in a loop
  • Pagination: Use /get_label_content with limit + offset for large datasets

Configuration

All via environment variables (.env.example has the full list):

Variable Default What it does
EMBEDDING_API_URL http://localhost:30003/v1/embeddings OpenAI-compatible embedding endpoint
EMBEDDING_MODEL Qwen/Qwen3-Embedding-4B Model name sent to embedding API
RERANK_API_URL http://localhost:30000/score Cross-encoder reranker endpoint
RERANK_MODEL Qwen3-Reranker-4B Model name sent to reranker API
QDRANT_HOST localhost Qdrant server
QDRANT_COLLECTION massive_intents Collection name
TOP_K 6 Retrieval candidates before reranking
NO_INTENT_RERANK_THRESHOLD 0.55 Below this score → __no_intent__
INSTRUCT_API_URL http://localhost:8001/v1/chat/completions (optional) LLM for multi-turn summary
LOG_LEVEL INFO Logging verbosity

Model Deployment

docker-compose.gpu.yml handles this automatically. For manual deployment:

Service Model Framework Command
Embedding Qwen3-Embedding-4B SGLang python -m sglang.launch_server --model-path Qwen/Qwen3-Embedding-4B --port 30003 --is-embedding
Reranker Qwen3-Reranker-0.6B vLLM vllm serve Qwen/Qwen3-Reranker-0.6B --task score --port 30000
LLM (optional) Qwen3-4B-AWQ Any OpenAI-compatible

Any OpenAI-compatible embedding API and cross-encoder reranker work as drop-in replacements.

The SquadyAI Ecosystem

RealtimeIntent is one piece of a larger open-source voice AI platform:

Project What it does Link
RealtimeAPI Core voice AI engine — ASR→LLM→TTS pipeline orchestration in Rust, <450 ms E2E SquadyAI/RealtimeAPI
RealtimeIntent Intent classification — vector search + neural reranking, <100 ms you are here
RealtimeSearch Multi-engine search gateway with automatic failover SquadyAI/RealtimeSearch
┌─────────────────────────────────── RealtimeAPI ───────────────────────────────────┐
│                                                                                   │
│   Audio ──▶ VAD ──▶ ASR ──▶ ┌──────────────────┐ ──▶ LLM ──▶ TTS ──▶ Audio      │
│                             │ ★ RealtimeIntent  │      │                          │
│                             │   < 100ms intent  │      │                          │
│                             └──────────────────┘      │                          │
│                                                  ┌────▼───────────────┐           │
│                                                  │  RealtimeSearch    │           │
│                                                  │  web search tool   │           │
│                                                  └────────────────────┘           │
│                                                                                   │
└───────────────────────────────────────────────────────────────────────────────────┘

Important

If you find RealtimeIntent useful, check out RealtimeAPI — the full voice conversation engine that powers real-time voice assistants with <450 ms latency and 100+ concurrent sessions.

License

Apache License 2.0


Built by SquadyAI contributors  ·  Part of the RealtimeAPI voice AI stack

About

Intent recognition for voice AI — 50x faster than LLM routing. Vector search + neural reranking in <100ms.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages