From 5585f89846ca6c5b881dbc0addd970db43bdc0e0 Mon Sep 17 00:00:00 2001 From: digitallysavvy Date: Wed, 11 Mar 2026 21:07:56 -0400 Subject: [PATCH 1/4] 2 tier discovery --- skills/agora/SKILL.md | 16 ++-- skills/agora/intake/SKILL.md | 2 +- .../references/cloud-recording/README.md | 13 +-- .../references/conversational-ai/README.md | 63 ++++++------ skills/agora/references/doc-fetching.md | 39 ++++++++ skills/agora/references/mcp-tools.md | 95 ++++++------------- skills/agora/references/rtc/README.md | 8 ++ skills/agora/references/rtm/README.md | 5 + .../agora/references/server-gateway/README.md | 11 +-- skills/agora/references/server/README.md | 56 +++++++++-- 10 files changed, 182 insertions(+), 126 deletions(-) create mode 100644 skills/agora/references/doc-fetching.md diff --git a/skills/agora/SKILL.md b/skills/agora/SKILL.md index 03c2a80..a3a944d 100644 --- a/skills/agora/SKILL.md +++ b/skills/agora/SKILL.md @@ -1,6 +1,6 @@ --- name: agora -description: Write code using Agora SDKs (agora.io) for real-time communication. Covers RTC (video/voice calling, live streaming), RTM (signaling, messaging, presence), Conversational AI (voice AI agents), Cloud Recording, and server-side token generation. Use when the user wants to build real-time audio/video applications, integrate Agora SDKs (Web JS/TS, React, iOS Swift, Android Kotlin/Java, Go, Python), manage channels, tracks, tokens, use RTM for messaging/signaling, or build Conversational AI with the agent-toolkit. Triggers on mentions of Agora, agora.io, RTC, RTM, video calling, voice calling, real-time communication, agora-rtc-sdk-ng, agora-rtc-react, agora-rtm, conversational AI with Agora, Agora token generation, Cloud Recording, agora-agent-client-toolkit, agora-agent-client-toolkit-react, AgoraVoiceAI, useConversationalAI, useTranscript, useAgentState, agent transcript, agent state hook. +description: Write code using Agora SDKs (agora.io) for real-time communication. Covers RTC (video/voice calling, live streaming), RTM (signaling, messaging, presence), Conversational AI (voice AI agents), Cloud Recording, and server-side token generation. Use when the user wants to build real-time audio/video applications, integrate Agora SDKs (Web JS/TS, React, iOS Swift, Android Kotlin/Java, Go, Python), manage channels, tracks, tokens, use RTM for messaging/signaling, or build Conversational AI with the agent-toolkit. Triggers on mentions of Agora, agora.io, RTC, RTM, video calling, voice calling, real-time communication, agora-rtc-sdk-ng, agora-rtc-react, agora-rtm, conversational AI with Agora, Agora token generation, Cloud Recording, agora-agent-client-toolkit, agora-agent-client-toolkit-react, agora-agent-server-sdk, AgoraVoiceAI, AgoraClient, useConversationalAI, useTranscript, useAgentState, agent transcript, agent state hook. metadata: author: agora version: '1.1.0' @@ -78,17 +78,15 @@ Examples of clear requests: - "Generate RTC token in Go" → `references/server/tokens.md` **Vague or multi-product request:** Route through `intake/SKILL.md`. +Intake handles product identification, combination recommendations, and routing. -Examples of vague requests: +## Documentation Lookup -- "I want to build an AI customer service bot" (product unclear) -- "Help me set up live streaming with recording" (multi-product) -- "What do I need to build a voice app?" (product unknown) +Check bundled references first (Level 1). If they don't cover the detail needed, +fetch `https://docs.agora.io/en/llms.txt`, find the relevant URL, and fetch it (Level 2). +See [references/doc-fetching.md](references/doc-fetching.md) for the full procedure and freeze-forever decision table. -## MCP Integration - -When MCP is configured, product skills use the Agora Doc MCP server for fast-moving -content. See [mcp-tools.md](references/mcp-tools.md) for tool reference and compatibility notes. +If a user explicitly asks about the Agora MCP server, see [references/mcp-tools.md](references/mcp-tools.md). ## Web Framework Notes diff --git a/skills/agora/intake/SKILL.md b/skills/agora/intake/SKILL.md index 4462ad0..a9a321c 100644 --- a/skills/agora/intake/SKILL.md +++ b/skills/agora/intake/SKILL.md @@ -10,7 +10,7 @@ description: | license: MIT metadata: author: agora - version: "1.0.0" + version: "1.1.0" --- # Agora Intake — Product Routing & Needs Analysis diff --git a/skills/agora/references/cloud-recording/README.md b/skills/agora/references/cloud-recording/README.md index 59f9744..6dfb755 100644 --- a/skills/agora/references/cloud-recording/README.md +++ b/skills/agora/references/cloud-recording/README.md @@ -12,15 +12,12 @@ Server-side recording of RTC channel audio/video. REST API only — no client SD | Prerequisite | Cloud Recording enabled in Agora Console | | Depends on | Active RTC channel with participants | -## MCP Quick Start +## Documentation -When MCP is available, fetch the full quick start guide before writing any code: - -```text -get-doc-content {"uri": "docs://default/cloud-recording/restful/get-started/quick-start"} -``` - -If MCP is unavailable: +The bundled reference below covers the recording lifecycle, modes, and error handling. +For the full REST API field reference and request/response schemas, use Level 2 fetch +(see [doc-fetching.md](../doc-fetching.md)) or fetch directly: + ## Recording Lifecycle diff --git a/skills/agora/references/conversational-ai/README.md b/skills/agora/references/conversational-ai/README.md index 26ce199..bb103b3 100644 --- a/skills/agora/references/conversational-ai/README.md +++ b/skills/agora/references/conversational-ai/README.md @@ -2,6 +2,14 @@ REST API-driven voice AI agents. Create agents that join RTC channels and converse with users via speech. Front-end clients connect via RTC+RTM. +The TypeScript, Go, and Python SDKs are convenience wrappers around this REST API. +For any other backend language (Java, Ruby, PHP, C#, etc.), call the REST API directly. +The live OpenAPI spec is the authoritative source for request/response schemas: + +``` +GET https://docs-md.agora.io/api/conversational-ai-api-v2.x.yaml +``` + ## Architecture ```text @@ -19,27 +27,23 @@ ASR → LLM → TTS Receives audio + transcripts 3. ASR converts speech to text → LLM generates response → TTS converts to speech 4. The agent publishes audio back to the channel; transcripts arrive via RTC data channel or RTM -## MCP Integration - -The ConvoAI REST API documentation is fast-moving. Use MCP to fetch current parameter -details rather than relying on inline content. - -**When MCP is available:** Call `get-doc-content` with the Quick Start URI for your language: +## Documentation Lookup -- Python/curl: `docs://default/convoai/restful/get-started/quick-start` -- Go: `docs://default/convoai/restful/get-started/quick-start-go` -- Java: `docs://default/convoai/restful/get-started/quick-start-java` +The bundled references in this file cover gotchas, generation rules, and the stable +behavioral contracts. For content that changes with doc updates, use Level 2: -**When MCP is unavailable:** +1. Fetch `https://docs.agora.io/en/llms.txt` +2. Scan for a URL matching your topic (e.g., `conversational-ai`, `quick-start`, `rest-api`) +3. Fetch that URL -1. Fetch the live OpenAPI spec: `https://docs-md.agora.io/api/conversational-ai-api-v2.x.yaml` -2. Fall back to: -3. Notify the user: "MCP unavailable — using local fallback. Please verify against - current docs before deploying." +Common topics to fetch via Level 2: quick-start code (Python, Go, Java), TTS/ASR/LLM +vendor configs, error code listings. -The behavioral guidance and gotchas in this file (uid types, agent name uniqueness, MLLM location field, etc.) are always valid regardless of MCP status. +For full request/response schemas, fetch the OpenAPI spec directly — it is always +current and covers every endpoint and field: +`https://docs-md.agora.io/api/conversational-ai-api-v2.x.yaml` -See [../mcp-tools.md](../mcp-tools.md) for full MCP tool reference. +See [../doc-fetching.md](../doc-fetching.md) for the full procedure. ## Authentication @@ -126,26 +130,29 @@ Each file maps to one repo in [AgoraIO-Conversational-AI](https://github.com/Ago Full request/response details for all endpoints: -- **[Start Agent (Join)](https://docs.agora.io/en/conversational-ai/rest-api/agent/join)** — POST /join: start agent with LLM/TTS/ASR config -- **[Stop Agent (Leave)](https://docs.agora.io/en/conversational-ai/rest-api/agent/leave)** — POST /leave: stop agent -- **[Update Agent](https://docs.agora.io/en/conversational-ai/rest-api/agent/update)** — POST /update: update token, LLM config -- **[Query Agent Status](https://docs.agora.io/en/conversational-ai/rest-api/agent/query)** — GET /agents/{id}: query status -- **[List Agents](https://docs.agora.io/en/conversational-ai/rest-api/agent/list)** — GET /agents: list with filters -- **[Broadcast Message (Speak)](https://docs.agora.io/en/conversational-ai/rest-api/agent/speak)** — POST /speak: broadcast TTS -- **[Interrupt Agent](https://docs.agora.io/en/conversational-ai/rest-api/agent/interrupt)** — POST /interrupt -- **[Conversation History](https://docs.agora.io/en/conversational-ai/rest-api/agent/history)** — GET /history +- **[Start Agent (Join)](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/join.md)** — POST /join: start agent with LLM/TTS/ASR config +- **[Stop Agent (Leave)](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/leave.md)** — POST /leave: stop agent +- **[Update Agent](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/update.md)** — POST /update: update token, LLM config +- **[Query Agent Status](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/query.md)** — GET /agents/{id}: query status +- **[List Agents](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/list.md)** — GET /agents: list with filters +- **[Broadcast Message (Speak)](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/speak.md)** — POST /speak: broadcast TTS +- **[Interrupt Agent](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/interrupt.md)** — POST /interrupt +- **[Conversation History](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/history.md)** — GET /history ## Agent Configuration (join payload `properties` object) -- **[Custom LLM Guide](https://docs.agora.io/en/conversational-ai/develop/custom-llm)** — LLM vendor, model, url, api_key, system prompt, greeting, style; TTS vendor, model, voice settings; ASR vendor, language, model -- **[Gemini Live MLLM](https://docs.agora.io/en/conversational-ai/models/mllm/gemini)** — Multimodal: vendor, model, credentials, location -- **[Join Endpoint (full schema)](https://docs.agora.io/en/conversational-ai/rest-api/agent/join)** — Complete properties schema: channel, token, turn detection, VAD, tools, avatars, encryption, filler words -- **[Release Notes](https://docs.agora.io/en/conversational-ai/overview/release-notes)** — New parameters and features +- **[Custom LLM Guide](https://docs-md.agora.io/en/conversational-ai/develop/custom-llm.md)** — LLM vendor, model, url, api_key, system prompt, greeting, style; TTS vendor, model, voice settings; ASR vendor, language, model +- **[Gemini Live MLLM](https://docs-md.agora.io/en/conversational-ai/models/mllm/gemini.md)** — Multimodal: vendor, model, credentials, location +- **[Join Endpoint (full schema)](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/join.md)** — Complete properties schema: channel, token, turn detection, VAD, tools, avatars, encryption, filler words +- **[Release Notes](https://docs-md.agora.io/en/conversational-ai/overview/release-notes.md)** — New parameters and features ## Gotchas & Quirks Things the official docs don't emphasize that cause frequent mistakes: +- **`agent_rtc_uid` is a string, not an int** — pass `"0"` (string) for auto-assignment, not `0`. Passing an integer will cause a type error at the API boundary. +- **`remote_rtc_uids` is an array of strings** — use `["*"]` to subscribe to all users, not `"*"` or `["0"]`. The wildcard must be in array form. +- **Agent name must be unique per project** — collisions return HTTP 409. Use a short UUID suffix: `agent_{uuid[:8]}`. On 409, generate a new name and retry; do not retry with the same name. - **Token auth is not in the official docs yet — use it anyway.** The ConvoAI REST API accepts `Authorization: agora token=` using a combined RTC + RTM token from `RtcTokenBuilder.buildTokenWithRtm`. This is **safer than Basic Auth**: tokens are scoped to a single App ID + channel, while Customer ID/Secret grants access to every project on the account. Default to token auth unless the user explicitly requests Basic Auth. See [Authentication → Option A](#authentication) for the implementation. - **`/update` overwrites `params` entirely** — sending `{ "llm": { "params": { "max_tokens": 2048 } } }` erases `model` and everything else in `params`. Always send the full object. diff --git a/skills/agora/references/doc-fetching.md b/skills/agora/references/doc-fetching.md new file mode 100644 index 0000000..31da701 --- /dev/null +++ b/skills/agora/references/doc-fetching.md @@ -0,0 +1,39 @@ +# Documentation Lookup + +**Level 1 — Bundled references (always try first)** + +Check the relevant file under `skills/agora/references/`. These are inline-stable: +RTC init patterns, RTM messaging, token generation, ConvoAI gotchas and generation +rules. If the answer is here, stop — no fetch needed. + +**Level 2 — Live docs (when Level 1 is insufficient)** + +When bundled references don't cover the detail needed (full request/response schemas, +vendor-specific configs, language-specific quick-start code): + +1. Fetch the Agora docs sitemap: + ``` + GET https://docs.agora.io/en/llms.txt + ``` +2. Scan the response for a URL matching the product and topic. +3. Fetch that URL and use its content to answer. + +## Fallback + +If `llms.txt` is unreachable or the fetched URL returns no useful content, try these +known markdown entry points directly: + +| Product | Markdown URL | +|---|---| +| RTC | https://docs-md.agora.io/en/video-calling/get-started/get-started-sdk.md | +| RTM | https://docs-md.agora.io/en/signaling/get-started/sdk-quickstart.md | +| ConvoAI | https://docs-md.agora.io/en/conversational-ai/get-started/quickstart.md | +| Cloud Recording | https://docs-md.agora.io/en/cloud-recording/get-started/getstarted.md | +| Server Gateway | https://docs-md.agora.io/en/server-gateway/get-started/integrate-sdk.md | +| Tokens | https://docs-md.agora.io/en/video-calling/token-authentication/deploy-token-server.md | + +## Agora MCP Server (optional) + +Agora also provides an MCP server that gives AI assistants direct tool-call access +to documentation — an alternative to the Level 2 HTTP fetch above. If a user asks +about installing or using the Agora MCP, see [mcp-tools.md](mcp-tools.md). diff --git a/skills/agora/references/mcp-tools.md b/skills/agora/references/mcp-tools.md index caad6a2..dcd093d 100644 --- a/skills/agora/references/mcp-tools.md +++ b/skills/agora/references/mcp-tools.md @@ -1,80 +1,43 @@ -# Agora Doc MCP Tools +# Agora Doc MCP Server -Internal guide for the model. Describes how to use the Agora Doc MCP server -to fetch up-to-date documentation during skill execution. +The Agora Doc MCP server gives AI assistants direct tool-call access to Agora +documentation. It is an optional enhancement — the skill works without it using +the two-tier fetch approach in [doc-fetching.md](doc-fetching.md). + +**Only use MCP when the user explicitly asks for it.** The default documentation +lookup is the two-tier fetch approach in [doc-fetching.md](doc-fetching.md) — use +that regardless of whether MCP is installed. **MCP endpoint:** `https://mcp.agora.io` ## Tools -| Tool | Input | Returns | When to use | -| ----------------- | ---------------------------------- | ------------------------- | ------------------------------- | -| `get-doc-content` | `{"uri": "docs://..."}` | Full markdown content | Read a specific doc (preferred) | -| `search-docs` | `{"query": "keyword"}` | List of matching doc URIs | Find docs when URI is unknown | -| `list-docs` | `{"category": "...", "limit": 20}` | All docs in a category | Browse available docs | - -## Preferred Approach: Direct URI - -When the doc URI is known, call `get-doc-content` directly — no search needed. - -> **After fetching quick-start docs:** use the fetched content for API structure and field names only. Do NOT copy sample code verbatim — quick-start examples typically hardcode credentials and omit production requirements. Apply the gotchas and rules in `references/conversational-ai/README.md` to any generated code (token auth, uid types, agent name uniqueness, credential env vars). - -```text -get-doc-content {"uri": "docs://default/convoai/restful/get-started/quick-start"} -``` - -## Known Doc URIs - -| Product | Topic | URI | -| --------------- | ------------------------- | ---------------------------------------------------------------- | -| ConvoAI | Quick Start (Python/curl) | `docs://default/convoai/restful/get-started/quick-start` | -| ConvoAI | Quick Start (Go) | `docs://default/convoai/restful/get-started/quick-start-go` | -| ConvoAI | Quick Start (Java) | `docs://default/convoai/restful/get-started/quick-start-java` | -| RTC | Quick Start (Web) | `docs://default/rtc/javascript/get-started/quick-start` | -| RTC | Quick Start (Android) | `docs://default/rtc/android/get-started/quick-start` | -| RTC | Quick Start (iOS) | `docs://default/rtc/ios/get-started/quick-start` | -| RTM | Quick Start (Web) | `docs://default/rtm2/javascript/get-started/quick-start` | -| Cloud Recording | Quick Start | `docs://default/cloud-recording/restful/get-started/quick-start` | +| Tool | Input | Returns | +|---|---|---| +| `get-doc-content` | `{"uri": "docs://..."}` | Full markdown content | +| `search-docs` | `{"query": "keyword"}` | List of matching doc URIs | +| `list-docs` | `{"category": "...", "limit": 20}` | All docs in a category | -## Fallback: Search Then Read +Use `search-docs` when the topic is known but the URI isn't. Use `get-doc-content` +directly when the URI is known. -When the URI is unknown, search first: +## Installation -```text -Step 1: search-docs {"query": "convoai "} - → returns [{uri: "docs://...", text: "..."}, ...] - -Step 2: get-doc-content {"uri": "docs://..."} - → returns full doc content +**Claude Code:** +```bash +claude mcp add agora-docs --transport http https://mcp.agora.io ``` -## When to Call MCP - -**Always call for:** - -- ConvoAI API field details, request/response schemas, vendor configurations (TTS, ASR) -- Error codes and their meanings (ConvoAI, Cloud Recording) -- Any content that changes with documentation updates - -**Do NOT call for:** - -- RTC initialization, track management, event registration — stable, in `references/rtc/` -- RTM messaging patterns — stable, in `references/rtm/` -- Token generation patterns — stable, in `references/server/` -- ConvoAI gotchas and critical rules — behavioral knowledge, inline in `references/conversational-ai/README.md` - -## Freeze-Forever Content - -The "When to Call MCP" section above is the categorization table. Rule: if content changes with doc updates or vendor releases, call MCP. If it's a stable SDK pattern, use inline skill content. - -## AI Assistant MCP Support +**Cursor / Windsurf / other MCP-compatible tools:** Add `https://mcp.agora.io` as +an HTTP MCP server in your tool's MCP settings. See your tool's documentation for +the exact configuration format. -MCP tool calls require an MCP-compatible AI assistant with the Agora Doc MCP server -configured at `https://doc-mcp.shengwang.cn/mcp`. +For the latest setup instructions and any changes to the endpoint, see: + -- **Claude Code**: MCP supported. Install the Agora Doc MCP server per official instructions. -- **Cursor, Windsurf, GitHub Copilot**: MCP support varies by version. Check your tool's documentation. +## Usage Note -When MCP is not available: use the graceful degradation paths defined in each product skill. -For ConvoAI: see the MCP Fallback section in `references/conversational-ai/README.md`. -For all other products: inline code in the skill files is the primary source — no degradation needed. +After fetching quick-start docs via MCP, use the content for API structure and field +names only. Do NOT copy sample code verbatim — quick-start examples typically hardcode +credentials and omit production requirements. Apply the gotchas and generation rules +in `references/conversational-ai/README.md` to any generated code. diff --git a/skills/agora/references/rtc/README.md b/skills/agora/references/rtc/README.md index 7d1810d..1121ee7 100644 --- a/skills/agora/references/rtc/README.md +++ b/skills/agora/references/rtc/README.md @@ -93,3 +93,11 @@ Read the file matching the user's platform: - **[android.md](android.md)** — `RtcEngine` (Kotlin/Java): engine setup, callbacks, permissions For test setup and mocking patterns, see [references/testing-guidance/SKILL.md](../testing-guidance/SKILL.md). + +## Live Docs + +For content not covered by the bundled platform files (advanced features, new SDK +capabilities, additional platforms), fetch the entry point directly: + +- **Video calling:** +- **Voice calling:** diff --git a/skills/agora/references/rtm/README.md b/skills/agora/references/rtm/README.md index e4bc0bb..9bcaad2 100644 --- a/skills/agora/references/rtm/README.md +++ b/skills/agora/references/rtm/README.md @@ -22,3 +22,8 @@ Signaling, text messaging, presence, and metadata — used alongside or independ ## Platform Reference Files - **[web.md](web.md)** — `agora-rtm` v2 (JS/TS): RTM client, messaging, presence, v1 legacy API +- **iOS / Android** — fetch the entry point below and follow platform-specific links + +## Live Docs + + diff --git a/skills/agora/references/server-gateway/README.md b/skills/agora/references/server-gateway/README.md index 828118a..65ddb0b 100644 --- a/skills/agora/references/server-gateway/README.md +++ b/skills/agora/references/server-gateway/README.md @@ -67,15 +67,10 @@ Hardware minimum: 8-core CPU 1.8 GHz, 2 GB RAM (4 GB recommended). ## Platform Reference Files - **[linux-cpp.md](linux-cpp.md)** — C++ full implementation: init, senders, receivers, video mixing, shutdown sequence +- **Java, Go, Python** — see the official documentation links below for each platform ## Official Documentation -- **[Product Overview](https://docs.agora.io/en/server-gateway/overview/product-overview)** -- **[Integrate the SDK — C++](https://docs.agora.io/en/server-gateway/get-started/integrate-sdk?platform=linux-cpp)** -- **[Integrate the SDK — Java](https://docs.agora.io/en/server-gateway/get-started/integrate-sdk?platform=linux-java)** -- **[Integrate the SDK — Go](https://docs.agora.io/en/server-gateway/get-started/integrate-sdk?platform=go)** — Go SDK: `github.com/AgoraIO-Extensions/Agora-Golang-Server-SDK` -- **[Integrate the SDK — Python](https://docs.agora.io/en/server-gateway/get-started/integrate-sdk?platform=python)** -- **[Send and Receive Media Streams](https://docs.agora.io/en/server-gateway/develop/send-receive-media-streams)** -- **[API Reference](https://docs.agora.io/en/server-gateway/reference/api)** +- **[Product Overview](https://docs-md.agora.io/en/server-gateway/overview/product-overview.md)** +- **[Integrate the SDK](https://docs-md.agora.io/en/server-gateway/get-started/integrate-sdk.md)** — covers C++, Java, Go (`github.com/AgoraIO-Extensions/Agora-Golang-Server-SDK`), Python - **[SDK Downloads](https://docs.agora.io/en/sdks)** -- **[Release Notes](https://docs.agora.io/en/server-gateway/overview/release-notes)** diff --git a/skills/agora/references/server/README.md b/skills/agora/references/server/README.md index bebe232..769a6ad 100644 --- a/skills/agora/references/server/README.md +++ b/skills/agora/references/server/README.md @@ -15,18 +15,62 @@ Server-side utilities for Agora — primarily token generation for secure authen - **RTM Token**: Grants access to RTM services for a specific user ID. - **AccessToken2**: Current token format. Supports privilege expiration per service and can bundle RTC + RTM privileges in a single token. -## ConvoAI REST API Authentication +## ConvoAI Agent Server SDKs -The `agora-agent-sdk` TypeScript SDK supports both token-based auth and Basic Auth for the ConvoAI REST API: +The TypeScript, Go, and Python SDKs are convenience wrappers around the ConvoAI REST API. +For any other backend language, call the REST API directly — fetch the live OpenAPI spec +for the full schema: `https://docs-md.agora.io/api/conversational-ai-api-v2.x.yaml` -- **Token auth (preferred)**: Pass `appId` + `appCertificate` when creating the client — the SDK generates a combined RTC + RTM token (via `RtcTokenBuilder.buildTokenWithRtm`) for each API call automatically. Or pass a pre-built token via `authToken`. -- **Basic Auth (legacy)**: Pass `customerId` + `customerSecret` (from Agora Console → Developer Toolkit → RESTful API). +Use these SDKs when building with TypeScript, Go, or Python to avoid writing REST boilerplate: + +### TypeScript — `agora-agent-server-sdk` + +```bash +npm install agora-agent-server-sdk +``` + +Builder pattern — configure the AI pipeline then create sessions: + +```typescript +import { AgoraClient, Agent, Area } from 'agora-agent-server-sdk'; + +const client = new AgoraClient({ + area: Area.US, + appId: process.env.AGORA_APP_ID, + appCertificate: process.env.AGORA_APP_CERTIFICATE, +}); + +const agent = new Agent({ + name: `agent_${crypto.randomUUID().slice(0, 8)}`, // must be unique per project + instructions: 'You are a helpful voice assistant.', + greeting: 'Hello! How can I help you today?', +}) + .withStt(new DeepgramSTT({ apiKey: process.env.DEEPGRAM_API_KEY })) + .withLlm(new OpenAI({ apiKey: process.env.OPENAI_API_KEY })) + .withTts(new ElevenLabsTTS({ apiKey: process.env.ELEVENLABS_API_KEY })); + +// Start a session (joins the agent to a channel) +const session = agent.createSession({ channel: 'my-channel', agentUid: 0 }); +const sessionId = await session.start(); + +// Stop from the same process +await session.stop(); + +// Stop from a stateless server (e.g. a different request handler) +await client.stopAgent(sessionId); +``` + +Token auth is handled automatically when `appCertificate` is provided. For vendor-specific STT/LLM/TTS import paths and MLLM (OpenAI Realtime, Gemini Live) config, see the [SDK README](https://github.com/AgoraIO-Conversational-AI/agent-server-sdk-ts). + +### Go / Python -See the agent SDK READMEs for full examples: -- [agent-server-sdk-ts](https://github.com/AgoraIO-Conversational-AI/agent-server-sdk-ts) - [agent-server-sdk-go](https://github.com/AgoraIO-Conversational-AI/agent-server-sdk-go) - [agent-server-sdk-python](https://github.com/AgoraIO-Conversational-AI/agent-server-sdk-python) +## Live Docs + + + ## Reference Files - **[tokens.md](tokens.md)** — Token generation for Node.js, Python, and Go. Express server example, security best practices. From f762e13e7415ea530981b5a3ee232f6737a92f17 Mon Sep 17 00:00:00 2001 From: digitallysavvy Date: Wed, 11 Mar 2026 21:08:32 -0400 Subject: [PATCH 2/4] updated CLAUDE --- CLAUDE.md | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 6c2e393..4fdd9cf 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -12,12 +12,15 @@ skills/ ├── SKILL.md ← entry point; do not restructure this file ├── intake/SKILL.md ← intake router; do not restructure this file └── references/ - ├── mcp-tools.md ← MCP reference + freeze-forever decision table + ├── doc-fetching.md ← two-tier lookup procedure (agent-facing) + ├── mcp-tools.md ← Agora MCP server install guide (user-facing) ├── rtc/ ├── rtm/ ├── conversational-ai/ ├── server/ - └── cloud-recording/ + ├── cloud-recording/ + ├── server-gateway/ + └── testing-guidance/ ``` ## Protected Files @@ -36,9 +39,20 @@ These files contain stable, high-value inline examples. Edits require a verified Before adding any inline content, ask: **will this still be correct in 6 months without any updates?** - **Yes** → put it inline (stable APIs, initialization sequences, gotchas) -- **No** → route to MCP or an external link (REST API schemas, SDK changelogs, vendor configs, model names) - -See [`skills/agora/references/mcp-tools.md`](skills/agora/references/mcp-tools.md) for the full decision table. +- **No** → route to Level 2 fetch or an external link (REST API schemas, SDK changelogs, vendor configs, model names) + +| Content type | Where it lives | +|---|---| +| RTC initialization, track management, event registration | Inline — `references/rtc/` | +| RTM messaging and presence patterns | Inline — `references/rtm/` | +| Token generation (RTC, RTM, AccessToken2) | Inline — `references/server/` | +| ConvoAI gotchas, field-type rules, lifecycle | Inline — `references/conversational-ai/README.md` | +| ConvoAI quick-start code (Python, Go, Java) | Level 2 fetch | +| ConvoAI full request/response schemas | Level 2 fetch | +| TTS / ASR / LLM vendor configs and model names | Level 2 fetch | +| Cloud Recording REST API field details | Level 2 fetch | +| Error code listings | Level 2 fetch | +| Release notes and new parameters | Level 2 fetch | ## Naming Rule From e9d9e92b618f6194d47ddb7774a848bf489d857a Mon Sep 17 00:00:00 2001 From: digitallysavvy Date: Wed, 11 Mar 2026 21:31:15 -0400 Subject: [PATCH 3/4] update links in references --- skills/agora/references/cloud-recording/README.md | 8 +------- skills/agora/references/conversational-ai/README.md | 2 +- skills/agora/references/rtc/README.md | 10 ++-------- skills/agora/references/rtm/README.md | 8 ++------ skills/agora/references/server/README.md | 5 +---- 5 files changed, 7 insertions(+), 26 deletions(-) diff --git a/skills/agora/references/cloud-recording/README.md b/skills/agora/references/cloud-recording/README.md index 6dfb755..802ab55 100644 --- a/skills/agora/references/cloud-recording/README.md +++ b/skills/agora/references/cloud-recording/README.md @@ -11,13 +11,7 @@ Server-side recording of RTC channel audio/video. REST API only — no client SD | Auth | HTTP Basic Auth (`AGORA_CUSTOMER_KEY:AGORA_CUSTOMER_SECRET`) | | Prerequisite | Cloud Recording enabled in Agora Console | | Depends on | Active RTC channel with participants | - -## Documentation - -The bundled reference below covers the recording lifecycle, modes, and error handling. -For the full REST API field reference and request/response schemas, use Level 2 fetch -(see [doc-fetching.md](../doc-fetching.md)) or fetch directly: - +| Full REST API reference | | ## Recording Lifecycle diff --git a/skills/agora/references/conversational-ai/README.md b/skills/agora/references/conversational-ai/README.md index bb103b3..7b35286 100644 --- a/skills/agora/references/conversational-ai/README.md +++ b/skills/agora/references/conversational-ai/README.md @@ -153,7 +153,7 @@ Things the official docs don't emphasize that cause frequent mistakes: - **`agent_rtc_uid` is a string, not an int** — pass `"0"` (string) for auto-assignment, not `0`. Passing an integer will cause a type error at the API boundary. - **`remote_rtc_uids` is an array of strings** — use `["*"]` to subscribe to all users, not `"*"` or `["0"]`. The wildcard must be in array form. - **Agent name must be unique per project** — collisions return HTTP 409. Use a short UUID suffix: `agent_{uuid[:8]}`. On 409, generate a new name and retry; do not retry with the same name. -- **Token auth is not in the official docs yet — use it anyway.** The ConvoAI REST API accepts `Authorization: agora token=` using a combined RTC + RTM token from `RtcTokenBuilder.buildTokenWithRtm`. This is **safer than Basic Auth**: tokens are scoped to a single App ID + channel, while Customer ID/Secret grants access to every project on the account. Default to token auth unless the user explicitly requests Basic Auth. See [Authentication → Option A](#authentication) for the implementation. +- **Token auth is not in the official docs — use it for production.** The ConvoAI REST API accepts `Authorization: agora token=` using a combined RTC + RTM token from `RtcTokenBuilder.buildTokenWithRtm`. This is **safer than Basic Auth**: tokens are scoped to a single App ID + channel, while Customer ID/Secret grants access to every project on the account. Default to token auth unless the user explicitly requires officially documented auth methods, in which case use Basic Auth. See [Authentication → Option A](#authentication) for the implementation. - **`/update` overwrites `params` entirely** — sending `{ "llm": { "params": { "max_tokens": 2048 } } }` erases `model` and everything else in `params`. Always send the full object. - **`/speak` priority enum** — `"INTERRUPT"` (immediate, default), `"APPEND"` (queued after current speech), `"IGNORE"` (skip if agent is busy). `interruptable: false` prevents users from cutting in. diff --git a/skills/agora/references/rtc/README.md b/skills/agora/references/rtc/README.md index 1121ee7..5e381d6 100644 --- a/skills/agora/references/rtc/README.md +++ b/skills/agora/references/rtc/README.md @@ -92,12 +92,6 @@ Read the file matching the user's platform: - **[ios.md](ios.md)** — `AgoraRtcEngineKit` (Swift): engine setup, delegation, permissions - **[android.md](android.md)** — `RtcEngine` (Kotlin/Java): engine setup, callbacks, permissions -For test setup and mocking patterns, see [references/testing-guidance/SKILL.md](../testing-guidance/SKILL.md). - -## Live Docs +For additional platforms and advanced features: — voice-only: -For content not covered by the bundled platform files (advanced features, new SDK -capabilities, additional platforms), fetch the entry point directly: - -- **Video calling:** -- **Voice calling:** +For test setup and mocking patterns, see [references/testing-guidance/SKILL.md](../testing-guidance/SKILL.md). diff --git a/skills/agora/references/rtm/README.md b/skills/agora/references/rtm/README.md index 9bcaad2..e372b0d 100644 --- a/skills/agora/references/rtm/README.md +++ b/skills/agora/references/rtm/README.md @@ -17,13 +17,9 @@ Signaling, text messaging, presence, and metadata — used alongside or independ - **Presence**: Track who is online, user status metadata. - **Storage**: Channel and user metadata (key-value store with versioning). - **Lock**: Distributed locking for shared resources. -- RTM uses **string UIDs** (not numeric like RTC). +- RTM uses **string UIDs** (not numeric like RTC). When using RTC and RTM together, use `String(rtcUid)` as the RTM user ID to maintain a consistent mapping. ## Platform Reference Files - **[web.md](web.md)** — `agora-rtm` v2 (JS/TS): RTM client, messaging, presence, v1 legacy API -- **iOS / Android** — fetch the entry point below and follow platform-specific links - -## Live Docs - - +- **iOS / Android** — diff --git a/skills/agora/references/server/README.md b/skills/agora/references/server/README.md index 769a6ad..b0be62c 100644 --- a/skills/agora/references/server/README.md +++ b/skills/agora/references/server/README.md @@ -67,10 +67,7 @@ Token auth is handled automatically when `appCertificate` is provided. For vendo - [agent-server-sdk-go](https://github.com/AgoraIO-Conversational-AI/agent-server-sdk-go) - [agent-server-sdk-python](https://github.com/AgoraIO-Conversational-AI/agent-server-sdk-python) -## Live Docs - - - ## Reference Files - **[tokens.md](tokens.md)** — Token generation for Node.js, Python, and Go. Express server example, security best practices. +- **Full token auth guide** — From 23e4fbdfefdc7b108389db553e4ad3db1b26501e Mon Sep 17 00:00:00 2001 From: digitallysavvy Date: Fri, 13 Mar 2026 11:10:52 -0400 Subject: [PATCH 4/4] fix metadata, plugin config, and agentic skill quality across agora-skills Corrected stale metadata (SECURITY.md repo URL, marketplace.json version mismatch, plugin.json repository URL), added .claude-plugin/mcp-config.json with Agora MCP server and wired it via mcpServers in plugin.json, fixed Claude Code install instructions to use the correct /plugin marketplace add and /plugin install slash commands, updated README file tree and products covered to reflect current repo state, added CHANGELOG.md, and removed dead blocklist.txt. Expanded CONTRIBUTING.md with eval execution, version bumping, plugin registration, and URL verification guidance. Improved agentic skill quality by expanding the RTM README with 8 verified gotchas, adding a new cross-platform-coordination.md for RTC, disambiguating server-side product routing in the intake skill, strengthening SKILL.md triggers and MCP fallback instructions, completing doc-fetching.md fallback URLs for all languages, and adding 5 new eval cases covering previously untested gotchas. Updated references to use level 2 fetch, and improved visibility of gotchas. --- .claude-plugin/marketplace.json | 2 +- .claude-plugin/mcp-config.json | 6 ++ .claude-plugin/plugin.json | 3 +- CHANGELOG.md | 37 +++++++ CONTRIBUTING.md | 74 ++++++++++++- README.md | 100 +++++++++++------- SECURITY.md | 2 +- scripts/blocklist.txt | 14 --- skills/agora/SKILL.md | 10 +- skills/agora/intake/SKILL.md | 16 ++- .../references/cloud-recording/README.md | 4 + .../references/conversational-ai/README.md | 41 +++---- skills/agora/references/doc-fetching.md | 16 ++- skills/agora/references/rtc/README.md | 5 + .../rtc/cross-platform-coordination.md | 66 ++++++++++++ skills/agora/references/rtm/README.md | 65 +++++++++--- .../agora/references/server-gateway/README.md | 14 ++- tests/eval-cases.md | 30 ++++++ 18 files changed, 405 insertions(+), 100 deletions(-) create mode 100644 .claude-plugin/mcp-config.json create mode 100644 CHANGELOG.md delete mode 100644 scripts/blocklist.txt create mode 100644 skills/agora/references/rtc/cross-platform-coordination.md diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 6ea950d..b6f8c8c 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -11,7 +11,7 @@ "name": "agora", "source": "./", "description": "Real-time communication with Agora SDKs — RTC, RTM, Conversational AI, and token generation", - "version": "1.0.0" + "version": "1.1.0" } ] } diff --git a/.claude-plugin/mcp-config.json b/.claude-plugin/mcp-config.json new file mode 100644 index 0000000..ba8421b --- /dev/null +++ b/.claude-plugin/mcp-config.json @@ -0,0 +1,6 @@ +{ + "agora-docs": { + "type": "http", + "url": "https://mcp.agora.io" + } +} diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index df874f8..ba113b9 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -5,8 +5,9 @@ "author": { "name": "Agora" }, - "repository": "https://github.com/AgoraIO-Conversational-AI/skills", + "repository": "https://github.com/AgoraIO/skills", "license": "MIT", + "mcpServers": "./mcp-config.json", "keywords": [ "agora", "rtc", diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..b07c2b3 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,37 @@ +# Changelog + +All notable changes to this project will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). + +## [1.1.0] + +### Added + +- Cloud Recording references (`references/cloud-recording/`) — REST API acquire/start/query/stop lifecycle +- Server Gateway references (`references/server-gateway/`) — Linux C++ SDK setup and media pipeline +- Testing Guidance skill (`references/testing-guidance/SKILL.md`) — ConvoAI and RTC test patterns +- Next.js RTC pattern (`references/rtc/nextjs.md`) — SSR-safe dynamic import guidance +- ConvoAI agent client toolkit React references (`references/conversational-ai/agent-client-toolkit-react.md`) — provider, hooks, transcript, state +- Intake router (`skills/agora/intake/SKILL.md`) — multi-product needs analysis for ambiguous requests +- Agora token-based auth for ConvoAI REST API — inline gotcha + implementation in `conversational-ai/README.md` +- OpenAI Realtime MLLM configuration in `agent-samples.md` +- Agora MCP server config bundled in `.claude-plugin/mcp-config.json` + +### Changed + +- `plugin.json` repository URL corrected to `AgoraIO/skills` +- `marketplace.json` version aligned to `1.1.0` +- `SECURITY.md` vulnerability report URL corrected to `AgoraIO/skills` + +## [1.0.0] + +### Added + +- RTC references for Web, React, iOS (Swift), Android (Kotlin/Java) +- RTM Web references — messaging, presence, stream channels +- Conversational AI references — REST API, agent config, 5 recipe files +- Server-side token generation references +- 4-layer progressive disclosure architecture (`SKILL.md` → product README → topic file) +- Eval cases in `tests/eval-cases.md` (25 cases across R, C, F, I series) +- Validation script (`scripts/validate-skills.sh`) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 6546755..8c521b8 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -10,6 +10,9 @@ Changes should improve routing accuracy, code generation quality, and maintainab ## Adding a New Product Skill +Examples of existing products added this way: Cloud Recording (`references/cloud-recording/`), +Server Gateway (`references/server-gateway/`), Testing Guidance (`references/testing-guidance/`). + 1. Create `skills/agora/references/{product}/README.md` (Layer 3 — overview, critical rules, topic links, 20–100 lines) 2. Add an entry to the **Products** section of `skills/agora/SKILL.md` @@ -23,6 +26,7 @@ Changes should improve routing accuracy, code generation quality, and maintainab - Add relevant rows to Common Product Combinations - Add a routing entry to the Step 4 table - Add a Decision Shortcuts row if the product has a clear keyword trigger +7. Bump the version in all three version files (see [Version Bumping](#version-bumping)) ## Adding a New Platform @@ -50,7 +54,7 @@ without any updates?** ConvoAI request/response schemas): put it **behind an MCP call** or an **external link**. Never hardcode fast-moving content. -Ben's existing link-first vs inline decision table in `README.md` already encodes this +The link-first vs inline decision table in `README.md` already encodes this principle. Follow it. When in doubt, add a new row to that table and document your reasoning in the PR description. @@ -120,6 +124,74 @@ Rules: bash scripts/validate-skills.sh +## Running Evals + +Eval cases live in `tests/eval-cases.md`. To run them: + +1. Load the skill in your AI coding assistant (see [README — Installation](README.md#installation)) +2. For each case, send the "User Input" to the assistant with the skill active +3. Compare the response against "Expected Behavior" and "Pass Criteria" +4. Record `PASS` or `FAIL` in the Result field +5. Add the run to the **Evaluation Log** table at the bottom of `tests/eval-cases.md` + (date, skill version, pass/fail counts, failed case IDs, fix actions taken) + +Run the full suite after every non-trivial skill change. Failed cases drive targeted +skill edits — don't ship a fix without verifying the case now passes. + +## Version Bumping + +Versions must stay in sync across three files. Bump all three together: + +| File | Field | +|------|-------| +| `skills/agora/SKILL.md` | `metadata.version` in frontmatter | +| `.claude-plugin/plugin.json` | `"version"` | +| `.claude-plugin/marketplace.json` | `plugins[0].version` | + +Version rules: +- **Patch** (`x.y.Z`): gotcha fixes, broken link repairs, content corrections +- **Minor** (`x.Y.0`): new product or platform added, new eval cases, new topic files +- **Major** (`X.0.0`): breaking restructure of skill entry points or routing logic + +Document the change in `CHANGELOG.md` under a new `[x.y.z]` heading. + +## Plugin & Marketplace Registration + +This skill is published to: + +- **[agentskills.io](https://agentskills.io)** — open skill registry (`.claude-plugin/marketplace.json`) +- **Claude Code plugin marketplace** — hosted at `AgoraIO/skills` on GitHub (`.claude-plugin/plugin.json` + `.claude-plugin/marketplace.json`) + +Users install via two slash commands inside Claude Code: + +``` +/plugin marketplace add AgoraIO/skills +/plugin install agora@agora-skills +``` + +(`agora-skills` is the marketplace `name` in `marketplace.json`; `agora` is the plugin `name`.) + +To update a registration after a version bump: +1. Submit a PR with the bumped version in both JSON files +2. Once merged, users get the update automatically when Claude Code refreshes (`/plugin marketplace update`) +3. For agentskills.io manual updates, follow the [agentskills.io submission guide](https://agentskills.io) + +The Agora MCP server config is bundled in `.claude-plugin/mcp-config.json` and +referenced from `plugin.json` via `"mcpServers": "./mcp-config.json"`. + +## Verifying URLs + +Before opening a PR, check that all `https://` links in skill files are reachable: + +```bash +grep -roh 'https://[^ )]*' skills/ | sort -u | while read url; do + code=$(curl -s -o /dev/null -w "%{http_code}" -L --max-time 10 "$url") + echo "$code $url" +done +``` + +Any non-200 response (except intentional 301 redirects) should be investigated and fixed. + ## Code of Conduct This project follows the [Contributor Covenant Code of Conduct](CODE_OF_CONDUCT.md). By participating, you agree to uphold this code. diff --git a/README.md b/README.md index 5391292..9b9b786 100644 --- a/README.md +++ b/README.md @@ -4,41 +4,57 @@ Structured reference knowledge for [Agora](https://www.agora.io) (agora.io) real ## Installation -### Option A: Skills CLI (recommended) +### Skills CLI (recommended) ```bash -npx skills add github:AgoraIO-Conversational-AI/skills +npx skills add github:AgoraIO/skills ``` Skills activate automatically when your agent detects relevant tasks (e.g., "build a voice agent", "integrate Agora RTC", "generate a token"). -### Option B: Git clone +### Claude Code Plugin (recommended if using Claude) + +Install Agora skills and the Agora Docs MCP server as a Claude Code plugin. Run these two slash commands inside Claude Code: + +``` +/plugin marketplace add AgoraIO/skills +/plugin install agora@agora-skills +``` + +The Agora MCP server (`mcp.agora.io`) is bundled automatically — no separate MCP configuration needed. + +### Git clone Clone the repo once, then point your tool at `skills/agora/`: ```bash -git clone https://github.com/AgoraIO-Conversational-AI/skills.git ~/agora-skills +git clone https://github.com/AgoraIO/skills.git ~/agora-skills ``` +### Configure with your Agent or IDE (optional) + **Claude Code — symlink (user-level):** +When installing the skill using the Skills CLI, you can symlink the skill to your home directory. This will make the skill available to all your agents. + ```bash -mkdir -p ~/.claude/skills ln -s ~/agora-skills/skills/agora ~/.claude/skills/agora ``` **Claude Code — copy (project-level, shared with team):** +When installing the skill using the Claude Code Plugin, you can copy the skill to your project directory. This will make the skill available to all your agents in the project. + ```bash mkdir -p .claude/skills cp -r ~/agora-skills/skills/agora .claude/skills/agora ``` -**Cursor:** Copy or symlink into `.cursor/rules/`. +**Cursor:** Copy or symlink into `.cursor/rules/`. See [Cursor skills docs](https://cursor.com/docs/skills#skill-directories). -**Windsurf:** Add `skills/agora/` to your Cascade context. +**Windsurf:** Add `skills/agora/` to your Cascade context. See [Windsurf skills docs](https://docs.windsurf.com/windsurf/cascade/skills). -**GitHub Copilot:** Reference via `@workspace` or add to `.github/copilot-instructions.md`. +**GitHub Copilot:** Reference via `@workspace` or add to `.github/copilot-instructions.md`. See [Copilot CLI skills](https://docs.github.com/en/copilot/how-tos/copilot-cli/customize-copilot/create-skills) and [Copilot Agents skills](https://docs.github.com/en/copilot/how-tos/use-copilot-agents/coding-agent/create-skills). **Any other tool:** The skill files are plain markdown. Point your tool at `skills/agora/` or load individual files directly. Use `SKILL.md` as the entry point — it links to everything else. @@ -50,11 +66,13 @@ This repo contains markdown skill files that give AI coding assistants deep know **Products covered:** -- **RTC (Video/Voice SDK)** — Web, React, iOS (Swift), Android (Kotlin/Java) +- **RTC (Video/Voice SDK)** — Web, React, Next.js, iOS (Swift), Android (Kotlin/Java) - **RTM (Signaling)** — Web (JS/TS) messaging, presence, metadata, stream channels -- **Conversational AI** — REST API, agent config, 5 recipe repos (agent-samples, agent-toolkit, agent-ui-kit, server-custom-llm, server-mcp) +- **Conversational AI** — REST API, agent config, Gemini Live + OpenAI Realtime MLLM, 6 recipe repos (agent-samples, agent-toolkit, agent-client-toolkit-react, agent-ui-kit, server-custom-llm, server-mcp) +- **Cloud Recording** — REST API acquire/start/query/stop lifecycle +- **Server Gateway** — Linux SDK (C++) for server-side RTC - **Server-Side** — Token generation for Node.js, Python, Go - +- **Testing Guidance** — ConvoAI and RTC testing patterns ## Design — 4-Layer Progressive Disclosure @@ -79,37 +97,47 @@ Not all content belongs inline. The skill uses two strategies depending on how f | **RTC / RTM** | Inline code examples | Stable APIs, official docs lack good examples | | **Server / Tokens** | TOC + links to official docs | Well-documented at docs.agora.io | - -ConvoAI files are aligned 1:1 with repos in [AgoraIO-Conversational-AI](https://github.com/AgoraIO-Conversational-AI). Each file maps to one repo and links to its README and AGENT.md as sources of truth. Gotchas and quirks that LLMs consistently get wrong stay inline in the ConvoAI README. +ConvoAI files are aligned 1:1 with repos in [AgoraIO-Conversational-AI](https://github.com/orgs/AgoraIO-Conversational-AI/repositories). Each file maps to one repo and links to its README and AGENT.md as sources of truth. Gotchas and quirks that LLMs consistently get wrong stay inline in the ConvoAI README. ## File Structure ``` skills/ -└── agora/ Skill root - ├── SKILL.md (72 lines) Entry point, product index +└── agora/ Skill root + ├── SKILL.md Entry point, product index + ├── intake/ + │ └── SKILL.md Multi-product needs analysis router └── references/ - ├── mcp-tools.md (93 lines) MCP tool reference and graceful degradation - ├── rtc/ RTC (Video/Voice SDK) - │ ├── README.md (85 lines) Critical rules, encoder profiles, cross-platform notes - │ ├── web.md (498 lines) agora-rtc-sdk-ng: client, tracks, events, screen share - │ ├── react.md (295 lines) agora-rtc-react: hooks, custom patterns - │ ├── nextjs.md Next.js / SSR dynamic import patterns - │ ├── ios.md (301 lines) AgoraRtcEngineKit (Swift): setup, delegation - │ └── android.md (340 lines) RtcEngine (Kotlin/Java): setup, callbacks - ├── rtm/ RTM (Signaling / Messaging) - │ ├── README.md (25 lines) Key concepts, platform links - │ └── web.md (375 lines) agora-rtm v2: messaging, presence, stream channels - ├── conversational-ai/ Conversational AI (Voice AI Agents) - │ ├── README.md (100 lines) Architecture, endpoints, auth, lifecycle, REST API + config links, gotchas - │ ├── agent-samples.md (80 lines) Backend, React clients, profiles, MLLM, deployment - │ ├── agent-toolkit.md (57 lines) @agora/conversational-ai SDK: API, helpers, hooks - │ ├── agent-ui-kit.md (52 lines) @agora/agent-ui-kit React components - │ ├── server-custom-llm.md (36 lines) Custom LLM proxy: RAG, tools, memory - │ └── server-mcp.md (38 lines) MCP memory server: persistent per-user memory - ├── server/ Server-Side (Tokens) - │ ├── README.md (20 lines) Token types, when tokens are needed - │ └── tokens.md (34 lines) Token generation TOC + links to official docs + ├── doc-fetching.md Two-tier lookup procedure (agent-facing) + ├── mcp-tools.md MCP tool reference and graceful degradation + ├── rtc/ RTC (Video/Voice SDK) + │ ├── README.md Critical rules, encoder profiles, cross-platform notes + │ ├── web.md agora-rtc-sdk-ng: client, tracks, events, screen share + │ ├── react.md agora-rtc-react: hooks, custom patterns + │ ├── nextjs.md Next.js / SSR dynamic import patterns + │ ├── ios.md AgoraRtcEngineKit (Swift): setup, delegation + │ └── android.md RtcEngine (Kotlin/Java): setup, callbacks + ├── rtm/ RTM (Signaling / Messaging) + │ ├── README.md Key concepts, platform links + │ └── web.md agora-rtm v2: messaging, presence, stream channels + ├── conversational-ai/ Conversational AI (Voice AI Agents) + │ ├── README.md Architecture, endpoints, auth, lifecycle, gotchas + │ ├── agent-samples.md Backend, React clients, profiles, MLLM, deployment + │ ├── agent-toolkit.md @agora/conversational-ai SDK: API, helpers, hooks + │ ├── agent-client-toolkit-react.md React hooks: provider, transcript, state + │ ├── agent-ui-kit.md @agora/agent-ui-kit React components + │ ├── server-custom-llm.md Custom LLM proxy: RAG, tools, memory + │ └── server-mcp.md MCP memory server: persistent per-user memory + ├── cloud-recording/ Cloud Recording (REST API) + │ └── README.md acquire/start/query/stop lifecycle, storage config + ├── server-gateway/ Server Gateway (Linux SDK) + │ ├── README.md Overview, use cases, critical notes + │ └── linux-cpp.md C++ SDK: setup, callbacks, media pipeline + ├── server/ Server-Side (Tokens) + │ ├── README.md Token types, when tokens are needed + │ └── tokens.md Token generation TOC + links to official docs + └── testing-guidance/ Testing Patterns + └── SKILL.md ConvoAI and RTC test setup, mocking patterns ``` ## Maintaining and Extending diff --git a/SECURITY.md b/SECURITY.md index 9f5fdbf..2aaa1e4 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -6,7 +6,7 @@ If you discover a security vulnerability in this project, please report it respo **Do not open a public issue.** -Instead, use [GitHub's private vulnerability reporting](https://github.com/BenWeekes/skills/security/advisories/new) to submit your report. This ensures the issue can be assessed and addressed before public disclosure. +Instead, use [GitHub's private vulnerability reporting](https://github.com/AgoraIO/skills/security/advisories/new) to submit your report. This ensures the issue can be assessed and addressed before public disclosure. For security issues in the broader Agora platform, please follow the reporting process at [https://www.agora.io/en/security](https://www.agora.io/en/security). diff --git a/scripts/blocklist.txt b/scripts/blocklist.txt deleted file mode 100644 index 3d7cc7f..0000000 --- a/scripts/blocklist.txt +++ /dev/null @@ -1,14 +0,0 @@ -# Forbidden patterns for scripts/validate-skills.sh (PRD-03) -# -# Format: each non-comment line is either: -# REGEX: — treated as a grep -E extended regex -# LITERAL: — treated as a grep -F fixed string -# -# Any match in agora/ or tests/ is a CI failure. -# ------------------------------------------------------- - -# Absolute local paths — never in public docs -REGEX:/Users/[a-zA-Z0-9._-]+/ - -# Internal cross-project reference — separate project, not a dependency -LITERAL:ben-agent-toolkit diff --git a/skills/agora/SKILL.md b/skills/agora/SKILL.md index a3a944d..ea5e91f 100644 --- a/skills/agora/SKILL.md +++ b/skills/agora/SKILL.md @@ -1,6 +1,6 @@ --- name: agora -description: Write code using Agora SDKs (agora.io) for real-time communication. Covers RTC (video/voice calling, live streaming), RTM (signaling, messaging, presence), Conversational AI (voice AI agents), Cloud Recording, and server-side token generation. Use when the user wants to build real-time audio/video applications, integrate Agora SDKs (Web JS/TS, React, iOS Swift, Android Kotlin/Java, Go, Python), manage channels, tracks, tokens, use RTM for messaging/signaling, or build Conversational AI with the agent-toolkit. Triggers on mentions of Agora, agora.io, RTC, RTM, video calling, voice calling, real-time communication, agora-rtc-sdk-ng, agora-rtc-react, agora-rtm, conversational AI with Agora, Agora token generation, Cloud Recording, agora-agent-client-toolkit, agora-agent-client-toolkit-react, agora-agent-server-sdk, AgoraVoiceAI, AgoraClient, useConversationalAI, useTranscript, useAgentState, agent transcript, agent state hook. +description: Write code using Agora SDKs (agora.io) for real-time communication. Covers RTC (video/voice calling, live streaming, screen sharing), RTM (signaling, messaging, presence), Conversational AI (voice AI agents), Cloud Recording, Server Gateway, and server-side token generation. Use when the user wants to build real-time audio/video applications, integrate Agora SDKs (Web JS/TS, React, iOS Swift, Android Kotlin/Java, Go, Python), manage channels, tracks, tokens, use RTM for messaging/signaling, record RTC sessions, or build Conversational AI with the agent-toolkit. Triggers on mentions of Agora, agora.io, RTC, RTM, video calling, voice calling, real-time communication, screen share, screen sharing, record session, record calls, Cloud Recording, Server Gateway, Linux media SDK, agora-rtc-sdk-ng, agora-rtc-react, agora-rtm, conversational AI with Agora, Agora token generation, Agora authentication, agora-agent-client-toolkit, agora-agent-client-toolkit-react, agora-agent-server-sdk, AgoraVoiceAI, AgoraClient, useConversationalAI, useTranscript, useAgentState, agent transcript, agent state hook. metadata: author: agora version: '1.1.0' @@ -40,7 +40,7 @@ Text messaging, signaling, presence, and metadata. Independent from RTC — chan REST API-driven voice AI agents. Create agents that join RTC channels and converse with users via speech. Front-end clients connect via RTC+RTM. -**[references/conversational-ai/README.md](references/conversational-ai/README.md)** — REST API, agent config, 5 recipe repos (agent-samples, agent-toolkit, agent-ui-kit, server-custom-llm, server-mcp) +**[references/conversational-ai/README.md](references/conversational-ai/README.md)** — REST API, agent config, 6 recipe repos (agent-samples, agent-toolkit, agent-client-toolkit-react, agent-ui-kit, server-custom-llm, server-mcp) ### Cloud Recording @@ -84,7 +84,11 @@ Intake handles product identification, combination recommendations, and routing. Check bundled references first (Level 1). If they don't cover the detail needed, fetch `https://docs.agora.io/en/llms.txt`, find the relevant URL, and fetch it (Level 2). -See [references/doc-fetching.md](references/doc-fetching.md) for the full procedure and freeze-forever decision table. +See [references/doc-fetching.md](references/doc-fetching.md) for the full procedure, fallback URLs, and freeze-forever decision table. + +**Always fetch Level 2 before answering questions about**: TTS/ASR/LLM vendor configs, model names, full request/response schemas, error code listings, or release notes. These change frequently — do not answer from training data or memory. + +**If MCP is unavailable or Level 2 fetch fails**: use the fallback URLs in `doc-fetching.md` to reach the official markdown docs directly. Never fabricate API parameters — always tell the user to verify against official docs if live fetch is unavailable. If a user explicitly asks about the Agora MCP server, see [references/mcp-tools.md](references/mcp-tools.md). diff --git a/skills/agora/intake/SKILL.md b/skills/agora/intake/SKILL.md index a9a321c..0ee6329 100644 --- a/skills/agora/intake/SKILL.md +++ b/skills/agora/intake/SKILL.md @@ -94,6 +94,16 @@ Based on the user's description, determine: Use the Product Relationships and Common Combinations tables to make this determination. +**Server-side disambiguation** — "backend", "Python/Go/Node.js server", or "server-side" alone is ambiguous. Clarify which server role is needed before routing: + +| User says | They need | +|-----------|-----------| +| "authenticate users", "issue tokens", "token server" | Server/Tokens → `references/server/` | +| "start an AI agent", "call the ConvoAI API", "Python ConvoAI backend" | ConvoAI → `references/conversational-ai/` | +| "my server sends audio/video", "server joins RTC channel", "Linux media SDK" | Server Gateway → `references/server-gateway/` | + +If unclear, ask: *"Does your server need to (a) generate auth tokens, (b) call the ConvoAI REST API to start agents, or (c) send/receive audio-video media directly?"* + **Example analysis:** > User: "I want to build an AI customer service bot where users call in and an AI answers" @@ -145,8 +155,10 @@ For common patterns, skip the full intake flow: | User says | Shortcut | |-----------|----------| | "video call" / "live stream" / "RTC" | → `references/rtc/README.md` directly | +| "screen share" / "screen sharing" | → `references/rtc/README.md` → `cross-platform-coordination.md` | | "chat" / "messaging" / "signaling" | → `references/rtm/README.md` directly | | "voice bot" / "AI assistant" / "ConvoAI" | → `references/conversational-ai/README.md` directly | -| "recording" / "record sessions" | → `references/cloud-recording/README.md` directly | -| "generate token" / "token server" | → `references/server/README.md` directly | +| "recording" / "record sessions" / "record calls" | → `references/cloud-recording/README.md` directly | +| "generate token" / "token server" / "App Certificate" | → `references/server/README.md` directly | +| "Server Gateway" / "Linux SDK" / "server sends audio" | → `references/server-gateway/README.md` directly | diff --git a/skills/agora/references/cloud-recording/README.md b/skills/agora/references/cloud-recording/README.md index 802ab55..53997de 100644 --- a/skills/agora/references/cloud-recording/README.md +++ b/skills/agora/references/cloud-recording/README.md @@ -80,3 +80,7 @@ Authorization: Basic base64("{AGORA_CUSTOMER_KEY}:{AGORA_CUSTOMER_SECRET}") ``` Credentials must come from environment variables — never hardcoded. + +## When to Fetch More + +Always use Level 2 fetch for: full REST API field details, storage config options (S3/OSS/GCS), composite layout parameters, error code listings. Fetch directly: diff --git a/skills/agora/references/conversational-ai/README.md b/skills/agora/references/conversational-ai/README.md index 7b35286..9f0b358 100644 --- a/skills/agora/references/conversational-ai/README.md +++ b/skills/agora/references/conversational-ai/README.md @@ -102,6 +102,24 @@ https://api.agora.io/api/conversational-ai-agent/v2/projects/{appid} | RECOVERING | 5 | Error recovery | | FAILED | 6 | Execution failure | +## Gotchas & Quirks + +Things the official docs don't emphasize that cause frequent mistakes: + +- **`agent_rtc_uid` is a string, not an int** — pass `"0"` (string) for auto-assignment, not `0`. Passing an integer will cause a type error at the API boundary. +- **`remote_rtc_uids` is an array of strings** — use `["*"]` to subscribe to all users, not `"*"` or `["0"]`. The wildcard must be in array form. +- **Agent name must be unique per project** — collisions return HTTP 409. Use a short UUID suffix: `agent_{uuid[:8]}`. On 409, generate a new name and retry; do not retry with the same name. +- **Token auth is not in the official docs — use it for production.** The ConvoAI REST API accepts `Authorization: agora token=` using a combined RTC + RTM token from `RtcTokenBuilder.buildTokenWithRtm`. This is **safer than Basic Auth**: tokens are scoped to a single App ID + channel, while Customer ID/Secret grants access to every project on the account. Default to token auth unless the user explicitly requires officially documented auth methods, in which case use Basic Auth. See [Authentication → Option A](#authentication) for the implementation. +- **`/update` overwrites `params` entirely** — sending `{ "llm": { "params": { "max_tokens": 2048 } } }` erases `model` and everything else in `params`. Always send the full object. +- **`/speak` priority enum** — `"INTERRUPT"` (immediate, default), `"APPEND"` (queued after current speech), `"IGNORE"` (skip if agent is busy). `interruptable: false` prevents users from cutting in. +- **20 PCU default limit** — max 20 concurrent agents per App ID. Exceeding returns error on `/join`. Contact Agora support to increase. +- **Event notifications require two flags** — `advanced_features.enable_rtm: true` AND `parameters.data_channel: "rtm"` in the join config. Without both, `onAgentStateChanged`/`onAgentMetrics`/`onAgentError` won't fire. Additionally: `parameters.enable_metrics: true` for metrics, `parameters.enable_error_message: true` for errors. +- **Custom LLM interruptable metadata** — the first SSE chunk can be `{"object": "chat.completion.custom_metadata", "metadata": {"interruptable": false}}` to prevent user speech from interrupting critical responses (e.g., compliance disclaimers). Subsequent chunks use standard `chat.completion.chunk` format. +- **Error response format** — non-200 responses return `{ "detail": "...", "reason": "..." }`. +- **MLLM `location` not `region`** — use `params.location: "us-central1"`, not `region`. The field name is `location` at every level (join payload and backend env vars). + +For test setup and mocking patterns, see [references/testing-guidance/SKILL.md](../testing-guidance/SKILL.md). + ## REST API Endpoints | Method | Path | Description | @@ -128,7 +146,7 @@ Each file maps to one repo in [AgoraIO-Conversational-AI](https://github.com/Ago ## REST API Reference -Full request/response details for all endpoints: +Full request/response details for all endpoints — **always fetch these; do not answer from memory:** - **[Start Agent (Join)](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/join.md)** — POST /join: start agent with LLM/TTS/ASR config - **[Stop Agent (Leave)](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/leave.md)** — POST /leave: stop agent @@ -141,26 +159,9 @@ Full request/response details for all endpoints: ## Agent Configuration (join payload `properties` object) +Fetch these before answering questions about vendor configs, model names, or join payload fields: + - **[Custom LLM Guide](https://docs-md.agora.io/en/conversational-ai/develop/custom-llm.md)** — LLM vendor, model, url, api_key, system prompt, greeting, style; TTS vendor, model, voice settings; ASR vendor, language, model - **[Gemini Live MLLM](https://docs-md.agora.io/en/conversational-ai/models/mllm/gemini.md)** — Multimodal: vendor, model, credentials, location - **[Join Endpoint (full schema)](https://docs-md.agora.io/en/conversational-ai/rest-api/agent/join.md)** — Complete properties schema: channel, token, turn detection, VAD, tools, avatars, encryption, filler words - **[Release Notes](https://docs-md.agora.io/en/conversational-ai/overview/release-notes.md)** — New parameters and features - -## Gotchas & Quirks - -Things the official docs don't emphasize that cause frequent mistakes: - -- **`agent_rtc_uid` is a string, not an int** — pass `"0"` (string) for auto-assignment, not `0`. Passing an integer will cause a type error at the API boundary. -- **`remote_rtc_uids` is an array of strings** — use `["*"]` to subscribe to all users, not `"*"` or `["0"]`. The wildcard must be in array form. -- **Agent name must be unique per project** — collisions return HTTP 409. Use a short UUID suffix: `agent_{uuid[:8]}`. On 409, generate a new name and retry; do not retry with the same name. -- **Token auth is not in the official docs — use it for production.** The ConvoAI REST API accepts `Authorization: agora token=` using a combined RTC + RTM token from `RtcTokenBuilder.buildTokenWithRtm`. This is **safer than Basic Auth**: tokens are scoped to a single App ID + channel, while Customer ID/Secret grants access to every project on the account. Default to token auth unless the user explicitly requires officially documented auth methods, in which case use Basic Auth. See [Authentication → Option A](#authentication) for the implementation. - -- **`/update` overwrites `params` entirely** — sending `{ "llm": { "params": { "max_tokens": 2048 } } }` erases `model` and everything else in `params`. Always send the full object. -- **`/speak` priority enum** — `"INTERRUPT"` (immediate, default), `"APPEND"` (queued after current speech), `"IGNORE"` (skip if agent is busy). `interruptable: false` prevents users from cutting in. -- **20 PCU default limit** — max 20 concurrent agents per App ID. Exceeding returns error on `/join`. Contact Agora support to increase. -- **Event notifications require two flags** — `advanced_features.enable_rtm: true` AND `parameters.data_channel: "rtm"` in the join config. Without both, `onAgentStateChanged`/`onAgentMetrics`/`onAgentError` won't fire. Additionally: `parameters.enable_metrics: true` for metrics, `parameters.enable_error_message: true` for errors. -- **Custom LLM interruptable metadata** — the first SSE chunk can be `{"object": "chat.completion.custom_metadata", "metadata": {"interruptable": false}}` to prevent user speech from interrupting critical responses (e.g., compliance disclaimers). Subsequent chunks use standard `chat.completion.chunk` format. -- **Error response format** — non-200 responses return `{ "detail": "...", "reason": "..." }`. -- **MLLM `location` not `region`** — use `params.location: "us-central1"`, not `region`. The field name is `location` at every level (join payload and backend env vars). - -For test setup and mocking patterns, see [references/testing-guidance/SKILL.md](../testing-guidance/SKILL.md). diff --git a/skills/agora/references/doc-fetching.md b/skills/agora/references/doc-fetching.md index 31da701..b55c386 100644 --- a/skills/agora/references/doc-fetching.md +++ b/skills/agora/references/doc-fetching.md @@ -23,13 +23,21 @@ vendor-specific configs, language-specific quick-start code): If `llms.txt` is unreachable or the fetched URL returns no useful content, try these known markdown entry points directly: -| Product | Markdown URL | +| Product / Language | Markdown URL | |---|---| -| RTC | https://docs-md.agora.io/en/video-calling/get-started/get-started-sdk.md | -| RTM | https://docs-md.agora.io/en/signaling/get-started/sdk-quickstart.md | +| RTC (Web/general) | https://docs-md.agora.io/en/video-calling/get-started/get-started-sdk.md | +| RTC (voice-only) | https://docs-md.agora.io/en/voice-calling/get-started/get-started-sdk.md | +| RTM (Web/general) | https://docs-md.agora.io/en/signaling/get-started/sdk-quickstart.md | +| RTM (iOS) | https://docs-md.agora.io/en/signaling/get-started/sdk-quickstart?platform=ios.md | +| RTM (Android) | https://docs-md.agora.io/en/signaling/get-started/sdk-quickstart?platform=android.md | | ConvoAI | https://docs-md.agora.io/en/conversational-ai/get-started/quickstart.md | +| ConvoAI (TypeScript SDK) | https://docs-md.agora.io/en/conversational-ai/develop/integrate-sdk.md | +| ConvoAI (Python SDK) | https://docs-md.agora.io/en/conversational-ai/develop/integrate-sdk?platform=python.md | | Cloud Recording | https://docs-md.agora.io/en/cloud-recording/get-started/getstarted.md | -| Server Gateway | https://docs-md.agora.io/en/server-gateway/get-started/integrate-sdk.md | +| Server Gateway (C++) | https://docs-md.agora.io/en/server-gateway/get-started/integrate-sdk.md | +| Server Gateway (Java) | https://docs-md.agora.io/en/server-gateway/get-started/integrate-sdk?platform=java.md | +| Server Gateway (Python) | https://docs-md.agora.io/en/server-gateway/get-started/integrate-sdk?platform=python.md | +| Server Gateway (Go) | https://docs-md.agora.io/en/server-gateway/get-started/integrate-sdk?platform=go.md | | Tokens | https://docs-md.agora.io/en/video-calling/token-authentication/deploy-token-server.md | ## Agora MCP Server (optional) diff --git a/skills/agora/references/rtc/README.md b/skills/agora/references/rtc/README.md index 5e381d6..3202470 100644 --- a/skills/agora/references/rtc/README.md +++ b/skills/agora/references/rtc/README.md @@ -91,7 +91,12 @@ Read the file matching the user's platform: - **[nextjs.md](nextjs.md)** — Next.js / SSR dynamic import patterns (App Router + Pages Router) - **[ios.md](ios.md)** — `AgoraRtcEngineKit` (Swift): engine setup, delegation, permissions - **[android.md](android.md)** — `RtcEngine` (Kotlin/Java): engine setup, callbacks, permissions +- **[cross-platform-coordination.md](cross-platform-coordination.md)** — UID strategy, codec interop, screen sharing across platforms, audio routing, common cross-platform bugs For additional platforms and advanced features: — voice-only: For test setup and mocking patterns, see [references/testing-guidance/SKILL.md](../testing-guidance/SKILL.md). + +## When to Fetch More + +Always use Level 2 fetch for: encoder profile parameter details, error code listings, release notes, Flutter/Windows/Electron/React Native platform quick-starts. See [../doc-fetching.md](../doc-fetching.md). diff --git a/skills/agora/references/rtc/cross-platform-coordination.md b/skills/agora/references/rtc/cross-platform-coordination.md new file mode 100644 index 0000000..e4a1205 --- /dev/null +++ b/skills/agora/references/rtc/cross-platform-coordination.md @@ -0,0 +1,66 @@ +# RTC Cross-Platform Coordination + +Patterns for apps where users on different platforms (Web, iOS, Android) join the same Agora RTC channel. + +## UID Strategy + +Agora assigns UIDs per channel. For multi-platform apps: + +- **Auto-assign on all clients**: Pass `null`/`0` to `join()` — each platform auto-receives a unique numeric UID. Clients subscribe to all remote users regardless of platform. +- **Fixed UIDs**: Assign specific UIDs per user role (e.g., `1001` for host, `1002` for co-host) when you need deterministic lookup. Must be unique per channel — duplicates cause undefined behavior. +- **RTC + RTM coordination**: After RTC join, use `String(rtcUid)` as the RTM user ID to correlate users across both systems (see [../rtm/README.md](../rtm/README.md)). + +## Codec Interoperability + +Agora handles codec negotiation automatically for most scenarios. What to know: + +| Codec | Notes | +|-------|-------| +| H.264 | Default on iOS and Android. Web supports it but may require software decode on low-end devices. | +| VP8 | Web default. iOS/Android require transcoding — adds ~50–100ms latency. | +| H.265 (HEVC) | Not universally supported on Web; avoid for cross-platform channels. | + +**Recommendation**: Enable H.264 explicitly on Web clients when iOS/Android users are present. Transcoding introduces latency and is billed separately. + +```javascript +// Web: force H.264 to match mobile clients +AgoraRTC.setParameter('CODEC', 'h264'); +// or via client config: +const client = AgoraRTC.createClient({ mode: 'rtc', codec: 'h264' }); +``` + +## Screen Sharing (Cross-Platform) + +Screen share is a separate track/stream, not a replacement for the camera track. + +- **Web**: `AgoraRTC.createScreenVideoTrack()` — publishes as a second video track. See [web.md](web.md) for dual-stream setup. +- **iOS**: `AgoraRtcEngineKit.startScreenCapture(_:)` with broadcast extension — different lifecycle than camera. +- **Android**: `MediaProjection` API + `RtcEngine.startScreenCapture()`. + +Remote users on any platform subscribe to the screen share UID as a normal remote user — the stream is just another video track from a different UID. + +**Key rule**: Screen share uses a separate channel join with a different UID. Never publish camera and screen share from the same UID. + +## Audio Routing Differences + +| Platform | Default audio output | Override | +|----------|---------------------|---------| +| Web | Speaker (browser-controlled) | Not configurable via SDK | +| iOS | Earpiece for `rtc` mode | `setDefaultAudioRouteToSpeakerphone(true)` for speaker | +| Android | Earpiece by default | `setEnableSpeakerphone(true)` | + +When a user plugs in headphones, iOS/Android switch automatically. Web relies on the browser and OS audio routing — the SDK cannot override this. + +## Testing Multi-Platform Channels Locally + +1. **Web + Simulator/Emulator**: Connect both to the same channel; verify remote tracks appear on both sides. +2. **Different UIDs on same machine**: Open two browser tabs or two simulator instances — each gets its own UID automatically. +3. **Cross-device**: Use the Agora [Web Demo](https://webdemo.agora.io) to join from a browser while testing your native app in the same channel. +4. **Codec check**: In the Agora Console → Real-time Monitoring, inspect active streams to confirm codec negotiation. + +## Common Cross-Platform Bugs + +- **Remote user appears then immediately disappears on iOS** — usually a token expiry or UID collision. Check `connectionStateChanged` delegate for the reason code. +- **No video from Android on Web** — codec mismatch. Android may be sending H.265; Web can't decode it. Force H.264 on Android via `VideoEncoderConfiguration`. +- **Audio works, video doesn't on mobile** — camera permission not granted. Check permission before calling `startPreview()`. +- **Web user sees mobile user but not vice versa** — `user-published` event not registered before `client.join()`. Event handlers must be set up before joining — see RTC critical rules in [README.md](README.md). diff --git a/skills/agora/references/rtm/README.md b/skills/agora/references/rtm/README.md index e372b0d..f92f244 100644 --- a/skills/agora/references/rtm/README.md +++ b/skills/agora/references/rtm/README.md @@ -5,21 +5,62 @@ Signaling, text messaging, presence, and metadata — used alongside or independ ## When to Use RTM - Text chat during video calls -- Signaling (call invitations, control messages) -- User presence/status tracking -- Custom data exchange (VAD signals, resolution requests) -- Sending text messages to AI agents (Conversational AI) +- Signaling (call invitations, control messages, hang-up) +- User presence and status tracking +- Custom data exchange (VAD signals, resolution requests, state sync) +- Receiving transcripts from Conversational AI agents + +## Channel Types + +RTM has two channel types with different semantics: + +| | Message Channel | Stream Channel | +| ------------------- | ---------------------------------------- | ------------------------------------------- | +| **Model** | Pub/sub | Join + topic subscribe | +| **Join required** | No — subscribe to publish/receive | Yes — must join before publishing | +| **Topics** | No | Yes — messages published per topic | +| **Use for** | Signaling, chat, ConvoAI transcripts | High-frequency data, custom media streams | +| **Presence events** | Via `presence.getOnlineUsers()` or event | Built-in via channel join/leave events | + +**Default choice**: Use message channels for most use cases. Use stream channels only if you need topic-based filtering or high-frequency updates (e.g., cursor positions, sensor data). ## Key Concepts -- **Message channels**: Pub/sub messaging. Subscribe to receive messages, publish to send. -- **Stream channels**: Joined channels with topics. More structured than message channels. -- **Presence**: Track who is online, user status metadata. -- **Storage**: Channel and user metadata (key-value store with versioning). -- **Lock**: Distributed locking for shared resources. -- RTM uses **string UIDs** (not numeric like RTC). When using RTC and RTM together, use `String(rtcUid)` as the RTM user ID to maintain a consistent mapping. +- **Presence**: Track online users and their metadata per channel. Subscribe to `presence` events to detect joins, leaves, and state changes in real time. +- **Storage**: Channel and user metadata — key-value store with versioning and compare-and-set (CAS) for conflict resolution. +- **Lock**: Distributed locking for coordinating shared resources across users. +- **RTM UIDs are strings** — not numeric like RTC. When using RTC and RTM together, use `String(rtcUid)` as the RTM user ID to keep both systems in sync. + +## Gotchas & Critical Rules + +- **UID type mismatch causes silent failures** — RTC UIDs are numbers; RTM UIDs are strings. Always use `String(rtcUid)` as the RTM user ID. Type mismatches don't throw errors — they silently break user lookups across both systems. +- **Namespace isolation** — RTC channels and RTM channels are completely separate. Joining RTC channel `"meeting-1"` does NOT auto-subscribe you to RTM channel `"meeting-1"`. Subscribe both explicitly. +- **Login before all operations** — `rtmClient.login()` must complete before any subscribe, publish, or presence call. Operations attempted before login resolves fail silently, not with an error. +- **Subscribe before presence** — Presence events (joins/leaves) require an active channel subscription. Publishing to a channel without subscribing means you won't receive presence notifications or responses. +- **RTM v2 API is a full rewrite** — Do NOT apply v1 patterns (`AgoraRTM.createInstance()`, `.createChannel()`) to v2. The APIs are incompatible. The Web reference (`web.md`) covers v2 only. +- **ConvoAI transcript delivery requires two flags** — For AI agent transcripts to arrive via RTM, the ConvoAI `/join` payload must include both `advanced_features.enable_rtm: true` AND `parameters.data_channel: "rtm"`. One flag alone is not sufficient. + +## RTC + RTM Coordination Pattern + +When pairing RTC and RTM in the same app: + +1. Join RTC channel with numeric UID (or `0` for auto-assignment) +2. After RTC join resolves, log in to RTM with `String(rtcUid)` +3. Subscribe to the RTM message channel +4. Use RTC for media (audio/video tracks), RTM for all signaling and metadata + +```javascript +// RTC join resolves with the assigned numeric UID +const rtcUid = await rtcClient.join(appId, channelName, token, null); + +// Mirror UID into RTM as a string — keeps both systems in sync +await rtmClient.login({ uid: String(rtcUid) }); +const { status } = await rtmClient.subscribe(channelName); +``` + +RTM channel name does not need to match the RTC channel name, but using the same name is the conventional approach. ## Platform Reference Files -- **[web.md](web.md)** — `agora-rtm` v2 (JS/TS): RTM client, messaging, presence, v1 legacy API -- **iOS / Android** — +- **[web.md](web.md)** — `agora-rtm` v2 (JS/TS): client, messaging, presence, stream channels, v1 legacy notes +- **iOS / Android** — Level 2 fetch required: use [doc-fetching.md](../doc-fetching.md) or fetch directly from diff --git a/skills/agora/references/server-gateway/README.md b/skills/agora/references/server-gateway/README.md index 65ddb0b..9d23355 100644 --- a/skills/agora/references/server-gateway/README.md +++ b/skills/agora/references/server-gateway/README.md @@ -67,10 +67,14 @@ Hardware minimum: 8-core CPU 1.8 GHz, 2 GB RAM (4 GB recommended). ## Platform Reference Files - **[linux-cpp.md](linux-cpp.md)** — C++ full implementation: init, senders, receivers, video mixing, shutdown sequence -- **Java, Go, Python** — see the official documentation links below for each platform +- **Java, Go, Python** — Level 2 fetch required; use [../doc-fetching.md](../doc-fetching.md) or fetch directly from the links below -## Official Documentation +## When to Fetch More -- **[Product Overview](https://docs-md.agora.io/en/server-gateway/overview/product-overview.md)** -- **[Integrate the SDK](https://docs-md.agora.io/en/server-gateway/get-started/integrate-sdk.md)** — covers C++, Java, Go (`github.com/AgoraIO-Extensions/Agora-Golang-Server-SDK`), Python -- **[SDK Downloads](https://docs.agora.io/en/sdks)** +Always use Level 2 fetch for Java, Go, and Python quick-starts, SDK download links, and any platform-specific method signatures. Direct fallback URLs: + +- **Java** — +- **Go** — +- **Python** — +- **Product Overview** — +- **SDK Downloads** — diff --git a/tests/eval-cases.md b/tests/eval-cases.md index aa70c38..d1229c6 100644 --- a/tests/eval-cases.md +++ b/tests/eval-cases.md @@ -138,6 +138,24 @@ For each case: - Pass Criteria: Token auth option is mentioned and explained; if token auth is used, imports `agora-token` and calls `buildTokenWithRtm`; does not present Basic Auth as the only option - Result: ___ +### C-10: ConvoAI `/update` — full params object required + +- User Input: "Update the max tokens for my ConvoAI agent's LLM" +- Expected Behavior: Generated code sends the full `params` object in the update payload, not just the changed field +- Pass Criteria: Update body includes `model` alongside `max_tokens` (or notes that omitting `model` will erase it); references the "overwrites entirely" gotcha + +### C-11: RTM + RTC UID consistency + +- User Input: "I'm building an app with both RTC and RTM — how do I join both?" +- Expected Behavior: Uses `String(rtcUid)` as the RTM user ID after RTC join resolves +- Pass Criteria: RTM login receives `String(rtcUid)`; does not use a separate hardcoded string or numeric UID for RTM + +### C-12: ConvoAI token auth is the default + +- User Input: "Show me a ConvoAI join request" +- Expected Behavior: Presents token-based auth as the default; does not default to Basic Auth (Customer ID + Secret) without being asked +- Pass Criteria: `Authorization: agora token=` pattern appears in the primary example; Basic Auth shown as an alternative only + --- ## 3. Failure Paths (F-series) @@ -171,6 +189,18 @@ For each case: - Pass Criteria: Warning is issued before or instead of generating code; includes advice to enable App Certificate; does not silently generate code that passes `null` as token without the warning - Result: ___ +### F-05: Hardcoded credentials in user code + +- User Input: "Here's my code: `const client = AgoraRTC.createClient(...); await client.join('my-app-id', channel, 'my-app-certificate', uid)`" +- Expected Behavior: Warns that App Certificate must never appear in client-side code; explains the token generation flow +- Pass Criteria: Warning is issued before or instead of continuing with the code; advises moving App Certificate to a server-side token generator; does not silently continue with the insecure pattern + +### F-06: Non-existent product asked about + +- User Input: "How do I use Agora's Cloud Recording SDK?" +- Expected Behavior: Clarifies that Cloud Recording is REST API only — there is no client SDK; describes the acquire/start/stop REST API pattern +- Pass Criteria: Does not fabricate a "Cloud Recording SDK" package or import; routes to `references/cloud-recording/README.md` + --- ## 4. Intake Accuracy (I-series)