Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
221 changes: 221 additions & 0 deletions contrib/sarvam-ai/CAPABILITIES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
# Sarvam AI - ADK Integration Capabilities

## Overview

The Sarvam AI module provides a comprehensive, production-grade integration of Sarvam AI services into the Google Agent Development Kit (ADK) for Java. It spans five service domains -- Chat, Speech-to-Text, Text-to-Speech, Vision, and Live Connections -- covering both REST and WebSocket protocols with full observability, resilience, and multi-turn agentic support.

**Module path:** `contrib/sarvam-ai`
**Package:** `com.google.adk.models.sarvamai`
**Branch:** `sarvam-ai`

---

## 1. Chat Completions (LLM)

**Class:** `SarvamAi` extends `BaseLlm`
**Endpoint:** `POST /v1/chat/completions` (OpenAI-compatible)

| Capability | Details |
|---|---|
| Blocking (non-streaming) | Full request/response cycle via `generateContent(request, false)` |
| SSE Streaming | Real-time token-by-token delivery via `generateContent(request, true)` with backpressure (RxJava `Flowable`) |
| Function / Tool Calling | ADK `FunctionDeclaration` serialized to OpenAI `tools` JSON with `tool_choice: auto` |
| Multi-turn Tool History | Prior `tool_calls` correctly formatted as assistant messages with `tool_call_id`, `function.name`, `function.arguments`; tool responses sent as `role: tool` |
| Streaming Function Calls | Chunked `name` and `arguments` accumulated across SSE deltas, emitted as final `FunctionCall` Part |
| Token Usage Tracking | `prompt_tokens`, `completion_tokens`, `total_tokens` extracted for both blocking and streaming modes. Streaming uses `stream_options: {"include_usage": true}` |
| System Instructions | ADK `GenerateContentConfig.systemInstruction` mapped to OpenAI `system` role message |
| Temperature Control | Forwarded from `GenerateContentConfig.temperature` (default 0.7) |
| Max Output Tokens | `GenerateContentConfig.maxOutputTokens` forwarded as `max_tokens` |
| Top-P Sampling | Configurable via `SarvamAiConfig.topP()` |
| Frequency / Presence Penalty | Configurable via `SarvamAiConfig` builder |
| Reasoning Effort | Sarvam-specific `reasoning_effort` parameter (low / medium / high) |
| Wiki Grounding | Sarvam-specific `wiki_grounding` toggle for factual grounding |
| Role Translation | ADK `model` -> OpenAI `assistant`, `user` -> `user`, `functionResponse` -> `tool` |
| Schema Normalization | Type strings lowercased, nested `items.properties` recursively normalized for OpenAI schema compatibility |
| Graceful Degradation | Empty choices return empty text response instead of crashing |

### Dual Implementation

| Implementation | Location | Use Case |
|---|---|---|
| `SarvamBaseLM` | `core/src/main/java/.../models/SarvamBaseLM.java` | Lightweight, env-var driven. Used by `AgentModelConfig` and `LlmRegistry` for `Sarvam\|model` config strings |
| `SarvamAi` | `contrib/sarvam-ai/src/.../SarvamAi.java` | Full-featured, Builder-pattern, OkHttp-based. Supports all chat parameters plus subservice access |

---

## 2. Speech-to-Text (STT)

**Class:** `SarvamSttService` implements `TranscriptionService`
**Model:** `saaras:v3`

| Capability | Details |
|---|---|
| REST Synchronous | `transcribe(byte[] audioData, TranscriptionConfig)` via `POST /speech-to-text` with multipart/form-data |
| REST Async | `transcribeAsync()` executes on RxJava IO scheduler |
| WebSocket Streaming | Real-time streaming via `wss://api.sarvam.ai/speech-to-text/streaming` with VAD (Voice Activity Detection) signals |
| Transcription Modes | `transcribe`, `translate`, `verbatim`, `translit`, `codemix` |
| Language Detection | Auto-detection supported; explicit BCP-47 codes (e.g., `hi-IN`, `en-IN`) also accepted |
| VAD Signals | `speech_start` and `speech_end` events for voice activity boundaries |
| ADK TranscriptionService | Full implementation of ADK's `TranscriptionService` interface including `isAvailable()`, `getServiceType()`, `getHealth()` |

---

## 3. Text-to-Speech (TTS)

**Class:** `SarvamTtsService`
**Model:** `bulbul:v3`

| Capability | Details |
|---|---|
| REST Synchronous | `synthesize(text, languageCode)` returns decoded WAV audio bytes |
| REST Async | `synthesizeAsync()` on IO scheduler |
| WebSocket Streaming | `synthesizeStream()` via `wss://api.sarvam.ai/text-to-speech/streaming` for low-latency progressive audio chunk delivery |
| 30+ Speaker Voices | Configurable via `SarvamAiConfig.ttsSpeaker()` (default: `shubh`) |
| Pace Control | Adjustable speech pace (0.5x to 2.0x) |
| Sample Rate | Configurable output sample rate |
| Base64 Decoding | Audio chunks automatically decoded from base64 to raw bytes |
| WebSocket Lifecycle | Config frame -> text frame -> flush frame -> audio chunks -> final event -> close |

---

## 4. Vision / Document Intelligence

**Class:** `SarvamVisionService`
**Model:** Sarvam Vision 3B VLM

| Capability | Details |
|---|---|
| Multi-Language OCR | 23 languages (22 Indian + English) |
| Input Formats | PDF, PNG, JPG, ZIP |
| Output Formats | HTML or Markdown |
| Async Job Pipeline | `createJob` -> `uploadDocument` (presigned URL) -> `startJob` -> `getJobStatus` (poll) -> `downloadResults` |
| Convenience Method | `processDocument(filePath, languageCode, outputFormat)` runs the full pipeline with adaptive exponential backoff polling |
| Polling Backoff | Starts at 2s, doubles up to 10s cap, max 60 polls (~2 min timeout) |

---

## 5. Live Bidirectional Connection

**Class:** `SarvamAiLlmConnection` implements `BaseLlmConnection`

| Capability | Details |
|---|---|
| Multi-Turn Context | Maintains conversation history across turns, accumulates full model responses |
| sendHistory | Replace full conversation context |
| sendContent | Append a single turn and trigger streaming response |
| receive | Returns `Flowable<LlmResponse>` via `PublishSubject` for reactive consumers |
| Thread Safety | History list synchronized for concurrent access |
| Realtime Guard | `sendRealtime(Blob)` throws `UnsupportedOperationException` with guidance to use STT/TTS services |

---

## 6. Resilience & Configuration

### Retry with Exponential Backoff

**Class:** `SarvamRetryInterceptor` (OkHttp `Interceptor`)

| Parameter | Value |
|---|---|
| Retryable codes | 429 (rate limit), 503, 5xx (server errors) |
| Base delay | 500ms |
| Max delay | 30s |
| Strategy | Exponential backoff with 20% jitter |
| Default max retries | 3 |

### Immutable Configuration

**Class:** `SarvamAiConfig` (Builder pattern)

| Parameter | Default |
|---|---|
| Chat endpoint | `https://api.sarvam.ai/v1/chat/completions` |
| STT endpoint | `https://api.sarvam.ai/speech-to-text` |
| STT WebSocket | `wss://api.sarvam.ai/speech-to-text/streaming` |
| TTS endpoint | `https://api.sarvam.ai/text-to-speech` |
| TTS WebSocket | `wss://api.sarvam.ai/text-to-speech/streaming` |
| Vision endpoint | `https://api.sarvam.ai/document-intelligence` |
| Connect timeout | 30s |
| Read timeout | 120s |
| Max retries | 3 |
| API key resolution | Explicit value > `SARVAM_API_KEY` env var |

### Structured Error Handling

**Class:** `SarvamAiException` extends `RuntimeException`

| Field | Purpose |
|---|---|
| `statusCode` | HTTP status code from API |
| `errorCode` | Sarvam-specific error code |
| `requestId` | Sarvam request ID for support tracing |
| `isRetryable()` | Programmatic check (429, 503, 5xx) |

---

## 7. Authentication

| Method | Header | Used By |
|---|---|---|
| API Subscription Key | `api-subscription-key: <key>` | `SarvamAi`, STT, TTS, Vision (contrib module) |
| Bearer Token | `Authorization: Bearer <key>` | `SarvamBaseLM` (core module, OpenAI-compatible) |
| Key Resolution | `SARVAM_API_KEY` env var or explicit via Builder | Both |
| Fail-Fast Validation | Warning logged at construction if key is missing | `SarvamBaseLM` |

---

## 8. Test Coverage

| Test Class | Tests | Scope |
|---|---|---|
| `SarvamBaseLMTest` | 10 | Response parsing (text, null, tool calls), construction, connection type |
| `SarvamAiTest` | - | Chat completion blocking and streaming |
| `SarvamAiConfigTest` | - | Config builder validation, defaults, env var resolution |
| `ChatRequestTest` | - | Request serialization from LlmRequest |
| `SarvamSttServiceTest` | - | STT REST and WebSocket transcription |
| `SarvamTtsServiceTest` | - | TTS REST and WebSocket synthesis |
| `SarvamRetryInterceptorTest` | - | Retry logic, delay calculation, jitter |
| `SarvamIntegrationTest` (rae) | 20 | End-to-end config wiring across properties, YAML, LlmRegistry |

---

## 9. RAE Integration (Consumer Project)

| Integration Point | Mechanism | File |
|---|---|---|
| Code-based agents | `AgentModelConfig` recognizes `Sarvam\|` prefix, instantiates `SarvamBaseLM` | `AgentModelConfig.java` |
| YAML-based agents | `LlmRegistry.registerLlm("Sarvam\\|.*", ...)` factory | `ApplicationRegistry.java` |
| Model metadata | `sarvam:` provider in `models.yaml` with feature declarations | `models.yaml` |
| Config format | `Sarvam\|sarvam-m` -- single string works across both paths | `agent-models.properties` + `*.yaml` |
| Global coverage | 43 code-based + 28 YAML agent configs switched to Sarvam | All agent config files |

---

## Architecture Summary

```
contrib/sarvam-ai/
src/main/java/com/google/adk/models/sarvamai/
SarvamAi.java # BaseLlm (chat, Builder pattern, OkHttp)
SarvamAiConfig.java # Immutable config for all services
SarvamAiException.java # Structured error with status/code/requestId
SarvamAiLlmConnection.java # Live bidirectional multi-turn connection
SarvamRetryInterceptor.java # Exponential backoff with jitter
chat/
ChatRequest.java # OpenAI-compatible request model
ChatResponse.java # Response deserialization
ChatChoice.java # Choice wrapper
ChatMessage.java # Message model
ChatUsage.java # Token usage tracking
stt/
SarvamSttService.java # REST + WebSocket STT (TranscriptionService)
tts/
SarvamTtsService.java # REST + WebSocket TTS
TtsRequest.java # TTS request model
TtsResponse.java # TTS response model
vision/
SarvamVisionService.java # Async job pipeline for document OCR

core/src/main/java/com/google/adk/models/
SarvamBaseLM.java # Lightweight BaseLlm for agent config integration
```
133 changes: 133 additions & 0 deletions contrib/sarvam-ai/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright 2025 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>com.google.adk</groupId>
<artifactId>google-adk-parent</artifactId>
<version>0.5.1-SNAPSHOT</version><!-- {x-version-update:google-adk:current} -->
<relativePath>../../pom.xml</relativePath>
</parent>

<artifactId>google-adk-sarvam-ai</artifactId>
<name>Agent Development Kit - Sarvam AI</name>
<description>Sarvam AI integration for the Agent Development Kit.</description>

<dependencies>
<!-- Main dependencies -->
<dependency>
<groupId>com.google.adk</groupId>
<artifactId>google-adk</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.google.adk</groupId>
<artifactId>google-adk-dev</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>${okhttp.version}</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</dependency>
<dependency>
<groupId>com.google.errorprone</groupId>
<artifactId>error_prone_annotations</artifactId>
</dependency>

<!-- Test dependencies -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-params</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.google.truth</groupId>
<artifactId>truth</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-junit-jupiter</artifactId>
<version>${mockito.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>mockwebserver</artifactId>
<version>${okhttp.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.5.2</version>
<dependencies>
<dependency>
<groupId>me.fabriciorby</groupId>
<artifactId>maven-surefire-junit5-tree-reporter</artifactId>
<version>0.1.0</version>
</dependency>
<!-- Explicitly add JUnit Jupiter Engine for Surefire's classpath -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<version>${junit.version}</version>
</dependency>
<!-- Explicitly add Mockito JUnit Jupiter for Surefire's classpath -->
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-junit-jupiter</artifactId>
<version>${mockito.version}</version>
</dependency>
</dependencies>
<configuration>
<reportFormat>plain</reportFormat>
<statelessTestsetInfoReporter
implementation="org.apache.maven.plugin.surefire.extensions.junit5.JUnit5StatelessTestsetInfoTreeReporter" />
<includes>
<include>**/*Test.java</include>
</includes>
<!-- Explicitly define test source directory -->
<testSourceDirectory>${project.basedir}/src/test/java</testSourceDirectory>
</configuration>
</plugin>
</plugins>
</build>
</project>
Loading
Loading