feat(telegram): Voice pipeline refactor with STT integration and configurable routing by maemreyo · Pull Request #33 · nextlevelbuilder/goclaw

maemreyo · 2026-03-01T13:38:13Z

🎯 Overview

This PR introduces a comprehensive refactor of the Telegram voice pipeline, improving code organization, testability, and adding robust voice agent routing capabilities with STT (Speech-to-Text) integration.

✨ Key Features

1. Nested Voice Configuration Structure

Introduced TelegramVoiceConfig to group all voice-related settings under a single voice JSON key
Clear separation between base channel settings and voice pipeline configuration
Backward compatible with legacy flat config layout via automatic promotion

2. Voice Agent Routing System

Configurable voice agent routing with priority-based decision chain:
1. Audio/Voice Media → Always routes to voice agent (highest priority)
2. /start Command → Bootstraps voice session with customizable message
3. Intent Keywords → Text-based routing via configurable keyword matching
4. Session Affinity → Sticky routing with TTL-based expiration
5. Affinity Clear Keywords → User-initiated switch back to default agent
DM-only routing logic (groups excluded except for audio media)
Case-insensitive keyword matching with defensive normalization

3. STT (Speech-to-Text) Integration

Multipart form-data contract with /transcribe_audio endpoint
Bearer token authentication support
Tenant ID forwarding for multi-tenant deployments
Configurable timeout (default: 30s)
Concurrency control via buffered-channel semaphore (max 4 concurrent calls per channel)
Shared HTTP client with connection pooling for performance

4. Audio Guard System

Extracted into dedicated voiceguard package for better separation of concerns
Zero dependencies on Telegram SDK or message bus
Pure string→string transformation for easy unit testing
Intercepts technical error language in voice agent replies
User-friendly fallback messages with transcript support
Customizable error markers (replaces built-in defaults when set)
Supports both English and Vietnamese error detection

5. Enhanced Testability

resolveTargetAgent() extracted as pure function (no I/O side effects)
14 table-driven test cases covering all routing scenarios
Race condition testing with -race flag support
13 unit tests for audio guard logic
Comprehensive STT test coverage

6. Agent Loop Improvements

Rate limit model fallback support
ForwardMedia field for delegation artifact forwarding
Improved error handling and tracing

📊 Changes Summary

18 files changed
+2020 insertions
-247 deletions

New Files

internal/channels/telegram/voiceguard/guard.go - Audio guard logic
internal/channels/telegram/voiceguard/guard_test.go - Audio guard tests (13 tests)
internal/channels/telegram/handlers_voice_routing_test.go - Voice routing tests (14 tests)
internal/config/config_load_voice_test.go - Voice config tests
internal/agent/loop_fallback_test.go - Model fallback tests
cmd/gateway_consumer_audio_sanitize_test.go - Audio sanitization tests

Modified Files

internal/config/config_channels.go - New TelegramVoiceConfig struct
internal/channels/telegram/factory.go - Legacy config promotion logic
internal/channels/telegram/handlers.go - Voice routing implementation
internal/channels/telegram/stt.go - STT concurrency control & HTTP client pooling
internal/agent/loop.go - ForwardMedia support & improved structure
cmd/gateway_consumer.go - Integration with voiceguard package

🔄 Migration Path

For Existing Deployments

No immediate action required! The refactor is fully backward compatible:

Existing DB rows with flat config layout continue to work
Legacy fields are automatically promoted to nested structure on load
No database migration needed

For New Deployments

Use the nested structure for cleaner configuration:

{
  "voice": {
    "agent_id": "speaking-agent",
    "stt_proxy_url": "https://stt.example.com",
    "stt_api_key": "secret-key",
    "intent_keywords": ["speaking", "pronunciation"],
    "affinity_clear_keywords": ["homework", "payment"],
    "affinity_ttl_minutes": 360,
    "dm_context_template": "Context:\n- tenant: {tenant_id}\n- user_id: {user_id}",
    "audio_guard_fallback_transcript": "🎙️ Got your voice: \"%s\". Please try again!",
    "audio_guard_error_markers": ["system error", "rate limit"]
  }
}

🧪 Testing

All tests pass:

✅ internal/channels/telegram - 14 routing tests
✅ internal/channels/telegram/voiceguard - 13 audio guard tests  
✅ internal/channels/telegram - STT tests updated
✅ cmd - audio sanitization tests

Run with race detector:

go test ./internal/channels/telegram/... -race -v

🔧 Environment Variables

New environment variable support:

GOCLAW_VOICE_AGENT_ID - Override voice agent ID
GOCLAW_STT_TENANT_ID - Override STT tenant ID
GOCLAW_VOICE_DM_CONTEXT_TEMPLATE - Override DM context template
GOCLAW_AUDIO_GUARD_FALLBACK_TRANSCRIPT - Override transcript fallback
GOCLAW_AUDIO_GUARD_FALLBACK_NO_TRANSCRIPT - Override no-transcript fallback

📝 Documentation

Voice Routing Priority Chain

Audio/voice media present → voice agent (applies to groups too)
/start or start text (DM only) → voice agent + rewrite content
Text matches intent keywords (DM only) → voice agent + set affinity
Existing non-expired affinity (DM only) → continue routing to affinity agent
Text matches clear keywords (DM only) → evict affinity, route to default
Fallback → default agent

Audio Guard Behavior

Only triggers for voice agent on Telegram DMs with audio/voice media
Checks reply for technical error language
Replaces with user-friendly fallback when error detected
Supports custom error markers (replaces defaults when set)
Extracts and includes transcript in fallback when available

🐛 Bug Fixes

Fixed group affinity leak (affinity no longer stored for group chats)
Fixed variable assignment in resolveTargetAgent call
Normalized voice routing keywords to lowercase for case-insensitive matching
Fixed STT contract to use audio field (not legacy file field)

🔍 Code Quality

Zero breaking changes for existing deployments
Comprehensive test coverage (27 new tests)
Clear separation of concerns (voiceguard package)
Improved code organization and maintainability
Detailed inline documentation
Performance optimizations (HTTP client pooling, concurrency control)

📚 Related Issues

Closes: (if any issue numbers)

🙏 Acknowledgments

This refactor builds upon the existing voice pipeline foundation and improves it with better structure, testability, and configurability for production deployments.

- Add dmAgentAffinity map for sticky DM routing to voice agent - Add STT config fields (STTProxyURL, STTAPIKey, STTTenantID, STTTimeoutSec, VoiceAgentID) - Implement looksLikeSpeakingIntent and looksLikeNonSpeakingIntent for smart routing - Add session affinity with 6h TTL for DM conversations - Improve STT URL handling with proper trimming - Add logging for transcript attachment

- Add modelFallbacks to Loop config for fallback model support - Implement callProviderWithFallback for automatic model switching on 429 errors - Add modelCandidates helper to deduplicate primary + fallback models - Add isRateLimitFailure detection for 429 status and common rate limit error messages - Update emitLLMSpan to track actual model used in span

- Change form field from 'file' to 'audio' for speaking-service contract - Add default tenant_id fallback ('default') when not configured - Add speaking-agent Telegram audio guard for student replies - Add internal identity prompt for speaking-agent in DM - Add sanitizeSpeakingAudioStudentReply to handle technical errors - Update STT tests for new contract

Replace hardcoded speaking-agent logic with configurable Telegram channel settings: - VoiceStartMessage, VoiceIntentKeywords, VoiceAffinityClearKeywords, VoiceAffinityTTLMinutes - VoiceDMContextTemplate (injects context with {user_id} substitution) - AudioGuardFallbackTranscript/NoTranscript for custom fallback messages - GOCLAW_STT_TENANT_ID and GOCLAW_VOICE_DM_CONTEXT_TEMPLATE env var overrides This allows deployments to customize voice routing behavior and error fallback messages without code changes. Includes new tests for voice routing logic and audio guard sanitization.

- Replace fmt.Sprintf with strings.ReplaceAll in audio fallback template handling to prevent "%!(EXTRA string=...)" garbage when custom templates lack %s placeholder - Lowercase config keywords defensively in matchesVoiceIntent and matchesAffinityClear since inbound text is normalized but DB keywords may have mixed case - Add comprehensive test coverage for custom fallback templates with and without placeholders - Add test cases for mixed-case keyword matching in voice intent and affinity-clear routing - Ensure operators can safely configure keywords with any casing without breaking voice routing logic

- Extract voice agent reply sanitization into new voiceguard package with Guard type - Add voiceguard.SanitizeReply function to handle technical error detection and fallback messaging - Refactor voice configuration from flat fields (VoiceAgentID, VoiceDMContextTemplate) to nested Voice struct - Support both nested and flat JSON layouts in telegramInstanceConfig for backward compatibility - Add sttSem field to Channel for bounding parallel STT HTTP calls - Update gateway_consumer to use voiceguard package instead of inline sanitization logic - Remove sanitizeVoiceAgentReply, containsTechnicalErrorLanguage, and extractTranscriptFromInbound functions from gateway_consumer - Clean up unused imports (html, regexp) from gateway_consumer

- Fix assignment operator from `=` to `:=` in handleMessage for proper variable declaration - Remove duplicate test code block at end of handlers_voice_routing_test.go - Clean up test file structure to eliminate redundant package declaration and imports

maemreyo added 9 commits March 1, 2026 20:04

chore: remove patch files after apply

6d08429

chore: remove patch files after apply

6996c42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telegram): Voice pipeline refactor with STT integration and configurable routing#33

feat(telegram): Voice pipeline refactor with STT integration and configurable routing#33
maemreyo wants to merge 9 commits intonextlevelbuilder:mainfrom
maemreyo:maemreyo/telegram-voice-stt

maemreyo commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maemreyo commented Mar 1, 2026

🎯 Overview

✨ Key Features

1. Nested Voice Configuration Structure

2. Voice Agent Routing System

3. STT (Speech-to-Text) Integration

4. Audio Guard System

5. Enhanced Testability

6. Agent Loop Improvements

📊 Changes Summary

New Files

Modified Files

🔄 Migration Path

For Existing Deployments

For New Deployments

🧪 Testing

🔧 Environment Variables

📝 Documentation

Voice Routing Priority Chain

Audio Guard Behavior

🐛 Bug Fixes

🔍 Code Quality

📚 Related Issues

🙏 Acknowledgments

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant