feat(telegram): Voice pipeline refactor with STT integration and configurable routing#33
Open
maemreyo wants to merge 9 commits intonextlevelbuilder:mainfrom
Open
feat(telegram): Voice pipeline refactor with STT integration and configurable routing#33maemreyo wants to merge 9 commits intonextlevelbuilder:mainfrom
maemreyo wants to merge 9 commits intonextlevelbuilder:mainfrom
Conversation
- Add dmAgentAffinity map for sticky DM routing to voice agent - Add STT config fields (STTProxyURL, STTAPIKey, STTTenantID, STTTimeoutSec, VoiceAgentID) - Implement looksLikeSpeakingIntent and looksLikeNonSpeakingIntent for smart routing - Add session affinity with 6h TTL for DM conversations - Improve STT URL handling with proper trimming - Add logging for transcript attachment
- Add modelFallbacks to Loop config for fallback model support - Implement callProviderWithFallback for automatic model switching on 429 errors - Add modelCandidates helper to deduplicate primary + fallback models - Add isRateLimitFailure detection for 429 status and common rate limit error messages - Update emitLLMSpan to track actual model used in span
- Change form field from 'file' to 'audio' for speaking-service contract
- Add default tenant_id fallback ('default') when not configured
- Add speaking-agent Telegram audio guard for student replies
- Add internal identity prompt for speaking-agent in DM
- Add sanitizeSpeakingAudioStudentReply to handle technical errors
- Update STT tests for new contract
Replace hardcoded speaking-agent logic with configurable Telegram channel settings:
- VoiceStartMessage, VoiceIntentKeywords, VoiceAffinityClearKeywords, VoiceAffinityTTLMinutes
- VoiceDMContextTemplate (injects context with {user_id} substitution)
- AudioGuardFallbackTranscript/NoTranscript for custom fallback messages
- GOCLAW_STT_TENANT_ID and GOCLAW_VOICE_DM_CONTEXT_TEMPLATE env var overrides
This allows deployments to customize voice routing behavior and error fallback messages without code changes. Includes new tests for voice routing logic and audio guard sanitization.
- Replace fmt.Sprintf with strings.ReplaceAll in audio fallback template handling to prevent "%!(EXTRA string=...)" garbage when custom templates lack %s placeholder - Lowercase config keywords defensively in matchesVoiceIntent and matchesAffinityClear since inbound text is normalized but DB keywords may have mixed case - Add comprehensive test coverage for custom fallback templates with and without placeholders - Add test cases for mixed-case keyword matching in voice intent and affinity-clear routing - Ensure operators can safely configure keywords with any casing without breaking voice routing logic
- Extract voice agent reply sanitization into new voiceguard package with Guard type - Add voiceguard.SanitizeReply function to handle technical error detection and fallback messaging - Refactor voice configuration from flat fields (VoiceAgentID, VoiceDMContextTemplate) to nested Voice struct - Support both nested and flat JSON layouts in telegramInstanceConfig for backward compatibility - Add sttSem field to Channel for bounding parallel STT HTTP calls - Update gateway_consumer to use voiceguard package instead of inline sanitization logic - Remove sanitizeVoiceAgentReply, containsTechnicalErrorLanguage, and extractTranscriptFromInbound functions from gateway_consumer - Clean up unused imports (html, regexp) from gateway_consumer
- Fix assignment operator from `=` to `:=` in handleMessage for proper variable declaration - Remove duplicate test code block at end of handlers_voice_routing_test.go - Clean up test file structure to eliminate redundant package declaration and imports
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 Overview
This PR introduces a comprehensive refactor of the Telegram voice pipeline, improving code organization, testability, and adding robust voice agent routing capabilities with STT (Speech-to-Text) integration.
✨ Key Features
1. Nested Voice Configuration Structure
TelegramVoiceConfigto group all voice-related settings under a singlevoiceJSON key2. Voice Agent Routing System
3. STT (Speech-to-Text) Integration
/transcribe_audioendpoint4. Audio Guard System
voiceguardpackage for better separation of concerns5. Enhanced Testability
resolveTargetAgent()extracted as pure function (no I/O side effects)-raceflag support6. Agent Loop Improvements
ForwardMediafield for delegation artifact forwarding📊 Changes Summary
New Files
internal/channels/telegram/voiceguard/guard.go- Audio guard logicinternal/channels/telegram/voiceguard/guard_test.go- Audio guard tests (13 tests)internal/channels/telegram/handlers_voice_routing_test.go- Voice routing tests (14 tests)internal/config/config_load_voice_test.go- Voice config testsinternal/agent/loop_fallback_test.go- Model fallback testscmd/gateway_consumer_audio_sanitize_test.go- Audio sanitization testsModified Files
internal/config/config_channels.go- NewTelegramVoiceConfigstructinternal/channels/telegram/factory.go- Legacy config promotion logicinternal/channels/telegram/handlers.go- Voice routing implementationinternal/channels/telegram/stt.go- STT concurrency control & HTTP client poolinginternal/agent/loop.go- ForwardMedia support & improved structurecmd/gateway_consumer.go- Integration with voiceguard package🔄 Migration Path
For Existing Deployments
No immediate action required! The refactor is fully backward compatible:
For New Deployments
Use the nested structure for cleaner configuration:
{ "voice": { "agent_id": "speaking-agent", "stt_proxy_url": "https://stt.example.com", "stt_api_key": "secret-key", "intent_keywords": ["speaking", "pronunciation"], "affinity_clear_keywords": ["homework", "payment"], "affinity_ttl_minutes": 360, "dm_context_template": "Context:\n- tenant: {tenant_id}\n- user_id: {user_id}", "audio_guard_fallback_transcript": "🎙️ Got your voice: \"%s\". Please try again!", "audio_guard_error_markers": ["system error", "rate limit"] } }🧪 Testing
All tests pass:
Run with race detector:
go test ./internal/channels/telegram/... -race -v🔧 Environment Variables
New environment variable support:
GOCLAW_VOICE_AGENT_ID- Override voice agent IDGOCLAW_STT_TENANT_ID- Override STT tenant IDGOCLAW_VOICE_DM_CONTEXT_TEMPLATE- Override DM context templateGOCLAW_AUDIO_GUARD_FALLBACK_TRANSCRIPT- Override transcript fallbackGOCLAW_AUDIO_GUARD_FALLBACK_NO_TRANSCRIPT- Override no-transcript fallback📝 Documentation
Voice Routing Priority Chain
/startorstarttext (DM only) → voice agent + rewrite contentAudio Guard Behavior
🐛 Bug Fixes
resolveTargetAgentcallaudiofield (not legacyfilefield)🔍 Code Quality
📚 Related Issues
Closes: (if any issue numbers)
🙏 Acknowledgments
This refactor builds upon the existing voice pipeline foundation and improves it with better structure, testability, and configurability for production deployments.