feat: add multimedia support for Telegram and Discord adapters by slysian · Pull Request #196 · RightNow-AI/openfang

slysian · 2026-03-02T06:28:14Z

Add comprehensive media handling for channel adapters:

Telegram (receive):

Voice messages: download + transcribe via Groq Whisper (fallback: OpenAI Whisper)
Photos: download + recognize via Gemini Vision API
Documents: download + extract text content or recognize images

Telegram & Discord (send):

File sending via multipart upload (sendDocument / Discord files API)
Image sending with optional captions

Discord (receive):

Attachment processing: images via Gemini Vision, text files extracted
Mixed content (text + attachments) handled correctly

Shared utilities (new media_utils module):

Gemini Vision image recognition
MIME type detection from magic bytes
Text file detection by extension/MIME
HTTP download helper
Attachment-to-text processing pipeline

Closes #158

Add comprehensive media handling for channel adapters: Telegram (receive): - Voice messages: download + transcribe via Groq Whisper (fallback: OpenAI Whisper) - Photos: download + recognize via Gemini Vision API - Documents: download + extract text content or recognize images Telegram & Discord (send): - File sending via multipart upload (sendDocument / Discord files API) - Image sending with optional captions Discord (receive): - Attachment processing: images via Gemini Vision, text files extracted - Mixed content (text + attachments) handled correctly Shared utilities (new media_utils module): - Gemini Vision image recognition - MIME type detection from magic bytes - Text file detection by extension/MIME - HTTP download helper - Attachment-to-text processing pipeline Closes RightNow-AI#158 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nurikk · 2026-03-02T08:55:57Z

crates/openfang-channels/src/media_utils.rs

+    });
+
+    let url = format!(
+        "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key={gemini_key}"


it'll be better if we not hardcode model names and rely on global model resolution flow through config.toml

Replace hardcoded model names and API endpoints with LazyLock statics that read from environment variables at first use, with sensible defaults: - VISION_MODEL (default: gemini-2.5-flash) - VISION_API_BASE (default: generativelanguage.googleapis.com/v1beta) - GROQ_STT_MODEL (default: whisper-large-v3-turbo) - GROQ_STT_URL (default: api.groq.com/openai/v1/audio/transcriptions) - OPENAI_STT_MODEL (default: whisper-1) - OPENAI_STT_URL (default: api.openai.com/v1/audio/transcriptions) This allows users to swap models or providers without recompiling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nurikk reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add multimedia support for Telegram and Discord adapters#196

feat: add multimedia support for Telegram and Discord adapters#196
slysian wants to merge 2 commits intoRightNow-AI:mainfrom
slysian:pr/multimedia-support

slysian commented Mar 2, 2026

Uh oh!

nurikk Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

slysian commented Mar 2, 2026

Uh oh!

nurikk Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants