feat(elevenlabs): add speech-to-text transcription support #5171

apappascs · 2026-01-02T15:10:40Z

Implements Spring AI TranscriptionModel interface for ElevenLabs Speech-to-Text API, providing audio transcription with support for multiple languages, speaker diarization, word-level timestamps, and advanced transcription features.

Implementation:

ElevenLabsAudioTranscriptionModel: Main model implementing TranscriptionModel
ElevenLabsAudioTranscriptionOptions: Configuration with 15 transcription parameters
ElevenLabsSpeechToTextApi: Low-level API client for ElevenLabs STT endpoints
ElevenLabsAudioTranscriptionMetadata: Rich metadata including language detection and word timing

Thank you for taking time to contribute this pull request!
You might have already read the contributor guide, but as a reminder, please make sure to:

Add a Signed-off-by line to each commit (git commit -s) per the DCO
Rebase your changes on the latest main branch and squash your commits
Add/Update unit tests as needed
Run a build and make sure all tests pass prior to submission

For more details, please check the contributor guide.
Thank you upfront!

Implements Spring AI TranscriptionModel interface for ElevenLabs Speech-to-Text API, providing audio transcription with support for multiple languages, speaker diarization, word-level timestamps, and advanced transcription features. Implementation: - ElevenLabsAudioTranscriptionModel: Main model implementing TranscriptionModel - ElevenLabsAudioTranscriptionOptions: Configuration with 15 transcription parameters - ElevenLabsSpeechToTextApi: Low-level API client for ElevenLabs STT endpoints - ElevenLabsAudioTranscriptionMetadata: Rich metadata including language detection and word timing Key Features: - Full TranscriptionModel interface compliance - Speaker diarization with configurable speaker count - Word-level timestamps and audio event tagging - Async transcription support via webhook integration - Language detection with confidence scores - Builder pattern for model and options construction - Comprehensive error handling with retry support Design Decisions: - Uses transcribe(Resource) convenience method from interface (not call(Resource)) - Result-level metadata for transcription-specific information - Defensive copying for immutability (List.copyOf, HashMap copy) - Proper null handling: returns empty AudioTranscription("") instead of null Testing: - 15 unit tests with MockRestServiceServer - 4 integration tests against live API - Tests cover basic transcription, options, diarization, and convenience methods Documentation: - Comprehensive Antora reference documentation - Module README with TTS and STT capabilities - Alignment report documenting design decisions vs OpenAI/Azure implementations All tests passing (26 integration tests total across module). Signed-off-by: Alexandros Pappas <apappascs@gmail.com>

apappascs · 2026-01-02T15:16:11Z

cc @markpollack

apappascs force-pushed the feature/elevenlabs-speech-to-text branch from 9d30cf5 to 4f951dd Compare January 2, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(elevenlabs): add speech-to-text transcription support #5171

feat(elevenlabs): add speech-to-text transcription support #5171

apappascs commented Jan 2, 2026

Uh oh!

apappascs commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(elevenlabs): add speech-to-text transcription support #5171

Are you sure you want to change the base?

feat(elevenlabs): add speech-to-text transcription support #5171

Conversation

apappascs commented Jan 2, 2026

Uh oh!

apappascs commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant