AI Voice Generator API - TTS

AI Voice Generator API — Gemini TTS by Google. Create natural, human-like voices with Gemini TTS at . Features include text-to-speech, multilingual voice support and studio-quality audio generation — with expressive tones for developers and creators. 👉 Check Pricing.

Note

If you want a full editor, consider GeminiGen AI Voice Generator - a web app makes video creation and editing effortless, with powerful AI options, intuitive tools, and a clean, user-friendly interface designed for everyone. Along with a dedicated platform that lets users with API access generate text to speech effortlessly.

✨ Features

Gemini TTS is Google’s most advanced and versatile text-to-speech model. It enables you to generate and customize natural, human-like voices seamlessly through simple input, blending studio-quality audio with expressive flexibility.

With this state-of-the-art model at the core, our app unlocks a suite of powerful tools designed to make audio production effortless and inspiring:

🎙️ Two Powerful Models — Choose between Gemini 2.5 Flash (fast & efficient) and Gemini 2.5 Pro (higher quality, expressive voices).
📝 Text & Document to Speech — Instantly convert text or full documents into natural-sounding audio.
💬 AI Dialogue Generation — Build multi-speaker conversations by adding multiple dialogue blocks, each assigned to a different voice. Perfect for creating roleplays, interviews, podcasts, or dramatized storytelling.
🔊 Multiple Voice Sources — Use Gemini Voices (AI voices), System Voices (native device voices), or even Your Own Custom Voices.
🎭 Custom Prompt Control — Go beyond presets and define your own custom vocal instructions (e.g., soft whisper, news anchor style, bedtime story tone) to fine-tune the output exactly as you need.
😊 Emotion & Style Control — Select from presets (Casual, Excited, Firm, Informative, etc.) or define a custom prompt for fine-grained vocal expression.
🌍 400+ AI Voices — Wide variety of accents, tones, genders, and styles across different languages and countries.
⚡ Speed Adjustment — Fine-tune playback from 0.25x up to 4x for perfect pacing.
🎧 Multiple Output Formats — Export to MP3 or WAV for maximum compatibility.
⭐ Voice Favorites & Management — Save your own preferred voices and reuse them easily.
📊 Real-Time Credit Cost Estimate — See exactly how many credits your generation will use before you hit “Generate Speech”.
🔗 Developer-Friendly API — Simple endpoints to control model, emotion, text, speed, format, and voice selection.

🎯 Usage Guide

🗣️ Text to Speech

Choose Model — Select between Gemini 2.5 Flash (fast, cost-efficient) or Gemini 2.5 Pro (expressive, high-quality).
Enter Text — Type or paste up to 30,000 characters into the text box.
Select Emotion & Style — Pick from presets (Casual, Excited, Firm, Informative, etc.) or use a Custom Prompt for advanced tone control.
Pick a Voice — Browse Gemini Voices, System Voices, or your Saved Custom Voices.
Adjust Speed — Fine-tune playback speed from 0.25x to 4x.
Select Output Format — Choose MP3 or WAV for download.
Check Credits & Cost — The UI displays cost per character.
Generate Speech — Click Generate Speech to synthesize your audio. The file will be available for playback or download.

📄 Document to Speech

Choose Model — Select Gemini 2.5 Flash or Gemini 2.5 Pro depending on your needs (speed vs. quality).
Upload Document — Drag & drop or click to upload supported files: DOCX, XLSX, PPTX, PDF, EPUB, MOBI, TXT, HTML, ODT, ODS, ODP, AZW, AZW3, ...
Pick a Voice — Choose from Gemini Voices, System Voices, or your Saved Custom Voices.
Adjust Speed — Set reading pace from 0.25x to 4x.
Select Output Format — Export the narration as MP3 or WAV.
Check Credits & Cost — The system shows exact credit usage before generating.
Generate Speech — Hit Generate Speech and your document will be converted into high-quality spoken audio.

💬 Dialogue TTS (Multi-Voice Conversations)

Choose Model — Pick between Gemini 2.5 Flash or Gemini 2.5 Pro depending on your use case.
Assign Voices — Select Voice 1, Voice 2 from the Gemini Voices list.
Add Dialogue Blocks — Each block is tied to a specific voice (e.g., Voice 1, Voice 2).
Enter Dialogues — Type the spoken lines for each voice in their respective blocks.
Customize Emotion & Style — Apply presets or use a Custom Prompt (e.g., whispering, news anchor style, bedtime storyteller).
Adjust Speed — Control playback pacing between 0.25x and 4x.
Select Output Format — Export the conversation as MP3 or WAV.
Check Credits & Cost — Preview exact usage before generating.
Generate Dialogue — Click Generate to create a multi-speaker audio simulating natural conversation.

💡 Tips for Best Results

✍️ Be clear with text: Write clean, well-punctuated sentences for more natural speech.
🎭 Use emotions wisely: Select an emotion (Casual, Excited, Firm, etc.) or a custom prompt to match the mood of your content.
🌍 Pick the right voice: Choose voices that fit your audience’s language, accent, and gender preference.
⚡ Balance model choice: Use Gemini 2.5 Flash for speed and cost savings, or Gemini 2.5 Pro for expressive, studio-quality narration.
🎧 Adjust speed: Fine-tune playback speed (0.25x–4x) to suit audiobooks, learning materials, or fast demos.
📄 Leverage Document to Speech: For long content (PDFs, DOCX, EPUB, etc.), upload documents directly instead of copy-pasting text.

🔊 Results & Examples

See the power of AI text-to-speech (TTS) in action. Easily generate natural, expressive speech from any text with customizable voices, speeds, and emotions.
We will take the example: convert a short sentence into realistic speech.

📄 Text/Document to speech

Output audio: Listen to audio

💬 Dialogue

Output audio: Listen to audio

🤝 Contributing

We warmly welcome contributions in the form of:

💡 Feature Suggestions — Share ideas to improve or expand the app’s capabilities.
🛠 Enhancements — Propose improvements to performance, UI/UX, or AI model usage.
🐞 Bug Reports — Help us identify and fix issues to ensure a smooth user experience.
📈 Collaboration — We are open to discussions about potential partnerships or integrations.

👉 If you’d like to contribute, please reach out via Issues or contact us directly at contact@geminigen.ai

🔗 Links & Resources

👨‍💻 Our website: https://geminigen.ai
📚 Our API Documentation: https://docs.geminigen.ai

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Voice Generator API - TTS

✨ Features

🎯 Usage Guide

🗣️ Text to Speech

📄 Document to Speech

💬 Dialogue TTS (Multi-Voice Conversations)

💡 Tips for Best Results

🔊 Results & Examples

📄 Text/Document to speech

💬 Dialogue

🤝 Contributing

🔗 Links & Resources

About

Uh oh!

Releases

Packages

License

GeminiGenAI/TTS-AI-API

Folders and files

Latest commit

History

Repository files navigation

AI Voice Generator API - TTS

✨ Features

🎯 Usage Guide

🗣️ Text to Speech

📄 Document to Speech

💬 Dialogue TTS (Multi-Voice Conversations)

💡 Tips for Best Results

🔊 Results & Examples

📄 Text/Document to speech

💬 Dialogue

🤝 Contributing

🔗 Links & Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages