Revolutionizing Voice Conversion with State-of-the-Art AI Technology
The Ultimate Voice Conversion Experience - Powered by Advanced AI Algorithms
- ๐ฏ Overview
- โจ Key Features
- โก Performance Improvements
- ๐ ๏ธ Installation
- ๐ Getting Started
- ๐จ UI Components
- ๐ Multi-language Support
- ๐ฌ YouTube Audio Processing
- ๐ฃ๏ธ Text-to-Speech (TTS)
- ๐ต Audio Separation
- ๐ฎ Realtime Voice Changer
- โ๏ธ Settings & Configuration
- ๐ก๏ธ Terms of Use
โ ๏ธ Disclaimer- ๐ Credits
Advanced RVC Inference V3.1 is a cutting-edge WebUI designed for lightning-fast and effortless voice conversion inference. Built on the powerful foundation of Applio with significant enhancements, this application delivers the most comprehensive and user-friendly voice conversion experience available today.
๐ฏ Perfect for: Content creators, voice actors, musicians, researchers, and AI enthusiasts
- RVC Inference: High-quality voice conversion with multiple algorithms (RMVPE, CREPE, FCPE, SWIFT)
- V1 & V2 Model Support: Full compatibility with both RVC model generations
- Multiple Embedder Models: ContentVec, Chinese-Hubert, Japanese-Hubert, Korean-Hubert, and custom support
- Pitch Control: Adjustable pitch with autotune capabilities
- Index Rate Management: Precision control over voice characteristics
- YouTube Audio Downloader: Direct download from YouTube with WAV format support
- Multi-format Support: WAV, MP3, FLAC, OGG, M4A, AAC, ALAC and more
- Audio Separation: Advanced vocal separation using Mel-Roformer, BS-Roformer, and MDX23C models
- Post-processing Effects: Reverb, volume control, and audio enhancement tools
- 150+ TTS Voices: Access to hundreds of high-quality voices across multiple languages
- Speech Rate Control: Adjustable speed from -50% to +50%
- Voice Customization: Tone, pitch, and expression controls
- Low-latency Processing: Real-time voice conversion with minimal delay
- Audio Device Management: Support for ASIO, WASAPI, and standard audio devices
- VAD (Voice Activity Detection): Automatic silence detection and processing
- Cross-platform Support: Works on Windows, macOS, and Linux
- Gradio 5.23.1 Integration: Modern, responsive interface with advanced features
- Multi-tab Interface: Organized workflow with dedicated sections
- GPU Acceleration: Automatic hardware utilization detection
- Theme Support: Customizable appearance and dark/light modes
- 16+ Languages Supported: Internationalization with growing community support
- Auto-detection: System language recognition with manual override
- Easy Translation System: Community-driven translation improvements
Advanced RVC Inference V3.1 has been significantly optimized with the following enhancements:
- Caching Mechanism: Prevents repeated file system operations, reducing I/O overhead by up to 90%
- Time-based Refresh: Directory scans happen only every 30 seconds, preventing unnecessary loops
- Efficient Memory Usage: Optimized data structures and reduced memory footprint
- Lazy Loading: Components load only when needed, improving startup time
- Modern Gradio Syntax: Updated all deprecated
__type__calls togr.update()method - Error Handling: Improved error catching and user notifications
- Responsive Design: Better UI responsiveness with reduced lag
- Optimized Event Handling: Cleaner event chains for better performance
- Directory Scanning: Reduced from O(n) repeated operations to O(1) cached result
- UI Updates: Up to 5x faster response times for dropdown refreshes
- Memory Usage: 30% reduction in memory consumption during operations
- Stability: Eliminated crashes from circular dependencies and syntax errors
- Python 3.8 or higher
- FFmpeg (for audio processing)
- Git (for cloning the repository)
-
Clone the Repository
git clone https://github.com/ArkanDash/Advanced-RVC-Inference.git cd Advanced-RVC-Inference -
Install Python Dependencies
pip install -r requirements.txt
-
Install FFmpeg
Download and add to your system PATH, or follow OS-specific installation guides:- Windows: Use chocolatey
choco install ffmpeg - macOS: Use homebrew
brew install ffmpeg - Linux:
sudo apt install ffmpeg(Ubuntu/Debian) orsudo dnf install ffmpeg(Fedora)
- Windows: Use chocolatey
-
(Optional) Install GPU Support For NVIDIA GPU acceleration:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
python app.pypython app.py --shareClick the "Open in Colab" badge at the top of this README to run in your browser without any local installation.
- Launch the application - Access the UI at the URL shown in your terminal
- Place your models in the
logsfolder (create subfolders for each model) - Add audio files to
audio_files/original_files/for processing - Refresh the UI using the refresh buttons to load new content
- Voice Model Selection: Choose from all available RVC models
- Index File Matching: Automatic index file detection and pairing
- Audio Input: Upload or select from existing audio files
- Advanced RVC Settings: Pitch, filtering, blending, protection ratios
- Audio Post-processing: Reverb, volume adjustments, and export format selection
- Process Control: Split audio, pitch extraction algorithms, embedder selection
- URL-based Downloads: Direct download from various sources
- File Drop Support: Drag and drop .pth and .index files directly
- Automatic Organization: Files automatically placed in correct model folders
- Text Input: Multi-line text area for input
- Voice Selection: 150+ voices with preview names
- Speech Rate Control: Adjustable speed from -50% to +50%
- Output Configuration: Customizable file naming and format selection
- Language Settings: 16+ language support with auto-detection
- Theme Management: Light/dark mode and color customization
- Audio Preferences: Format defaults and file handling
- Performance Options: Thread management and optimization settings
- Notification Controls: Completion and error notifications
- File Management: Backup and cleanup utilities
- Debug Options: Logging and error tracking
Advanced RVC Inference now supports 16+ languages with community-driven translations, making it accessible to users worldwide.
- ๐บ๐ธ English (US) -
en_US - ๐ฉ๐ช German (Deutsch) -
de_DE - ๐ช๐ธ Spanish (Espaรฑol) -
es_ES - ๐ซ๐ท French (Franรงais) -
fr_FR - ๐ฎ๐ฉ Indonesian (Bahasa Indonesia) -
id_ID - ๐ฏ๐ต Japanese (ๆฅๆฌ่ช) -
ja_JP - ๐ง๐ท Portuguese (Portuguรชs) -
pt_BR - ๐จ๐ณ Chinese (ไธญๆ) -
zh_CN - ๐ธ๐ฆ Arabic (ุงูุนุฑุจูุฉ) -
ar_SA - ๐ฎ๐ณ Hindi (เคนเคฟเคจเฅเคฆเฅ) -
hi_IN - ๐ฎ๐น Italian (Italiano) -
it_IT - ๐ฐ๐ท Korean (ํ๊ตญ์ด) -
ko_KR - ๐ณ๐ฑ Dutch (Nederlands) -
nl_NL - ๐ต๐ฑ Polish (Polski) -
pl_PL - ๐ท๐บ Russian (ะ ัััะบะธะน) -
ru_RU - ๐น๐ท Turkish (Tรผrkรงe) -
tr_TR
- Launch the application
- Navigate to the "Settings" tab
- Select "Language" sub-tab
- Choose your preferred language from the dropdown
- Restart the application for changes to take effect
We welcome translations from the community! If you'd like to add support for your language or improve existing translations, please follow our Translation Guide.
- Go to the "Full Inference" tab
- Navigate to the "Download Music" sub-tab
- Paste your YouTube URL in the text box
- Click "Download" to process the audio
- Audio will be available in the audio selection dropdown after completion
- YouTube URLs
- Other popular video platforms
- Audio file links (when compatible)
- Automatic format conversion to WAV
- Preserved original quality
- Organized storage in
audio_files/original_files/
- Navigate to the "TTS" tab
- Enter your text in the multi-line text area
- Select your preferred voice from 150+ options
- Adjust speech rate (-50% to +50%) as needed
- Optionally specify output filename
- Click "Generate Speech" to create the audio file
- Voice Selection: Multiple voices per language with expressive capabilities
- Speed Control: Adjustable from very slow to very fast
- Output Format: WAV, MP3, FLAC, OGG support
- Quality Settings: High-quality synthesis with natural intonation
Advanced RVC Inference includes powerful audio separation capabilities:
- Mel-Roformer by KimberleyJSN: State-of-the-art vocal isolation
- BS-Roformer by ViperX: High-quality instrumental separation
- MDX23C: Advanced neural network processing
- Karaoke Models: Separate vocals from instrumentals
- Dereverb Models: Remove reverb and room effects
- Deecho Models: Eliminate echo and acoustic artifacts
- Denoise Models: Reduce background noise and artifacts
- Upload or select audio file
- Choose separation model type
- Configure processing parameters
- Select input/output devices (for realtime)
- Start the separation process
- Process separated tracks with RVC
- Apply post-processing effects
- Export final audio in desired format
The advanced realtime voice changer offers:
- Input Device: Microphone or audio interface selection
- Output Device: Virtual cable or speaker configuration
- Monitor Device: Separate monitoring path (optional)
- Input/Output Gain: Independent volume controls (0-200%)
- ASIO Channel Selection: Specific channel routing (-1 to 16)
- WASAPI Exclusive Mode: Lower latency on Windows
- VAD Sensitivity: Voice Activity Detection (0-5)
- Pitch Control: Range from -24 to +24 semitones
- Autotune: Soft auto-tuning with adjustable strength
- Proposed Pitch: Automatic pitch adjustment for voice range
- Speaker ID: Multi-speaker model selection
- Chunk Size: Buffer size control (2.7ms - 2730.7ms)
- Crossfade Overlap: Audio transition smoothing (0.05s - 0.2s)
- Extra Conversion: Context buffer (0.1s - 5.0s)
- Silence Threshold: Noise floor detection (-90dB to -60dB)
- Theme Mode: Light or dark mode selection
- Primary Color: 9 color options (red, orange, yellow, green, blue, purple, pink, slate, gray)
- Font Size: Small, medium, or large text options
- Max Threads: 1-16 thread configuration
- Memory Optimization: Automatic memory management
- GPU Acceleration: Enable/disable hardware acceleration
- Completion Notifications: Success/failure alerts
- Error Notifications: Issue reporting
- Sound Effects: Audio feedback for events
- Auto Cleanup: Automatic temporary file removal
- Cleanup Interval: Schedule (1-168 hours)
- Backup System: Configuration and model preservation
The converted voices must not be used for:
- Harmful Content: Criticizing, attacking, or defaming individuals
- Political/Religious Propaganda: Advocating or opposing political positions, religions, or ideologies
- Inappropriate Content: Public display of strongly stimulating expressions without proper content warnings
- Commercial Exploitation: Selling voice models, generated voice clips, or monetizing without proper licensing
- Identity Fraud: Malicious impersonation of original voice owners or fraudulent activities
- Deceptive Practices: Identity theft, deceptive calls, or misleading communications
- Personal Entertainment: Non-commercial creative projects
- Artistic Expression: Music, comedy, and entertainment applications
- Educational Purposes: Academic research and learning
- Accessibility: Tools for those with speech difficulties
The author is not liable for any direct, indirect, consequential, incidental, or special damages arising from the use, misuse, or inability to use this software.
- Keep your voice models secure
- Do not share sensitive personal voice data
- Use appropriate content filters
- Be responsible with generated content
- Respect the rights of voice owners
- Obtain proper permissions when required
- Follow local laws and regulations
- Use the technology ethically and responsibly
- Applio: Original project foundation and core RVC implementation
- RVC Project: Core voice conversion technology
- Shirou's RVC AI Cover Maker UI: Initial project structure
- ArkanDash: Project owner and lead developer
This is an open-source project. Contributions, bug reports, and feature suggestions are welcome through GitHub issues and pull requests.