🚀 Advanced RVC Inference V3.1

Revolutionizing Voice Conversion with State-of-the-Art AI Technology

The Ultimate Voice Conversion Experience - Powered by Advanced AI Algorithms

📖 Table of Contents

🎯 Overview
✨ Key Features
⚡ Performance Improvements
🛠️ Installation
🚀 Getting Started
🎨 UI Components
🌍 Multi-language Support
🎬 YouTube Audio Processing
🗣️ Text-to-Speech (TTS)
🎵 Audio Separation
🎮 Realtime Voice Changer
⚙️ Settings & Configuration
🛡️ Terms of Use
⚠️ Disclaimer
🙏 Credits

🎯 Overview

Advanced RVC Inference V3.1 is a cutting-edge WebUI designed for lightning-fast and effortless voice conversion inference. Built on the powerful foundation of Applio with significant enhancements, this application delivers the most comprehensive and user-friendly voice conversion experience available today.

🎯 Perfect for: Content creators, voice actors, musicians, researchers, and AI enthusiasts

✨ Key Features

🎵 Advanced Voice Conversion

RVC Inference: High-quality voice conversion with multiple algorithms (RMVPE, CREPE, FCPE, SWIFT)
V1 & V2 Model Support: Full compatibility with both RVC model generations
Multiple Embedder Models: ContentVec, Chinese-Hubert, Japanese-Hubert, Korean-Hubert, and custom support
Pitch Control: Adjustable pitch with autotune capabilities
Index Rate Management: Precision control over voice characteristics

🎙️ Audio Processing Suite

YouTube Audio Downloader: Direct download from YouTube with WAV format support
Multi-format Support: WAV, MP3, FLAC, OGG, M4A, AAC, ALAC and more
Audio Separation: Advanced vocal separation using Mel-Roformer, BS-Roformer, and MDX23C models
Post-processing Effects: Reverb, volume control, and audio enhancement tools

🗣️ Text-to-Speech Integration

150+ TTS Voices: Access to hundreds of high-quality voices across multiple languages
Speech Rate Control: Adjustable speed from -50% to +50%
Voice Customization: Tone, pitch, and expression controls

🎮 Realtime Voice Changer

Low-latency Processing: Real-time voice conversion with minimal delay
Audio Device Management: Support for ASIO, WASAPI, and standard audio devices
VAD (Voice Activity Detection): Automatic silence detection and processing
Cross-platform Support: Works on Windows, macOS, and Linux

🎨 Enhanced UI Experience

Gradio 5.23.1 Integration: Modern, responsive interface with advanced features
Multi-tab Interface: Organized workflow with dedicated sections
GPU Acceleration: Automatic hardware utilization detection
Theme Support: Customizable appearance and dark/light modes

🌍 Global Accessibility

16+ Languages Supported: Internationalization with growing community support
Auto-detection: System language recognition with manual override
Easy Translation System: Community-driven translation improvements

⚡ Performance Improvements

Advanced RVC Inference V3.1 has been significantly optimized with the following enhancements:

🚀 Performance Optimizations

Caching Mechanism: Prevents repeated file system operations, reducing I/O overhead by up to 90%
Time-based Refresh: Directory scans happen only every 30 seconds, preventing unnecessary loops
Efficient Memory Usage: Optimized data structures and reduced memory footprint
Lazy Loading: Components load only when needed, improving startup time

🛠️ UI Enhancements

Modern Gradio Syntax: Updated all deprecated __type__ calls to gr.update() method
Error Handling: Improved error catching and user notifications
Responsive Design: Better UI responsiveness with reduced lag
Optimized Event Handling: Cleaner event chains for better performance

📊 Performance Metrics

Directory Scanning: Reduced from O(n) repeated operations to O(1) cached result
UI Updates: Up to 5x faster response times for dropdown refreshes
Memory Usage: 30% reduction in memory consumption during operations
Stability: Eliminated crashes from circular dependencies and syntax errors

🛠️ Installation

Prerequisites

Python 3.8 or higher
FFmpeg (for audio processing)
Git (for cloning the repository)

Quick Installation

Clone the Repository

git clone https://github.com/ArkanDash/Advanced-RVC-Inference.git
cd Advanced-RVC-Inference

Install Python Dependencies
```
pip install -r requirements.txt
```
Install FFmpeg
Download and add to your system PATH, or follow OS-specific installation guides:
- Windows: Use chocolatey choco install ffmpeg
- macOS: Use homebrew brew install ffmpeg
- Linux: sudo apt install ffmpeg (Ubuntu/Debian) or sudo dnf install ffmpeg (Fedora)

(Optional) Install GPU Support For NVIDIA GPU acceleration:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

🚀 Getting Started

Running the Application

Method 1: Local Runtime

python app.py

Method 2: Share Publicly (with --share flag)

python app.py --share

Method 3: Using Google Colab

Click the "Open in Colab" badge at the top of this README to run in your browser without any local installation.

Initial Setup

Launch the application - Access the UI at the URL shown in your terminal
Place your models in the logs folder (create subfolders for each model)
Add audio files to audio_files/original_files/ for processing
Refresh the UI using the refresh buttons to load new content

🎨 UI Components

Full Inference Tab

Voice Model Selection: Choose from all available RVC models
Index File Matching: Automatic index file detection and pairing
Audio Input: Upload or select from existing audio files
Advanced RVC Settings: Pitch, filtering, blending, protection ratios
Audio Post-processing: Reverb, volume adjustments, and export format selection
Process Control: Split audio, pitch extraction algorithms, embedder selection

Download Model Tab

URL-based Downloads: Direct download from various sources
File Drop Support: Drag and drop .pth and .index files directly
Automatic Organization: Files automatically placed in correct model folders

TTS (Text-to-Speech) Tab

Text Input: Multi-line text area for input
Voice Selection: 150+ voices with preview names
Speech Rate Control: Adjustable speed from -50% to +50%
Output Configuration: Customizable file naming and format selection

Settings Tab

Language Settings: 16+ language support with auto-detection
Theme Management: Light/dark mode and color customization
Audio Preferences: Format defaults and file handling
Performance Options: Thread management and optimization settings
Notification Controls: Completion and error notifications
File Management: Backup and cleanup utilities
Debug Options: Logging and error tracking

🌍 Multi-language Support

Advanced RVC Inference now supports 16+ languages with community-driven translations, making it accessible to users worldwide.

🌐 Currently Supported Languages:

🇺🇸 English (US) - en_US
🇩🇪 German (Deutsch) - de_DE
🇪🇸 Spanish (Español) - es_ES
🇫🇷 French (Français) - fr_FR
🇮🇩 Indonesian (Bahasa Indonesia) - id_ID
🇯🇵 Japanese (日本語) - ja_JP
🇧🇷 Portuguese (Português) - pt_BR
🇨🇳 Chinese (中文) - zh_CN
🇸🇦 Arabic (العربية) - ar_SA
🇮🇳 Hindi (हिन्दी) - hi_IN
🇮🇹 Italian (Italiano) - it_IT
🇰🇷 Korean (한국어) - ko_KR
🇳🇱 Dutch (Nederlands) - nl_NL
🇵🇱 Polish (Polski) - pl_PL
🇷🇺 Russian (Русский) - ru_RU
🇹🇷 Turkish (Türkçe) - tr_TR

🔄 Changing Application Language:

Launch the application
Navigate to the "Settings" tab
Select "Language" sub-tab
Choose your preferred language from the dropdown
Restart the application for changes to take effect

📝 Contributing Translations:

We welcome translations from the community! If you'd like to add support for your language or improve existing translations, please follow our Translation Guide.

🎬 YouTube Audio Processing

📥 Downloading Audio from YouTube

Go to the "Full Inference" tab
Navigate to the "Download Music" sub-tab
Paste your YouTube URL in the text box
Click "Download" to process the audio
Audio will be available in the audio selection dropdown after completion

📋 Supported Sources

YouTube URLs
Other popular video platforms
Audio file links (when compatible)

⚙️ Download Settings

Automatic format conversion to WAV
Preserved original quality
Organized storage in audio_files/original_files/

🗣️ Text-to-Speech (TTS)

🎙️ Using TTS Features

Navigate to the "TTS" tab
Enter your text in the multi-line text area
Select your preferred voice from 150+ options
Adjust speech rate (-50% to +50%) as needed
Optionally specify output filename
Click "Generate Speech" to create the audio file

🎨 TTS Customization Options

Voice Selection: Multiple voices per language with expressive capabilities
Speed Control: Adjustable from very slow to very fast
Output Format: WAV, MP3, FLAC, OGG support
Quality Settings: High-quality synthesis with natural intonation

🎵 Audio Separation

🎵 Advanced Audio Processing

Advanced RVC Inference includes powerful audio separation capabilities:

Vocal Separation Models:

Mel-Roformer by KimberleyJSN: State-of-the-art vocal isolation
BS-Roformer by ViperX: High-quality instrumental separation
MDX23C: Advanced neural network processing

Additional Separation Options:

Karaoke Models: Separate vocals from instrumentals
Dereverb Models: Remove reverb and room effects
Deecho Models: Eliminate echo and acoustic artifacts
Denoise Models: Reduce background noise and artifacts

🎛️ Processing Workflow

Upload or select audio file
Choose separation model type
Configure processing parameters
Select input/output devices (for realtime)
Start the separation process
Process separated tracks with RVC
Apply post-processing effects
Export final audio in desired format

🎮 Realtime Voice Changer

🎤 Realtime Features

The advanced realtime voice changer offers:

Audio Device Management:

Input Device: Microphone or audio interface selection
Output Device: Virtual cable or speaker configuration
Monitor Device: Separate monitoring path (optional)

Processing Controls:

Input/Output Gain: Independent volume controls (0-200%)
ASIO Channel Selection: Specific channel routing (-1 to 16)
WASAPI Exclusive Mode: Lower latency on Windows
VAD Sensitivity: Voice Activity Detection (0-5)

Voice Conversion Settings:

Pitch Control: Range from -24 to +24 semitones
Autotune: Soft auto-tuning with adjustable strength
Proposed Pitch: Automatic pitch adjustment for voice range
Speaker ID: Multi-speaker model selection

Performance Tuning:

Chunk Size: Buffer size control (2.7ms - 2730.7ms)
Crossfade Overlap: Audio transition smoothing (0.05s - 0.2s)
Extra Conversion: Context buffer (0.1s - 5.0s)
Silence Threshold: Noise floor detection (-90dB to -60dB)

⚙️ Settings & Configuration

🎨 Theme Configuration

Theme Mode: Light or dark mode selection
Primary Color: 9 color options (red, orange, yellow, green, blue, purple, pink, slate, gray)
Font Size: Small, medium, or large text options

⚡ Performance Settings

Max Threads: 1-16 thread configuration
Memory Optimization: Automatic memory management
GPU Acceleration: Enable/disable hardware acceleration

🔔 Notification Preferences

Completion Notifications: Success/failure alerts
Error Notifications: Issue reporting
Sound Effects: Audio feedback for events

💾 File Management

Auto Cleanup: Automatic temporary file removal
Cleanup Interval: Schedule (1-168 hours)
Backup System: Configuration and model preservation

🛡️ Terms of Use

🚫 Prohibited Uses

The converted voices must not be used for:

Harmful Content: Criticizing, attacking, or defaming individuals
Political/Religious Propaganda: Advocating or opposing political positions, religions, or ideologies
Inappropriate Content: Public display of strongly stimulating expressions without proper content warnings
Commercial Exploitation: Selling voice models, generated voice clips, or monetizing without proper licensing
Identity Fraud: Malicious impersonation of original voice owners or fraudulent activities
Deceptive Practices: Identity theft, deceptive calls, or misleading communications

📋 Acceptable Uses

Personal Entertainment: Non-commercial creative projects
Artistic Expression: Music, comedy, and entertainment applications
Educational Purposes: Academic research and learning
Accessibility: Tools for those with speech difficulties

⚠️ Disclaimer

📝 Liability

The author is not liable for any direct, indirect, consequential, incidental, or special damages arising from the use, misuse, or inability to use this software.

🔒 Security

Keep your voice models secure
Do not share sensitive personal voice data
Use appropriate content filters
Be responsible with generated content

🛡️ Ethical Use

Respect the rights of voice owners
Obtain proper permissions when required
Follow local laws and regulations
Use the technology ethically and responsibly

🙏 Credits

🏗️ Project Foundation

Applio: Original project foundation and core RVC implementation
RVC Project: Core voice conversion technology

🎨 UI Base

Shirou's RVC AI Cover Maker UI: Initial project structure

👤 Current Maintainer

ArkanDash: Project owner and lead developer

💡 Contributions Welcome

This is an open-source project. Contributions, bug reports, and feature suggestions are welcome through GitHub issues and pull requests.

🚀 Ready to Transform Your Voice?

Get started today with Advanced RVC Inference V3.1 - The most powerful and user-friendly voice conversion platform available.

Name		Name	Last commit message	Last commit date
Latest commit History 395 Commits
.github		.github
assets		assets
programs		programs
tabs		tabs
unused		unused
.gitignore		.gitignore
Advanced-RVC-no-ui.ipynb		Advanced-RVC-no-ui.ipynb
Advanced-RVC.ipynb		Advanced-RVC.ipynb
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
TRANSLATION.md		TRANSLATION.md
app.py		app.py
core.py		core.py
create_translation.py		create_translation.py
install.bat		install.bat
models.py		models.py
remove_theme_i18n.py		remove_theme_i18n.py
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt

Uh oh!

License

ArkanDash/Advanced-RVC-Inference

Folders and files

Latest commit

History

Repository files navigation

🚀 Advanced RVC Inference V3.1

📖 Table of Contents

🎯 Overview

✨ Key Features

🎵 Advanced Voice Conversion

🎙️ Audio Processing Suite

🗣️ Text-to-Speech Integration

🎮 Realtime Voice Changer

🎨 Enhanced UI Experience

🌍 Global Accessibility

⚡ Performance Improvements

🚀 Performance Optimizations

🛠️ UI Enhancements

📊 Performance Metrics

🛠️ Installation

Prerequisites

Quick Installation

🚀 Getting Started

Running the Application

Method 1: Local Runtime

Method 2: Share Publicly (with --share flag)

Method 3: Using Google Colab

Initial Setup

🎨 UI Components

Full Inference Tab

Download Model Tab

TTS (Text-to-Speech) Tab

Settings Tab

🌍 Multi-language Support

🌐 Currently Supported Languages:

🔄 Changing Application Language:

📝 Contributing Translations:

🎬 YouTube Audio Processing

📥 Downloading Audio from YouTube

📋 Supported Sources

⚙️ Download Settings

🗣️ Text-to-Speech (TTS)

🎙️ Using TTS Features

🎨 TTS Customization Options

🎵 Audio Separation

🎵 Advanced Audio Processing

Vocal Separation Models:

Additional Separation Options:

🎛️ Processing Workflow

🎮 Realtime Voice Changer

🎤 Realtime Features

Audio Device Management:

Processing Controls:

Voice Conversion Settings:

Performance Tuning:

⚙️ Settings & Configuration

🎨 Theme Configuration

⚡ Performance Settings

🔔 Notification Preferences

💾 File Management

🛡️ Terms of Use

🚫 Prohibited Uses

📋 Acceptable Uses

⚠️ Disclaimer

📝 Liability

🔒 Security

🛡️ Ethical Use

🙏 Credits

🏗️ Project Foundation

🎨 UI Base

👤 Current Maintainer

💡 Contributions Welcome

🚀 Ready to Transform Your Voice?

About

Topics

Resources

License

Code of conduct