Next-Generation Professional Music Video Karaoke Maker
High-performance, native desktop application for creating professional karaoke videos with AI-powered vocal separation, automatic transcription, and synchronized lyrics.
Welcome to the C++ Qt6 Rebirth of NC-KTV.
Originally written in Python, we have completely overhauled the NC-KTV engine using modern C++17 and Qt6 to achieve unparalleled performance, hardware-accelerated rendering, and a butter-smooth editing experience. This marks a massive leap in processing speed and UI responsiveness, enabling real-time waveform rendering, precise audio seeking, and seamless subtitle processing.
NC-KTV automates the entire karaoke video creation workflow:
- Import any media file natively via FFmpeg.
- Extract vocals cleanly using UVR-compatible ONNX models (MDX-Net) with a native C++ Mixed-Radix FFT DSP pipeline.
- Transcribe lyrics automatically (Whisper) or import industry standards.
- Sync lyrics with sub-millisecond precision using the new Hardware-Accelerated Timeline and one-click Set Start/End controls.
- Export to professional-grade formats (ASS, MP4, MKV) with GPU-accelerated encoding and custom resolution scaling.
- High-Performance Bridge: Driven by the authentic
audio-separatorPython library for 100% matching UVR quality. - Hardware Acceleration: Automatic target detection for CUDA (NVIDIA) via PyTorch. Local GPU libraries in
models/whisper/cudn12/are auto-bundled into the portable build. - Full Model Support: Supports all UVR models including MDX-Net, VR Architecture, and Roformer (
.onnxand.pth).
- Native Qt6 UI: Butter-smooth 60fps+ rendering of complex timeline data via
QPainterand Hardware Accel, featuring smart render-debouncing. - Interactive Waveforms: Zoom, scrub, and manipulate gigabytes of audio data instantaneously using pixel-bucketing compression without UI blocking.
- Precision Karaoke Builder Studio Mode: Brand-new fully vertical syllable tracking interface. Features cascading blocks locked to a Y-axis left-waveform display, a dedicated instant-update Lyrics Map sidebar for transcription tuning, fully-synchronized auto-scrolling, and millisecond-accurate "Play Segment" vocal isolation capabilities.
- One-Click Timing Sync: Dedicated "Set Start" and "Set End" buttons in the transport bar to instantly align lyrics to the current playhead.
- GPU-Accelerated Export: High-performance video rendering utilizing NVENC (NVIDIA), QSV (Intel), or AMF (AMD) with configurable resolution scaling (1080p to 360p).
- Dockable Workspace Panels: The Synchronization Queue and Properties panels are full
QDockWidgetinstances — tear off, float, and re-dock them anywhere for a completely custom workspace layout. - In-Editor AI Support: Kick off Whisper transcripts dynamically directly from the editor mode, complete with language override parameters and dimming modal overlays.
- Live Karaoke Preview: Configurable zero-latency ASS subtitle rendering overlaid onto the active video track with a smooth horizontal linear wipe effect per word.
- Whisper Powered: Uses OpenAI's Whisper via Python subprocess for robust, high-speed transcription.
- Word-Level Precision: Automatic word-timestamp generation for perfect syllable alignment.
- Targeted AI Control: Select Whisper models (base, small, medium, large, turbo) and specific ISO codes (en, id, ms, ja, ko) inside the UI to balance speed vs. accuracy.
- Tap-to-Sync Engine: Rebuilt event-driven synchronization for perfect rhythm matching.
- Auto-Romanization: Lightning-fast transliteration of global scripts (Korean/Japanese to Latin).
- Transcribe with Gemini: One-click button in the Source Lyrics tab that compresses the active audio to a small MP3 file and opens your custom Gemini Gems link in the browser. Simply upload the MP3, copy Gemini's output, and click Paste & Sync in the app.
- Smart Paste & Sync: Parses Gemini/AI transcription text (plain or LRC format with range timestamps like
[00:15.15 - 00:19.30]) and directly loads it into the Synchronization Queue. - Configurable URL: Paste your own Gemini Gem link directly in the UI so the app always opens the right transcription tool.
NC-KTV is now built using standard CMake and requires a modern C++17 compliant toolchain.
- MinGW-w64 (GCC 13.x): Part of the bundled Qt 6.8.2 toolchain.
- CMake 3.25+: Essential for project configuration.
- Qt 6.8.2: Core, Gui, Widgets, Multimedia, Network.
- Python 3.10+: For the AI bridge (Whisper/UVR).
NC-KTV now features a streamlined, high-performance build pipeline using MinGW and a custom PowerShell script that handles both C++ compilation and Python environment bundling.
# 1. Open PowerShell and navigate to the project root
# 2. Run the automated build script
# This script configures CMake, builds the C++ engine (Ninja),
# packages the Python AI bridge (PyInstaller), and assembles
# the portable directory with all necessary DLLs.
.\build_portable_release.ps1Once completed, the final portable application will be available in the root directory as ncktv.exe, with all dependencies (Qt, FFmpeg, ONNX, Python Bridge) properly staged.
For manual development/debugging:
- Open the project in VS Code or Qt Creator.
- Select the
windows-debugorwindows-releaseCMake preset. - Build using the standard CMake workflow (
Ctrl+Shift+Bin VS Code).
For a deep dive into the completely revamped C++ architecture, hardware-accelerated UI patterns, and the multithreaded audio pipeline, please see our dedicated Technical Documentation (DOCS.md).
We welcome contributions to the NC-KTV C++ engine!
- Please ensure PRs targeting core systems compile successfully across MSVC, GCC, and Clang.
- Run the included
GTestsuite viactestbefore opening a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.