A Rust application that mixes multiple audio input devices for speaker output while independently running ASR (Automatic Speech Recognition) on each input and routing the recognized text to multiple destinations (e.g., Discord). Controllable via a TUI during runtime. Supports macOS and Linux.
Designed for scenarios such as routing multiple radio receivers (e.g., license-free radios, digital simplex radios) connected to the PC via 3.5mm audio jacks, where each receiver's audio is captured as a separate input device.
InputDevice1 ──┬──→ Mixer ──→ OutputDevice (Speaker)
InputDevice2 ──┤
InputDeviceN ──┘
InputDevice1 ──→ ASR Engine ──→ Text ──→ [Destination1(prefix), Destination2(prefix)]
InputDevice2 ──→ ASR Engine ──→ Text ──→ [Destination3(prefix)]
Each input device feeds into a shared mixer for speaker output. Simultaneously, each input is tapped and sent to an ASR engine. The recognized text is then routed to one or more configured destinations with optional per-destination prefixes.
The project is organized as a Cargo workspace:
voxmux/
├── Cargo.toml # Workspace root
├── config.example.toml
├── crates/
│ ├── voxmux-core/ # Common traits, types, config, errors
│ ├── voxmux-audio/ # Audio capture, mixer, output (cpal + ringbuf)
│ ├── voxmux-engine/ # ASR plugin host + whisper integration
│ ├── voxmux-destination/ # Destination plugin host + discord integration
│ └── voxmux-tui/ # TUI (ratatui + crossterm)
└── src/main.rs # Binary entry point
| Crate | Description |
|---|---|
voxmux-core |
Shared traits, config schema (TOML), error types, and audio primitives (AudioChunk, RecognitionResult, etc.) |
voxmux-audio |
Device enumeration, audio capture via cpal, lock-free SPSC ring buffers (ringbuf), N-to-1 mixer, and speaker output |
voxmux-engine |
AsrEngine trait, plugin registry, and whisper-rs integration (feature-gated) |
voxmux-destination |
Destination trait, plugin registry, and Discord integration via serenity (feature-gated) |
voxmux-tui |
Terminal UI with ratatui + crossterm — dashboard, input/output controls, and log viewer |
#[async_trait]
pub trait AsrEngine: Send + Sync {
fn name(&self) -> &str;
async fn initialize(&mut self, config: toml::Value) -> Result<(), AsrError>;
async fn feed_audio(&self, chunk: AudioChunk) -> Result<(), AsrError>;
fn set_result_sender(&mut self, sender: mpsc::UnboundedSender<RecognitionResult>);
async fn shutdown(&self) -> Result<(), AsrError>;
}#[async_trait]
pub trait Destination: Send + Sync {
fn name(&self) -> &str;
async fn initialize(&mut self, config: toml::Value) -> Result<(), DestinationError>;
async fn send_text(&self, text: &str, metadata: &TextMetadata) -> Result<(), DestinationError>;
fn is_healthy(&self) -> bool;
async fn shutdown(&self) -> Result<(), DestinationError>;
}- Phase 1: Compile-time registration via
PluginRegistrywith feature flags - Future: Dynamic loading via
libloading(feature-gated)
cpal input callback → SPSC ring buffer (lock-free, per device)
↓
Mixer thread: read all input ring buffers → apply gain/mute → sum → output ring buffer
↓
cpal output callback ← output ring buffer
CaptureNode → mpsc channel → ASR task (parallel tap, does not block the mixer)
- cpal handles cross-platform audio I/O
- ringbuf provides lock-free SPSC ring buffers between the real-time audio callbacks and processing threads
- Volume and mute are controlled via atomics for lock-free, real-time-safe adjustment
Configuration is defined in TOML. Environment variables can be interpolated with ${VAR_NAME} syntax.
[general]
log_level = "info"
sample_rate = 48000
buffer_size = 1024
[output]
device_name = "default"
play_mixed_input = true
[asr]
engine = "whisper"
[asr.whisper]
model_path = "./models/ggml-base.bin"
language = "ja"
[[input]]
id = "mic_main"
device_name = "MacBook Pro Microphone"
enabled = true
volume = 1.0
muted = false
[[input.destinations]]
plugin = "discord"
prefix = "[Main] "
channel_id = 123456789
[destinations.discord]
token = "${DISCORD_TOKEN}"
guild_id = 987654321The TUI provides four tabs:
| Tab | Contents |
|---|---|
| Dashboard | Overall status, VU meters, latest recognized text |
| Inputs | Per-device volume, mute, and enable controls |
| Outputs | Speaker output settings, play-mixed-input toggle |
| Logs | Scrollable tracing log viewer |
Communication between the TUI and the router:
- TUI → Router:
UiCommandsent via mpsc channel (volume changes, mute toggles, etc.) - Router → TUI:
RouterStatebroadcast viawatch::Senderfor real-time state synchronization
| Crate | Purpose |
|---|---|
| cpal | Cross-platform audio I/O |
| ringbuf | Lock-free SPSC ring buffer |
| ratatui + crossterm | Terminal UI |
| tokio | Async runtime |
| serde + toml | Configuration |
| thiserror / anyhow | Error handling |
| tracing | Logging |
| whisper-rs | Whisper ASR engine (feature-gated) |
| serenity | Discord bot (feature-gated) |
| async-trait | Async trait support |
| notify | Filesystem watcher for config hot-reload |
| clap | CLI argument parsing |
- Cargo workspace and crate scaffolding
- Core types: config schema, error types, audio primitives
- Single input → speaker output passthrough
- N-to-1 mixer with gain/mute per input
- Concurrent multi-device capture
- Real-time volume/mute control via atomics
AsrEnginetrait and plugin registry- ASR tap on each capture node
- whisper-rs engine integration
Destinationtrait and plugin registry- Config-driven routing table
- File destination (for testing)
- Discord destination (serenity)
- Per-destination prefix support
- Event loop and application state
- Four-tab layout (Dashboard / Inputs / Outputs / Logs)
- VU meters and volume sliders
- Bidirectional TUI ↔ Router communication
- Peak level tracking and health status
- Input/output handles with enable/disable controls
- ASR result forwarding to TUI
- Config hot-reload via filesystem watcher
This project is licensed under the GNU General Public License v3.0.