voxmux

A Rust application that mixes multiple audio input devices for speaker output while independently running ASR (Automatic Speech Recognition) on each input and routing the recognized text to multiple destinations (e.g., Discord). Controllable via a TUI during runtime. Supports macOS and Linux.

Designed for scenarios such as routing multiple radio receivers (e.g., license-free radios, digital simplex radios) connected to the PC via 3.5mm audio jacks, where each receiver's audio is captured as a separate input device.

Data Flow

InputDevice1 ──┬──→ Mixer ──→ OutputDevice (Speaker)
InputDevice2 ──┤
InputDeviceN ──┘

InputDevice1 ──→ ASR Engine ──→ Text ──→ [Destination1(prefix), Destination2(prefix)]
InputDevice2 ──→ ASR Engine ──→ Text ──→ [Destination3(prefix)]

Each input device feeds into a shared mixer for speaker output. Simultaneously, each input is tapped and sent to an ASR engine. The recognized text is then routed to one or more configured destinations with optional per-destination prefixes.

Project Structure

The project is organized as a Cargo workspace:

voxmux/
├── Cargo.toml              # Workspace root
├── config.example.toml
├── crates/
│   ├── voxmux-core/        # Common traits, types, config, errors
│   ├── voxmux-audio/       # Audio capture, mixer, output (cpal + ringbuf)
│   ├── voxmux-engine/      # ASR plugin host + whisper integration
│   ├── voxmux-destination/ # Destination plugin host + discord integration
│   └── voxmux-tui/         # TUI (ratatui + crossterm)
└── src/main.rs             # Binary entry point

Crate Responsibilities

Crate	Description
`voxmux-core`	Shared traits, config schema (TOML), error types, and audio primitives (`AudioChunk`, `RecognitionResult`, etc.)
`voxmux-audio`	Device enumeration, audio capture via cpal, lock-free SPSC ring buffers (ringbuf), N-to-1 mixer, and speaker output
`voxmux-engine`	`AsrEngine` trait, plugin registry, and whisper-rs integration (feature-gated)
`voxmux-destination`	`Destination` trait, plugin registry, and Discord integration via serenity (feature-gated)
`voxmux-tui`	Terminal UI with ratatui + crossterm — dashboard, input/output controls, and log viewer

Core Traits

AsrEngine

#[async_trait]
pub trait AsrEngine: Send + Sync {
    fn name(&self) -> &str;
    async fn initialize(&mut self, config: toml::Value) -> Result<(), AsrError>;
    async fn feed_audio(&self, chunk: AudioChunk) -> Result<(), AsrError>;
    fn set_result_sender(&mut self, sender: mpsc::UnboundedSender<RecognitionResult>);
    async fn shutdown(&self) -> Result<(), AsrError>;
}

Destination

#[async_trait]
pub trait Destination: Send + Sync {
    fn name(&self) -> &str;
    async fn initialize(&mut self, config: toml::Value) -> Result<(), DestinationError>;
    async fn send_text(&self, text: &str, metadata: &TextMetadata) -> Result<(), DestinationError>;
    fn is_healthy(&self) -> bool;
    async fn shutdown(&self) -> Result<(), DestinationError>;
}

Plugin System

Phase 1: Compile-time registration via PluginRegistry with feature flags
Future: Dynamic loading via libloading (feature-gated)

Audio Pipeline

cpal input callback → SPSC ring buffer (lock-free, per device)
                          ↓
Mixer thread: read all input ring buffers → apply gain/mute → sum → output ring buffer
                          ↓
cpal output callback ← output ring buffer

CaptureNode → mpsc channel → ASR task (parallel tap, does not block the mixer)

cpal handles cross-platform audio I/O
ringbuf provides lock-free SPSC ring buffers between the real-time audio callbacks and processing threads
Volume and mute are controlled via atomics for lock-free, real-time-safe adjustment

Configuration

Configuration is defined in TOML. Environment variables can be interpolated with ${VAR_NAME} syntax.

[general]
log_level = "info"
sample_rate = 48000
buffer_size = 1024

[output]
device_name = "default"
play_mixed_input = true

[asr]
engine = "whisper"

[asr.whisper]
model_path = "./models/ggml-base.bin"
language = "ja"

[[input]]
id = "mic_main"
device_name = "MacBook Pro Microphone"
enabled = true
volume = 1.0
muted = false

[[input.destinations]]
plugin = "discord"
prefix = "[Main] "
channel_id = 123456789

[destinations.discord]
token = "${DISCORD_TOKEN}"
guild_id = 987654321

TUI

The TUI provides four tabs:

Tab	Contents
Dashboard	Overall status, VU meters, latest recognized text
Inputs	Per-device volume, mute, and enable controls
Outputs	Speaker output settings, play-mixed-input toggle
Logs	Scrollable tracing log viewer

Communication between the TUI and the router:

TUI → Router: UiCommand sent via mpsc channel (volume changes, mute toggles, etc.)
Router → TUI: RouterState broadcast via watch::Sender for real-time state synchronization

Dependencies

Crate	Purpose
cpal	Cross-platform audio I/O
ringbuf	Lock-free SPSC ring buffer
ratatui + crossterm	Terminal UI
tokio	Async runtime
serde + toml	Configuration
thiserror / anyhow	Error handling
tracing	Logging
whisper-rs	Whisper ASR engine (feature-gated)
serenity	Discord bot (feature-gated)
async-trait	Async trait support
notify	Filesystem watcher for config hot-reload
clap	CLI argument parsing

Roadmap

Phase 1: Foundation ✅

Cargo workspace and crate scaffolding
Core types: config schema, error types, audio primitives
Single input → speaker output passthrough

Phase 2: Multi-Input Mixing ✅

N-to-1 mixer with gain/mute per input
Concurrent multi-device capture
Real-time volume/mute control via atomics

Phase 3: ASR Integration ✅

AsrEngine trait and plugin registry
ASR tap on each capture node
whisper-rs engine integration

Phase 4: Destination Routing ✅

Destination trait and plugin registry
Config-driven routing table
File destination (for testing)
Discord destination (serenity)
Per-destination prefix support

Phase 5: TUI ✅

Event loop and application state
Four-tab layout (Dashboard / Inputs / Outputs / Logs)
VU meters and volume sliders
Bidirectional TUI ↔ Router communication

Phase 6: Polish ✅

Peak level tracking and health status
Input/output handles with enable/disable controls
ASR result forwarding to TUI
Config hot-reload via filesystem watcher

License

This project is licensed under the GNU General Public License v3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
crates		crates
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
config.example.toml		config.example.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voxmux

Data Flow

Project Structure

Crate Responsibilities

Core Traits

AsrEngine

Destination

Plugin System

Audio Pipeline

Configuration

TUI

Dependencies

Roadmap

Phase 1: Foundation ✅

Phase 2: Multi-Input Mixing ✅

Phase 3: ASR Integration ✅

Phase 4: Destination Routing ✅

Phase 5: TUI ✅

Phase 6: Polish ✅

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

voxmux

Data Flow

Project Structure

Crate Responsibilities

Core Traits

AsrEngine

Destination

Plugin System

Audio Pipeline

Configuration

TUI

Dependencies

Roadmap

Phase 1: Foundation ✅

Phase 2: Multi-Input Mixing ✅

Phase 3: ASR Integration ✅

Phase 4: Destination Routing ✅

Phase 5: TUI ✅

Phase 6: Polish ✅

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages