Skip to content

OP-88/Verba.devops

Repository files navigation

๐ŸŽฏ Verba AI Transcription

Offline-First Audio Transcription with Speaker Diarization & AI Summarization

Verba Banner

MIT License Python 3.8+ React 18 FastAPI Whisper AI Vercel

๐Ÿ›ก๏ธ Privacy-First โ€ข ๐Ÿš€ Lightning-Fast โ€ข ๐Ÿค– AI-Powered โ€ข ๐Ÿ“ฑ Cross-Platform

Complete offline transcription with speaker identification, AI summaries, and export capabilities


๐ŸŽฏ Why Verba?

๐Ÿ”ฅ What Makes It Special

๐Ÿ›ก๏ธ 100% Privacy-First - All AI processing happens offline
๐ŸŽ™๏ธ Speaker Diarization - Identifies who said what
๐Ÿค– AI Summarization - Automatic key points & action items
๐Ÿ“„ Multi-Format Export - Markdown, PDF, JSON, SRT with metadata
โšก Enhanced VAD - Smart voice activity detection
โŒจ๏ธ Keyboard Shortcuts - Power user friendly
๐Ÿ“ฑ Desktop Apps - Tauri-powered native applications

๐Ÿš€ Perfect For

๐ŸŽ“ Students - Record lectures, meetings, interviews
๐Ÿ’ผ Professionals - Meeting notes, voice memos
๐ŸŽฌ Content Creators - Video subtitles, podcasts
โ™ฟ Accessibility - Voice-to-text for everyone
๐Ÿ”ฌ Researchers - Interview transcriptions
๐Ÿ“ Writers - Voice-to-draft your ideas


๐Ÿ› ๏ธ Cutting-Edge Tech Stack

Backend Powerhouse Frontend Excellence AI & Processing
FastAPI React OpenAI
Python TypeScript Whisper
SQLite Vite WebRTC

โšก Quick Start Guide

๐ŸŽฌ Get Running in 3 Steps!

# ๐Ÿ”ฅ Step 1: Clone the repo
git clone https://github.com/OP-88/Verba.devops.git
cd Verba.devops

# ๐Ÿš€ Step 2: Backend setup
cd backend
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

# โšก Step 3: Start backend server
uvicorn src.run_fastapi_audio_fixed:app --reload --host 0.0.0.0 --port 8000

# ๐ŸŽจ Step 4: Frontend setup (new terminal)
cd frontend
export VITE_API_URL=http://localhost:8000  # Windows: set VITE_API_URL=http://localhost:8000
npm install && npm run dev

๐ŸŽ‰ Visit http://localhost:8080 and start transcribing! ๐ŸŽ‰

๐ŸŒ Try the Live Demo โ€ข ๐Ÿ“ Read the Docs โ€ข ๐Ÿ› Report Issues


๐ŸŽ† Complete Feature Set

๐ŸŽ™๏ธ Audio Processing

  • โšก Real-Time Transcription - Live microphone recording with instant text conversion
  • ๐Ÿ“ File Upload Support - Process WAV, MP3, M4A, and more audio formats
  • ๐Ÿ”Š Smart VAD - Enhanced voice activity detection with Silero VAD
  • ๐ŸŽฏ Noise Reduction - Advanced audio preprocessing for clarity
  • ๐Ÿ“Š Audio Visualization - Real-time waveform and level monitoring

๐Ÿง‘โ€๐Ÿ’ผ Speaker Intelligence

  • ๐ŸŽ™๏ธ Speaker Diarization - Automatic "who said what" identification using pyannote.audio
  • ๐Ÿ“Š Speaker Statistics - Speaking time analysis and dominant speaker detection
  • ๐Ÿท๏ธ Smart Labeling - Automatic speaker assignment to transcript segments
  • ๐Ÿ”„ Segment Merging - Intelligent combining of short speech segments

๐Ÿค– AI-Powered Analysis

  • ๐Ÿ“ Auto-Summarization - T5-powered summaries with key points extraction
  • ๐ŸŽฏ Action Items - Automatic detection of tasks and follow-ups
  • ๐Ÿ“ˆ Sentiment Analysis - Meeting tone and mood detection
  • ๐Ÿ’ฌ Smart Chat - AI assistant for transcript queries (hybrid mode)

๐Ÿ“Š Export & Sharing

  • ๐Ÿ“„ Multiple Formats - Markdown, PDF, JSON, TXT, SRT with full metadata
  • โš™๏ธ Customizable Exports - Include/exclude metadata, speakers, summaries
  • ๐Ÿ“‹ One-Click Copy - Instant clipboard access with formatting
  • ๐Ÿ’พ Auto-Save - SQLite database with full history tracking

โŒจ๏ธ Keyboard Shortcuts

  • Ctrl+R - Start/Stop recording
  • Ctrl+P - Pause/Resume recording
  • Ctrl+C - Copy transcription
  • Ctrl+E - Edit transcription
  • Ctrl+S - Save/Export
  • Esc - Cancel current action

๐Ÿ“ฑ Cross-Platform

  • ๐ŸŒ Web App - Modern React interface with PWA support
  • ๐Ÿ–ฅ๏ธ Desktop Apps - Native Tauri applications for Windows, macOS, Linux
  • ๐Ÿ“ฑ Mobile Responsive - Touch-optimized interface for tablets and phones
  • โ˜๏ธ Cloud Deploy - One-click Vercel deployment ready

๐Ÿ—๏ธ Project Architecture

graph TB
    A[๐ŸŽค Audio Input] --> B[๐ŸŒŠ WebRTC Stream]
    B --> C[โšก FastAPI Backend]
    C --> D[๐Ÿค– Whisper AI]
    D --> E[๐Ÿ“ Transcription]
    E --> F[๐Ÿ’พ SQLite Storage]
    F --> G[๐Ÿ“ฑ React Frontend]
    G --> H[๐Ÿ‘ค Beautiful UI]
Loading

๐Ÿ“ Crystal Clear Structure

๐Ÿ  verba/
โ”œโ”€โ”€ ๐Ÿš€ backend/           # FastAPI powerhouse
โ”‚   โ”œโ”€โ”€ ๐ŸŽฏ main.py        # Server magic starts here
โ”‚   โ”œโ”€โ”€ ๐Ÿ—ƒ๏ธ models/        # Database schemas
โ”‚   โ”œโ”€โ”€ ๐Ÿ›ฃ๏ธ routes/        # API endpoints
โ”‚   โ”œโ”€โ”€ โš™๏ธ services/      # Whisper AI integration
โ”‚   โ””โ”€โ”€ ๐Ÿ“‹ requirements.txt
โ”œโ”€โ”€ ๐Ÿ’Ž frontend/          # React brilliance
โ”‚   โ”œโ”€โ”€ ๐ŸŽจ src/
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿงฉ components/  # Reusable UI magic
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ pages/      # Main app screens
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ”— services/   # API communication
โ”‚   โ”‚   โ””โ”€โ”€ ๐ŸŽฏ types/      # TypeScript definitions
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ฆ package.json
โ”‚   โ””โ”€โ”€ โšก vite.config.ts
โ””โ”€โ”€ ๐Ÿ“š docs/              # Everything you need to know

๐ŸŽฏ API Endpoints

๐ŸŒ RESTful API That Just Works

๐Ÿš€ Method ๐ŸŽฏ Endpoint ๐Ÿ’ก What It Does โœจ Magic
GET /health ๐Ÿ’š Server heartbeat Always alive
POST /transcribe ๐ŸŽค Transform audio โ†’ text AI-powered
GET /history ๐Ÿ“œ Your transcription story Full history
POST /history ๐Ÿ’พ Save your gems Instant storage
DELETE /history/{id} ๐Ÿ—‘๏ธ Clean up One-click delete
GET /export/{id} ๐Ÿ“ค Download magic Multiple formats

๐Ÿ”Œ WebSocket Superpowers

๐ŸŽฏ Endpoint ๐Ÿ’ซ Real-Time Magic
/ws/transcribe โšก Live transcription stream

๐ŸŽจ Supported Formats & Languages

๐ŸŽต Audio Formats

๐Ÿ“€ Input Support:
๐Ÿ”Š WAV โ€ข MP3 โ€ข M4A โ€ข FLAC
โšก Real-time: WebRTC streams
๐ŸŽฏ Optimal: 16kHz, 16-bit

๐Ÿค– AI Models:
โšก Whisper Tiny  โ†’ Lightning fast
๐ŸŽฏ Whisper Base  โ†’ Balanced magic
๐Ÿ”ฅ Whisper Large โ†’ Ultimate accuracy

๐ŸŒ Global Language Support

๐ŸŒ 90+ Languages Including:
๐Ÿ‡บ๐Ÿ‡ธ English     ๐Ÿ‡ช๐Ÿ‡ธ Spanish     ๐Ÿ‡ซ๐Ÿ‡ท French
๐Ÿ‡ฉ๐Ÿ‡ช German      ๐Ÿ‡ฎ๐Ÿ‡น Italian     ๐Ÿ‡ต๐Ÿ‡น Portuguese  
๐Ÿ‡ท๐Ÿ‡บ Russian     ๐Ÿ‡ฏ๐Ÿ‡ต Japanese    ๐Ÿ‡ฐ๐Ÿ‡ท Korean
๐Ÿ‡จ๐Ÿ‡ณ Chinese     ๐Ÿ‡ฆ๐Ÿ‡ช Arabic      ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi
๐Ÿ”„ Auto-detection magic built-in!

๐Ÿ’ช System Requirements

๐ŸŽฏ Minimum Specs

๐Ÿ’พ RAM: 4GB
๐Ÿ’ฟ Storage: 2GB free
โšก CPU: Dual-core
๐ŸŒ Browser: Chrome 80+ | Firefox 75+ | Safari 13+

๐Ÿš€ Recommended Power

๐Ÿ”ฅ RAM: 8GB+
๐Ÿ’ฟ Storage: 5GB free
โšก CPU: Quad-core+
๐ŸŽฎ GPU: CUDA-compatible (optional boost!)

๐Ÿ—บ๏ธ Development Roadmap

๐ŸŽฏ The Journey to Transcription Excellence

gantt
    title ๐Ÿš€ Verba Development Timeline
    dateFormat  YYYY-MM-DD
    section ๐Ÿ—๏ธ Foundation
    Backend API Core    :active, 2024-09-15, 7d
    Database Schema     :active, 2024-09-16, 5d
    Whisper Integration :2024-09-20, 4d
    section ๐ŸŽจ Frontend
    React UI Base       :2024-09-18, 6d
    WebRTC Recording    :2024-09-22, 5d
    Real-time Display   :2024-09-25, 4d
    section โœจ Polish
    Export Features     :2024-09-28, 3d
    UI/UX Enhancement   :2024-09-30, 5d
    Testing & Deploy    :2024-10-03, 4d
Loading

๐ŸŽฏ Feature Status

Phase Feature Status Timeline
๐Ÿ—๏ธ Core API ๐Ÿ”„ In Progress Week 1
๐Ÿค– Whisper AI โณ Planned Week 2
๐ŸŽจ React UI ๐Ÿ”„ In Progress Week 2
โšก Real-time โณ Planned Week 3
๐Ÿ’Ž Export โณ Planned Week 4

๐Ÿค Join the Revolution

๐ŸŒŸ We Need You!

Help us build the future of voice transcription!

๐ŸŽฏ How to Contribute

# ๐Ÿด Fork it
git clone https://github.com/YOUR-USERNAME/Verba.devops.git

# ๐ŸŒฑ Branch it  
git checkout -b feature/amazing-transcription-magic

# โœจ Code it
# ... your brilliant contributions ...

# ๐Ÿš€ Push it
git push origin feature/amazing-transcription-magic

# ๐ŸŽ‰ PR it - Open a Pull Request!

๐Ÿ’ก Contribution Ideas

๐ŸŽจ Frontend Magic

  • UI/UX improvements
  • New themes & designs
  • Mobile responsiveness
  • Accessibility features

โšก Backend Power

  • API optimizations
  • New endpoints
  • Database improvements
  • Performance tuning

๐Ÿค– AI Enhancement

  • Model optimizations
  • Language support
  • Accuracy improvements
  • Processing speed

๐Ÿ† Recognition Wall

๐ŸŒŸ Hall of Fame ๐ŸŒŸ

Coming soon - your name could be here!

Be the first to contribute and earn your place in Verba history! ๐Ÿš€


๐Ÿ› Known Issues & Solutions

๐Ÿ”ง We're Transparent About Everything

๐Ÿ› Issue ๐Ÿ’ก Status ๐ŸŽฏ Solution
Repository URL verification ๐Ÿ”„ Working Testing clone process
Development environment โšก Priority Automated setup script
Dependency management ๐Ÿ”„ Active Version compatibility check

๐Ÿ“ž Get Help & Support

๐Ÿ’ฌ We're Here for You!

๐Ÿ†˜ Need Help?

  1. ๐Ÿ“š Check Documentation - docs/ folder
  2. ๐Ÿ” Search Issues - GitHub Issues tab
  3. ๐Ÿ’ฌ Ask Questions - Create new issue
  4. ๐Ÿ› Report Bugs - Detailed bug reports

๐ŸŽฏ Quick Links


๐Ÿ“„ License

๐Ÿ“œ MIT License - Freedom to Innovate

This project is licensed under the MIT License - see the LICENSE file for details.

๐ŸŽ‰ Free to use, modify, and distribute! ๐ŸŽ‰


๐ŸŒŸ Star the Repo โ€ข Share the Love โ€ข Build the Future ๐ŸŒŸ

GitHub stars GitHub forks GitHub watchers


๐Ÿ’ Built with โค๏ธ for Developers by Developers

Transforming the way we interact with audio, one transcription at a time

๐Ÿš€ Ready to revolutionize transcription? Let's build something amazing together! ๐Ÿš€


Made with ๐Ÿ”ฅ passion and โšก cutting-edge technology

About

Verba

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •