A real-time speech transcription and collaborative editing application powered by OpenAI GPT-4o.
- Real-time Transcription: Stream audio directly to OpenAI's GPT-4o Realtime API for instant speech-to-text
- Collaborative Editing: Google Docs-style real-time collaborative text editing using Yjs and Tiptap
- AI-powered Rewriting: Rewrite and improve transcribed text using GPT-4o with customizable prompts
- Session Management: Create and join transcription sessions with shareable URLs
- Audio Recording Support: Process pre-recorded audio files for transcription
- Framework: Next.js 15 (App Router)
- Real-time Sync: Yjs + Hocuspocus + Tiptap
- AI: OpenAI API (GPT-4o Realtime / Transcribe models)
- State Management: Jotai
- Styling: Tailwind CSS v4
- Node.js 18+
- OpenAI API key with GPT-4o Realtime API access
# Clone the repository
git clone https://github.com/uehaj/CollaRecoX.git
cd CollaRecoX
# Install dependencies
npm install
# Configure environment variables
cp .env.example .env.local
# Edit .env.local and add your OpenAI API keyCreate .env.local with:
OPENAI_API_KEY=your_openai_api_key_here# Recommended: Use the development script (handles proxy and environment)
bin/dev.sh
# Or with options
bin/dev.sh -f # Force kill existing process on port 8888
bin/dev.sh -l # Enable log file output
bin/dev.sh -f -l # Both optionsnpm run build
npm run start- Main Application: http://localhost:8888/realtime
- Collaborative Editor: http://localhost:8888/editor/[sessionId]
- Start a Session: Create or join a transcription session from the main page
- Begin Transcription: Click "Start Recording" to stream audio to OpenAI
- Real-time Updates: Transcribed text appears instantly in the collaborative editor
- Collaborate: Share the session URL for others to view and edit in real-time
- AI Rewrite: Select text and use AI-powered rewriting with custom prompts
┌───────────────────────────────┐
│ Browser (Transcription Page) │
│ ┌─────────────────────────┐ │
│ │ Microphone Input │ │
│ │ Transcription Controls │ │
│ └─────────────────────────┘ │
└───────────────┬───────────────┘
│ WebSocket (Audio)
▼
┌──────────────────────────────────────────┐ ┌───────────────────────┐
│ Next.js Server │ │ OpenAI API │
│ │ │ │
│ ┌────────────────┐ ┌────────────────┐ │ │ ┌─────────────────┐ │
│ │ WebSocket │──▶│ Hocuspocus │ │◀─────▶│ │ Realtime API │ │
│ │ Proxy │ │ (Yjs Server) │ │ │ │(gpt-4o-transcribe) │
│ │ │◀──│ ▲ │ │ │ └─────────────────┘ │
│ └────────────────┘ └───────┼────────┘ │ │ │
│ │ │ │ ┌─────────────────┐ │
│ ┌────────────────┐ │ │◀─────▶│ │ gpt-4o-mini │ │
│ │ AI Rewrite │──────────▶│ │ │ │ (AI Rewrite) │ │
│ └───────▲────────┘ │ │ │ └─────────────────┘ │
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
└──────────┼────────────────────┼──────────┘ └───────────────────────┘
│ AI Rewrite │WebSocket
│ Request │(Yjs Sync)
│ ▼
┌──────────┴────────────────────────────────────────┐
│ Browser (Proofreading Page) × N │
│ ┌─────────────────────────────────────────────┐ │
│ │ Collaborative Editor (Tiptap) │ │
│ │ AI Rewrite Controls │ │
│ └─────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────┘
| Path | Description |
|---|---|
/realtime |
Main transcription control panel |
/editor/[sessionId] |
Collaborative editing session |
/recorder |
Batch audio processing mode |
MIT License - see LICENSE file for details.
Junji Uehara (@uehaj)