CollaRecoX

A real-time speech transcription and collaborative editing application powered by OpenAI GPT-4o.

Features

Real-time Transcription: Stream audio directly to OpenAI's GPT-4o Realtime API for instant speech-to-text
Collaborative Editing: Google Docs-style real-time collaborative text editing using Yjs and Tiptap
AI-powered Rewriting: Rewrite and improve transcribed text using GPT-4o with customizable prompts
Session Management: Create and join transcription sessions with shareable URLs
Audio Recording Support: Process pre-recorded audio files for transcription

Tech Stack

Framework: Next.js 15 (App Router)
Real-time Sync: Yjs + Hocuspocus + Tiptap
AI: OpenAI API (GPT-4o Realtime / Transcribe models)
State Management: Jotai
Styling: Tailwind CSS v4

Prerequisites

Node.js 18+
OpenAI API key with GPT-4o Realtime API access

Installation

# Clone the repository
git clone https://github.com/uehaj/CollaRecoX.git
cd CollaRecoX

# Install dependencies
npm install

# Configure environment variables
cp .env.example .env.local
# Edit .env.local and add your OpenAI API key

Configuration

Create .env.local with:

OPENAI_API_KEY=your_openai_api_key_here

Usage

Development

# Recommended: Use the development script (handles proxy and environment)
bin/dev.sh

# Or with options
bin/dev.sh -f      # Force kill existing process on port 8888
bin/dev.sh -l      # Enable log file output
bin/dev.sh -f -l   # Both options

Production

npm run build
npm run start

Access

Main Application: http://localhost:8888/realtime
Collaborative Editor: http://localhost:8888/editor/[sessionId]

How It Works

Start a Session: Create or join a transcription session from the main page
Begin Transcription: Click "Start Recording" to stream audio to OpenAI
Real-time Updates: Transcribed text appears instantly in the collaborative editor
Collaborate: Share the session URL for others to view and edit in real-time
AI Rewrite: Select text and use AI-powered rewriting with custom prompts

Architecture

┌───────────────────────────────┐
│  Browser (Transcription Page) │
│  ┌─────────────────────────┐  │
│  │  Microphone Input       │  │
│  │  Transcription Controls │  │
│  └─────────────────────────┘  │
└───────────────┬───────────────┘
                │ WebSocket (Audio)
                ▼
┌──────────────────────────────────────────┐       ┌───────────────────────┐
│  Next.js Server                          │       │  OpenAI API           │
│                                          │       │                       │
│  ┌────────────────┐   ┌────────────────┐ │       │  ┌─────────────────┐  │
│  │  WebSocket     │──▶│  Hocuspocus    │ │◀─────▶│  │ Realtime API    │  │
│  │  Proxy         │   │  (Yjs Server)  │ │       │  │(gpt-4o-transcribe) │
│  │                │◀──│       ▲        │ │       │  └─────────────────┘  │
│  └────────────────┘   └───────┼────────┘ │       │                       │
│                               │          │       │  ┌─────────────────┐  │
│  ┌────────────────┐           │          │◀─────▶│  │ gpt-4o-mini     │  │
│  │  AI Rewrite    │──────────▶│          │       │  │ (AI Rewrite)    │  │
│  └───────▲────────┘           │          │       │  └─────────────────┘  │
│          │                    │          │       │                       │
│          │                    │          │       │                       │
│          │                    │          │       │                       │
└──────────┼────────────────────┼──────────┘       └───────────────────────┘
           │ AI Rewrite         │WebSocket
           │ Request            │(Yjs Sync)
           │                    ▼
┌──────────┴────────────────────────────────────────┐
│  Browser (Proofreading Page) × N                  │
│  ┌─────────────────────────────────────────────┐  │
│  │  Collaborative Editor (Tiptap)              │  │
│  │  AI Rewrite Controls                        │  │
│  └─────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────┘

Key Routes

Path	Description
`/realtime`	Main transcription control panel
`/editor/[sessionId]`	Collaborative editing session
`/recorder`	Batch audio processing mode

License

MIT License - see LICENSE file for details.

Author

Junji Uehara (@uehaj)

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
bin		bin
docs		docs
public		public
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
server.js		server.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CollaRecoX

Features

Tech Stack

Prerequisites

Installation

Configuration

Usage

Development

Production

Access

How It Works

Architecture

Key Routes

License

Author

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

uehaj/CollaRecoX

Folders and files

Latest commit

History

Repository files navigation

CollaRecoX

Features

Tech Stack

Prerequisites

Installation

Configuration

Usage

Development

Production

Access

How It Works

Architecture

Key Routes

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages