feat: Add OpenAI Whisper transcription and Gemini translation support by danyuchn · Pull Request #3 · op7418/Youtube-clipper-skill

danyuchn · 2026-01-26T03:08:50Z

🎯 Motivation

Many YouTube videos lack existing subtitles
Claude API costs can be prohibitive for batch processing
Users need more flexible API options

✨ New Features

1. Auto-transcription with OpenAI Whisper API

Handles videos without existing subtitles
Supports long audio (no token limits)
Cost: ~$0.006/minute
Tested: 73-minute video → 209 seconds processing, 2427 segments

2. Gemini API integration for translation

93% cost reduction vs Claude API
Gemini 2.5 Flash Lite for batch translation (30 items/batch)
Gemini 2.5 Flash for content generation
Maintains translation quality

3. YouTube HTTP 403 bypass

Uses iOS/Android client parameters
Documented in references/yt-dlp-guide.md
Tested successfully on multiple videos

📊 Tested On

Video: 73-minute Chinese GMAT lecture (no existing subtitles)
Chapters: 18 processed
Success rate: 100%
Total cost: $0.74-0.89 (vs $6.30 with Claude API only)
Processing time: 25-30 minutes

Cost Comparison (18 chapters)

API	Translation	Content	Total
Claude API	~$2.70	~$3.60	~$6.30
Gemini API	~$0.15	~$0.30	~$0.45
Savings	94%	92%	93%

🔧 Technical Details

New Scripts

scripts/transcribe_with_openai.py - Whisper API transcription
scripts/translate_with_gemini.py - Gemini batch translation (30 items/batch)
scripts/merge_bilingual_from_json.py - JSON to SRT format conversion

Updated Documentation

TECHNICAL_NOTES.md: Added sections 11-15 for new technical issues
- Section 11: YouTube HTTP 403 Forbidden
- Section 12: Whisper API transcription
- Section 13: Gemini batch translation optimization
- Section 14: Content generation anti-truncation
- Section 15: JSON → SRT format conversion
FIXES_AND_IMPROVEMENTS.md: Added 2026-01-25 version with complete test results
README.md & README.zh-CN.md: Added API keys configuration section
references/yt-dlp-guide.md: Added HTTP 403 solution
.env.example: Added OPENAI_API_KEY and GEMINI_API_KEY

API Keys Required

# OpenAI API Key (for Whisper transcription)
OPENAI_API_KEY=sk-proj-...

# Gemini API Key (for translation and content generation)
GEMINI_API_KEY=AIza...

📋 Breaking Changes

None - This PR only adds new optional features:

Original Claude API translation support is retained
All new features are opt-in via API keys
Existing workflows continue to work unchanged

🔍 Key Implementation Details

Whisper Transcription

No token limits for long videos (unlike Gemini 2.0 Flash: 1M token limit)
Automatic language detection
High-quality timestamps in VTT format
Handles 73-minute video without issues

Gemini Translation

Batch size optimized from 20 → 30 items
Temperature: 0.3 for consistency
JSON output format for easy validation
95% reduction in API calls vs single-item requests

Anti-truncation Mechanism

max_output_tokens: Increased from 3000 → 8000
3-retry system with completeness validation
Checks for all required sections (小红书/抖音/微信公众号)
100% success rate after optimization

🧪 Testing

✅ Fully tested on production workload
✅ 73-minute video, 18 chapters, 100% success
✅ All scripts validated with real API calls
✅ Documentation verified and cross-referenced

💡 Future Improvements (Not in this PR)

Support for more Gemini models
Parallel chapter processing
Auto-retry on API failures

Note: This PR represents real-world usage and optimization based on processing a complete 73-minute video with 18 chapters. All features have been tested and validated in production scenarios.

## New Features - **Auto-transcription**: OpenAI Whisper API for videos without subtitles - Supports long audio (no token limits) - Cost: ~$0.006/minute - Tested on 73-minute video (2427 segments, 209 seconds) - **Gemini API integration**: 93% cost reduction vs Claude API - Gemini 2.5 Flash Lite for translation (batch size: 30) - Gemini 2.5 Flash for content generation - Cost: ~$0.45 vs ~$6.30 for 18 chapters - **YouTube HTTP 403 bypass**: iOS/Android client parameters - Documented in references/yt-dlp-guide.md ## New Scripts - scripts/transcribe_with_openai.py: Whisper API transcription - scripts/translate_with_gemini.py: Gemini batch translation - scripts/merge_bilingual_from_json.py: JSON to SRT conversion ## Updated Documentation - TECHNICAL_NOTES.md: Added sections 11-15 for new technical issues - FIXES_AND_IMPROVEMENTS.md: Added 2026-01-25 version with test results - README.md & README.zh-CN.md: Added API keys configuration - references/yt-dlp-guide.md: Added HTTP 403 solution - .env.example: Added OPENAI_API_KEY and GEMINI_API_KEY ## Test Results - 73-minute Chinese video (no subtitles) - 18 chapters processed - 100% success rate - Total cost: $0.74-0.89 - Processing time: 25-30 minutes ## Backward Compatibility - Original Claude API support retained - All new features are optional - No breaking changes Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add OpenAI Whisper transcription and Gemini translation support#3

feat: Add OpenAI Whisper transcription and Gemini translation support#3
danyuchn wants to merge 1 commit intoop7418:mainfrom
danyuchn:feature/multi-api-support

danyuchn commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danyuchn commented Jan 26, 2026

🎯 Motivation

✨ New Features

1. Auto-transcription with OpenAI Whisper API

2. Gemini API integration for translation

3. YouTube HTTP 403 bypass

📊 Tested On

Cost Comparison (18 chapters)

🔧 Technical Details

New Scripts

Updated Documentation

API Keys Required

📋 Breaking Changes

🔍 Key Implementation Details

Whisper Transcription

Gemini Translation

Anti-truncation Mechanism

🧪 Testing

💡 Future Improvements (Not in this PR)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant