🎥 YouTube Transcript Processor

Transform YouTube videos into language learning materials by extracting transcripts and adapting them to different proficiency levels

✨ Features

📝 Automatic Transcript Extraction - Fetch transcripts from YouTube videos
🌏 Multi-language Support - Specialized for Cantonese (粵語) transcripts
🧠 AI-Powered Processing - Transform content to match different language proficiency levels
⚡ Smart Text Chunking - Intelligently split content based on token limits
📊 Token Counting - Precise token management using tiktoken
💾 File Output - Save processed results to text files
🎧 Podcast Integration - Compatible with ElevenReader to convert YouTube videos into English podcasts

🚀 Quick Start

Basic Usage

from main import process_youtube

# Process a YouTube video
video_url = "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
results = process_youtube(video_url, level="b1", max_tokens=4000)

# Results are automatically saved to text files

Command Line Usage

python main.py

📋 How It Works

🔗 URL Parsing - Extracts video ID from YouTube URLs
📜 Transcript Retrieval - Fetches Cantonese transcripts using YouTube Transcript API
✂️ Smart Chunking - Splits text into manageable chunks while preserving sentence integrity
🤖 AI Processing - Sends chunks to AI model for language level adaptation
💾 File Export - Saves processed content to organized text files

🛠️ Configuration

Language Levels

a1 - Beginner
a2 - Elementary
b1 - Intermediate
b2 - Upper Intermediate
c1 - Advanced
c2 - Proficient

Token Limits

Default: 4000 tokens per chunk

Adjustable based on your AI model's context window
Automatically handles sentences that exceed token limits

📁 Project Structure

youtube-transcript-processor/
├── main.py              # Main processing script
├── robot.py            # AI model interface
├── text/               # Output directory for processed files
├── requirements.txt    # Python dependencies
└── README.md          # This file

🔧 API Reference

`process_youtube(link, level, max_tokens=4000, is_chinese=True)`

Parameters:

link (str): YouTube video URL
level (str): Target language proficiency level
max_tokens (int): Maximum tokens per chunk
is_chinese (bool): Enable Chinese text processing

Returns:

List of processed text chunks

`get_youtube_transcript(video_url)`

Parameters:

video_url (str): YouTube video URL

Returns:

Full transcript text or None if error

🎯 Use Cases

Language Learning - Adapt YouTube content to your proficiency level
Content Creation - Generate educational materials from videos
Research - Process video content for analysis
Accessibility - Create readable transcripts from video content
🎧 Podcast Creation - Use with ElevenReader to transform YouTube videos into English podcasts for on-the-go learning

🔍 Example Output

Input: Complex Cantonese YouTube video
Output: Simplified text adapted to B1 level with proper sentence structure and vocabulary

Original: 今日我哋要講嘅係一個好複雜嘅概念...
Processed: Today what we're going to talk about is a very complex concept...

🤝 Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

YouTube Transcript API for transcript extraction
tiktoken for accurate token counting
Gemini for AI processing capabilities

📞 Support

If you encounter any issues or have questions:

Open an issue on GitHub
Check the Wiki for detailed documentation
Join our Discussions for community support

🔄 Integration with ElevenReader

Transform your processed transcripts into engaging audio content:

Process YouTube Video - Extract and adapt transcript using this tool
Export Text File - Save the processed content to a text file
Upload to ElevenReader - Visit ElevenReader.io and upload your text file
Generate Podcast - Convert your adapted transcript into an English podcast
Listen & Learn - Enjoy your personalized audio content on any device

Perfect Workflow:

YouTube Video → Transcript Extraction → AI Processing → Text File → ElevenReader → English Podcast

This integration allows you to:

Turn any YouTube video into an English learning transcript
Create audio content at your desired proficiency level (use with ElevenReader)
Learn through multiple modalities (reading + listening)

Made with ❤️ for language learners worldwide

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎥 YouTube Transcript Processor

✨ Features

🚀 Quick Start

Basic Usage

Command Line Usage

📋 How It Works

🛠️ Configuration

Language Levels

Token Limits

📁 Project Structure

🔧 API Reference

`process_youtube(link, level, max_tokens=4000, is_chinese=True)`

`get_youtube_transcript(video_url)`

🎯 Use Cases

🔍 Example Output

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

🔄 Integration with ElevenReader

FilesExpand file tree

readme.md

Latest commit

History

readme.md

File metadata and controls

🎥 YouTube Transcript Processor

✨ Features

🚀 Quick Start

Basic Usage

Command Line Usage

📋 How It Works

🛠️ Configuration

Language Levels

Token Limits

📁 Project Structure

🔧 API Reference

process_youtube(link, level, max_tokens=4000, is_chinese=True)

get_youtube_transcript(video_url)

🎯 Use Cases

🔍 Example Output

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

🔄 Integration with ElevenReader

`process_youtube(link, level, max_tokens=4000, is_chinese=True)`

`get_youtube_transcript(video_url)`