Transform any EPUB book into different English proficiency levels using AI! Perfect for language learners who want to enjoy their favorite books while improving their English skills.
- 🔄 Smart Text Processing: Converts EPUB files to different English proficiency levels (A1, A2, B1, B2, C1, C2)
- 🧠 AI-Powered Translation: Uses advanced AI models to maintain story coherence while adjusting language complexity
- 📖 EPUB Support: Handles standard EPUB format with proper HTML content extraction
- 🌐 Multi-language Support: Special handling for Chinese text with custom sentence tokenization
- ⚡ Token-Aware Chunking: Intelligently splits text while preserving sentence boundaries
- 📝 Clean Output: Generates processed text files ready for reading
pip install ebooklib beautifulsoup4 nltk tiktoken requests richfrom epub_transfer import process_epub_file
# Process an EPUB file to B1 level English
results = process_epub_file(
epub_path="your_book.epub",
level="b1",
max_tokens=4000,
is_chinese=False # Set to True for Chinese books
)
# Save the result
with open("output_book_b1.txt", "w") as f:
f.write("".join(results))- 📖 Text Extraction: Reads EPUB files and extracts clean text content
- ✂️ Smart Chunking: Splits text into manageable chunks while preserving sentence structure
- 🤖 AI Processing: Sends each chunk to an AI model for language level conversion
- 📝 Output Generation: Combines processed chunks into a coherent final text
| Level | Description | Target Audience |
|---|---|---|
| A1 | Beginner | Basic words and phrases |
| A2 | Elementary | Simple sentences and common topics |
| B1 | Intermediate | Clear standard language on familiar topics |
| B2 | Upper-Intermediate | Complex texts and abstract topics |
| C1 | Advanced | Flexible and effective language use |
| C2 | Proficient | Near-native fluency |
Adjust the max_tokens parameter based on your AI model's capabilities:
token_limit = 4000 # Standard limit
token_limit = 8000 # For more context-aware processingFor Chinese books, enable special sentence tokenization:
process_epub_file(
epub_path="chinese_book.epub",
level="b1",
is_chinese=True # Enables Chinese sentence splitting
)
## 📊 Token Counting
The tool uses `tiktoken` for accurate token counting, ensuring chunks stay within AI model limits:
```python
from epub_transfer import count_tokens
text = "Your sample text here"
token_count = count_tokens(text)
print(f"Text contains {token_count} tokens")
We welcome contributions! Here are some ways you can help:
- 🐛 Bug Reports: Found an issue? Let us know!
- 💡 Feature Requests: Have ideas for improvements?
- 🔧 Code Contributions: Submit pull requests
- 📚 Documentation: Help improve our docs
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with ebooklib for EPUB processing
- Uses NLTK for natural language processing
- Powered by tiktoken for token counting
- Enhanced with Beautiful Soup for HTML parsing
⭐ Star this repo if you find it helpful! ⭐
Transform your reading experience, one book at a time. 📖✨