A comprehensive ComfyUI custom node package for OCR (Optical Character Recognition) and advanced text rendering capabilities using PaddleOCR.
- Text Recognition: Extract text from images with high accuracy using PaddleOCR
- Layout Analysis: Analyze document structure and text positioning
- Bounding Box Detection: Get precise coordinates of detected text regions
- Multi-language Support: Supports various languages through PaddleOCR
- JSON Output: Structured OCR results with text content and positioning data
- JSON-based Text Configuration: Define text properties through JSON input including:
- Text content and positioning
- Font selection from included font library
- Font size and styling options
- Text color and stroke effects
- Text alignment and wrapping
- Rich Font Library: Includes multiple Chinese fonts (华文系列, 微软雅黑)
- Advanced Typography: Support for character-per-line control, leading, and text wrapping
- Stroke Effects: Customizable text outlines with color and width control
Performs OCR on input images and extracts text with positioning information.
Inputs:
image: Input image for OCR processing
Outputs:
Texts: Extracted text contentx_offsets,y_offsets: Text position coordinateswidths,heights: Text bounding box dimensionsimg_width,img_height: Original image dimensionsocr_results_json: Complete OCR results in JSON formatMask Image: Generated mask for detected text regionsResult Image: Annotated image with OCR results
Renders text onto images with extensive customization options.
Inputs:
background_image: Base image for text overlaytext: Text content to render (supports multiline)font_file: Font selection from available fontsalign: Text alignment optionschar_per_line: Characters per line for text wrappingleading: Line spacing controlfont_size: Text size (1-2500px)text_color: Text color specificationstroke_width: Outline thicknessstroke_color: Outline colorx_offset,y_offset: Text positioningtext_width,text_height: Text area dimensions
Converts bounding box coordinates to image masks.
Processes translation results and applies them to images.
The text rendering system supports JSON input for complex text layouts:
{
"text": "Your text content",
"font": "font_name.ttf",
"size": 72,
"color": "#FFFFFF",
"stroke_color": "#000000",
"stroke_width": 2,
"align": "center",
"x": 100,
"y": 200,
"width": 400,
"height": 100
}- Clone this repository into your ComfyUI custom_nodes directory:
cd ComfyUI/custom_nodes/
git clone https://github.com/your-repo/Comfyui-PaddleOCR.git- Install required dependencies:
pip install paddleocr pillow opencv-python numpy torch torchvision- Restart ComfyUI to load the new nodes.
The package includes a comprehensive font library located in nodes/font_dir/:
- Multiple Chinese fonts (华文中宋, 华文仿宋, 华文彩云, etc.)
- Microsoft YaHei (微软雅黑) regular and bold
- Support for TTF and OTF font formats
To add custom fonts:
- Place font files (.ttf or .otf) in the
nodes/font_dir/directory - Restart ComfyUI to refresh the font list
- Document Processing: Extract and analyze text from scanned documents
- Image Translation: OCR text extraction followed by translation overlay
- Content Creation: Add styled text overlays to images
- Data Extraction: Automated text extraction from images for data processing
- Multilingual Content: Handle text in various languages with appropriate fonts
- ComfyUI
- Python 3.8+
- PaddleOCR
- PIL (Pillow)
- OpenCV
- NumPy
- PyTorch
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.