transcript-pl is a Node.js application designed to process document files (such as PDFs) by converting them into images and extracting text using OCR technologies (Tesseract and Google Vision). It supports AI-based transcription and can be configured for multiple languages.
- PDF to image conversion
- OCR with Tesseract and Google Vision
- AI-powered transcription (OpenAI GPT)
- Configurable via YAML file
- CLI interface for flexible usage
git clone <repository-url>
cd transcript-plnpm installnpm run setup:tessdataThis script downloads the required .traineddata files for Tesseract OCR into resources/tessdata.
Edit the configuration file at src/config/config.yaml to set your API keys, languages, and other options.
- For Google Vision, set the path to your API key JSON file.
- For OpenAI, set your API key and model.
Run the application from the command line:
node src/app.js --document <path-to-document> [options]| Argument | Alias | Type | Description | Required |
|---|---|---|---|---|
| --document | -d | string | Path to the document file to process | Yes |
| --pages | -p | string | List of pages to process via AI and image generation (e.g., 1,3,5) |
No |
| --lang | -l | string | Source language code for the document (e.g., pl, en, fr). Used for all OCR engines. |
No |
| --target-langs | -t | string | Comma-separated list of target languages for AI translation (e.g., fr,en,de) |
No |
| --example | -e | string | Path to an example file to improve AI transcription | No |
| --docType | -D | string | Type of document (e.g., mémoire historique, acte de naissance). Used for AI prompt. |
Yes |
Example:
node src/app.js --document input/sample.pdf --pages 1,2,3 --lang pl --target-langs fr,en --docType "mémoire historique"All main settings are in src/config/config.yaml:
- lang:
source: language code of the document (used for OCR, required)target: array of language codes for AI translation (optional, can be empty)
- docType: Not in YAML! Always provide via CLI.
- ai: Enable/disable AI, set OpenAI API key, model, etc.
- google_vision: Enable/disable, set API key path.
- tesseract: Enable/disable.
- pdf_to_image: Set DPI for image conversion.
- The PDF is converted to images, and each page is processed by all enabled OCR engines (Tesseract, Google Vision).
- For each page, the AI (OpenAI GPT) improves the text in the source language using all available OCR transcriptions.
- If target languages are specified, the improved text is then translated by the AI into each target language.
- Result:
- If no target language is set, you get only the improved text in the source language.
- If target languages are set, you get both the improved source text and AI translations for each target language.
npm run setup:tessdata— Downloads Tesseract.traineddatafiles for supported languages.
Note: Make sure you have the required API keys for Google Vision and OpenAI before running the application.