A Windows Forms application for OCR text extraction from images using EmguCV and Tesseract.
This project was originally created ten years ago and remained unchanged since then. In 2025 I decided to modernize it and remove the cumbersome manual EmguCV/OpenCV installation process. The new version uses NuGet packages instead, making setup much simpler.
- OCR text extraction from images
- Support for multiple image formats (JPG, PNG, GIF, BMP)
- Modern async/await patterns for responsive UI
- No manual environment setup required
- All dependencies managed through NuGet
- Windows 10 or later
- .NET 10 SDK
- Visual Studio 2022 or VS Code with C# Dev Kit
No manual environment configuration needed. The build process automatically downloads required Tesseract language data.
git clone https://github.com/brakmic/OpenCV.git
cd OpenCV
dotnet buildThe first build will download tessdata files automatically.
By default, only English language data is downloaded. To use other languages, download them manually using the provided script.
Example for German:
pwsh -ExecutionPolicy Bypass -File scripts/Download-Tessdata.ps1 -TessdataPath CharacterRecognition/tessdata -Language deuThen update the Language property in CharacterRecognition/appsettings.json to match your chosen language code.
Available language codes: eng, deu, fra, spa, ita, por, and many more.
dotnet run --project CharacterRecognition/CharacterRecognition.csprojOr open OpenCV.sln in Visual Studio and press F5.
- Start the application
- Load one of the sample images from the Assets folder (these are invoice documents)
- Click on Analyze and wait for the OCR task to complete
- View the extracted text in the result window
- Framework: .NET 10
- UI: Windows Forms
- OCR Engine: Tesseract 4 via EmguCV 4.12
- Architecture: x64
