This Python script uses Tesseract OCR and Regex to extract phone numbers from images.
It processes all images inside an images folder and saves results into a CSV file.
- ✅ If phone numbers are found → image is moved to
success/ - ❌ If no phone number or error → image is moved to
failed/ - 📊 A
numbers.csvfile is generated with extracted numbers and source filenames. - 🔄 The main
imagesfolder will be emptied after processing.
project/
│── main.py
│── README.md
│── numbers.csv (generated automatically)
│── images/ # put your input images here
│── success/ # created automatically
│── failed/ # created automatically
-
Clone the repository:
git clone https://github.com/pythonicshariful/phone-number-extractor.git cd phone-number-extractor -
Install dependencies:
pip install -r requirements.txt
-
Install Tesseract OCR:
- Windows: Download here
- Linux (Ubuntu/Debian):
sudo apt install tesseract-ocr
- macOS (Homebrew):
brew install tesseract
-
Update the path to
tesseract.exeinsidemain.pyif needed:pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
-
Place your images inside the
images/folder.
Supported formats:.png,.jpg,.jpeg -
Run the script:
python main.py
-
Results:
- Extracted phone numbers →
numbers.csv - Successfully processed images →
success/ - Failed images →
failed/
- Extracted phone numbers →
- Python 3.8+
- Tesseract OCR
- Python libraries:
pillow pytesseract tqdm
Install them with:
pip install pillow pytesseract tqdmMIT License