Skip to content

A Python script that extracts phone numbers from images using Tesseract OCR and Regex. Automatically organizes processed images into success and failed folders, and saves results to a CSV file.

Notifications You must be signed in to change notification settings

pythonicshariful/phone-number-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Phone Number Extractor from Images

This Python script uses Tesseract OCR and Regex to extract phone numbers from images.
It processes all images inside an images folder and saves results into a CSV file.

  • ✅ If phone numbers are found → image is moved to success/
  • ❌ If no phone number or error → image is moved to failed/
  • 📊 A numbers.csv file is generated with extracted numbers and source filenames.
  • 🔄 The main images folder will be emptied after processing.

📂 Folder Structure

project/
│── main.py
│── README.md
│── numbers.csv (generated automatically)
│── images/        # put your input images here
│── success/       # created automatically
│── failed/        # created automatically

⚙️ Installation

  1. Clone the repository:

    git clone https://github.com/pythonicshariful/phone-number-extractor.git
    cd phone-number-extractor
  2. Install dependencies:

    pip install -r requirements.txt
  3. Install Tesseract OCR:

    • Windows: Download here
    • Linux (Ubuntu/Debian):
      sudo apt install tesseract-ocr
    • macOS (Homebrew):
      brew install tesseract
  4. Update the path to tesseract.exe inside main.py if needed:

    pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

▶️ Usage

  1. Place your images inside the images/ folder.
    Supported formats: .png, .jpg, .jpeg

  2. Run the script:

    python main.py
  3. Results:

    • Extracted phone numbers → numbers.csv
    • Successfully processed images → success/
    • Failed images → failed/

🛠 Requirements

  • Python 3.8+
  • Tesseract OCR
  • Python libraries:
    pillow
    pytesseract
    tqdm

Install them with:

pip install pillow pytesseract tqdm

📜 License

MIT License

About

A Python script that extracts phone numbers from images using Tesseract OCR and Regex. Automatically organizes processed images into success and failed folders, and saves results to a CSV file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages