Skip to content

TobyCyan/scrape-mei

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Anime Figurine Image Scraper

A desktop GUI application for automatically scraping and downloading product images from anime figurine company websites.

Features

  • Desktop GUI built with Tkinter
  • Multi-company support: Good Smile Company and Kotobukiya
  • URL domain validation: Ensures URLs match the selected company
  • Async image downloads with configurable concurrency (up to 5 parallel downloads)
  • Automatic retry logic for failed downloads (up to 3 attempts)
  • Real-time progress updates in the GUI
  • OOP architecture using Factory Pattern for easy extensibility
  • Automatic file organization with sanitized naming
  • Error handling for network issues, invalid URLs, and missing images

Architecture

The application follows Object-Oriented Programming principles with a modular design:

ScraperParser (Orchestrator)
    ↓
ScraperFactory (Factory Pattern)
    ↓
BaseScraper (Abstract Base Class)
    ├─ GoodSmileScraper
    └─ KotobukiyaScraper
    ↓
ImageDownloader (Async Downloads)

Core Components

  • ScraperParser: Main orchestrator that validates inputs, coordinates scraping, and manages downloads
  • ScraperFactory: Creates appropriate scraper instances based on company type
  • BaseScraper: Abstract base class defining the scraper interface
  • Company Scrapers: Implement company-specific scraping logic
  • ImageDownloader: Handles async image downloads with retry logic

Installation

Prerequisites

  • Python 3.11 or higher
  • pip (Python package manager)

Setup

  1. Clone or download this repository

  2. Install dependencies:

pip install -r requirements.txt

Testing

The project includes a comprehensive test suite using Python's unittest framework.

Running All Tests

Run all test suites with a single command:

python run_tests.py

This will automatically discover and run all tests in the src/tests/ directory.

Test Suites

  • TestBugFixes: Validates bug fixes (factory duplicates, aliases, DPI awareness)
  • TestURLValidation: Tests URL domain validation for each company
  • TestImageFiltering: Verifies filtering of social media and UI images

Running Individual Tests

Run a specific test file:

python src/tests/test_fixes.py
python src/tests/test_url_validation.py
python src/tests/test_image_filtering.py

Run with different verbosity:

python run_tests.py -v      # Verbose output
python run_tests.py -q      # Quiet output

Usage

Running the Application

python run.py

Using the GUI

  1. Enter Product URL: Paste the product page URL from the manufacturer's website
  2. Select Company: Choose from the dropdown (Good Smile or Kotobukiya)
    • The scraper will validate that the URL domain matches the selected company
    • Good Smile accepts: goodsmile.info, goodsmileus.com, goodsmilecompany.com
    • Kotobukiya accepts: kotobukiya.co.jp
  3. Set Output Directory: Specify where to save images (default: downloads/)
  4. Click "Start Scraping": The application will:
    • Validate your inputs and URL domain
    • Fetch the product page
    • Extract product name and images
    • Download all images to <output_dir>/<product_name>/
    • Display real-time progress
  5. View Results: Check the status window for download statistics

File Naming Convention

Downloaded images are saved as:

<output_directory>/<sanitized_product_name>/
    ├─ sanitized_product_name_001.jpg
    ├─ sanitized_product_name_002.jpg
    └─ ...

Project Structure

ScrapeMei/
├── run.py                       # Application entry point
├── build.py                     # Standard build script (single .exe)
├── build_folder.py              # Folder distribution build
├── build.bat                    # Windows build wrapper
├── run_tests.py                 # Test runner
├── requirements.txt             # Python dependencies
├── README.md                    # This file
├── QUICKSTART.md                # Quick start guide
├── src/logic/                   # Main source code
│   ├── main.py                  # Application launcher
│   ├── gui.py                   # GUI components (AnimeScraperGUI)
│   ├── parser.py                # Scraping orchestrator (ScraperParser)
│   ├── downloader.py            # Async image downloader
│   ├── utils.py                 # Utility functions
│   └── scraper/                 # Scraper package
│       └── ...                  # base, factory, and company scrapers
├── src/tests/                   # Unit tests
│   └── ...
├── logs/                        # Application logs
│   └── scraper.log
└── downloads/                   # Default download directory
    └── <product_name>/

Adding New Companies

The application is designed for easy extensibility. To add support for a new company:

  1. Create a new scraper class inheriting from BaseScraper
  2. Implement get_product_name() and get_image_urls() methods
  3. Register in ScraperFactory._scrapers dictionary

The new company will automatically appear in the GUI dropdown.

Packaging as .exe (Windows)

To create a standalone executable, use the provided build scripts:

Option 1: Single-File Executable (Recommended)

python build.py

Creates a single ScrapeMei.exe file in the dist/ folder.

Option 2: Folder Distribution (Most Reliable)

python build_folder.py

Creates a dist/ScrapeMei/ folder with the .exe and all dependencies. More reliable on some systems.

Option 3: Batch File Wrapper

build.bat

Windows batch wrapper for the standard build.

Option 4: Debug Build

python build.py --debug

Creates an .exe with a visible console window to diagnose errors.

What the Build Script Does

The build script will:

  • Clean previous build artifacts
  • Automatically read dependencies from requirements.txt
  • Generate hidden imports for PyInstaller
  • Bundle the application with PyInstaller
  • Create executable file(s) in the dist/ folder
  • Verify the build was successful

Note: The build script automatically parses requirements.txt and includes all dependencies in the executable. When you add a new package to requirements.txt, it will automatically be bundled in the next build.

The executable will be standalone and does not require Python to be installed on the target system.

Troubleshooting Build Issues

"Failed to start python embedded interpreter"

  1. Try the folder distribution: python build_folder.py
  2. Build in debug mode: python build.py --debug
  3. Disable antivirus temporarily or add dist/ to exclusions
  4. Install Visual C++ Redistributables
  5. Run the .exe as administrator

"ModuleNotFoundError" when running .exe

  • Add the missing package to requirements.txt
  • Rebuild with python build.py

Manual Build (Advanced)

# Single file
pyinstaller --onefile --windowed --name ScrapeMei --noupx run.py

# Folder distribution
pyinstaller --windowed --name ScrapeMei --noupx run.py

Technical Details

Dependencies

  • requests: HTTP requests for fetching web pages
  • beautifulsoup4: HTML parsing and element selection
  • lxml: Fast HTML/XML parser backend
  • aiohttp: Async HTTP client for parallel downloads
  • tkinter: GUI framework (included with Python)
  • pyinstaller: Creating standalone executables

Performance

  • Supports up to 100 images per product efficiently
  • Configurable parallel downloads (default: 5 concurrent)
  • Async I/O for non-blocking downloads
  • Automatic URL deduplication
  • Exponential backoff for retries

License

This project is provided as-is for educational and personal use.


Version: 1.0.0 (MVP)
Last Updated: March 2026

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors