Skip to content

ahmedsamir46/Image-Caption-System

Repository files navigation

🤖 Image Captioning System (BLIP-Powered)

An advanced AI-powered platform that automatically generates contextual, human-like captions for images using state-of-the-art Salesforce BLIP Transformer models.


🌟 System Overview

This project provide a robust solution for automated image-to-text generation. It transitions from a legacy EfficientNet+BiLSTM architecture to a modern, high-accuracy Transformer-based pipeline. Key objectives include:

  • Accuracy: Leveraging Vision-Language Pre-training (VLP) for human-like descriptions.
  • Scalability: Utilizing Celery + Redis to handle heavy ML inference asynchronously.
  • Accessibility: Providing a seamless API for developers to integrate captioning into any website via a simple JS snippet.
  • Modern UI: A premium dark-themed React frontend with real-time status polling.

🏗️ System Architecture & Diagrams

1. High-Level Logic Flow (Activity Diagram)

Activity Diagram

2. Async Communication (Sequence Diagram)

Sequence Diagram

3. Database Schema (ERD Diagram)

ERD Diagram

4. Class Structure

Class Diagram

5. Interaction (Use Case Diagram)

Use Case Diagram


🚀 How to Use

1. Backend Setup

  1. Environment: Create a virtual environment and install dependencies.
    python -m venv venv
    venv\Scripts\activate
    pip install -r backend/requirements.txt
  2. Database: Run migrations.
    python manage.py migrate
  3. Redis: Ensure Redis is running (default: localhost:6379).
  4. Worker: Start the Celery worker in a separate terminal.
    celery -A backendImageCaption worker -l INFO --pool=solo
  5. Server: Run the Django development server.
    python manage.py runserver

2. Frontend Setup

  1. Install:
    cd frontend
    npm install
  2. Start:
    npm start

3. API Integration

To add captions to any website, include the JS snippet found in the Documentation page. Replace YOUR_API_KEY_HERE with a valid key generated from the platform.


🛠️ Tools & Technologies

Layer Tool / Framework Version
Frontend React 19.x
Backend Django 5.x
API Django REST Framework 3.15.x
Model Salesforce BLIP (Base) -
Inference PyTorch / Transformers 5.3.0
Workers Celery / Redis 5.x
Styling Vanilla CSS / Bootstrap 5.3

⚖️ License & Hard Constraints

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0).

⚠️ HARD CONSTRAINTS

  • DAILY QUOTA: Strictly capped at 1,000 requests per day per API key.
  • COMMERCIAL USE: Commercial use of the Salesforce BLIP model weights MUST comply with the Salesforce BLIP license terms (BSD 3-Clause).
  • ATTRIBUTION: You MUST maintain all existing copyright notices and "Powered by Salesforce BLIP" markers in any derivative works.
  • IMAGE LIMITS: Max file size is 10MB. Supported: PNG, JPG, JPEG, GIF.
  • WARRANTY: This is a research project. The authors provide NO WARRANTY and assume NO LIABILITY for model output.

About

An advanced AI-powered platform that automatically generates contextual, human-like captions for images using state-of-the-art Salesforce BLIP Transformer models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors