🤖 Image Captioning System (BLIP-Powered)

An advanced AI-powered platform that automatically generates contextual, human-like captions for images using state-of-the-art Salesforce BLIP Transformer models.

🌟 System Overview

This project provide a robust solution for automated image-to-text generation. It transitions from a legacy EfficientNet+BiLSTM architecture to a modern, high-accuracy Transformer-based pipeline. Key objectives include:

Accuracy: Leveraging Vision-Language Pre-training (VLP) for human-like descriptions.
Scalability: Utilizing Celery + Redis to handle heavy ML inference asynchronously.
Accessibility: Providing a seamless API for developers to integrate captioning into any website via a simple JS snippet.
Modern UI: A premium dark-themed React frontend with real-time status polling.

🏗️ System Architecture & Diagrams

1. High-Level Logic Flow (Activity Diagram)

2. Async Communication (Sequence Diagram)

3. Database Schema (ERD Diagram)

4. Class Structure

5. Interaction (Use Case Diagram)

🚀 How to Use

1. Backend Setup

Environment: Create a virtual environment and install dependencies.

python -m venv venv
venv\Scripts\activate
pip install -r backend/requirements.txt

Database: Run migrations.
```
python manage.py migrate
```
Redis: Ensure Redis is running (default: localhost:6379).

Worker: Start the Celery worker in a separate terminal.

celery -A backendImageCaption worker -l INFO --pool=solo

Server: Run the Django development server.
```
python manage.py runserver
```

2. Frontend Setup

Install:
```
cd frontend
npm install
```
Start:
```
npm start
```

3. API Integration

To add captions to any website, include the JS snippet found in the Documentation page. Replace YOUR_API_KEY_HERE with a valid key generated from the platform.

🛠️ Tools & Technologies

Layer	Tool / Framework	Version
Frontend	React	19.x
Backend	Django	5.x
API	Django REST Framework	3.15.x
Model	Salesforce BLIP (Base)	-
Inference	PyTorch / Transformers	5.3.0
Workers	Celery / Redis	5.x
Styling	Vanilla CSS / Bootstrap	5.3

⚖️ License & Hard Constraints

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0).

⚠️ HARD CONSTRAINTS

DAILY QUOTA: Strictly capped at 1,000 requests per day per API key.
COMMERCIAL USE: Commercial use of the Salesforce BLIP model weights MUST comply with the Salesforce BLIP license terms (BSD 3-Clause).
ATTRIBUTION: You MUST maintain all existing copyright notices and "Powered by Salesforce BLIP" markers in any derivative works.
IMAGE LIMITS: Max file size is 10MB. Supported: PNG, JPG, JPEG, GIF.
WARRANTY: This is a research project. The authors provide NO WARRANTY and assume NO LIABILITY for model output.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
activaty diagram.svg		activaty diagram.svg
class diagram.svg		class diagram.svg
erd diagram.png		erd diagram.png
sequence diagram.svg		sequence diagram.svg
use case diagram.svg		use case diagram.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Image Captioning System (BLIP-Powered)

🌟 System Overview

🏗️ System Architecture & Diagrams

1. High-Level Logic Flow (Activity Diagram)