An advanced AI-powered platform that automatically generates contextual, human-like captions for images using state-of-the-art Salesforce BLIP Transformer models.
This project provide a robust solution for automated image-to-text generation. It transitions from a legacy EfficientNet+BiLSTM architecture to a modern, high-accuracy Transformer-based pipeline. Key objectives include:
- Accuracy: Leveraging Vision-Language Pre-training (VLP) for human-like descriptions.
- Scalability: Utilizing Celery + Redis to handle heavy ML inference asynchronously.
- Accessibility: Providing a seamless API for developers to integrate captioning into any website via a simple JS snippet.
- Modern UI: A premium dark-themed React frontend with real-time status polling.
- Environment: Create a virtual environment and install dependencies.
python -m venv venv venv\Scripts\activate pip install -r backend/requirements.txt
- Database: Run migrations.
python manage.py migrate
- Redis: Ensure Redis is running (default:
localhost:6379). - Worker: Start the Celery worker in a separate terminal.
celery -A backendImageCaption worker -l INFO --pool=solo
- Server: Run the Django development server.
python manage.py runserver
- Install:
cd frontend npm install - Start:
npm start
To add captions to any website, include the JS snippet found in the Documentation page. Replace YOUR_API_KEY_HERE with a valid key generated from the platform.
| Layer | Tool / Framework | Version |
|---|---|---|
| Frontend | React | 19.x |
| Backend | Django | 5.x |
| API | Django REST Framework | 3.15.x |
| Model | Salesforce BLIP (Base) | - |
| Inference | PyTorch / Transformers | 5.3.0 |
| Workers | Celery / Redis | 5.x |
| Styling | Vanilla CSS / Bootstrap | 5.3 |
This project is licensed under the GNU General Public License v3.0 (GPL-3.0).
- DAILY QUOTA: Strictly capped at 1,000 requests per day per API key.
- COMMERCIAL USE: Commercial use of the Salesforce BLIP model weights MUST comply with the Salesforce BLIP license terms (BSD 3-Clause).
- ATTRIBUTION: You MUST maintain all existing copyright notices and "Powered by Salesforce BLIP" markers in any derivative works.
- IMAGE LIMITS: Max file size is 10MB. Supported: PNG, JPG, JPEG, GIF.
- WARRANTY: This is a research project. The authors provide NO WARRANTY and assume NO LIABILITY for model output.
