This guide will help you set up and run the Paper2Poster FastAPI service using Docker and Docker Compose.
- Docker and Docker Compose installed on your system
- NVIDIA Docker runtime (if using local GPU models with vLLM)
- API keys for AI services (OpenAI, Anthropic, etc.)
- 5GB+ free disk space for OCR/parsing models (downloaded on first run)
Copy the example environment file and configure your API keys:
cp env.example .envEdit .env and add your API keys:
OPENAI_API_KEY=your_actual_openai_api_key
# Add other API keys as neededmkdir -p input_papers output_posters model_cache# Build and start the API service
docker-compose up -d
# Check if the service is running
docker-compose ps
# Monitor the startup logs (first run will download models)
docker-compose logs -f paper2poster
# Check service health
curl http://localhost:6025/health
# View API documentation
open http://localhost:6025/docsNote on First Run: On the first container startup, the service will automatically download required OCR and parsing models (Docling and Marker). This process takes 10-30 minutes depending on your internet speed. The models are cached in the model_cache volume and won't need to be downloaded again.
To skip model download (not recommended for production):
# Set environment variable to skip model download
SKIP_MODEL_DOWNLOAD=true docker-compose up -dThe FastAPI service will be available at:
- API Endpoint: http://localhost:6025
- Interactive Docs: http://localhost:6025/docs
- ReDoc: http://localhost:6025/redoc
# Generate a poster using the REST API
curl -X POST "http://localhost:6025/generate-poster" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "pdf_file=@/path/to/your/paper.pdf" \
-F "model_name_t=4o" \
-F "model_name_v=4o" \
-F "poster_width_inches=48" \
-F "poster_height_inches=36"
# Response will include a job_id like:
# {"job_id":"abc123...","message":"Poster generation started","status":"queued"}- Navigate to http://localhost:6025/docs
- Click on "POST /generate-poster"
- Click "Try it out"
- Upload your PDF file and set parameters
- Click "Execute"
import requests
import time
# Upload and start poster generation
with open('/path/to/your/paper.pdf', 'rb') as f:
response = requests.post(
'http://localhost:6025/generate-poster',
files={'pdf_file': f},
data={
'model_name_t': '4o',
'model_name_v': '4o',
'poster_width_inches': 48,
'poster_height_inches': 36
}
)
job_id = response.json()['job_id']
print(f"Job started: {job_id}")
# Check job status
while True:
status_response = requests.get(f'http://localhost:6025/jobs/{job_id}')
status = status_response.json()
print(f"Status: {status['status']} - {status['progress']}")
if status['status'] in ['completed', 'failed']:
break
time.sleep(10)
# Download the generated poster
if status['status'] == 'completed':
poster_response = requests.get(f'http://localhost:6025/download/{job_id}?file_type=pptx')
with open('generated_poster.pptx', 'wb') as f:
f.write(poster_response.content)
print("Poster downloaded successfully!")| Endpoint | Method | Description |
|---|---|---|
/ |
GET | API information and available endpoints |
/health |
GET | Health check endpoint |
/generate-poster |
POST | Upload PDF and start poster generation |
/jobs/{job_id} |
GET | Get job status and progress |
/jobs |
GET | List all jobs (admin) |
/download/{job_id} |
GET | Download generated poster files |
/jobs/{job_id} |
DELETE | Delete a job |
# Check job status
curl http://localhost:6025/jobs/{job_id}
# List all jobs
curl http://localhost:6025/jobs
# Download poster (PPTX format)
curl -O -J http://localhost:6025/download/{job_id}?file_type=pptx
# Download poster (PNG format)
curl -O -J http://localhost:6025/download/{job_id}?file_type=png- Start the vLLM services:
docker-compose --profile vllm up -d- Generate poster with local models via API:
curl -X POST "http://localhost:6025/generate-poster" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "pdf_file=@/path/to/your/paper.pdf" \
-F "model_name_t=vllm_qwen" \
-F "model_name_v=vllm_qwen_vl" \
-F "poster_width_inches=48" \
-F "poster_height_inches=36"You can still access the original command-line interface if needed:
# Access container shell
docker-compose exec paper2poster bash
# Run original pipeline
python -m PosterAgent.new_pipeline \
--poster_path=/app/input_papers/my_paper/paper.pdf \
--model_name_t="4o" \
--model_name_v="4o"
# Create evaluation dataset
python -m PosterAgent.create_dataset
# Evaluate posters
python -m Paper2Poster-eval.eval_poster_pipeline \
--paper_name="my_paper" \
--poster_method="4o_4o_generated_posters" \
--metric=qapaper2poster: Main application container
vllm-server: Text generation model server (Qwen2.5-7B-Instruct)vllm-vl-server: Vision-language model server (Qwen2.5-VL-7B-Instruct)
Start vLLM services:
docker-compose --profile vllm up -dPaper2Poster/
├── input_papers/ # Place your PDF papers here
│ └── paper_name/
│ └── paper.pdf
├── output_posters/ # Generated posters will appear here
├── model_cache/ # Cached models (OCR/parsing models + vLLM)
│ └── .models_downloaded # Marker file indicating models are downloaded
├── tmp/ # Temporary files
├── contents/ # Parsed paper contents
├── tree_splits/ # Layout information
├── <4o_4o>_images_and_tables/ # Extracted images and tables
├── <4o_4o>_generated_posters/ # Final generated posters
└── .env # Your API keys and configuration
4o: GPT-4o (requires OpenAI API key)4o-mini: GPT-4o Mini (requires OpenAI API key)vllm_qwen: Qwen2.5-7B-Instruct via vLLM (local)claude-3-5-sonnet: Claude 3.5 Sonnet (requires Anthropic API key)
4o: GPT-4o Vision (requires OpenAI API key)vllm_qwen_vl: Qwen2.5-VL-7B-Instruct via vLLM (local)claude-3-5-sonnet: Claude 3.5 Sonnet Vision (requires Anthropic API key)
--poster_width_inches: Width in inches (default: 48)--poster_height_inches: Height in inches (default: 36)
--no_blank_detection: Disable blank detection for severe overflow--ablation_no_tree_layout: Disable tree layout (ablation study)--ablation_no_commenter: Disable commenter (ablation study)--ablation_no_example: Disable examples (ablation study)
# Check logs
docker-compose logs paper2poster
# Rebuild the image
docker-compose build --no-cache paper2posterMake sure you have NVIDIA Docker runtime installed:
# Test GPU access
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi# Fix permissions
sudo chown -R $USER:$USER input_papers output_posters model_cache- Make sure your
.envfile is properly configured - Check that API keys are valid and have sufficient credits
- Verify the container can access the
.envfile
If the service fails to start due to model download problems:
# Remove the marker file and restart
rm model_cache/.models_downloaded
docker-compose restart paper2poster
# Or manually download models
docker-compose exec paper2poster python /app/download_models.py
# Check model cache size
du -sh model_cache/docker-compose exec paper2poster bashdocker-compose logs -f paper2poster# Stop all services
docker-compose down
# Stop vLLM services only
docker-compose --profile vllm down- 8GB RAM
- 2 CPU cores
- 15GB storage (10GB for Docker image + 5GB for OCR/parsing models)
- 16GB RAM
- 4 CPU cores
- 20GB storage
- NVIDIA GPU (optional, for faster processing)
- 16GB+ RAM
- NVIDIA GPU with 8GB+ VRAM
- 50GB+ storage for vLLM model cache
For issues and questions, please refer to the main README.md or create an issue in the repository.