Skip to content

ManojBuilds/doc-to-pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOC/DOCX to PDF Conversion API

A high-accuracy, self-hosted Node.js Express API for converting DOC, DOCX, DOCM, RTF, and ODT documents to PDF using LibreOffice.

Features

  • High Accuracy: Uses LibreOffice for professional-quality conversions
  • No External Dependencies: Completely self-hosted solution
  • Multiple Formats: Supports DOC, DOCX, DOCM, RTF, ODT
  • Production Ready: Includes logging, health checks, rate limiting
  • Docker Support: Easy deployment with Docker
  • TypeScript Support: Full TypeScript integration for Next.js

Quick Start

1. Automated Setup

# Clone or create the project directory
mkdir pdf-conversion-api && cd pdf-conversion-api

# Create all the files (copy the provided code)
# Run the installation script
chmod +x install.sh
./install.sh

2. Manual Setup

Prerequisites

  • Node.js 16+
  • LibreOffice installed on your system

Install LibreOffice

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install -y libreoffice default-jre fonts-liberation

CentOS/RHEL:

sudo yum install -y libreoffice java-1.8.0-openjdk

macOS:

brew install --cask libreoffice

Windows: Download from LibreOffice.org

Install Dependencies

npm install

3. Development

# Start in development mode
npm run dev
# or
./dev.sh

# Start in production mode
npm start
# or
./start.sh

4. Docker Deployment

# Build and run with Docker Compose
docker-compose up --build

# Or build manually
docker build -t pdf-converter .
docker run -p 3001:3001 pdf-converter

API Endpoints

Convert Document to PDF

POST /api/convert/to-pdf
Content-Type: multipart/form-data

file: [document file]

Example:

curl -X POST \
  -F "file=@document.docx" \
  -o converted.pdf \
  http://localhost:3001/api/convert/to-pdf

Health Checks

GET /api/health                # Basic health check
GET /api/health/detailed        # Detailed system info
GET /api/health/libreoffice     # LibreOffice-specific check

File Validation

POST /api/convert/validate
Content-Type: multipart/form-data

file: [document file]

Integration with Next.js

1. Update Your Next.js Environment

Add to .env.local:

CONVERSION_SERVICE_URL=http://localhost:3001

2. Use the Updated API Route

The provided Next.js API route (app/api/convert-to-pdf/route.ts) will automatically use your conversion service.

3. Update Your React Component

Your existing NewDocumentDialog component should work without changes! The conversion will now happen through your accurate LibreOffice-powered service.

Supported File Formats

Format Extension MIME Type
Microsoft Word .doc application/msword
Microsoft Word .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document
Word Macro-Enabled .docm application/vnd.ms-word.document.macroEnabled.12
Rich Text Format .rtf application/rtf
OpenDocument Text .odt application/vnd.oasis.opendocument.text

Configuration

Environment Variables

# Server Configuration
NODE_ENV=development|production
PORT=3001
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:3001
LOG_LEVEL=info|debug|warn|error

# File Upload Limits
MAX_FILE_SIZE=50MB
MAX_FILES=1
UPLOAD_TIMEOUT=60000

Docker Environment

environment:
  - NODE_ENV=production
  - PORT=3001
  - ALLOWED_ORIGINS=http://localhost:3000

Production Deployment

1. VPS/Server Deployment

# Install LibreOffice on your server
sudo apt-get install libreoffice default-jre

# Clone your application
git clone your-repo
cd your-app

# Install dependencies
npm ci --only=production

# Start with PM2 (recommended)
npm install -g pm2
pm2 start ecosystem.config.js

2. Docker Deployment

# Use the provided Docker setup
docker-compose up -d

# Or with custom settings
docker run -d \
  --name pdf-converter \
  -p 3001:3001 \
  -e NODE_ENV=production \
  -v ./logs:/app/logs \
  your-pdf-converter-image

3. Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pdf-converter
spec:
  replicas: 2
  selector:
    matchLabels:
      app: pdf-converter
  template:
    metadata:
      labels:
        app: pdf-converter
    spec:
      containers:
        - name: pdf-converter
          image: your-pdf-converter:latest
          ports:
            - containerPort: 3001
          env:
            - name: NODE_ENV
              value: "production"
          resources:
            requests:
              memory: "512Mi"
              cpu: "250m"
            limits:
              memory: "1Gi"
              cpu: "500m"

Monitoring and Logging

Health Monitoring

Monitor the /api/health/detailed endpoint for:

  • LibreOffice conversion functionality
  • System resource usage
  • File system accessibility
  • Response times

Logs

Logs are written to:

  • logs/combined.log - All logs
  • logs/error.log - Error logs only
  • Console output in development

Metrics

The API provides metrics via health endpoints:

  • Conversion times
  • Memory usage
  • System load
  • Success/failure rates

Troubleshooting

LibreOffice Issues

Error: "LibreOffice not found"

# Check if LibreOffice is installed
libreoffice --version

# Check if it's in PATH
which libreoffice

Error: "Conversion timeout"

  • Increase timeout in configuration
  • Check system resources
  • Verify LibreOffice fonts are installed

Memory Issues

High memory usage:

  • LibreOffice can use significant memory for complex documents
  • Consider implementing queue system for high load
  • Monitor with health endpoints

File Format Issues

Unsupported file error:

  • Check MIME type detection
  • Verify file isn't corrupted
  • Test with simple documents first

Performance Optimization

1. System Level

  • Adequate RAM (2GB+ recommended)
  • SSD storage for temporary files
  • Multiple CPU cores for concurrent conversions

2. Application Level

  • Implement conversion queues for high load
  • Add Redis for job management
  • Scale horizontally with load balancer

3. LibreOffice Optimization

  • Pre-installed fonts for consistent rendering
  • Headless mode (automatically enabled)
  • Regular cleanup of temporary files

Security Considerations

  1. File Validation: Strict MIME type checking
  2. Size Limits: Configurable file size limits
  3. Rate Limiting: Built-in rate limiting
  4. Sanitization: Temporary file cleanup
  5. Non-root User: Docker runs as non-root user

License

MIT License - see LICENSE file for details.

About

Simple, lightweight and free doc to pdf api

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published