Skip to content
/ EuroCV Public

Convert PDF/DOCX resumes to Europass CV format via CLI, API, or Docker

License

Notifications You must be signed in to change notification settings

Bermpje/EuroCV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EuroCV

Convert resumes (PDF/DOCX) to Europass XML/JSON format with CLI, Python API, and HTTP service support.

Features

  • πŸ“„ Multi-format support: Parse PDF and DOCX resumes with auto-detection
  • πŸ€– Smart Extractors: LinkedIn PDF, Generic PDF (with Dutch support), and DOCX extractors
  • 🌍 Multi-language: Native support for English and Dutch resumes
  • πŸ‡ͺπŸ‡Ί Europass compliant: Validates against official Europass schemas
  • 🐍 Python API: Use as a library in your Python projects
  • πŸ’» CLI tool: Command-line interface for batch processing
  • 🌐 HTTP service: FastAPI-based REST API
  • 🐳 Docker ready: Containerized for easy deployment
  • πŸ”’ AVG/GDPR compliant: Stateless processing, no data retention
  • 🌍 Multi-locale: Support for multiple date/number formats (including Dutch)

Installation

From PyPI (coming soon)

pip install eurocv

From source

git clone https://github.com/emiel/eurocv.git
cd eurocv
pip install -e .

With OCR support

pip install eurocv[ocr]
# Also requires tesseract-ocr system package

Docker

docker pull ghcr.io/emiel/eurocv:latest

Usage

CLI

# Convert single file
eurocv convert resume.pdf --out output.json

# Convert with XML output
eurocv convert resume.docx --out-json output.json --out-xml output.xml

# Batch conversion
eurocv batch resumes/*.pdf --out-dir output/ --parallel 4

# Dutch locale and no photo (GDPR-friendly)
eurocv convert cv.pdf --locale nl-NL --no-photo --out output.json

Python API

from eurocv import convert_to_europass

# Simple conversion
europass_json = convert_to_europass("resume.pdf")

# With options
europass_json = convert_to_europass(
    "resume.pdf",
    locale="nl-NL",
    include_photo=False,
    output_format="json"
)

# Get both JSON and XML
result = convert_to_europass("resume.pdf", output_format="both")
print(result.json)
print(result.xml)

Docker

# Convert a file
docker run --rm -v $PWD:/data ghcr.io/emiel/eurocv \
  convert /data/resume.pdf --out /data/output.json

# Batch processing
docker run --rm -v $PWD:/data ghcr.io/emiel/eurocv \
  batch /data/resumes/*.pdf --out-dir /data/output

HTTP Service

Start the server:

# Using CLI
eurocv serve --host 0.0.0.0 --port 8000

# Using Docker
docker run -p 8000:8000 ghcr.io/emiel/eurocv serve

# Using uvicorn directly
uvicorn eurocv.api.main:app --host 0.0.0.0 --port 8000

API endpoints:

# Convert a resume
curl -X POST http://localhost:8000/convert \
  -F "file=@resume.pdf" \
  -F "locale=nl-NL" \
  -F "include_photo=false"

# Validate Europass JSON
curl -X POST http://localhost:8000/validate \
  -H "Content-Type: application/json" \
  -d @europass.json

# Get Europass JSON Schema
curl http://localhost:8000/schema > europass-schema.json

# Health check
curl http://localhost:8000/healthz

# Interactive API docs
open http://localhost:8000/docs

API Schema & Type Generation

The API provides a fully-typed JSON Schema for the Europass CV format. Use this for client code generation:

# Download the schema
curl http://localhost:8000/schema > europass-schema.json

# Generate TypeScript types
npx quicktype europass-schema.json -o europass.ts

# Generate Python types
datamodel-codegen --input europass-schema.json --output europass_types.py

# Generate Go types
quicktype europass-schema.json -o europass.go --lang go

# Generate Java classes
json schema2pojo --source europass-schema.json --target java-gen/

Benefits:

  • Type-safe clients: Auto-generate types for any language
  • IDE autocomplete: Full IntelliSense support
  • Validation: Validate responses against official schema
  • Documentation: Self-describing API

Example TypeScript usage:

import { EuropassCVResponse } from './europass';

async function convertResume(file: File): Promise<EuropassCVResponse> {
  const response = await fetch('http://localhost:8000/convert', {
    method: 'POST',
    body: formData
  });
  return await response.json(); // Fully typed!
}

// IDE knows all fields:
const firstName = result.data.LearnerInfo.Identification.PersonName.FirstName;

Architecture

eurocv/
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ extract/      # PDF/DOCX parsing
β”‚   β”œβ”€β”€ map/          # Resume β†’ Europass mapping
β”‚   └── validate/     # Schema validation
β”œβ”€β”€ cli/              # CLI interface (Typer)
β”œβ”€β”€ api/              # HTTP service (FastAPI)
└── schemas/          # Europass XML/JSON schemas

Development

Setup

# Clone repository
git clone https://github.com/emiel/eurocv.git
cd eurocv

# Install with dev dependencies
pip install -e ".[dev,ocr]"

# Run tests
pytest

# Format code
black src/ tests/
ruff check src/ tests/

# Type checking
mypy src/

Running locally

# CLI
python -m eurocv.cli.main convert test.pdf --out output.json

# API server
uvicorn eurocv.api.main:app --reload

Docker build

# Build image
docker build -t eurocv:local .

# Run
docker run --rm -v $PWD:/data eurocv:local convert /data/test.pdf

Privacy & Compliance (AVG/GDPR)

  • Stateless processing: No data is stored on disk
  • No photo by default: Use --no-photo flag to exclude photos (GDPR-friendly)
  • Local processing: Run in your own infrastructure
  • Encrypted storage: Use --store=encrypted only when necessary

Europass Resources

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions:

Roadmap

  • Support for JSON Resume format
  • Enhanced OCR with layout analysis
  • Support for more input formats (LinkedIn, etc.)
  • AI-powered field extraction
  • Europass PDF rendering
  • Multi-language support

About

Convert PDF/DOCX resumes to Europass CV format via CLI, API, or Docker

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages