Convert resumes (PDF/DOCX) to Europass XML/JSON format with CLI, Python API, and HTTP service support.
- π Multi-format support: Parse PDF and DOCX resumes with auto-detection
- π€ Smart Extractors: LinkedIn PDF, Generic PDF (with Dutch support), and DOCX extractors
- π Multi-language: Native support for English and Dutch resumes
- πͺπΊ Europass compliant: Validates against official Europass schemas
- π Python API: Use as a library in your Python projects
- π» CLI tool: Command-line interface for batch processing
- π HTTP service: FastAPI-based REST API
- π³ Docker ready: Containerized for easy deployment
- π AVG/GDPR compliant: Stateless processing, no data retention
- π Multi-locale: Support for multiple date/number formats (including Dutch)
pip install eurocvgit clone https://github.com/emiel/eurocv.git
cd eurocv
pip install -e .pip install eurocv[ocr]
# Also requires tesseract-ocr system packagedocker pull ghcr.io/emiel/eurocv:latest# Convert single file
eurocv convert resume.pdf --out output.json
# Convert with XML output
eurocv convert resume.docx --out-json output.json --out-xml output.xml
# Batch conversion
eurocv batch resumes/*.pdf --out-dir output/ --parallel 4
# Dutch locale and no photo (GDPR-friendly)
eurocv convert cv.pdf --locale nl-NL --no-photo --out output.jsonfrom eurocv import convert_to_europass
# Simple conversion
europass_json = convert_to_europass("resume.pdf")
# With options
europass_json = convert_to_europass(
"resume.pdf",
locale="nl-NL",
include_photo=False,
output_format="json"
)
# Get both JSON and XML
result = convert_to_europass("resume.pdf", output_format="both")
print(result.json)
print(result.xml)# Convert a file
docker run --rm -v $PWD:/data ghcr.io/emiel/eurocv \
convert /data/resume.pdf --out /data/output.json
# Batch processing
docker run --rm -v $PWD:/data ghcr.io/emiel/eurocv \
batch /data/resumes/*.pdf --out-dir /data/outputStart the server:
# Using CLI
eurocv serve --host 0.0.0.0 --port 8000
# Using Docker
docker run -p 8000:8000 ghcr.io/emiel/eurocv serve
# Using uvicorn directly
uvicorn eurocv.api.main:app --host 0.0.0.0 --port 8000API endpoints:
# Convert a resume
curl -X POST http://localhost:8000/convert \
-F "file=@resume.pdf" \
-F "locale=nl-NL" \
-F "include_photo=false"
# Validate Europass JSON
curl -X POST http://localhost:8000/validate \
-H "Content-Type: application/json" \
-d @europass.json
# Get Europass JSON Schema
curl http://localhost:8000/schema > europass-schema.json
# Health check
curl http://localhost:8000/healthz
# Interactive API docs
open http://localhost:8000/docsThe API provides a fully-typed JSON Schema for the Europass CV format. Use this for client code generation:
# Download the schema
curl http://localhost:8000/schema > europass-schema.json
# Generate TypeScript types
npx quicktype europass-schema.json -o europass.ts
# Generate Python types
datamodel-codegen --input europass-schema.json --output europass_types.py
# Generate Go types
quicktype europass-schema.json -o europass.go --lang go
# Generate Java classes
json schema2pojo --source europass-schema.json --target java-gen/Benefits:
- Type-safe clients: Auto-generate types for any language
- IDE autocomplete: Full IntelliSense support
- Validation: Validate responses against official schema
- Documentation: Self-describing API
Example TypeScript usage:
import { EuropassCVResponse } from './europass';
async function convertResume(file: File): Promise<EuropassCVResponse> {
const response = await fetch('http://localhost:8000/convert', {
method: 'POST',
body: formData
});
return await response.json(); // Fully typed!
}
// IDE knows all fields:
const firstName = result.data.LearnerInfo.Identification.PersonName.FirstName;eurocv/
βββ core/
β βββ extract/ # PDF/DOCX parsing
β βββ map/ # Resume β Europass mapping
β βββ validate/ # Schema validation
βββ cli/ # CLI interface (Typer)
βββ api/ # HTTP service (FastAPI)
βββ schemas/ # Europass XML/JSON schemas
# Clone repository
git clone https://github.com/emiel/eurocv.git
cd eurocv
# Install with dev dependencies
pip install -e ".[dev,ocr]"
# Run tests
pytest
# Format code
black src/ tests/
ruff check src/ tests/
# Type checking
mypy src/# CLI
python -m eurocv.cli.main convert test.pdf --out output.json
# API server
uvicorn eurocv.api.main:app --reload# Build image
docker build -t eurocv:local .
# Run
docker run --rm -v $PWD:/data eurocv:local convert /data/test.pdf- Stateless processing: No data is stored on disk
- No photo by default: Use
--no-photoflag to exclude photos (GDPR-friendly) - Local processing: Run in your own infrastructure
- Encrypted storage: Use
--store=encryptedonly when necessary
MIT License - see LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
For issues and questions:
- GitHub Issues: https://github.com/emiel/eurocv/issues
- Email: [your-email]
- Support for JSON Resume format
- Enhanced OCR with layout analysis
- Support for more input formats (LinkedIn, etc.)
- AI-powered field extraction
- Europass PDF rendering
- Multi-language support