CV Parser (Vietnamese + English)

This project extracts structured information from Vietnamese and English CVs (PDF or DOCX), including name, contact, skills, education, experience, and languages. It auto-detects the language and routes outputs accordingly.

📦 External Requirements

Please install these system-level dependencies before running the project:

Dependency	Version	Description
Python	3.12.7+	Recommended interpreter
Poppler	24.08.0	Required for accurate PDF text extraction
Tesseract-OCR	Latest / 5.x+	OCR fallback for scanned/non-text PDFs
PyTorch	Compatible w/ Transformers	Required for spaCy's transformer pipeline
Transformers	Latest	Used for spaCy's `en_core_web_trf` model

Linux (Debian/Ubuntu)

sudo apt update
sudo apt install poppler-utils tesseract-ocr

MacOS (Homebrew)

brew install poppler tesseract

Python Setup

1. Clone the repo

git clone https://github.com/Watch650/ResumeParser.git

2. Create a virtual environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

3. Install Python dependencies

pip install -r requirements.txt

4. Download spaCy models

python -m spacy download en_core_web_trf

Folder Structure

cv-parser/
├── CV/                     # Drop .pdf/.docx files here
├── text_extract/           # Output: Cleaned text per language
├── parsed_data/            # Output: Structured JSON data
├── extractors/             # Text extractors (PDF, DOCX)
├── utils/                  # Helper utilities (cleaning, OCR)
├── file_router.py          # Routes and processes input CV files
├── file_parser_en.py       # English CV parser
└── file_parser_vn.py       # Vietnamese CV parser

How to Use

1. Drops files

Place .pdf or .docx files in the CV/ folder.

2. Extract & route text

python file_router.py

3. Parse CVs

python file_parser_en.py   # For English CVs
python file_parser_vn.py   # For Vietnamese CVs

Ouput Format

Each parsed CV will be saved to: parsed_data/extracted_cv_data.json. Sample entry:

{
  "ho_ten": "Vu Hoang Lan",
  "email": "example@gmail.com",
  "so_dien_thoai": "+84987654321",
  "hoc_van": [...],
  "kinh_nghiem": [...],
  "ky_nang": [...],
  "ngoai_ngu": [...],
  "source_file": "CV_1_extracted_text.txt"
}

Logging

Logs are saved in: logs/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CV Parser (Vietnamese + English)

📦 External Requirements

Linux (Debian/Ubuntu)

MacOS (Homebrew)

Python Setup

1. Clone the repo

2. Create a virtual environment

3. Install Python dependencies

4. Download spaCy models

Folder Structure

How to Use

1. Drops files

2. Extract & route text

3. Parse CVs

Ouput Format

Logging

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
CV		CV
extractors		extractors
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
file_parser_en.py		file_parser_en.py
file_parser_vn.py		file_parser_vn.py
file_router.py		file_router.py
requirement_external.txt		requirement_external.txt
requirements.txt		requirements.txt

Watch650/ResumeParser

Folders and files

Latest commit

History

Repository files navigation

CV Parser (Vietnamese + English)

📦 External Requirements

Linux (Debian/Ubuntu)

MacOS (Homebrew)

Python Setup

1. Clone the repo

2. Create a virtual environment

3. Install Python dependencies

4. Download spaCy models

Folder Structure

How to Use

1. Drops files

2. Extract & route text

3. Parse CVs

Ouput Format

Logging

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages