🦀 crapdf

Extract text from a PDF file. Uses the lopdf crate. Kind of crappy.

from crapdf import extract, extract_bytes

# Extract from file path
texts: list[str] = extract("file.pdf")

# Extract from bytes
with open("file.pdf", "rb") as f:
    content = f.read()

texts: list[str] = extract_bytes(content)

Performance

Run the benchmarks using bench.py. Make sure to install dev dependencies from requirements-dev.txt.

The overall performance is similar to pypdf.

AWeirdDev. GitHub Repo

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
bench		bench
python		python
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Pipfile		Pipfile
README.md		README.md
bench.py		bench.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦀 crapdf

Performance

About

Uh oh!

Releases 1

Uh oh!

Languages

AWeirdDev/crapdf

Folders and files

Latest commit

History

Repository files navigation

🦀 crapdf

Performance

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Languages