Papermill

A general-purpose FastAPI microservice that extracts text, structured data from tables, and metadata from various document types (TXT, DOCX, PDF, etc.).

Core Functions:

Document Parsing: Handles multiple file types (TXT, DOCX and PDF).
Metadata Extraction: Pulls out standard metadata like author, creation date, etc.
Output: Returns a comprehensive JSON object containing all the extracted information.

Installation and Setup

# Fork and clone the repository
git clone https://github.com/<username>/papermill.git
cd papermill

# Create a branch for your changes
git checkout -b feature/your-feature-name

# Create and activate a virtual environment
uv venv
source .venv/bin/activate  # On Windows use `.venv\Scripts\activate`

# Install dependencies
uv pip sync requirements.txt

# Start the FastAPI Server
uvicorn src.app:app --reload

Push and make a pull request when your changes are ready.

Endpoints

POST /extract: Upload a document and receive extracted text and metadata.
POST /extract/url: Provide a URL to a document for extraction.
GET /health: Check the health status of the service.
GET /docs: Access the interactive API documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Papermill

Core Functions:

Installation and Setup

Endpoints

About

Uh oh!

Languages

License

akasr/papermill

Folders and files

Latest commit

History

Repository files navigation

Papermill

Core Functions:

Installation and Setup

Endpoints

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages