#

text-extraction-from-pdf

Here are 8 public repositories matching this topic...

purunep / pdfparser

The PDF Parser API is an intelligent and modular document processing service that converts unstructured PDF files into structured, machine-readable data. It automatically detects and extracts text blocks, tables, key-value pairs, and images, and returns the results in a hierarchical JSON format, complete with page-level and spatial metadata.

pdf pdf-parser document-processing fastapi text-extraction-from-pdf image-extraction-from-pdf rag-pdf

Updated May 29, 2025
Python

akoutsop1909 / pdf-to-txt-converter

A simple Java CLI tool for batch-converting PDF files to TXT format. Supports file filtering by filename wildcards and last modified date.

Updated Jan 25, 2026
Java

Var1035 / legai-bond-extractor

Automated extraction of structured data from legal bond and agreement documents using Python.

automation python3 contract-analysis data-extraction-and-pre-processing legal-documents legal-tech text-extraction-from-pdf mistrial

Updated Mar 24, 2026
HTML

smmehrab / igm-extractor

PoC to showcase text extraction from IGM documents using VLMs

ocr prototype gemini-api igm vlms text-extraction-from-pdf

Updated Jan 4, 2026
Java

josh-janse / pdf-to-markdown-extractor

Convert PDF documents to clean markdown using Google's Gemini API.

nodejs markdown pdf ai text-extraction document-processing gemini-api text-extraction-from-pdf

Updated Jun 17, 2025
JavaScript

jeetuverma2002 / Document-PDF-Analyzer

AI-powered tool to analyze and extract insights from PDF and documents

nlp pdf ai text-extraction pdf-viewer pdf-document pdf-analyzer document-analyzer text-extraction-from-pdf

Updated Apr 25, 2026

FelixCAxO / PdfPy

Split PDFs into sections using bookmarks, text-style detection, OCR, or manual page starts.

python pdf automation text-extraction-from-pdf

Updated Feb 20, 2026
Python

D3M-Sudo / Anura

Extract text from any image, video, QR Code and etc.

ocr tesseract qr-code text-extraction tesseract-ocr optical-character-recognition ocr-recognition ocr-python anura text-extraction-from-image qr-decoder text-extraction-from-pdf

Updated Apr 26, 2026
Python

Improve this page

Add a description, image, and links to the text-extraction-from-pdf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction-from-pdf topic, visit your repo's landing page and select "manage topics."