This is repository for Docling Java, a Java API for using Docling.
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
- 🗂️ Parsing of multiple document formats incl. PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, VTT, images (PNG, TIFF, JPEG, ...), and more
- 📑 Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
- 🧬 Unified, expressive DoclingDocument representation format
- ↪️ Various export formats and options, including Markdown, HTML, DocTags and lossless JSON
- 🔒 Local execution capabilities for sensitive data and air-gapped environments
- 🤖 Plug-and-play integrations including LangChain4j
- 🔍 Extensive OCR support for scanned PDFs and images
- 👓 Support of several Visual Language Models (GraniteDocling)
- 🎙️ Audio support with Automatic Speech Recognition (ASR) models
This project aims to provide the following artifacts:
- docling-api: The core API for interacting with Docling. Should be framework-agnostic.
- docling-client: A reference implementation of the- docling-apiusing Java's- HttpClientand Jackson.
- docling-testing: Utilities for testing Docling
- docling-testcontainers: A Testcontainers module for running Docling in a Docker container.
Use DoclingApi.convertSource() to convert individual documents. For example:
import ai.docling.api.DoclingApi;
import ai.docling.api.convert.request.ConvertDocumentRequest;
import ai.docling.api.convert.response.ConvertDocumentResponse;
import ai.docling.client.DoclingClient;
DoclingApi doclingApi = DoclingClient.builder()
    .baseUrl("<location of docling server>")
    .build();
ConvertDocumentRequest request = ConvertDocumentRequest.builder()
    .addHttpSources(URI.create("https://arxiv.org/pdf/2408.09869"))
    .build();
ConvertDocumentResponse response = doclingApi.convertSource(request);
System.out.println(response.document().markdownContent());More usage information are available in the docs.
Please feel free to connect with us using the discussion section.
Please read Contributing to Docling Java for details.
The Docling codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.
The project was started by the AI for knowledge team at IBM Research Zurich.
Thanks goes to these wonderful people (emoji key):
| Eric Deandrea 💻 🖋 📖 🤔 🚇 🚧 📆 | Thomas Vitale 💻 🖋 📖 🤔 🚇 🚧 📆 | Alex Soto 🤔 📆 | 
This project follows the all-contributors specification. Contributions of any kind welcome!