Comparison: Basic versus Layout Mode #327
JorjMcKie
announced in
Announcements
Replies: 1 comment
-
|
Here is an example of a Jupyter notebook execution dynamically using package tqdm:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When using PyMuPDF with package PyMuPDF-Layout, many new features become available under PyMuPDF4LLM.
Use the following import statements in your scripts to activate layout execution mode. The first statement is mandatory and the sequence of statements is important.
If you are using PyMuPDF-Pro, code your imports as follows:
New Features
.to_text()creates the plain text version of the text. Tables are written using package tabulate..to_json()creates a JSON version of the document's metadata, together with the selected pages.marginsparameter obsoletePostponed and Obsolete Features
The following table shows parameter availabilities of all three methods
to_markdown(),to_text()andto_json().The entries in the Comments column have the following meaning:
marginsmarginsUnavailable Features in Layout Mode
In addition to ignored parameters shown in above table, some features are not unavailable when PyMuPDF-Layout is active.
IdentifyHeadersis unavailable in layout mode. Titles and section headers are detected with a much higher precision. This is not dependent on things like font size -- the approach used in basic mode. However, there is no way to retrieve section header levels. Therefore, only two markdown header level tags are used, "#" for titles and "##" for section headers.TocHeadersis unavailable in layout mode. Titles and section headers are exclusively detected by PyMuPDF-Layout.General Comments
While PyMuPDF-Layout is AI-empowered, it is different from most other tools that employ artificial intelligence.
It is not vision-based: its models do not depend on rendered page images.
Its Graph Neural Networks are directly based on PDF internals, thus combining precision with an up to 10 times higher speed.
Here is a short list of the characteristics:
Beta Was this translation helpful? Give feedback.
All reactions