How to Extract Alternative Text (Alt Text) from Images in PDF? #4764
Unanswered
krish-tech02
asked this question in
Looking for help
Replies: 3 comments 1 reply
-
Beta Was this translation helpful? Give feedback.
0 replies
-
|
@JorjMcKie, I am looking to fetch the alt text example shown in the screenshot below(this is from microsoft word): |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
When exporting Word to PDF: which options do you choose? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment



Uh oh!
There was an error while loading. Please reload this page.
-
Description
I'm trying to extract the alternative text (alt text) from images embedded in PDF documents using PyMuPDF. Alt text is typically used for accessibility purposes in tagged PDFs (PDF/UA).
What I've Tried
I've attempted several approaches to extract the alt text, but none of them have worked successfully:
Approach 1: Using
xref_get_keyApproach 2: Checking ActualText
Approach 3: Checking Structure Tree
Results
None of the above approaches successfully extracted the alt text from my PDF, even though:
The
xref_get_keymethod either returnsNoneor throws exceptions when trying to access theAltorActualTextkeys.Questions
Environment
Sample PDF
I'm attaching a sample PDF file that contains images with alt text. The PDF is created with accessibility features (PDF/UA compliant).
Breast Care After Birth 10–02-2025 FR - Copy.pdf
[Attach your sample PDF file here]
Expected Behavior
I expect to be able to extract the alternative text associated with images in the PDF, similar to how it appears in Adobe Acrobat's accessibility checker or other PDF readers that support accessibility features.
Any guidance on accessing these properties through PyMuPDF would be greatly appreciated!
Thank you for maintaining this excellent library!
@JorjMcKie
Beta Was this translation helpful? Give feedback.
All reactions