Skip to content

Unprecedented KeyError occuring #138

@nikhil-aguru

Description

@nikhil-aguru

Bug

I was using docling to convert pdf file, and noticed KeyError originating from docling from the line
doc = converter.convert(pdf_path)
the pdf path contains the pdf file.
The error is occurring for same pdf but with different numbers on KeyError.
...

Steps to reproduce

Unfortunately cant share the same pdf.
...

Docling version

Docling version: 2.58.0
Docling Core version: 2.49.0
Docling IBM Models version: 3.10.1
Docling Parse version: 4.7.0
Python: cpython-312 (3.12.3)
Platform: Linux-6.14.0-1012-aws-x86_64-with-glibc2.39
...

Python version

File "home/src/extractors/pdf_markdown.py", line 79, in process_single_page
doc = converter.convert(pdf_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/path/venv./lib/python3.12/site-packages/pydantic/_internal/_validate_call.py", line 39, in wrapper_function
return wrapper(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "home/path/venv./lib/python3.12/site-packages/pydantic/_internal/_validate_call.py", line 136, in call
res = self.pydantic_validator.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/path/venv./lib/python3.12/site-packages/docling/document_converter.py", line 237, in convert
return next(all_res)
^^^^^^^^^^^^^
File "home/path/venv./lib/python3.12/site-packages/docling/document_converter.py", line 260, in convert_all
for conv_res in conv_res_iter:
File "home/path/venv./lib/python3.12/site-packages/docling/document_converter.py", line 332, in _convert
for item in map(
File "home/path/venv./lib/python3.12/site-packages/docling/document_converter.py", line 379, in _process_document
conv_res = self._execute_pipeline(in_doc, raises_on_error=raises_on_error)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/path/venv./lib/python3.12/site-packages/docling/document_converter.py", line 402, in _execute_pipeline
conv_res = pipeline.execute(in_doc, raises_on_error=raises_on_error)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/path/venv./lib/python3.12/site-packages/docling/pipeline/base_pipeline.py", line 80, in execute
raise e
File "home/path/venv./lib/python3.12/site-packages/docling/pipeline/base_pipeline.py", line 73, in execute
conv_res = self._assemble_document(conv_res)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/path/venv./lib/python3.12/site-packages/docling/pipeline/standard_pdf_pipeline.py", line 153, in _assemble_document
conv_res.document = self.reading_order_model(conv_res)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/path/venv./lib/python3.12/site-packages/docling/models/readingorder_model.py", line 410, in call
sorted_elements = self.ro_model.predict_reading_order(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/path/venv./lib/python3.12/site-packages/docling_ibm_models/reading_order/reading_order_rb.py", line 108, in predict_reading_order
page_to_elems[page_no] = self._predict_page(elems)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/path/venv./lib/python3.12/site-packages/docling_ibm_models/reading_order/reading_order_rb.py", line 239, in _predict_page
self._init_ud_maps(page_elements)
File "home/path/venv./lib/python3.12/site-packages/docling_ibm_models/reading_order/reading_order_rb.py", line 366, in _init_ud_maps
self.dn_map[i].append(j)
~~~~~~~~~~~^^^
KeyError: 22
...

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions