-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Hi there.
I have tried running create_memmaps.sh with DocVQA data and T5-large model following the instruction.
It seems successful with microsoft_cv, while there are warnings about png like #6 (comment), but failed with tesseract giving the error like in the title.
I found that 'tokens_layer' is used instead of 'common_format' in the tesseract block in documents_content.jsonl (of DocVQA at least), so using 'tokens_layer' if 'common_format' is not found should solve the error here for now.
https://github.com/due-benchmark/baselines/blob/master/benchmarker/data/reader/benchmark_dataset.py#L54-L57
jshtok
Metadata
Metadata
Assignees
Labels
No labels