Skip to content

Something questions about part'Pre-processing' #4

@ImmortalCi

Description

@ImmortalCi

Hello, thanks for your interesting work!

I'm tring to recomplete COCO Pre-training and I noticed that I need to preprocess the dataset.
This is mentioned in the ./COCO-DR/COCO/README.md
image

But when I follow the instructions in it, Something goes wrong in pre_processing_coco.sh.
It calls COCO-DR/COCO/helper/create_train_co_short.py and there's a function called encode_one().

in the line 35&36, item is a Dict but no group, spans key in the Dict. This will cause raise valueKeyError: 'group'

image

log as follows:
image

I noticed that there are only four keys in each line of the dataset: 'id','title',"text','metadata'
Did I miss some steps before preprocessing?
I'm eagerly looking forward to your reply!!! Thanks a lot!

Best regards!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions