Skip to content

zipfile.BadZipFile: File is not a zip file #39

@Intellouis

Description

@Intellouis

When I tried to run "python local_nlp_evaluation.py" in https://gitlab.aicrowd.com/aicrowd/challenges/iglu-challenge-2022/iglu-2022-rl-mhb-baseline, it should download a dataset using IGLU gridworld (as it is in the code local_nlp_evaluation.py, line 29:
dataset = IGLUDataset(task_kwargs=None, force_download=False, ))

Then it would call the library function: ~/.conda/envs/iglu_mhb/lib/python3.9/site-packages/gridworld/data/iglu_dataset.py, line 181:
download(url=url, destination=path, data_prefix=data_path, description='downloading multiturn dataset')

Then it would call: ~/.conda/envs/iglu_mhb/lib/python3.9/site-packages/gridworld/data/load.py, line 9:

def download(url, destination, data_prefix, description='downloading dataset into'):
    os.makedirs(data_prefix, exist_ok=True)
    r = requests.get(url, stream=True)
    CHUNK_SIZE = 1048576
    total_length = int(r.headers.get('content-length'))
    with open(destination, "wb") as f:
        with tqdm(desc=description, 
                  total=(total_length // CHUNK_SIZE) + 1) as pbar:
            for chunk in r.iter_content(chunk_size=CHUNK_SIZE): 
                if chunk: # filter out keep-alive new chunks
                    f.write(chunk)
                    pbar.update(1)

Here I checked the parameters in function download(...), and found that CHUNK_SIZE=1048576, total_length=248, (total_length // CHUNK_SIZE + 1)=1. Then I found that the zip file it downloaded occupies 248 Bytes, which is consistent with the value of total_length.
However, then it calls ~/.conda/envs/iglu_mhb/lib/python3.9/site-packages/gridworld/data/iglu_dataset.py, line 186:
with ZipFile(path) as zfile:
There was an error:

zipfile.BadZipFile: File is not a zip file

indicating that the downloaded zip file is incomplete, which verified my doubt that total_length is unexpectedly smaller than CHUNK_SIZE. I wonder how to solve this problem.

The whole traceback is as follows, for your reference:

(iglu_mhb) lab@lab-Precision-Tower-7910:~/Desktop/IGLU/codes/iglu-2022-rl-mhb-baseline$ python local_nlp_evaluation.py 

Invalid data stream
Loading parsed dataset failed. Downloading full dataset.
downloading multiturn dataset: 100%|███████████████| 1/1 [00:00<00:00, 799.98it/s]
Traceback (most recent call last):
  File "/home/lab/Desktop/IGLU/codes/iglu-2022-rl-mhb-baseline/local_nlp_evaluation.py", line 62, in <module>
    main()
  File "/home/lab/Desktop/IGLU/codes/iglu-2022-rl-mhb-baseline/local_nlp_evaluation.py", line 29, in main
    dataset = IGLUDataset(task_kwargs=None, force_download=False, )
  File "/home/lab/.conda/envs/iglu_mhb/lib/python3.9/site-packages/gridworld/data/iglu_dataset.py", line 139, in __init__
    self.download_dataset(data_path, force_download)
  File "/home/lab/.conda/envs/iglu_mhb/lib/python3.9/site-packages/gridworld/data/iglu_dataset.py", line 188, in download_dataset
    with ZipFile(path) as zfile:
  File "/home/lab/.conda/envs/iglu_mhb/lib/python3.9/zipfile.py", line 1266, in __init__
    self._RealGetContents()
  File "/home/lab/.conda/envs/iglu_mhb/lib/python3.9/zipfile.py", line 1333, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions