Skip to content

"togethercomputer/RedPajama-Data-1T-Sample" dataset does not exist. #33

@y-vectorfield

Description

@y-vectorfield

I implemented qtip for LLM models, however, the following error occured. The "togetehr computer/RedPajama-Data-1T-Smaple" dataset does not exist now.

^MLoading checkpoint shards:   0%|          | 0/16 [00:00<?, ?it/s]^MLoading checkpoint shards:   6%|~V~K         | 1/16 [00:00<00:02,  6.29it/s]^MLoading checkpoint shards:  12%|~V~H~V~N        | 2/16 [00:00<00:02,  5.65it/s]^MLoading checkpoint shards:  19%|~V~H~V~I        | 3/16 [00:00<00:02,  5.53it/s]^MLoo
ading checkpoint shards:  25%|~V~H~V~H~V~L       | 4/16 [00:00<00:02,  5.49it/s]^MLoading checkpoint shards:  31%|~V~H~V~H~V~H~V~O      | 5/16 [00:00<00:02,  5.45it/s]^MLoading checkpoint shards:  38%|~V~H~V~H~V~H~V~J      | 6/16 [00:01<00:03,  3.31it/s]^MLoading checkpoint shards:  44%|~V~H~V~H~V~H~~
V~H~V~M     | 7/16 [00:01<00:02,  3.80it/s]^MLoading checkpoint shards:  50%|~V~H~V~H~V~H~V~H~V~H     | 8/16 [00:01<00:01,  4.20it/s]^MLoading checkpoint shards:  56%|~V~H~V~H~V~H~V~H~V~H~V~K    | 9/16 [00:01<00:01,  4.52it/s]^MLoading checkpoint shards:  62%|~V~H~V~H~V~H~V~H~V~H~V~H~V~N   | 10/11
6 [00:02<00:01,  4.78it/s]^MLoading checkpoint shards:  69%|~V~H~V~H~V~H~V~H~V~H~V~H~V~I   | 11/16 [00:02<00:01,  4.97it/s]^MLoading checkpoint shards:  75%|~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~L  | 12/16 [00:02<00:00,  5.11it/s]^MLoading checkpoint shards:  81%|~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~~
O | 13/16 [00:02<00:00,  5.23it/s]^MLoading checkpoint shards:  88%|~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~J | 14/16 [00:02<00:00,  5.31it/s]^MLoading checkpoint shards:  94%|~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~M| 15/16 [00:03<00:00,  5.37it/s]^MLoading checkpoint shards: 100%|~V~H~V~H~V~H~V~HH
~V~H~V~H~V~H~V~H~V~H~V~H| 16/16 [00:03<00:00,  5.16it/s]
I1127 04:29:45.708108 3912 quantize_finetune_llama.py:134] loaded model
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/root/qtip/quantize_llama/quantize_finetune_llama.py", line 214, in <module>
    main(args)
  File "/root/qtip/quantize_llama/quantize_finetune_llama.py", line 136, in main
    devset = utils.sample_rp1t(tokenizer, args.devset_size, args.ctx_size,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/qtip/lib/utils/data_utils.py", line 197, in sample_rp1t
    dataset = load_dataset('togethercomputer/RedPajama-Data-1T-Sample',
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/py/.venv/lib/python3.11/site-packages/datasets/load.py", line 2594, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/py/.venv/lib/python3.11/site-packages/datasets/load.py", line 2266, in load_dataset_builder
    dataset_module = dataset_module_factory(
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/py/.venv/lib/python3.11/site-packages/datasets/load.py", line 1908, in dataset_module_factory
    raise e1 from None
  File "/root/.local/share/py/.venv/lib/python3.11/site-packages/datasets/load.py", line 1858, in dataset_module_factory
    raise DatasetNotFoundError(f"Dataset '{path}' doesn't exist on the Hub or cannot be accessed.") from e
datasets.exceptions.DatasetNotFoundError: Dataset 'togethercomputer/RedPajama-Data-1T-Sample' doesn't exist on the Hub or cannot be accessed.

The following table is datasets list of togethercomputer. The dataset does not exist now in the lineup.

Image

https://huggingface.co/togethercomputer/datasets

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions