Skip to content

工具池:数据序列化 , 数据集 tools7 , 关键字title , 数据量 0 BUG #126

@zanguixuan3

Description

@zanguixuan3

Image2026-01-04 03:21:49 | INFO | data_engine.utils.logger_utils:144 - Create logger ID 3 with loglevel: INFO, export to /data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/output/log/tool_serialize_meta_preprocess_internal_time_20260104032149.txt
2026-01-04 03:21:49 | INFO | data_engine.core.executor_tools:52 - Preparing tool...
2026-01-04 03:21:49 | INFO | data_engine.tools.base_tool:44 - Setting up data ingester...
2026-01-04 03:21:49 | INFO | data_engine.ingester.csghub_ingester:30 - Using dataset_path: /data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/input, repo:longrui/tools7, branch:main
2026-01-04 03:21:49 | INFO | data_engine.tools.base_tool:55 - Preparing exporter...
2026-01-04 03:21:49 | INFO | data_engine.core.executor_tools:59 - Launching tool...
2026-01-04 03:21:49 | INFO | data_engine.ingester.csghub_ingester:41 - model_id:longrui/tools7
2026-01-04 03:21:49 | INFO | data_engine.ingester.csghub_ingester:43 - endpoint:http://modelhub.cmr-co.com
2026-01-04 03:21:49 | INFO | data_engine.ingester.csghub_ingester:44 - 入参:repo_id:longrui/tools7, repo_type:dataset, revision:main, cache_dir:/data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/input, endpoint:http://modelhub.cmr-co.com, token:b2bc8452d426461d8e4aac51b82fdebc

Downloading .gitattributes: 0%| | 0.00/2.34k [00:00<?, ?B/s]
Downloading .gitattributes: 100%|##########| 2.34k/2.34k [00:00<00:00, 2.50MB/s]

Downloading README.md: 0%| | 0.00/25.0 [00:00<?, ?B/s]
Downloading README.md: 100%|##########| 25.0/25.0 [00:00<00:00, 35.5kB/s]

Downloading data.jsonl: 0%| | 0.00/162 [00:00<?, ?B/s]
Downloading data.jsonl: 100%|##########| 162/162 [00:00<00:00, 221kB/s]
2026-01-04 03:21:50 | INFO | data_engine.ingester.csghub_ingester:54 - result: /data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/input, _src_path: /data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/input
2026-01-04 03:21:50 | INFO | data_engine.tools.base_tool:95 - Data ingested from /data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/input
_accelerator 5555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555
2026-01-04 03:21:50 | DEBUG | data_engine.tools.base_tool:137 - Op [serialize_meta_preprocess_internal] running with number of procs:3
2026-01-04 03:21:50 | INFO | data_engine.tools.base_tool:109 - Processing tool...
/data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/input/data.jsonl
_accelerator -5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5-5
2026-01-04 03:21:50 | INFO | data_engine.tools.base_tool:114 - Tool are done in 0.156s.
2026-01-04 03:21:50 | INFO | data_engine.tools.base_tool:121 - Exporting dataset to somewhere...
2026-01-04 03:21:50 | INFO | data_engine.exporter.csghub_exporter:97 - Start to upload /data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/output/_df_dataset.jsonl/_data to repo: longrui/tools7 with branch: main
2026-01-04 03:21:50 | INFO | data_engine.exporter.csghub_exporter:200 - repo longrui/tools7 all branches: ['main', 'refs-convert-parquet', 'v1']
2026-01-04 03:21:50 | INFO | data_engine.exporter.csghub_exporter:153 - Start to push /data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/output/_df_dataset.jsonl/_data to repo: longrui/tools7 with branch: v2,user_name: longrui, token: b2bc8452d426461d8e4aac51b82fdebc
2026-01-04 03:21:51 | INFO | data_engine.exporter.csghub_exporter:166 - Done push /data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/output/_df_dataset.jsonl/_data to repo: longrui/tools7 with branch: v2
2026-01-04 03:21:51 | INFO | data_engine.exporter.csghub_exporter:169 - Remove /data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/output/_git
2026-01-04 03:21:51 | INFO | data_engine.exporter.csghub_exporter:172 - Remove /data/dataflow/元数据序列化tools7title_48b47295-15dd-4a88-a58a-9681ed70ed43/output/_df_dataset.jsonl/_data
2026-01-04 03:21:51 | WARNING | data_server.job.JobExecutor:127 - Job 120 still in PROCESSING state in finally block, marking as FAILED

Image 实际这个任务创建文件成功了

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions