-
Notifications
You must be signed in to change notification settings - Fork 350
Pull requests: datajuicer/data-juicer
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat: add DocumentLineDeduplicator for cross-document line-level dedup
#961
opened Apr 4, 2026 by
amanyara
Loading…
fix(service): propagate text_keys to ops via get_init_configs in _set…
#960
opened Apr 3, 2026 by
cmgzn
Loading…
[Env] Reduce the size of default deps to speed up the installation.
dj:ci/cd
issues/PRs about CI/CD of Data-Juicer
dj:efficiency
regarding to efficiency issues and enhancements
environment
related to third-party dependency, DJ-pypi, DJ-docker, etc.
#959
opened Apr 1, 2026 by
HYLcool
Loading…
1 task done
feat(agent): interaction quality ops & recipe, bad-case HTML report, and robust JSONL / HF meta loading
agent
related to agent
dj:op
issues/PRs about some specific OPs
dj:post-tuning
issues/PRs about post-tuning scenarios
enhancement
New feature or request
#957
opened Mar 25, 2026 by
yxdyc
Loading…
feat(semantic_ops): MVP extract + condition filter; join/agg/top-k planned
#948
opened Mar 19, 2026 by
yxdyc
Loading…
In PyArrow 20.0.0+, when using open_json to read data in batches, an …
#942
opened Mar 17, 2026 by
HunterLine
Loading…
fix(tests): Specify top_level_dir to resolve tools module ambiguity in test discovery
#940
opened Mar 16, 2026 by
cmgzn
Loading…
feat: add human-centric video understanding operators for HumanVBench
#938
opened Mar 13, 2026 by
SYSUzhouting
Loading…
[WIP] feat: Integrate ElasticJuicer Core Modules
#934
opened Mar 11, 2026 by
fengrui-z
Loading…
1 of 4 tasks
[WIP] arXiv/PDF to Markdown mappers + dj-op one-shot runner
dj:op
issues/PRs about some specific OPs
#917
opened Feb 14, 2026 by
yxdyc
Loading…
[WIP] Multi-branch executor
dj:core
issues/PRs about the core functions of Data-Juicer
enhancement
New feature or request
#916
opened Feb 13, 2026 by
yxdyc
Loading…
[WIP] feat: Add combined_logical_filter operator with AND/OR support
dj:op
issues/PRs about some specific OPs
#914
opened Feb 13, 2026 by
yxdyc
Loading…
[WIP] Feat: Add video_calibration_mapper and video_split_by_frame_mapper
#902
opened Feb 1, 2026 by
1van2ha0
Loading…
3 tasks
[WIP] feat: Pr 839 s3 download checkpoint resume and unittest for s3 download
#870
opened Dec 25, 2025 by
Dludora
Loading…
Depth seg new op
dj:op
issues/PRs about some specific OPs
#862
opened Dec 22, 2025 by
archernsy
Loading…
[NewOp] Add generate_challenging_qa_mapper based on MindGYM principles
#703
opened Jun 14, 2025 by
Bat-Reality
Loading…
Optimization framework
dj:core
issues/PRs about the core functions of Data-Juicer
dj:efficiency
regarding to efficiency issues and enhancements
#702
opened Jun 13, 2025 by
cyruszhang
•
Draft
[NewOp] Add domain_diversity_selector based on DaaR principles
#699
opened Jun 12, 2025 by
lingzhq
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-04-01.