Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges
A benchmark and data synthesis framework for creative code generation across combinatorial and exploratory settings.
- [2026.03.13] CreativeBench paper is available on arXiv.
- [2026.03.13] CreativeBench dataset is available on Hugging Face.
- [2026.03.13] The project homepage is now live.
CreativeBench is an open-source benchmark and data synthesis framework for creative code generation, featuring two complementary pipelines:
- Combo (reverse-engineering): Combines solutions from different domains to synthesize new problems and tests.
- Explore (self-play): Evolves problems through progressive constraints to elicit novel solutions.
This repository provides the pipelines, templates, and artifacts needed to reproduce the dataset generation process.
- News
- Introduction
- Project Structure
- Data Resources
- Combo Pipeline (Reverse-Engineering)
- Explore Pipeline (Self-Play)
- Evaluation
- Todo
- License
CreativeBench targets creative code generation: the ability to produce correct, novel solutions under new constraints or from cross-domain recombination. We provide:
- Combo: cross-domain code recombination + sandbox feedback, yielding novel tasks with verified tests.
- Explore: progressive constraint self-play, encouraging diverse solution strategies beyond the baseline.
The framework is designed for reproducibility and extensibility, and can be adapted to other languages or models.
.
├── CreativeGen/
│ ├── combo/ # reverse-engineering pipeline
│ └── explore/ # self-play pipeline
├── datasets-subset/ # sampled datasets only
├── evaluation/ # evaluation utilities
└── inference/ # inference utilities
We provide sampled datasets in datasets-subset/.
Field definitions (each JSONL line):
question: problem statementcanonical_solution: reference solutiondemo_test_func: public testsfull_test_func: comprehensive testslanguage: programming languagedifficulty: difficulty label
- Select domain pairs and build combo prompts
- Generate combined solutions
- Validate in sandbox
- Fix failed solutions using feedback
- Generate tests and questions
- Format final dataset
bash CreativeGen/combo/run_combo_pipeline.sh \
<num_combos> <max_fix_attempts> <input_jsonl>Example:
bash CreativeGen/combo/run_combo_pipeline.sh 5 3 /path/to/input.jsonlA run folder is created under:
CreativeGen/combo/runs/run_YYYYMMDD_HHMMSS/
Key artifacts:
combo_final_success.jsonltest_func.jsonlcombo_final_dataset.jsonlcombo_final_formatted.jsonl
- Filter source dataset to Python-only (or target language)
- Identify key techniques in baseline solutions
- Add progressive constraints
- Generate constrained solutions
- Verify compliance and run sandbox validation
- Compute creativity scores
- Convert results to inference-ready flat dataset
bash CreativeGen/explore/run_explore_pipeline.sh \
/path/to/autocodebench.jsonlCreativeGen/explore/runs/run_YYYYMMDD_HHMMSS/
creativity_evolution_results.json
creativity_analysis.png
CreativeGen/explore/data/converted/*_infer_*.jsonl
If you have the sandbox server running, you can validate solutions with:
python3 CreativeGen/combo/src/call_sandbox.py \
--input_file path/to/data.jsonl \
--output path/to/output.jsonl \
--solution_key canonical_solutionSandbox usage details will be documented here.
The sandbox implementation is being cleaned up, and MultiLanguageSandbox/ is not included in the current public release yet.
- Upload and document
MultiLanguageSandbox/for code execution and verification. - Release setup instructions for the sandbox service used by the combo and explore pipelines.
- Add end-to-end verification examples for benchmark generation and inference evaluation.
- Expand sandbox support and documentation for additional programming languages.
This project is released under the MIT License. See LICENSE for details.
If you use CreativeBench in your work, please cite:
@misc{wang2026creativebenchbenchmarkingenhancingmachine,
title={CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges},
author={Zi-Han Wang and Lam Nguyen and Zhengyang Zhao and Mengyue Yang and Chengwei Qin and Yujiu Yang and Linyi Yang},
year={2026},
eprint={2603.11863},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2603.11863},
}