Skip to content

ZethWang/CreativeBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CreativeBench logo

Python MIT License Dataset arXiv

CreativeBench

Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

A benchmark and data synthesis framework for creative code generation across combinatorial and exploratory settings.

HomepagePaperDataset

News

  • [2026.03.13] CreativeBench paper is available on arXiv.
  • [2026.03.13] CreativeBench dataset is available on Hugging Face.
  • [2026.03.13] The project homepage is now live.

CreativeBench is an open-source benchmark and data synthesis framework for creative code generation, featuring two complementary pipelines:

  • Combo (reverse-engineering): Combines solutions from different domains to synthesize new problems and tests.
  • Explore (self-play): Evolves problems through progressive constraints to elicit novel solutions.

This repository provides the pipelines, templates, and artifacts needed to reproduce the dataset generation process.


Contents


Introduction

CreativeBench targets creative code generation: the ability to produce correct, novel solutions under new constraints or from cross-domain recombination. We provide:

  • Combo: cross-domain code recombination + sandbox feedback, yielding novel tasks with verified tests.
  • Explore: progressive constraint self-play, encouraging diverse solution strategies beyond the baseline.

The framework is designed for reproducibility and extensibility, and can be adapted to other languages or models.


Project Structure

.
├── CreativeGen/
│   ├── combo/                 # reverse-engineering pipeline
│   └── explore/               # self-play pipeline
├── datasets-subset/           # sampled datasets only
├── evaluation/                # evaluation utilities
└── inference/                 # inference utilities

Data Resources

We provide sampled datasets in datasets-subset/.

Field definitions (each JSONL line):

  • question: problem statement
  • canonical_solution: reference solution
  • demo_test_func: public tests
  • full_test_func: comprehensive tests
  • language: programming language
  • difficulty: difficulty label

Combo Pipeline (Reverse-Engineering)

Overview

  1. Select domain pairs and build combo prompts
  2. Generate combined solutions
  3. Validate in sandbox
  4. Fix failed solutions using feedback
  5. Generate tests and questions
  6. Format final dataset

Run

bash CreativeGen/combo/run_combo_pipeline.sh \
  <num_combos> <max_fix_attempts> <input_jsonl>

Example:

bash CreativeGen/combo/run_combo_pipeline.sh 5 3 /path/to/input.jsonl

Outputs

A run folder is created under:

CreativeGen/combo/runs/run_YYYYMMDD_HHMMSS/

Key artifacts:

  • combo_final_success.jsonl
  • test_func.jsonl
  • combo_final_dataset.jsonl
  • combo_final_formatted.jsonl

Explore Pipeline (Self-Play)

Overview

  1. Filter source dataset to Python-only (or target language)
  2. Identify key techniques in baseline solutions
  3. Add progressive constraints
  4. Generate constrained solutions
  5. Verify compliance and run sandbox validation
  6. Compute creativity scores
  7. Convert results to inference-ready flat dataset

Run

bash CreativeGen/explore/run_explore_pipeline.sh \
  /path/to/autocodebench.jsonl

Outputs

CreativeGen/explore/runs/run_YYYYMMDD_HHMMSS/
  creativity_evolution_results.json
  creativity_analysis.png
CreativeGen/explore/data/converted/*_infer_*.jsonl

Evaluation

If you have the sandbox server running, you can validate solutions with:

python3 CreativeGen/combo/src/call_sandbox.py \
  --input_file path/to/data.jsonl \
  --output path/to/output.jsonl \
  --solution_key canonical_solution

Sandbox usage details will be documented here.


Todo

The sandbox implementation is being cleaned up, and MultiLanguageSandbox/ is not included in the current public release yet.

  • Upload and document MultiLanguageSandbox/ for code execution and verification.
  • Release setup instructions for the sandbox service used by the combo and explore pipelines.
  • Add end-to-end verification examples for benchmark generation and inference evaluation.
  • Expand sandbox support and documentation for additional programming languages.

License

This project is released under the MIT License. See LICENSE for details.


Citation

If you use CreativeBench in your work, please cite:

@misc{wang2026creativebenchbenchmarkingenhancingmachine,
  title={CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges},
  author={Zi-Han Wang and Lam Nguyen and Zhengyang Zhao and Mengyue Yang and Chengwei Qin and Yujiu Yang and Linyi Yang},
  year={2026},
  eprint={2603.11863},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2603.11863},
}

About

Official code for "CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors