SemEval-2026 Task 13: Detecting Machine-Generated Code with Multiple Programming Languages, Generators, and Application Scenarios

🔍 Task Overview

The rise of generative models has made it increasingly difficult to distinguish machine-generated code from human-written code — especially across different programming languages, domains, and generation techniques.

SemEval-2026 Task 13 challenges participants to build systems that can detect machine-generated code under diverse conditions by evaluating generalization to unseen languages, generator families, and code application scenarios.

The task consists of three subtasks:

Subtask A: Binary Machine-Generated Code Detection

Goal:
Given a code snippet, predict whether it is:

(i) Fully human-written, or
(ii) Fully machine-generated

Training Languages: C++, Python, Java
Training Domain: Algorithmic (e.g., Leetcode-style problems)

Evaluation Settings:

Setting	Language	Domain
(i) Seen Languages & Seen Domains	C++, Python, Java	Algorithmic
(ii) Unseen Languages & Seen Domains	Go, PHP, C#, C, JS	Algorithmic
(iii) Seen Languages & Unseen Domains	C++, Python, Java	Research, Production
(iv) Unseen Languages & Domains	Go, PHP, C#, C, JS	Research, Production

Dataset Size:

Train - 500K samples (238K Human-Written | 262K Machine-Generated)
Validation - 100K samples

Target Metric - Macro F1-score (we will build the leaderboard based on it), but you are free to use whatever works best for your approach during training.

Subtask B: Multi-Class Authorship Detection

Goal:
Given a code snippet, predict its author:

(i) Human
(ii–xi) One of 10 LLM families:
- DeepSeek-AI, Qwen, 01-ai, BigCode, Gemma, Phi, Meta-LLaMA, IBM-Granite, Mistral, OpenAI

Evaluation Settings:

Seen authors: Test-time generators appeared in training
Unseen authors: Test-time generators are new but from known model families

Dataset Size:

Train - 500K samples (442K Human |4K DeepSeek-AI | 8K Qwen| 3K 01-ai |2 K BigCode |2K Gemma | 5K Phi | 8K Meta-LLaMA |8K IBM-Granite| 4K Mistral |10K OpenAI)
Validation - 100K samples

Target Metric - Macro F1-score (we will build the leaderboard based on it), but you are free to use whatever works best for your approach during training.

Subtask C: Hybrid Code Detection

Goal:
Classify each code snippet as one of:

Human-written
Machine-generated
Hybrid — partially written or completed by LLM
Adversarial — generated via adversarial prompts or RLHF to mimic humans

Dataset Size:

Train - 900K samples (485K Human-written | 210K Machine-generated | 85K Hybrid | 118K Adversarial)
Validation - 200K samples

Target Metric - Macro F1-score (we will build the leaderboard based on it), but you are free to use whatever works best for your approach during training.

📁 Data Format

All data will be released via:
- Kaggle
- HuggingFace Datasets
- In this GitHub repo as .parquet file
For each subtask:
- Dataset contains code, label (which is label id), and additional meta-data such as programming language (language), and the generator.
- Label mappings (label_to_id.json and id_to_label.json) are provided in each task folder

🔒 Data and Model Restrictions

The use of additional training data is not allowed. Participants must use only the official training sets provided for each subtask.
It is also not permitted to use models that have been pre-trained specifically for AI-generated code detection by third parties.
However, participants are allowed to use general-purpose or code-oriented pre-trained models (e.g., CodeBERT, StarCoder, etc.)

Please adhere strictly to these rules to ensure a fair comparison across submissions. If you have any doubts, contact task organizers

📤 Submission Format

Submit a .csv file with two columns:
- id: unique identifier of the code snippet
- label: the label ID (not the string label)
Sample submission files are available in each task’s folder
A single scorer script (scorer.py) is used for all subtasks
Evaluation measure: macro F1 for all subtasks

📢 Kaggle competition

The Kaggle competitions for the SemEval task are now live! You can submit your system outputs using the following links:

At the moment, only the public test set is available. The leaderboard shown now is for convenience only - it reflects results on the public test set. We will release the private test set on Jan. 10, which will be used for the final evaluation and ranking.

Please make sure to resubmit your final predictions once the private test set is released, as only those submissions will be considered for the official evaluation. We will inform all participants when the private test data becomes available.

FAQs

Q1: What’s the participation process and how do I register?

We will release our Kaggle website soon for participant registration. You can register anytime before the evaluation phase begins. Once registered, simply prepare your detection results and submit them before the evaluation deadline.

Q2: There are three tasks. Do I have to participate in all of them, or can I choose?

You are free to participate in one, two, or all three tasks—it’s completely up to you.

Q3: What are the important dates for the project?

We aim to align all key dates with the official SemEval committee schedule and will announce them accordingly.

Q4: What methods and technologies can I use?

Be creative! The only restriction is that we do not allow the usage of AI-generated content detectors, trained by third parties and restrict the data usage to the provided training sets. Other than that, feel free to use any methods and technologies you prefer.

Q5: How do I start?

You can use Starter Files to have some direction of work. You may experiment with backbone models, training strategy etc. Also feel free to ask questions and share your ideas on Discussion page of Kaggle competitions.

Important Dates

~~Sample data ready: 15 July 2025~~
Training data ready: 1 September 2025
Evaluation data ready: 1 December 2025 (we already released the training and validation datasets)
Evaluation data ready and evaluation start: 10 January 2026 (we will share private test data at this time)
Evaluation end: 24 January 2026
Paper submission due: February 2026
Notification to authors: March 2026
Camera ready due April 2026
SemEval workshop Summer 2026 (co-located with a major NLP conference)

Citation

Our task is based on enriched data from our previous works. Please, consider citing them, when using data from this task

Droid: A Resource Suite for AI-Generated Code Detection

@misc{orel2025textttdroidresourcesuiteaigenerated,
      title={$\texttt{Droid}$: A Resource Suite for AI-Generated Code Detection}, 
      author={Daniil Orel and Indraneil Paul and Iryna Gurevych and Preslav Nakov},
      year={2025},
      eprint={2507.10583},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2507.10583}, 
}

CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings

@inproceedings{orel-etal-2025-codet,
    title = "{C}o{D}et-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings",
    author = "Orel, Daniil  and
      Azizov, Dilshod  and
      Nakov, Preslav",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.550/",
    pages = "10570--10593",
    ISBN = "979-8-89176-256-5",
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
baselines		baselines
task_A		task_A
task_B		task_B
task_C		task_C
LICENSE		LICENSE
README.md		README.md
format_checker.py		format_checker.py
scorer.py		scorer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SemEval-2026 Task 13: Detecting Machine-Generated Code with Multiple Programming Languages, Generators, and Application Scenarios

🔍 Task Overview

Subtask A: Binary Machine-Generated Code Detection

Subtask B: Multi-Class Authorship Detection

Subtask C: Hybrid Code Detection

📁 Data Format

🔒 Data and Model Restrictions

📤 Submission Format

📢 Kaggle competition

FAQs

Important Dates

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

mbzuai-nlp/SemEval-2026-Task13

Folders and files

Latest commit

History

Repository files navigation

SemEval-2026 Task 13: Detecting Machine-Generated Code with Multiple Programming Languages, Generators, and Application Scenarios

🔍 Task Overview

Subtask A: Binary Machine-Generated Code Detection

Subtask B: Multi-Class Authorship Detection

Subtask C: Hybrid Code Detection

📁 Data Format

🔒 Data and Model Restrictions

📤 Submission Format

📢 Kaggle competition

FAQs

Important Dates

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages