Multi-Agent MT is a pipeline-based AI agent framework that progressively refines translations via Translate → Post-edit → Proofread.
- Dec 27, 2025 — Version 1.0 officially released
- Nov 8, 2025 — Our work was accepted and presented at the Conference on Machine Translation (WMT) 2025
- Support for both single-task and multi-task execution modes
- Integration of Rubric-MQM as an automatic post-editing (APE) component
- Fully asynchronous OpenAI API integration
export OPENAI_API_KEY=sk-xxxx
# or
export OPENAI_API_KEYS=sk-key1,sk-key2This project uses Rubric-MQM (version 2.0) as an automatic post-editing (APE) component. Clone it at the v2.0 release under the name rubric_mqm and add it to PYTHONPATH.
git clone --branch v2.0 https://github.com/trotacodigos/Rubric-MQM.git rubric_mqm
export PYTHONPATH=$PYTHONPATH:/your/full/path/to/rubric_mqmYou can run the system in either single-task mode (one agent) or multi-task mode (full pipeline: translate → postedit → proofread). Model selection and decoding parameters are fully configurable via YAML.
🤖 Single-tasker Example (config file ↗)
task: proofread
model:
name: gpt-5
temperature: 0.7
max_tokens: 1024🤖🤖🤖 Multi-tasker Example (config file ↗)
model:
translate:
name: gpt-4.1
temperature: 0.7
max_tokens: 1024
postedit:
name: gpt-4o
temperature: 0.7
max_tokens: 1024
name: gpt-5
temperature: 0.7
max_tokens: 1024- The
targetfield is ignored by the Translation agent, but is required by the Post-edit and Proofread agents, where it serves as the initial hypothesis. - If you already have a translation, you can skip the Translate Agent by setting
skip_translate_if_provided: truein your multi-task configuration.- Skips the translate step and directly proceeds to postedit → proofread
- Only available in multi-task mode
- During multi-task execution, agents iteratively upate the
targetfield.
Input data must be provided as a CSV file. The required columns for all modes are:
- src_lang
- tgt_lang
- src_text
Example CSV format
| src_lang | tgt_lang | src_text | target | ref_text | domain |
| ... | ... | ... | ... | ... | ... |
🧑🏫 Source: 你永远主动联系不上这个专员,也不知道她的工号,也没有直线联系电话,就是你联系不上她,只有她联系你。
🧑🏫 Reference: Since you don't know the commissioner's job number and there isn't a direct phone number to call, you'll never make the effort to get in touch with her, She is the only one who can reach you, You can't.
🤖 Translate: You never actively contact the commissioner, you never know her job number, you never have a direct telephone line, you never contact her, she only contacts you.
🤖 Postedit: You can never proactively reach this specialist. You don't know her employee ID, nor do you have a direct phone number. It's always that you cannot contact her; only she can contact you.
🤖 Proofread: You can never proactively contact the commissioner, you never know her employee ID, you never have a direct telephone line, you cannot reach her, she only contacts you.
🤖🤖🤖 Multi-agent Translation[1]: You can never proactively reach this commissioner, as you don’t know her employee ID or have a direct phone number; only she contacts you, and you cannot get in touch with her.
[1] This translation differs from those produced by single-agent systems, which generate outputs based on a previously provided translation. In contrast, the multi-agent approach performs the translation process collaboratively from scratch.
Multi-AgentMT/
├─ agents/
│ ├─ run.py # CLI entry point
│ ├─ core/
│ │ ├─ engine.py # Async batch execution engine
│ │ └─ call_api.py # Async OpenAI API wrapper
│ ├─ modules/
│ │ ├─ singletasker.py # Single-task execution
│ │ └─ multitasker.py # Multi-task pipeline
│ │ └─ dispatcher/ # Prompt & parameter dispatch
│ ├─ parser/ # Output parsing
│ └─ prompt/ # Prompt templates
│
├─ rubric_mqm/ # (submodule) MQM evaluation toolkit
│ └─ metric/
│
├─ data/
│ └─ sample.csv
│
└─ agents/config/
├─ single.yaml
└─ multi.yaml
If you use this framework in your research or projects, please cite it as follows:
@inproceedings{kim-2025-multi,
title = "Multi-agent{MT}: Deploying {AI} Agent in the {WMT}25 Shared Task",
author = "Kim, Ahrii",
editor = "Haddow, Barry and
Kocmi, Tom and
Koehn, Philipp and
Monz, Christof",
booktitle = "Proceedings of the Tenth Conference on Machine Translation",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.wmt-1.53/",
doi = "10.18653/v1/2025.wmt-1.53",
pages = "769--777",
ISBN = "979-8-89176-341-8",
abstract = "We present Multi-agentMT, our system for the WMT25 General Shared Task. The model adopts Prompt Chaining, a multi-agent workflow combined with Rubric-MQM, an automatic MQM-based error annotation metric. Our primary submission follows a Translate{--}Postedit{--}Proofread pipeline, in which error positions are explicitly marked and iteratively refined. Results suggest that a semi-autonomous agent scheme for machine translation is feasible with a smaller, earlier-generation model in low-resource settings, achieving comparable quality at roughly half the cost of larger systems."
}
@inproceedings{kim-2025-preliminary,
title = "A Preliminary Study of {AI} Agent Model in Machine Translation",
author = "Kim, Ahrii",
editor = "Haddow, Barry and
Kocmi, Tom and
Koehn, Philipp and
Monz, Christof",
booktitle = "Proceedings of the Tenth Conference on Machine Translation",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.wmt-1.32/",
doi = "10.18653/v1/2025.wmt-1.32",
pages = "583--586",
ISBN = "979-8-89176-341-8",
abstract = "We present IR{\_}Multi-agentMT, our submission to the WMT25 General Shared Task. The system adopts an AI-agent paradigm implemented through a multi-agent workflow, Prompt Chaining, in combination with RUBRIC-MQM, an automatic MQM-based error annotation metric. Our primary configuration follows the Translate{--}Postedit{--}Proofread paradigm, where each stage progressively enhances translation quality. We conduct a preliminary study to investigate (i) the impact of initial translation quality and (ii) the effect of enforcing explicit responses from the Postedit Agent. Our findings highlight the importance of both factors in shaping the overall performance of multi-agent translation systems."
}
