OpenSakura

🌸 OpenSakura 🌸

_{English | 中文介绍}

🌸 About Us

We build datasets, train models, run evaluations, and publish benchmarks for literary and niche-domain translation. Light novels, visual novels, galgames, web fiction — the domains where generic Machine Translation collapses into polite nonsense.

At OpenSakura, we believe in open science and reproducible results. Everything we do is done in public, with receipts.

🎯 What We Do

📚 Datasets — Curated, schema-validated, with documented lineage and licensing. High-quality parallel corpora designed specifically for our target domains.
🔬 Experiments — Fully reproducible training runs with pinned revisions, deterministic setups, and meticulously logged metrics.
📊 Benchmarks — Comprehensive LLM-as-a-judge pairwise evaluations, establishing Elo-style rankings for translation models.
⚔️ Arena — Blind A/B human evaluation platform to gather high-fidelity preference data for RLHF and DPO.

🚀 Quick Links

Resource	URL
📊 Benchmark Dashboard	bench.opensakura.com
⚔️ Translation Arena	arena.opensakura.com
🤗 Hugging Face	huggingface.co/OpenSakura

📂 Key Repositories

Repository	Description
📖 Users-Please-Come-And-See-This	User-facing guide: Find our datasets, models, benchmarks, and FAQ here.
🛠️ Contributors-You-Dare-Not-Watch-This	Contribution rules: Naming conventions, PR flow, and guidelines.
🗄️ OpenSakura-DS-260130-LN-SFT-Template	Dataset repo template: Schema, validation, and tooling.
🧪 OpenSakura-EXP-260213-LN-SFT-Template	Experiment repo template: Run format, metrics, and env capture.
🏟️ OpenSakura-Arena	Arena platform source code: Built with Next.js, FastAPI, and PostgreSQL.

🤝 Get Involved

We welcome contributions of all kinds! Whether you want to contribute datasets, tooling, evaluation scripts, documentation, or just file issues when something breaks, we need your help!

👉 Start here: Contribution Guidelines

If you use OpenSakura artifacts and they help you, tell a friend. If they don't, tell us what failed (with examples).

🌱 Origins

OpenSakura grew out of the SakuraLLM community — the pioneering open-source project for Japanese-to-Chinese ACGN translation models. Members of that community came together to push the mission further: broader language pairs, rigorous reproducibility, public benchmarks, and a more open development process. We stand on their shoulders and are deeply grateful for the foundation they built.

🌸 中文介绍 🌸

OpenSakura 是一个专注于特定领域大语言模型 (LLM) 翻译的开源社区项目。我们致力于构建高质量数据集、训练专属模型、执行严格评测并发布权威基准——全面覆盖轻小说、视觉小说 (Visual Novel)、Galgame、网络小说等通用机器翻译极易“翻车”的垂直领域。

🎯 我们做什么

📚 数据集 — 经过精心整理与严格校验，具备完整的来源追溯与明确的开源许可。
🔬 实验 — 保证绝对可复现的训练流程，锁定数据集与模型版本，并记录完整的训练指标。
📊 基准测试 (Benchmarks) — 引入基于 LLM 作为裁判 (LLM-as-a-judge) 的成对评测机制，建立科学的 Elo 排名体系。
⚔️ 竞技场 (Arena) — 开展盲评 A/B 人工评测，为后续的 RLHF 和 DPO 算法提供高质量的人类偏好数据。

🚀 快速链接

资源	链接
📊 基准测试看板	bench.opensakura.com
⚔️ 翻译竞技场	arena.opensakura.com
🤗 Hugging Face	huggingface.co/OpenSakura

📂 核心仓库

仓库	说明
📖 Users-Please-Come-And-See-This	用户指南 — 包含数据集、模型、基准测试说明及常见问题 (FAQ)。
🛠️ Contributors-You-Dare-Not-Watch-This	贡献规范 — 命名规则、PR 提交流程及社区行为准则。
🗄️ OpenSakura-DS-260130-LN-SFT-Template	数据集仓库模板 — 包含 Schema 结构、数据校验及工具链。
🧪 OpenSakura-EXP-260213-LN-SFT-Template	实验仓库模板 — 规范化运行记录、指标监控与环境快照。
🏟️ OpenSakura-Arena	竞技场平台源码 — 基于 Next.js + FastAPI + PostgreSQL 构建。

🤝 参与贡献

我们热烈欢迎各种形式的开源贡献！无论是提供数据集、开发工具链、编写评测脚本、完善文档，还是仅仅提交一个 issue 报告 Bug，我们都需要你的力量！

👉 从这里开始: 贡献指南

如果你觉得 OpenSakura 的产出对你有帮助，请推荐给你的朋友们！如果遇到糟糕的翻译结果，请带上具体的例子向我们反馈，帮助我们不断改进。

🌱 社区起源

OpenSakura 脱胎于 SakuraLLM 社区——那是日中 ACGN 翻译领域的开源先驱。怀揣着将这项事业推向更高峰的愿景，社区成员们再次集结：我们致力于支持更丰富的语言对、贯彻更严格的可复现性标准、构建公开透明的基准评测，并推行更加开放的开发流程。

我们站在前人的肩膀上，对 SakuraLLM 奠定的坚实基础致以最深的敬意与感谢。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenSakura

🌸 OpenSakura 🌸

🌸 About Us

🎯 What We Do

🚀 Quick Links

📂 Key Repositories

🤝 Get Involved

🌱 Origins

🌸 中文介绍 🌸

🎯 我们做什么

🚀 快速链接

📂 核心仓库

🤝 参与贡献

🌱 社区起源

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!