⚡️ Z-Image-Turbo
_{^{An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer}}

Welcome to the official repository for the Z-Image（造相）project!

✨ Z-Image

Z-Image is a powerful and highly efficient image generation model with 6B parameters. Currently there are three variants:

🚀 Z-Image-Turbo – A distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers ⚡️sub-second inference latency⚡️ on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
🧱 Z-Image-Base – The non-distilled foundation model. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development.
✍️ Z-Image-Edit – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.

🌟 Features

⚡️ Ultra-Fast Generation: Only 8 inference steps needed (sub-second on enterprise GPUs)
📸 Photorealistic Quality: Strong photorealistic image generation with excellent aesthetic quality
📖 Bilingual Text Rendering: Excels at rendering complex Chinese and English text
🎨 Advanced Architecture: Single-Stream Diffusion Transformer (S3-DiT) with Decoupled-DMD
🚀 Optimized Performance: Includes xformers and Flash Attention support

💾 Installation

Download the latest build from Releases
Extract the archive into any folder you prefer.
On Windows: run Zimage.exe to finalize setup.

🖼️ Showcase

📸 Photorealistic Quality: Z-Image-Turbo delivers strong photorealistic image generation while maintaining excellent aesthetic quality.

📖 Accurate Bilingual Text Rendering: Z-Image-Turbo excels at accurately rendering complex Chinese and English text.

💡 Prompt Enhancing & Reasoning: Prompt Enhancer empowers the model with reasoning capabilities, enabling it to transcend surface-level descriptions and tap into underlying world knowledge.

🧠 Creative Image Editing: Z-Image-Edit shows a strong understanding of bilingual editing instructions, enabling imaginative and flexible image transformations.

🏗️ Model Architecture

We adopt a Scalable Single-Stream DiT (S3-DiT) architecture. In this setup, text, visual semantic tokens, and image VAE tokens are concatenated at the sequence level to serve as a unified input stream, maximizing parameter efficiency compared to dual-stream approaches.

📈 Performance

According to the Elo-based Human Preference Evaluation (on Alibaba AI Arena), Z-Image-Turbo shows highly competitive performance against other leading models, while achieving state-of-the-art results among open-source models.

Click to view the full leaderboard

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
benchmarks		benchmarks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚡️ Z-Image-Turbo
_{^{An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer}}

✨ Z-Image

🌟 Features