This repository contains a collection of resources and papers on diffusion language models.
Diffusion language models
Dieleman, Sander
[Website]
Gemini-diffusion
Google
[Website]
Diffusion Models for Non-autoregressive Text Generation: A Survey
[https://arxiv.org/abs/2303.06574]
A Survey of Diffusion Models in Natural Language Processing
[https://arxiv.org/abs/2305.14671]
Discrete Diffusion in Large Language and Multimodal Models: A Survey
[https://arxiv.org/pdf/2506.13759]
Structured Denoising Diffusion Models in Discrete State-Spaces
D3PM
[https://arxiv.org/abs/2107.03006]
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
SEED
[https://arxiv.org/abs/2310.16834]
Simple and Effective Masked Diffusion Language Models
MDLM Neurips 2024
[https://openreview.net/forum?id=L4uaAR4ArM]
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
ICLR 2025
[https://arxiv.org/abs/2503.09573]
Simplified and Generalized Masked Diffusion for Discrete Data
Neurips 2024, deepmind
[https://github.com/google-deepmind/md4]
Energy-Based Diffusion Language Models for Text Generation
ICLR 2025, stefano Ermon
[https://arxiv.org/abs/2410.21357]
LaViDa: A Large Diffusion Language Model for Multimodal Understanding
[https://arxiv.org/abs/2505.16839]
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
[https://arxiv.org/pdf/2505.16990]
Diffusion Language Models Are Versatile Protein Learners
[arxiv]
NeurIPS 2025 papers: Most of these focus on discrete diffusion or diffusion language models, with a few covering other areas.
-
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
-
Theoretical Benefit and Limitation of Diffusion Language Model
-
STEAD: Robust Provably Secure Linguistic Steganography with Diffusion Language Model
-
StateSpaceDiffuser: Bringing Long-Context Content to Diffusion World Models
-
State Size Independent Statistical Error Bound for Discrete Diffusion Models
-
Remasking Discrete Diffusion Models with Inference-Time Scaling
-
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
-
On Efficiency-Effectiveness Trade-off of Diffusion-based Recommenders
-
Non-Markovian Discrete Diffusion with Causal Language Models
-
Next Semantic Scale Prediction via Hierarchical Diffusion Language Models
-
MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization
-
MMaDA: Unraveling The Design Space for Multimodal Large Diffusion Language Models
-
Learnable Sampler Distillation for Discrete Diffusion Models
-
LaViDa: A Large Diffusion Model for Vision-Language Understanding
-
Language Modeling by Language Models
-
Large Language Diffusion Models
-
KLASS: KL-Adaptive Stability Sampling for Fast Inference in Masked Diffusion Models
-
Informed Correctors for Discrete Diffusion Models
-
Heterogeneous Diffusion Structure Inference for Network Cascade
-
GeoAda: Efficiently Finetune Geometric Diffusion Models with Equivariant Adapters
-
Generative Pre-trained Autoregressive Diffusion Transformer
-
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods
-
Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms
-
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
-
Fading to Grow: Growing Preference Ratios via Preference Fading Discrete Diffusion for Recommendation
-
Encoder-Decoder Block Diffusion Language Models for Efficient Training and Inference
-
Don’t Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation
-
Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models
-
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking
-
Accelerating Diffusion LLMs via Adaptive Parallel Decoding
-
Ambient Diffusion Omni: Training Good Models with Bad Data
-
Ambient Proteins - Training Diffusion Models on Noisy Structures
-
Anchored Diffusion Language Model
-
Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking
-
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
-
Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models
-
Constrained Discrete Diffusion
-
Continuous Diffusion Model for Language Modeling
-
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
-
Deep Compositional Phase Diffusion for Long Motion Sequence Generation
-
Diffusion Beats AR in Data-Constrained Settings
-
DINGO: Constrained Inference for Diffusion LLMs
-
Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees
-
Discrete Spatial Diffusion: Intensity-Preserving Diffusion Modeling
-
dKV-Cache: The Cache for Diffusion Language Models
-
Hierarchical Koopman Diffusion: Fast Generation with Interpretable Diffusion Trajectory
-
Neural Hamiltonian Diffusions for Modeling Structured Geometric Dynamics
-
NeuralPLexer3: Accurate Biomolecular Complex Structure Prediction with Flow Models
-
Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models
-
Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making
-
C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning
-
LAW 2025: Bridging Language, Agent, and World Models for Reasoning and Planning
-
WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
-
World Models Should Prioritize the Unification of Physical and Social Dynamics
-
SimWorld: An Open-ended Simulator for Agents in Physical and Social Worlds
-
Social World Model-Augmented Mechanism Design Policy Learning
-
World Models Should Prioritize the Unification of Physical and Social Dynamics
-
PhysDiff: A Physically-Guided Diffusion Model for Multivariate Time Series Anomaly Detection
-
SegMASt3R: Leveraging Geometric Foundation Models for Wide-Baseline Segment Matching
-
Towards Multiscale Graph-based Protein Learning with Geometric Secondary Structural Motifs