Google 开源扩散模型加速文本生成 4 倍
值得看指数 75.0 NO. 010 · 2026.06.11
发布2026/06/10Score200Comments44
为什么值得看
Google 发布实验性开源模型 DiffusionGemma,在专用 GPU 上推理速度提升最高 4 倍。对需要低延迟交互的本地 AI 应用(如实时对话、代码补全)有直接价值,扩散语言模型的工程化路径开始清晰。
媒体预览
编辑判断
扩散模型做文本生成不是新方向,但之前 CAR 和 Mercury 等尝试都卡在文本离散性和推理步骤的平衡上。Google 这次敢放出来并强调 4x 速度,说明在采样步数压缩或并行解码上有了工程突破。
关键看点是 Gemma 系列的生态位——它定位轻量开源,如果扩散架构能在 2B-4B 参数区间跑赢同尺寸自回归模型,端侧和边缘部署的逻辑会被重写。做本地 AI 工具的团队应该盯紧它的 latency-quality tradeoff 曲线,这比 headline 的 4x 数字更重要。
社区反馈
意见分歧 44 条评论
核心争论:扩散模型本地速度优势 vs 云端规模化劣势及质量差距
A few days ago I was just thinking that Google never talked about their diffusion text generation model after demoing it at I/O a year ago. The rumor is that it was too expensive to run, but with the provided chart using the same 1x H100 hardware and comparing DiffusionGemma to regular Gemma, t
> I'm curious what the downside for this speed is here "DiffusionGemma's speedup is designed for local and low-concurrency inference. In high-QPS cloud serving, autoregressive models can be deployed to saturate compute efficiently, so DiffusionGemma's parallel decoding offers diminishing returns and
Well with a standard autoregressive model you can generate for example 256 tokens at once if you have 256 users, with this approach you can generate 256 tokens for a single user but you need several forward steps. So the diffusion process takes more GFLOPs, if you have enough users you can already b