Google 开源扩散模型加速文本生成 4 倍

#ARTICLE HackerNews 2026.06.11

值得看指数 75.0 NO. 010 · 2026.06.11

发布2026/06/10Score200Comments44

为什么值得看

Google 发布实验性开源模型 DiffusionGemma，在专用 GPU 上推理速度提升最高 4 倍。对需要低延迟交互的本地 AI 应用（如实时对话、代码补全）有直接价值，扩散语言模型的工程化路径开始清晰。

媒体预览

编辑判断

扩散模型做文本生成不是新方向，但之前 CAR 和 Mercury 等尝试都卡在文本离散性和推理步骤的平衡上。Google 这次敢放出来并强调 4x 速度，说明在采样步数压缩或并行解码上有了工程突破。

关键看点是 Gemma 系列的生态位——它定位轻量开源，如果扩散架构能在 2B-4B 参数区间跑赢同尺寸自回归模型，端侧和边缘部署的逻辑会被重写。做本地 AI 工具的团队应该盯紧它的 latency-quality tradeoff 曲线，这比 headline 的 4x 数字更重要。

社区反馈

意见分歧 44 条评论

核心争论：扩散模型本地速度优势 vs 云端规模化劣势及质量差距

minimaxir

A few days ago I was just thinking that Google never talked about their diffusion text generation model after demoing it at I/O a year ago. The rumor is that it was too expensive to run, but with the provided chart using the same 1x H100 hardware and comparing DiffusionGemma to regular Gemma, t

ac29

> I'm curious what the downside for this speed is here "DiffusionGemma's speedup is designed for local and low-concurrency inference. In high-QPS cloud serving, autoregressive models can be deployed to saturate compute efficiently, so DiffusionGemma's parallel decoding offers diminishing returns and

GaggiX

Well with a standard autoregressive model you can generate for example 256 tokens at once if you have 256 users, with this approach you can generate 256 tokens for a single user but you need several forward steps. So the diffusion process takes more GFLOPs, if you have enough users you can already b

替代方案： GemmaMercuryFableGPT 5.5Gemini 3.5OpenCode

查看原文 →