Gemma 4 手机端量化模型发布

#ARTICLE HackerNews 2026.06.06

推荐指数 64.0 NO. 013 · 2026.06.06

发布2026/06/05Score145Comments28

为什么值得看

Google 为 Gemma 4 推出 Quantization-Aware Training 量化模型，4B 参数版本可在手机端本地运行。对端侧 AI 开发者意味着无需再为移动部署做复杂的后量化调优，开箱即用的精度损失更小。

编辑判断

端侧部署的痛点从来不是模型有没有，而是量化后精度崩多少、调优成本有多高。Google 这次把 QAT 做到官方支持，相当于把之前 MLX、llama.cpp 社区里各种民间量化方案收编标准化了。

对做 AI 应用的团队来说，这省掉的不是一点工程时间——以前要维护 FP16/INT8/INT4 多条分支，现在一条 Gemma 4 QAT 管线直接覆盖手机到笔记本。已经在用 Llama 3 端侧方案的团队，建议拿 Gemma 4 4B QAT 跑一下你的 benchmark，Google 的 QAT 在 perplexity 上的损失控制确实比 PTQ 好一截，如果延迟和内存占用也能打平，切换成本很低。

社区反馈

意见分歧 26 条评论

核心争论：QAT量化模型是真实精度提升还是营销包装，发布节奏混乱是否影响开发者体验

minimaxir

It's a bit awkward to release Gemma 4 12B (https://news.ycombinator.com/item?id=48385906), and then a canonical Q4_0 Gemma 4 12B a couple days later. It's good that this post lists the expected VRAM usage for the models with Q4_0 Gemma 4 12B being 6.7GB, which will indeed fit Google's

netdur

not sure if I understand you, but 4Q and QAT 4Q are different

refulgentis

It's super annoying when you have products that utilize these because there's...4? releases in 3 weeks? - Gemma 4 2B/4B/27BE3B/31B - Gemma 4 2B/4B/27BE3B/31B x "assistant" / MTP drafter models (i.e. multitoken prediction) - Gemma 4 12B (2 days ago? 1?) - Gemma 4 QA

替代方案： Unsloth StudioUnsloth

查看原文 →