Redis作者手写DeepSeek V4专用推理引擎

#ARTICLE HackerNews 2026.05.08

推荐指数 76.0 NO. 012 · 2026.05.08

发布2026/05/07Score145Comments46

为什么值得看

antirez（Redis作者）开源了ds4，一个专为DeepSeek V4 Flash设计的原生Metal推理引擎，拒绝做通用GGUF包装器。对需要在Apple Silicon上榨干本地推理性能的工程师来说，这是比llama.cpp更激进的单模型优化方案。

编辑判断

llama.cpp的通用性是把双刃剑：支持几百种模型意味着每层抽象都有性能损耗。antirez选择反向操作——只为DS4 Flash一种模型写死优化路径，类似当年Redis用单线程打败多线程数据库的思路。

关键看点在Metal graph executor的定制程度：KV cache布局、prompt rendering、甚至server API glue都是DS4-specific，这通常能挤出20-40%的tokens/s。代价是零泛化能力，换模型就报废。

最适合两类人：一是已经在用DeepSeek V4 Flash做产品且被延迟卡脖子的团队，二是想研究「专用推理引擎vs通用运行时」 trade-off的底层工程师。纯学习目的也可以读代码，antirez的C代码是教科书级别的干净。

社区反馈

意见分歧 42 条评论

核心争论：开源本地模型能否缩小与前沿模型的差距，还是成本与能力鸿沟不可逾越

maherbeg

This is so sick. I'm really curious to see what focused effort on optimizing a single open source model can look like over many months. Not only on the inference serving side, but also on the harness optimization side and building custom workflows to narrow the gap between things frontier models can

dakolli

There will always be a huge gap between frontier models and open source models (unless you're very rich). This whole industry makes no sense, everyone is ignoring the unit economics. It cost 20k a month to running Kimi 2.6 at decent tok/ps, to sell those tokens at a profit you'd need your hardw

bensyverson

If you looked at a graph of GPU power in consumer hardware and model capability per billion parameters over time, it seems inevitable that in the next few years a "good enough" model will run on entry-level hardware. Of course there will always be larger flagship models, but if you can count on dece

替代方案： llama.cppGGUFClaude 4.7 Max thinkingGPT-5.5CodexOpenCode GoCUDA

查看原文 →