扩散模型4倍速无损生成文本
Orthrus 用双视图扩散架构让 LLM 像扩散模型一样并行生成 token,在 Qwen3 上实现 4.25 倍加速且保证严格无损。这对推理成本敏感的 AI 产品团队是实质性利好,可能改变自回归生成的工程范式。
DeepSeek-V4-Flash 是一个可在本地运行、性能接近低端闭源模型的开源模型,配合 DwarfStar 4 精简推理框架实现了低门槛的 activation steering(激活层操控)。这让工程师无需依赖 API 就能实验直接干预模型内部状态来引导输出,为可控生成和模型可解释性研究打开了新空间。
Steering 这个方向在 Anthropic 的 Golden Gate Claude 后一度沉寂,因为闭源模型不开放权重,研究者只能看热闹。DeepSeek-V4-Flash 的关键不是它多强,而是它够小够快够开放,让个人开发者能在笔记本上复现干预第 N 层激活向量、实时观察输出偏转的全过程。
之前想玩 steering 的人要么蹭 Anthropic 的有限 API,要么自己训模型,成本都很高。现在 DwarfStar 4 把 llama.cpp 砍到只跑这一个模型,启动速度和内存占用都下来了,适合快速迭代实验。如果你在做 prompt 工程遇到天花板,或者研究模型对齐需要可控的干预手段,这是目前成本最低的入场券。
建议关注两个落地场景:一是用 steering 做风格/安全护栏的轻量级替代方案,避开 fine-tune 成本;二是结合自动化的向量搜索,找到对特定任务最有效的干预方向和强度,这比手动调 prompt 更有系统性。
核心争论:activation steering 用于移除模型拒绝行为是否属于合理研究用途,还是等同于制造有害工具
> inspired to write this post by antirez’s recent project DwarfStar 4, which is a version of llama.cpp that’s been stripped down to run only DeepSeek-V4-Flash This is not true, it is its own project. Indebted to llama.cpp, sure, but not a stripped down version
Truth seems to sit somewhere in-between, DwarfStar 4 seems to mainly exists only because of llama.cpp, and authors basically were very inspired by llama.cpp's code, and even in some places literally have copied pieces from it, all with proper attribution and everything, I'm not trying to say this is
Send patches! But remember that many speedups end being not exactly correct and the logits drift. But there is extensive testing and even ds4-eval now to test how it performs.