Gemini API 支持多模态文件检索

#ARTICLE HackerNews 2026.05.11

推荐指数 56.0 NO. 006 · 2026.05.11

发布2026/05/10Score138Comments37

为什么值得看

Google 扩展 Gemini API 的文件搜索功能，支持对 PDF、图片、视频等多模态内容做 RAG 检索，并内置引用溯源。这意味着开发者无需自建向量数据库和解析管线，可直接用托管服务构建能验证答案来源的问答系统。

媒体预览

编辑判断

Google 此举是直接用托管服务抢 RAG 基础设施的市场。之前团队做多模态 RAG 要串接 Unstructured 解析、自托管向量库、写重排逻辑，现在一条 API 全包，但代价是锁进 Google 生态。

对已经在用 Vertex AI 的团队，这是省掉一个工程团队的选项；但对用 OpenAI 或自研栈的团队，迁移成本不低，尤其是引用格式和 chunk 策略不可控。

最值得观察的是定价——如果检索 token 比输入 token 贵 3-5 倍，高频场景下自建仍可能更划算，建议等 benchmark 出来再决定是否迁移核心管线。

社区反馈

负面 32 条评论

核心争论：Gemini 模型能力是否掉队，Google 产品执行力是否拖后腿

FrequentLurker

This might be great and all but I am still miffed at how simple search on AI Studio is. You can only search the titles of your conversations and nothing inside them. On top of that they messed with the scrolling so Ctrl+F doesn't work reliably.

greesil

Too bad they can't just easily vibe code new features.

bloqs

Yeah, what happened to no more SWE

替代方案： OpenAIAnthropicClaudeChatGPTCodexClaude CodeClaude DesktopDeepSeekGPT-OSSOpenCode

查看原文 →