ArXiv cs.AI 周报:AI 智能体领域每周论文追踪(2026 年 5 月第一周)
本周 ArXiv cs.AI 类别共收录 98 篇论文,其中 30 篇聚焦智能体相关研究。多智能体推理实现 Pareto-optimal 测试时扩展,突破单智能体计算效率瓶颈;Agent Capsules 通过质量门控粒度控制减少 51% token 消耗;RAG-Gym 提供语言智能体检索增强生成的系统化优化框架。
数据概览
- 快照周次:2026-05-01 至 2026-05-07
- 追踪器:ArXiv AI 智能体论文(查看所有快照:
/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly) - 更新频率:每周
- 主要来源:ArXiv cs.AI RSS、ArXiv cs.CL RSS、ArXiv API
关键事实
- 谁:本周 cs.AI 共收录 98 篇论文;涉及 cs.AI、cs.CL 和 cs.MA 类别的 30 篇智能体相关投稿
- 什么:多智能体推理论文引入 Pareto-optimal 测试时扩展;Agent Capsules 实现 51% token 缩减;5 篇论文趋势评分达 7+
- 何时:2026 年 5 月 1 日至 7 日当周
- 影响:多智能体框架以 15 篇论文占据主导;RAG 优化研究环比增长 50%;token 效率成为关键设计考量因素
方法论
本追踪器监控 ArXiv cs.AI 和 cs.CL 类别的 AI 智能体相关投稿。论文筛选标准包括与智能体、多智能体系统、工具调用、推理和检索增强生成(RAG)的相关性。趋势评分(1-10 分)基于新颖性、引用速度、实用性和行业相关性趋势进行评定。数据通过 ArXiv RSS 订阅源和 ArXiv API 查询采集。本快照覆盖 2026 年 5 月 1 日至 7 日期间发表的论文。
本周数据
趋势评分 Top 20 论文
| ArXiv ID | 标题 | 趋势评分 | 关键主题 |
|---|---|---|---|
| 2605.01566 | Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling | 9 | 多智能体、推理、测试时扩展、计算效率 |
| 2605.00410 | Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines | 8 | 多智能体、流水线优化、token 效率、质量门控 |
| 2502.13957 | RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation | 8 | RAG、智能体优化、系统化框架、语言智能体 |
| 2605.01986 | 12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation | 7 | 多智能体、决策制定、评测、陪审团模拟 |
| 2605.01604 | Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework | 7 | Agentic AI、评测、失效模式、生产部署 |
| 2605.03534 | SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG | 7 | RAG、证据验证、不确定性感知、选择性检索 |
| 2605.03476 | CuraView: Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification | 7 | 多智能体、RAG、幻觉检测、医疗 AI、GraphRAG |
| 2412.20138 | TradingAgents: Multi-Agents LLM Financial Trading Framework | 7 | 多智能体、LLM、交易、金融分析 |
| 2410.02958 | AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | 7 | 多智能体、LLM、AutoML、自动化 |
| 2412.12881 | RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement | 7 | RAG、推理、MCTS、验证、精化 |
| 2605.01920 | A Language for Describing Agentic LLM Contexts | 6 | Agentic、LLM、上下文规范、形式语言 |
| 2605.01208 | Faithful Mobile GUI Agents with Guided Advantage Estimator | 6 | 智能体、GUI、移动端、优势估计器 |
| 2605.01293 | Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning | 6 | Agentic、神经符号、技能归纳、长时域任务 |
| 2605.01879 | Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems | 6 | 多智能体、自主、规划、层论 |
| 2605.02910 | CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing | 6 | 智能体、推理、创造力、工具重用 |
| 2605.02967 | AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines | 6 | RAG、优化、流水线、声明式 |
| 2605.03228 | MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory | 6 | 智能体、LLM、安全、影子记忆、长时域威胁 |
| 2605.04018 | Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems | 6 | Agentic、推理、检索、搜索系统 |
| 2401.07324 | Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | 6 | 多智能体、工具调用、LLM、分解 |
| 2605.01102 | Towards Multi-Agent Autonomous Reasoning in Hydrodynamics | 5 | 多智能体、自主、推理、流体动力学 |
重点论文摘要
多智能体推理(2605.01566) — 证明多智能体推理在测试时扩展中实现 Pareto-optimal 计算效率,系统性超越单智能体方法。论文提供实证证据表明,将推理分布在多个智能体上,每单位计算量的性能优于扩展单智能体推理。
Agent Capsules(2605.00410) — 引入自适应执行运行时,将多智能体流水线执行视为带有经验质量约束的优化问题。相比手工实现实现 51% token 缩减,同时通过动态粒度控制维持质量阈值。
RAG-Gym(2502.13957) — 针对 RAG 语言智能体的系统化优化综合平台,覆盖三个维度:提示工程、actor 调优和 critic 训练。为 RAG 智能体开发提供可复现基准和优化协议。
12 Angry AI Agents(2605.01986) — 通过电影陪审团审议场景评测多智能体 LLM 决策能力,测试协作、共识形成以及在冲突证据条件下的审慎推理。
环比摘要
| 指标 | 本周 | 上周 | 变化 |
|---|---|---|---|
| 总论文数(cs.AI) | 98 | 30 | +227% |
| 智能体相关论文 | 30 | 25 | +20% |
| 多智能体论文 | 15 | 15 | 0% |
| RAG 相关论文 | 12 | 8 | +50% |
| 高影响力论文(趋势评分 7+) | 10 | 8 | +25% |
| 推理聚焦论文 | 8 | 10 | -20% |
| 工具调用论文 | 4 | 4 | 0% |
生态指标
类别分布
| 类别 | 论文数 | 占比 |
|---|---|---|
| cs.AI(人工智能) | 45 | 45.9% |
| cs.CL(计算与语言) | 35 | 35.7% |
| cs.MA(多智能体系统) | 8 | 8.2% |
| cs.LG(机器学习) | 5 | 5.1% |
| 其他 | 5 | 5.1% |
本周热门主题
| 主题 | 论文数 | 代表论文 |
|---|---|---|
| 多智能体 LLM 框架 | 15 | 2605.01566, 2605.00410, 2605.01986 |
| RAG 优化与评测 | 12 | 2502.13957, 2605.03534, 2605.03476 |
| 智能体推理与决策 | 8 | 2605.01566, 2605.02910, 2605.04018 |
| 工具调用与函数调用 | 4 | 2605.02910, 2401.07324 |
| 自主系统设计 | 6 | 2605.01879, 2605.01102, 2605.01293 |
关键词频率
| 关键词 | 频率 | 环比变化 |
|---|---|---|
| agent | 28 | +7% |
| multi-agent | 15 | 0% |
| RAG | 12 | +50% |
| reasoning | 8 | -20% |
| autonomous | 6 | +50% |
| LLM | 6 | +20% |
| optimization | 5 | +67% |
| evaluation | 5 | +25% |
| tool-use | 4 | 0% |
| safety | 3 | +50% |
趋势与观察
涌现模式
-
多智能体推理中的 Pareto-optimal scaling — 首批明确解决多智能体测试时扩展计算效率权衡的论文出现,超越单智能体优化范式。
-
Token 效率成为一等设计约束 — Agent Capsules 的 51% token 缩减标志着从能力导向转向效率导向的多智能体流水线设计。
-
系统化 RAG 优化框架 — RAG-Gym 引入 Gym 风格环境用于可复现的 RAG 智能体优化,类似强化学习训练范式。
-
GraphRAG 集成用于幻觉检测 — CuraView 展示了结合知识图谱与多智能体验证用于医疗 AI 可靠性的方法。
-
生产评测框架涌现 — 多篇论文涉及 Agentic AI 系统的失效模式、漂移模式和部署监控。
与上周的显著变化
- 首次出现 Pareto-optimal scaling 论文 — 明确的多智能体计算效率优化
- 质量门控粒度控制 — 自适应多智能体流水线执行的新范式
- RAG 相关论文环比增长 50% — 检索与智能体架构持续融合
- Token 分配成为系统设计原则 — 边际 token 分配器框架被提出
🔺 独家情报:别处看不到的洞察
置信度: 高 | 新颖度评分: 62/100
常规报道追踪单篇论文发布,但本周 30 篇智能体论文的聚合信号揭示了研究重心的战略转向。多智能体推理(2605.01566)证明多智能体方法实现 Pareto-optimal scaling,从根本上挑战了单智能体扩展的正统观念。Agent Capsules 的 51% token 缩减(2605.00410)验证了效率提升来自架构优化,而非仅仅依赖模型改进。RAG-Gym(2502.13957)引入可复现优化协议,解决了”每个 RAG 系统都是定制开发”这一阻碍企业采用的问题。
关键启示:平台团队应立即投资多智能体编排基础设施 — 单智能体扩展已呈边际收益递减,51% token 效率提升在生产规模下直接转化为成本降低。RAG-Gym 的系统化方法支持标准化评测,加速从原型到生产的路径。
历史快照
- 2026 年 4 月 23-30 日当周 — ClawNet 引入跨用户智能体协作;HERA 实现 38.69% 提升
- 2026 年 4 月 16-23 日当周 — Actor-Observer Asymmetry 研究涌现;基准测试论文激增 133%
- 2026 年 4 月 9-16 日当周 — 更早的快照数据
查看所有历史快照:
/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly
信息来源
- ArXiv cs.AI RSS Feed — ArXiv,2026 年 5 月
- ArXiv cs.CL RSS Feed — ArXiv,2026 年 5 月
- ArXiv API - AI Agent Papers — ArXiv,2026 年 5 月
完整论文列表(30 篇)
| ArXiv ID | 标题 | 作者 | 类别 | 发表日期 | 趋势评分 |
|---|---|---|---|---|---|
| 2605.01566 | Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling | Florian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp | cs.AI | 2026-05-06 | 9 |
| 2605.00410 | Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines | Aninda Ray | cs.CL, cs.AI | 2026-05-01 | 8 |
| 2502.13957 | RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation | Guangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang | cs.CL, cs.AI | 2025-02-19 | 8 |
| 2605.01986 | 12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation | Ahmet Bahaddin Ersoz | cs.AI | 2026-05-06 | 7 |
| 2605.01604 | Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework | Mukund Pandey | cs.AI | 2026-05-06 | 7 |
| 2605.03534 | SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG | Jingxi Qiu, Zeyu Han, Cheng Huang | cs.CL | 2026-05-06 | 7 |
| 2605.03476 | CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification | Severin Ye, Xiao Kong, Xiaopeng He, Guangsu Yan, Dongsuk Oh | cs.CL | 2026-05-06 | 7 |
| 2412.20138 | TradingAgents: Multi-Agents LLM Financial Trading Framework | Yijia Xiao, Edward Sun, Di Luo, Wei Wang | q-fin.TR, cs.AI | 2024-12-28 | 7 |
| 2410.02958 | AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | Patara Trirat, Wonyong Jeong, Sung Ju Hwang | cs.LG, cs.AI | 2024-10-03 | 7 |
| 2412.12881 | RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement | Jinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao, Yang Song, Tao Zhang | cs.CL, cs.AI | 2024-12-17 | 7 |
| 2605.01920 | A Language for Describing Agentic LLM Contexts | Noga Peleg Pelc, Gal A. Kaminka, Yoav Goldberg | cs.AI | 2026-05-06 | 6 |
| 2605.01208 | Faithful Mobile GUI Agents with Guided Advantage Estimator | Haowen Hu, Pengzhou Cheng, Zheng Wu, Lingzhong Dong, Gongshen Liu, Zhuosheng Zhang | cs.AI | 2026-05-06 | 6 |
| 2605.01293 | Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks | Jie-Jing Shao, Haiyan Yin, Yueming Lyu, Xingrui Yu, Lan-Zhe Guo, Ivor Tsang, James Kwok, Yu-Feng Li | cs.AI | 2026-05-06 | 6 |
| 2605.01879 | Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems | Manuel Hernandez, Eduardo Sanchez-Soto | cs.AI | 2026-05-06 | 6 |
| 2605.02910 | CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing | Cheng Qian, Hyeonjeong Ha, Jiayu Liu, Bingxiang He, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Ji | cs.AI | 2026-05-06 | 6 |
| 2605.02967 | AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines | Xintan Zeng, Yongchao Liu, Yice Luo, Jiajun Zhen | cs.AI | 2026-05-06 | 6 |
| 2605.03228 | MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory | Alexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjes | cs.CL | 2026-05-06 | 6 |
| 2605.04018 | Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems | Yilun Zhao, Jinbiao Wei, Tingyu Song, Siyue Zhang, Chen Zhao, Arman Cohan | cs.CL | 2026-05-06 | 6 |
| 2401.07324 | Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang | cs.AI, cs.CL | 2024-01-14 | 6 |
| 2605.01102 | Towards Multi-Agent Autonomous Reasoning in Hydrodynamics | Jinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawson | cs.AI | 2026-05-06 | 5 |
| 2605.01101 | Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent | Shakeel Sheikh, Patrick Marmaroli, MD Sahidullah, Slim Ouni, Fabrice Hirsch, Goncalo Leal, Bjorn W Schuller | cs.AI | 2026-05-06 | 5 |
| 2605.00841 | AI Agents for Sustainable SMEs: A Green ESG Assessment Framework | Viet Trinh, Tan Nguyen, Minh-Huyen Phan, Quan Luu | cs.AI | 2026-05-06 | 5 |
| 2605.01214 | Agentic AI Systems Should Be Designed as Marginal Token Allocators | Siqi Zhu | cs.AI | 2026-05-06 | 5 |
| 2605.01758 | Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems | Yue Ma, Ziyuan Yang, Yi Zhang | cs.AI | 2026-05-06 | 5 |
| 2605.01675 | CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized Checkers | Yuliang Song, Eldan Cohen | cs.AI | 2026-05-06 | 5 |
| 2605.03314 | When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning | Jiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu You | cs.CL | 2026-05-06 | 5 |
| 2605.00846 | ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations | Navapat Nananukul, Mayank Kejriwal | cs.AI | 2026-05-06 | 5 |
| 2605.01789 | DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents | Qisong Zhang, Wenzhuo Wu, Zhuangzhuang Jia, Yunhao Yang, Huayu Zhang, Xianghao Zang, Zhixiang He, Zhongjiang He, Kongming Liang, Zhanyu Ma | cs.AI | 2026-05-06 | 5 |
| 2605.01847 | NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles | Jia Xiao | cs.AI | 2026-05-06 | 5 |
ArXiv cs.AI 周报:AI 智能体领域每周论文追踪(2026 年 5 月第一周)
本周 ArXiv cs.AI 类别共收录 98 篇论文,其中 30 篇聚焦智能体相关研究。多智能体推理实现 Pareto-optimal 测试时扩展,突破单智能体计算效率瓶颈;Agent Capsules 通过质量门控粒度控制减少 51% token 消耗;RAG-Gym 提供语言智能体检索增强生成的系统化优化框架。
数据概览
- 快照周次:2026-05-01 至 2026-05-07
- 追踪器:ArXiv AI 智能体论文(查看所有快照:
/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly) - 更新频率:每周
- 主要来源:ArXiv cs.AI RSS、ArXiv cs.CL RSS、ArXiv API
关键事实
- 谁:本周 cs.AI 共收录 98 篇论文;涉及 cs.AI、cs.CL 和 cs.MA 类别的 30 篇智能体相关投稿
- 什么:多智能体推理论文引入 Pareto-optimal 测试时扩展;Agent Capsules 实现 51% token 缩减;5 篇论文趋势评分达 7+
- 何时:2026 年 5 月 1 日至 7 日当周
- 影响:多智能体框架以 15 篇论文占据主导;RAG 优化研究环比增长 50%;token 效率成为关键设计考量因素
方法论
本追踪器监控 ArXiv cs.AI 和 cs.CL 类别的 AI 智能体相关投稿。论文筛选标准包括与智能体、多智能体系统、工具调用、推理和检索增强生成(RAG)的相关性。趋势评分(1-10 分)基于新颖性、引用速度、实用性和行业相关性趋势进行评定。数据通过 ArXiv RSS 订阅源和 ArXiv API 查询采集。本快照覆盖 2026 年 5 月 1 日至 7 日期间发表的论文。
本周数据
趋势评分 Top 20 论文
| ArXiv ID | 标题 | 趋势评分 | 关键主题 |
|---|---|---|---|
| 2605.01566 | Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling | 9 | 多智能体、推理、测试时扩展、计算效率 |
| 2605.00410 | Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines | 8 | 多智能体、流水线优化、token 效率、质量门控 |
| 2502.13957 | RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation | 8 | RAG、智能体优化、系统化框架、语言智能体 |
| 2605.01986 | 12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation | 7 | 多智能体、决策制定、评测、陪审团模拟 |
| 2605.01604 | Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework | 7 | Agentic AI、评测、失效模式、生产部署 |
| 2605.03534 | SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG | 7 | RAG、证据验证、不确定性感知、选择性检索 |
| 2605.03476 | CuraView: Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification | 7 | 多智能体、RAG、幻觉检测、医疗 AI、GraphRAG |
| 2412.20138 | TradingAgents: Multi-Agents LLM Financial Trading Framework | 7 | 多智能体、LLM、交易、金融分析 |
| 2410.02958 | AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | 7 | 多智能体、LLM、AutoML、自动化 |
| 2412.12881 | RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement | 7 | RAG、推理、MCTS、验证、精化 |
| 2605.01920 | A Language for Describing Agentic LLM Contexts | 6 | Agentic、LLM、上下文规范、形式语言 |
| 2605.01208 | Faithful Mobile GUI Agents with Guided Advantage Estimator | 6 | 智能体、GUI、移动端、优势估计器 |
| 2605.01293 | Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning | 6 | Agentic、神经符号、技能归纳、长时域任务 |
| 2605.01879 | Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems | 6 | 多智能体、自主、规划、层论 |
| 2605.02910 | CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing | 6 | 智能体、推理、创造力、工具重用 |
| 2605.02967 | AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines | 6 | RAG、优化、流水线、声明式 |
| 2605.03228 | MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory | 6 | 智能体、LLM、安全、影子记忆、长时域威胁 |
| 2605.04018 | Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems | 6 | Agentic、推理、检索、搜索系统 |
| 2401.07324 | Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | 6 | 多智能体、工具调用、LLM、分解 |
| 2605.01102 | Towards Multi-Agent Autonomous Reasoning in Hydrodynamics | 5 | 多智能体、自主、推理、流体动力学 |
重点论文摘要
多智能体推理(2605.01566) — 证明多智能体推理在测试时扩展中实现 Pareto-optimal 计算效率,系统性超越单智能体方法。论文提供实证证据表明,将推理分布在多个智能体上,每单位计算量的性能优于扩展单智能体推理。
Agent Capsules(2605.00410) — 引入自适应执行运行时,将多智能体流水线执行视为带有经验质量约束的优化问题。相比手工实现实现 51% token 缩减,同时通过动态粒度控制维持质量阈值。
RAG-Gym(2502.13957) — 针对 RAG 语言智能体的系统化优化综合平台,覆盖三个维度:提示工程、actor 调优和 critic 训练。为 RAG 智能体开发提供可复现基准和优化协议。
12 Angry AI Agents(2605.01986) — 通过电影陪审团审议场景评测多智能体 LLM 决策能力,测试协作、共识形成以及在冲突证据条件下的审慎推理。
环比摘要
| 指标 | 本周 | 上周 | 变化 |
|---|---|---|---|
| 总论文数(cs.AI) | 98 | 30 | +227% |
| 智能体相关论文 | 30 | 25 | +20% |
| 多智能体论文 | 15 | 15 | 0% |
| RAG 相关论文 | 12 | 8 | +50% |
| 高影响力论文(趋势评分 7+) | 10 | 8 | +25% |
| 推理聚焦论文 | 8 | 10 | -20% |
| 工具调用论文 | 4 | 4 | 0% |
生态指标
类别分布
| 类别 | 论文数 | 占比 |
|---|---|---|
| cs.AI(人工智能) | 45 | 45.9% |
| cs.CL(计算与语言) | 35 | 35.7% |
| cs.MA(多智能体系统) | 8 | 8.2% |
| cs.LG(机器学习) | 5 | 5.1% |
| 其他 | 5 | 5.1% |
本周热门主题
| 主题 | 论文数 | 代表论文 |
|---|---|---|
| 多智能体 LLM 框架 | 15 | 2605.01566, 2605.00410, 2605.01986 |
| RAG 优化与评测 | 12 | 2502.13957, 2605.03534, 2605.03476 |
| 智能体推理与决策 | 8 | 2605.01566, 2605.02910, 2605.04018 |
| 工具调用与函数调用 | 4 | 2605.02910, 2401.07324 |
| 自主系统设计 | 6 | 2605.01879, 2605.01102, 2605.01293 |
关键词频率
| 关键词 | 频率 | 环比变化 |
|---|---|---|
| agent | 28 | +7% |
| multi-agent | 15 | 0% |
| RAG | 12 | +50% |
| reasoning | 8 | -20% |
| autonomous | 6 | +50% |
| LLM | 6 | +20% |
| optimization | 5 | +67% |
| evaluation | 5 | +25% |
| tool-use | 4 | 0% |
| safety | 3 | +50% |
趋势与观察
涌现模式
-
多智能体推理中的 Pareto-optimal scaling — 首批明确解决多智能体测试时扩展计算效率权衡的论文出现,超越单智能体优化范式。
-
Token 效率成为一等设计约束 — Agent Capsules 的 51% token 缩减标志着从能力导向转向效率导向的多智能体流水线设计。
-
系统化 RAG 优化框架 — RAG-Gym 引入 Gym 风格环境用于可复现的 RAG 智能体优化,类似强化学习训练范式。
-
GraphRAG 集成用于幻觉检测 — CuraView 展示了结合知识图谱与多智能体验证用于医疗 AI 可靠性的方法。
-
生产评测框架涌现 — 多篇论文涉及 Agentic AI 系统的失效模式、漂移模式和部署监控。
与上周的显著变化
- 首次出现 Pareto-optimal scaling 论文 — 明确的多智能体计算效率优化
- 质量门控粒度控制 — 自适应多智能体流水线执行的新范式
- RAG 相关论文环比增长 50% — 检索与智能体架构持续融合
- Token 分配成为系统设计原则 — 边际 token 分配器框架被提出
🔺 独家情报:别处看不到的洞察
置信度: 高 | 新颖度评分: 62/100
常规报道追踪单篇论文发布,但本周 30 篇智能体论文的聚合信号揭示了研究重心的战略转向。多智能体推理(2605.01566)证明多智能体方法实现 Pareto-optimal scaling,从根本上挑战了单智能体扩展的正统观念。Agent Capsules 的 51% token 缩减(2605.00410)验证了效率提升来自架构优化,而非仅仅依赖模型改进。RAG-Gym(2502.13957)引入可复现优化协议,解决了”每个 RAG 系统都是定制开发”这一阻碍企业采用的问题。
关键启示:平台团队应立即投资多智能体编排基础设施 — 单智能体扩展已呈边际收益递减,51% token 效率提升在生产规模下直接转化为成本降低。RAG-Gym 的系统化方法支持标准化评测,加速从原型到生产的路径。
历史快照
- 2026 年 4 月 23-30 日当周 — ClawNet 引入跨用户智能体协作;HERA 实现 38.69% 提升
- 2026 年 4 月 16-23 日当周 — Actor-Observer Asymmetry 研究涌现;基准测试论文激增 133%
- 2026 年 4 月 9-16 日当周 — 更早的快照数据
查看所有历史快照:
/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly
信息来源
- ArXiv cs.AI RSS Feed — ArXiv,2026 年 5 月
- ArXiv cs.CL RSS Feed — ArXiv,2026 年 5 月
- ArXiv API - AI Agent Papers — ArXiv,2026 年 5 月
完整论文列表(30 篇)
| ArXiv ID | 标题 | 作者 | 类别 | 发表日期 | 趋势评分 |
|---|---|---|---|---|---|
| 2605.01566 | Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling | Florian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp | cs.AI | 2026-05-06 | 9 |
| 2605.00410 | Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines | Aninda Ray | cs.CL, cs.AI | 2026-05-01 | 8 |
| 2502.13957 | RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation | Guangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang | cs.CL, cs.AI | 2025-02-19 | 8 |
| 2605.01986 | 12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation | Ahmet Bahaddin Ersoz | cs.AI | 2026-05-06 | 7 |
| 2605.01604 | Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework | Mukund Pandey | cs.AI | 2026-05-06 | 7 |
| 2605.03534 | SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG | Jingxi Qiu, Zeyu Han, Cheng Huang | cs.CL | 2026-05-06 | 7 |
| 2605.03476 | CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification | Severin Ye, Xiao Kong, Xiaopeng He, Guangsu Yan, Dongsuk Oh | cs.CL | 2026-05-06 | 7 |
| 2412.20138 | TradingAgents: Multi-Agents LLM Financial Trading Framework | Yijia Xiao, Edward Sun, Di Luo, Wei Wang | q-fin.TR, cs.AI | 2024-12-28 | 7 |
| 2410.02958 | AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | Patara Trirat, Wonyong Jeong, Sung Ju Hwang | cs.LG, cs.AI | 2024-10-03 | 7 |
| 2412.12881 | RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement | Jinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao, Yang Song, Tao Zhang | cs.CL, cs.AI | 2024-12-17 | 7 |
| 2605.01920 | A Language for Describing Agentic LLM Contexts | Noga Peleg Pelc, Gal A. Kaminka, Yoav Goldberg | cs.AI | 2026-05-06 | 6 |
| 2605.01208 | Faithful Mobile GUI Agents with Guided Advantage Estimator | Haowen Hu, Pengzhou Cheng, Zheng Wu, Lingzhong Dong, Gongshen Liu, Zhuosheng Zhang | cs.AI | 2026-05-06 | 6 |
| 2605.01293 | Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks | Jie-Jing Shao, Haiyan Yin, Yueming Lyu, Xingrui Yu, Lan-Zhe Guo, Ivor Tsang, James Kwok, Yu-Feng Li | cs.AI | 2026-05-06 | 6 |
| 2605.01879 | Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems | Manuel Hernandez, Eduardo Sanchez-Soto | cs.AI | 2026-05-06 | 6 |
| 2605.02910 | CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing | Cheng Qian, Hyeonjeong Ha, Jiayu Liu, Bingxiang He, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Ji | cs.AI | 2026-05-06 | 6 |
| 2605.02967 | AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines | Xintan Zeng, Yongchao Liu, Yice Luo, Jiajun Zhen | cs.AI | 2026-05-06 | 6 |
| 2605.03228 | MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory | Alexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjes | cs.CL | 2026-05-06 | 6 |
| 2605.04018 | Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems | Yilun Zhao, Jinbiao Wei, Tingyu Song, Siyue Zhang, Chen Zhao, Arman Cohan | cs.CL | 2026-05-06 | 6 |
| 2401.07324 | Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang | cs.AI, cs.CL | 2024-01-14 | 6 |
| 2605.01102 | Towards Multi-Agent Autonomous Reasoning in Hydrodynamics | Jinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawson | cs.AI | 2026-05-06 | 5 |
| 2605.01101 | Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent | Shakeel Sheikh, Patrick Marmaroli, MD Sahidullah, Slim Ouni, Fabrice Hirsch, Goncalo Leal, Bjorn W Schuller | cs.AI | 2026-05-06 | 5 |
| 2605.00841 | AI Agents for Sustainable SMEs: A Green ESG Assessment Framework | Viet Trinh, Tan Nguyen, Minh-Huyen Phan, Quan Luu | cs.AI | 2026-05-06 | 5 |
| 2605.01214 | Agentic AI Systems Should Be Designed as Marginal Token Allocators | Siqi Zhu | cs.AI | 2026-05-06 | 5 |
| 2605.01758 | Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems | Yue Ma, Ziyuan Yang, Yi Zhang | cs.AI | 2026-05-06 | 5 |
| 2605.01675 | CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized Checkers | Yuliang Song, Eldan Cohen | cs.AI | 2026-05-06 | 5 |
| 2605.03314 | When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning | Jiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu You | cs.CL | 2026-05-06 | 5 |
| 2605.00846 | ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations | Navapat Nananukul, Mayank Kejriwal | cs.AI | 2026-05-06 | 5 |
| 2605.01789 | DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents | Qisong Zhang, Wenzhuo Wu, Zhuangzhuang Jia, Yunhao Yang, Huayu Zhang, Xianghao Zang, Zhixiang He, Zhongjiang He, Kongming Liang, Zhanyu Ma | cs.AI | 2026-05-06 | 5 |
| 2605.01847 | NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles | Jia Xiao | cs.AI | 2026-05-06 | 5 |
相关情报
NPM 人工智能开发包周下载追踪器 — 2026 年 5 月第二周数据分析报告
Anthropic SDK 周下载量增长 286 万次,与 OpenAI SDK 的市场份额差距缩窄至 15%,增速显著超越竞争对手。Vercel AI SDK 生态系统下载量突破 2300 万次,统一抽象层成为多模型应用开发的主流选择。LlamaIndex TypeScript 版本周环比下降 35%,开发者正在加速向 LangGraph 和 Vercel AI SDK 生态系统迁移。
AI 智能体周度情报:企业治理架构之战打响,微软与英伟达两大阵营定调未来十年走向
微软 Agent 365 与英伟达-ServiceNow Project Arc 推出两种相互竞争的企业治理架构:以端点为中心的身份管理体系对决基于运行时的沙盒执行环境。高达 58 个百分点的采用率与治理能力落差,定义了 2026 年企业面临的核心挑战。
微软发布 Agent 365 正式版,新增影子智能体检测功能
微软 Agent 365 正式发布,定价每用户每月 15 美元。该平台新增影子 AI 检测功能,可发现 Windows 终端上运行的本地 AI 智能体。通过与 Defender 和 Intune 深度集成实现策略管控,可阻止未授权智能体执行,治理范围覆盖 Claude Code 等第三方工具。