AgentScout Logo Agent Scout

ArXiv cs.AI 周报:AI 智能体领域每周论文追踪(2026 年 5 月第一周)

本周 ArXiv cs.AI 类别共收录 98 篇论文,其中 30 篇聚焦智能体相关研究。多智能体推理实现 Pareto-optimal 测试时扩展,突破单智能体计算效率瓶颈;Agent Capsules 通过质量门控粒度控制减少 51% token 消耗;RAG-Gym 提供语言智能体检索增强生成的系统化优化框架。

AgentScout · · · 10 分钟阅读
#arxiv #ai-agents #multi-agent #rag #reasoning #llm
Analyzing Data Nodes...
SIG_CONF:CALCULATING
Verified Sources

数据概览

  • 快照周次:2026-05-01 至 2026-05-07
  • 追踪器:ArXiv AI 智能体论文(查看所有快照:/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly
  • 更新频率:每周
  • 主要来源ArXiv cs.AI RSSArXiv cs.CL RSSArXiv API

关键事实

  • :本周 cs.AI 共收录 98 篇论文;涉及 cs.AI、cs.CL 和 cs.MA 类别的 30 篇智能体相关投稿
  • 什么:多智能体推理论文引入 Pareto-optimal 测试时扩展;Agent Capsules 实现 51% token 缩减;5 篇论文趋势评分达 7+
  • 何时:2026 年 5 月 1 日至 7 日当周
  • 影响:多智能体框架以 15 篇论文占据主导;RAG 优化研究环比增长 50%;token 效率成为关键设计考量因素

方法论

本追踪器监控 ArXiv cs.AI 和 cs.CL 类别的 AI 智能体相关投稿。论文筛选标准包括与智能体、多智能体系统、工具调用、推理和检索增强生成(RAG)的相关性。趋势评分(1-10 分)基于新颖性、引用速度、实用性和行业相关性趋势进行评定。数据通过 ArXiv RSS 订阅源和 ArXiv API 查询采集。本快照覆盖 2026 年 5 月 1 日至 7 日期间发表的论文。

本周数据

趋势评分 Top 20 论文

ArXiv ID标题趋势评分关键主题
2605.01566Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling9多智能体、推理、测试时扩展、计算效率
2605.00410Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines8多智能体、流水线优化、token 效率、质量门控
2502.13957RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation8RAG、智能体优化、系统化框架、语言智能体
2605.0198612 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation7多智能体、决策制定、评测、陪审团模拟
2605.01604Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework7Agentic AI、评测、失效模式、生产部署
2605.03534SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG7RAG、证据验证、不确定性感知、选择性检索
2605.03476CuraView: Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification7多智能体、RAG、幻觉检测、医疗 AI、GraphRAG
2412.20138TradingAgents: Multi-Agents LLM Financial Trading Framework7多智能体、LLM、交易、金融分析
2410.02958AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML7多智能体、LLM、AutoML、自动化
2412.12881RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement7RAG、推理、MCTS、验证、精化
2605.01920A Language for Describing Agentic LLM Contexts6Agentic、LLM、上下文规范、形式语言
2605.01208Faithful Mobile GUI Agents with Guided Advantage Estimator6智能体、GUI、移动端、优势估计器
2605.01293Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning6Agentic、神经符号、技能归纳、长时域任务
2605.01879Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems6多智能体、自主、规划、层论
2605.02910CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing6智能体、推理、创造力、工具重用
2605.02967AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines6RAG、优化、流水线、声明式
2605.03228MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory6智能体、LLM、安全、影子记忆、长时域威胁
2605.04018Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems6Agentic、推理、检索、搜索系统
2401.07324Small LLMs Are Weak Tool Learners: A Multi-LLM Agent6多智能体、工具调用、LLM、分解
2605.01102Towards Multi-Agent Autonomous Reasoning in Hydrodynamics5多智能体、自主、推理、流体动力学

重点论文摘要

多智能体推理(2605.01566) — 证明多智能体推理在测试时扩展中实现 Pareto-optimal 计算效率,系统性超越单智能体方法。论文提供实证证据表明,将推理分布在多个智能体上,每单位计算量的性能优于扩展单智能体推理。

Agent Capsules(2605.00410) — 引入自适应执行运行时,将多智能体流水线执行视为带有经验质量约束的优化问题。相比手工实现实现 51% token 缩减,同时通过动态粒度控制维持质量阈值。

RAG-Gym(2502.13957) — 针对 RAG 语言智能体的系统化优化综合平台,覆盖三个维度:提示工程、actor 调优和 critic 训练。为 RAG 智能体开发提供可复现基准和优化协议。

12 Angry AI Agents(2605.01986) — 通过电影陪审团审议场景评测多智能体 LLM 决策能力,测试协作、共识形成以及在冲突证据条件下的审慎推理。

环比摘要

指标本周上周变化
总论文数(cs.AI)9830+227%
智能体相关论文3025+20%
多智能体论文15150%
RAG 相关论文128+50%
高影响力论文(趋势评分 7+)108+25%
推理聚焦论文810-20%
工具调用论文440%

上周快照:arxiv-cs-ai-weekly-20260430

生态指标

类别分布

类别论文数占比
cs.AI(人工智能)4545.9%
cs.CL(计算与语言)3535.7%
cs.MA(多智能体系统)88.2%
cs.LG(机器学习)55.1%
其他55.1%

本周热门主题

主题论文数代表论文
多智能体 LLM 框架152605.01566, 2605.00410, 2605.01986
RAG 优化与评测122502.13957, 2605.03534, 2605.03476
智能体推理与决策82605.01566, 2605.02910, 2605.04018
工具调用与函数调用42605.02910, 2401.07324
自主系统设计62605.01879, 2605.01102, 2605.01293

关键词频率

关键词频率环比变化
agent28+7%
multi-agent150%
RAG12+50%
reasoning8-20%
autonomous6+50%
LLM6+20%
optimization5+67%
evaluation5+25%
tool-use40%
safety3+50%

趋势与观察

涌现模式

  1. 多智能体推理中的 Pareto-optimal scaling — 首批明确解决多智能体测试时扩展计算效率权衡的论文出现,超越单智能体优化范式。

  2. Token 效率成为一等设计约束 — Agent Capsules 的 51% token 缩减标志着从能力导向转向效率导向的多智能体流水线设计。

  3. 系统化 RAG 优化框架 — RAG-Gym 引入 Gym 风格环境用于可复现的 RAG 智能体优化,类似强化学习训练范式。

  4. GraphRAG 集成用于幻觉检测 — CuraView 展示了结合知识图谱与多智能体验证用于医疗 AI 可靠性的方法。

  5. 生产评测框架涌现 — 多篇论文涉及 Agentic AI 系统的失效模式、漂移模式和部署监控。

与上周的显著变化

  • 首次出现 Pareto-optimal scaling 论文 — 明确的多智能体计算效率优化
  • 质量门控粒度控制 — 自适应多智能体流水线执行的新范式
  • RAG 相关论文环比增长 50% — 检索与智能体架构持续融合
  • Token 分配成为系统设计原则 — 边际 token 分配器框架被提出

🔺 独家情报:别处看不到的洞察

置信度: 高 | 新颖度评分: 62/100

常规报道追踪单篇论文发布,但本周 30 篇智能体论文的聚合信号揭示了研究重心的战略转向。多智能体推理(2605.01566)证明多智能体方法实现 Pareto-optimal scaling,从根本上挑战了单智能体扩展的正统观念。Agent Capsules 的 51% token 缩减(2605.00410)验证了效率提升来自架构优化,而非仅仅依赖模型改进。RAG-Gym(2502.13957)引入可复现优化协议,解决了”每个 RAG 系统都是定制开发”这一阻碍企业采用的问题。

关键启示:平台团队应立即投资多智能体编排基础设施 — 单智能体扩展已呈边际收益递减,51% token 效率提升在生产规模下直接转化为成本降低。RAG-Gym 的系统化方法支持标准化评测,加速从原型到生产的路径。

历史快照

查看所有历史快照:/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly

信息来源


完整论文列表(30 篇)
ArXiv ID标题作者类别发表日期趋势评分
2605.01566Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time ScalingFlorian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gippcs.AI2026-05-069
2605.00410Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM PipelinesAninda Raycs.CL, cs.AI2026-05-018
2502.13957RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented GenerationGuangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhangcs.CL, cs.AI2025-02-198
2605.0198612 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury DeliberationAhmet Bahaddin Ersozcs.AI2026-05-067
2605.01604Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation FrameworkMukund Pandeycs.AI2026-05-067
2605.03534SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAGJingxi Qiu, Zeyu Han, Cheng Huangcs.CL2026-05-067
2605.03476CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge VerificationSeverin Ye, Xiao Kong, Xiaopeng He, Guangsu Yan, Dongsuk Ohcs.CL2026-05-067
2412.20138TradingAgents: Multi-Agents LLM Financial Trading FrameworkYijia Xiao, Edward Sun, Di Luo, Wei Wangq-fin.TR, cs.AI2024-12-287
2410.02958AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoMLPatara Trirat, Wonyong Jeong, Sung Ju Hwangcs.LG, cs.AI2024-10-037
2412.12881RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and RefinementJinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao, Yang Song, Tao Zhangcs.CL, cs.AI2024-12-177
2605.01920A Language for Describing Agentic LLM ContextsNoga Peleg Pelc, Gal A. Kaminka, Yoav Goldbergcs.AI2026-05-066
2605.01208Faithful Mobile GUI Agents with Guided Advantage EstimatorHaowen Hu, Pengzhou Cheng, Zheng Wu, Lingzhong Dong, Gongshen Liu, Zhuosheng Zhangcs.AI2026-05-066
2605.01293Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic TasksJie-Jing Shao, Haiyan Yin, Yueming Lyu, Xingrui Yu, Lan-Zhe Guo, Ivor Tsang, James Kwok, Yu-Feng Lics.AI2026-05-066
2605.01879Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous SystemsManuel Hernandez, Eduardo Sanchez-Sotocs.AI2026-05-066
2605.02910CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool RepurposingCheng Qian, Hyeonjeong Ha, Jiayu Liu, Bingxiang He, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Jics.AI2026-05-066
2605.02967AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG PipelinesXintan Zeng, Yongchao Liu, Yice Luo, Jiajun Zhencs.AI2026-05-066
2605.03228MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow MemoryAlexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjescs.CL2026-05-066
2605.04018Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search SystemsYilun Zhao, Jinbiao Wei, Tingyu Song, Siyue Zhang, Chen Zhao, Arman Cohancs.CL2026-05-066
2401.07324Small LLMs Are Weak Tool Learners: A Multi-LLM AgentWeizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huangcs.AI, cs.CL2024-01-146
2605.01102Towards Multi-Agent Autonomous Reasoning in HydrodynamicsJinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawsoncs.AI2026-05-065
2605.01101Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy AgentShakeel Sheikh, Patrick Marmaroli, MD Sahidullah, Slim Ouni, Fabrice Hirsch, Goncalo Leal, Bjorn W Schullercs.AI2026-05-065
2605.00841AI Agents for Sustainable SMEs: A Green ESG Assessment FrameworkViet Trinh, Tan Nguyen, Minh-Huyen Phan, Quan Luucs.AI2026-05-065
2605.01214Agentic AI Systems Should Be Designed as Marginal Token AllocatorsSiqi Zhucs.AI2026-05-065
2605.01758Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent SystemsYue Ma, Ziyuan Yang, Yi Zhangcs.AI2026-05-065
2605.01675CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized CheckersYuliang Song, Eldan Cohencs.AI2026-05-065
2605.03314When to Think, When to Speak: Learning Disclosure Policies for LLM ReasoningJiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu Youcs.CL2026-05-065
2605.00846ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable CitationsNavapat Nananukul, Mayank Kejriwalcs.AI2026-05-065
2605.01789DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop AgentsQisong Zhang, Wenzhuo Wu, Zhuangzhuang Jia, Yunhao Yang, Huayu Zhang, Xianghao Zang, Zhixiang He, Zhongjiang He, Kongming Liang, Zhanyu Macs.AI2026-05-065
2605.01847NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent ProfilesJia Xiaocs.AI2026-05-065

ArXiv cs.AI 周报:AI 智能体领域每周论文追踪(2026 年 5 月第一周)

本周 ArXiv cs.AI 类别共收录 98 篇论文,其中 30 篇聚焦智能体相关研究。多智能体推理实现 Pareto-optimal 测试时扩展,突破单智能体计算效率瓶颈;Agent Capsules 通过质量门控粒度控制减少 51% token 消耗;RAG-Gym 提供语言智能体检索增强生成的系统化优化框架。

AgentScout · · · 10 分钟阅读
#arxiv #ai-agents #multi-agent #rag #reasoning #llm
Analyzing Data Nodes...
SIG_CONF:CALCULATING
Verified Sources

数据概览

  • 快照周次:2026-05-01 至 2026-05-07
  • 追踪器:ArXiv AI 智能体论文(查看所有快照:/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly
  • 更新频率:每周
  • 主要来源ArXiv cs.AI RSSArXiv cs.CL RSSArXiv API

关键事实

  • :本周 cs.AI 共收录 98 篇论文;涉及 cs.AI、cs.CL 和 cs.MA 类别的 30 篇智能体相关投稿
  • 什么:多智能体推理论文引入 Pareto-optimal 测试时扩展;Agent Capsules 实现 51% token 缩减;5 篇论文趋势评分达 7+
  • 何时:2026 年 5 月 1 日至 7 日当周
  • 影响:多智能体框架以 15 篇论文占据主导;RAG 优化研究环比增长 50%;token 效率成为关键设计考量因素

方法论

本追踪器监控 ArXiv cs.AI 和 cs.CL 类别的 AI 智能体相关投稿。论文筛选标准包括与智能体、多智能体系统、工具调用、推理和检索增强生成(RAG)的相关性。趋势评分(1-10 分)基于新颖性、引用速度、实用性和行业相关性趋势进行评定。数据通过 ArXiv RSS 订阅源和 ArXiv API 查询采集。本快照覆盖 2026 年 5 月 1 日至 7 日期间发表的论文。

本周数据

趋势评分 Top 20 论文

ArXiv ID标题趋势评分关键主题
2605.01566Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling9多智能体、推理、测试时扩展、计算效率
2605.00410Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines8多智能体、流水线优化、token 效率、质量门控
2502.13957RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation8RAG、智能体优化、系统化框架、语言智能体
2605.0198612 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation7多智能体、决策制定、评测、陪审团模拟
2605.01604Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework7Agentic AI、评测、失效模式、生产部署
2605.03534SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG7RAG、证据验证、不确定性感知、选择性检索
2605.03476CuraView: Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification7多智能体、RAG、幻觉检测、医疗 AI、GraphRAG
2412.20138TradingAgents: Multi-Agents LLM Financial Trading Framework7多智能体、LLM、交易、金融分析
2410.02958AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML7多智能体、LLM、AutoML、自动化
2412.12881RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement7RAG、推理、MCTS、验证、精化
2605.01920A Language for Describing Agentic LLM Contexts6Agentic、LLM、上下文规范、形式语言
2605.01208Faithful Mobile GUI Agents with Guided Advantage Estimator6智能体、GUI、移动端、优势估计器
2605.01293Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning6Agentic、神经符号、技能归纳、长时域任务
2605.01879Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems6多智能体、自主、规划、层论
2605.02910CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing6智能体、推理、创造力、工具重用
2605.02967AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines6RAG、优化、流水线、声明式
2605.03228MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory6智能体、LLM、安全、影子记忆、长时域威胁
2605.04018Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems6Agentic、推理、检索、搜索系统
2401.07324Small LLMs Are Weak Tool Learners: A Multi-LLM Agent6多智能体、工具调用、LLM、分解
2605.01102Towards Multi-Agent Autonomous Reasoning in Hydrodynamics5多智能体、自主、推理、流体动力学

重点论文摘要

多智能体推理(2605.01566) — 证明多智能体推理在测试时扩展中实现 Pareto-optimal 计算效率,系统性超越单智能体方法。论文提供实证证据表明,将推理分布在多个智能体上,每单位计算量的性能优于扩展单智能体推理。

Agent Capsules(2605.00410) — 引入自适应执行运行时,将多智能体流水线执行视为带有经验质量约束的优化问题。相比手工实现实现 51% token 缩减,同时通过动态粒度控制维持质量阈值。

RAG-Gym(2502.13957) — 针对 RAG 语言智能体的系统化优化综合平台,覆盖三个维度:提示工程、actor 调优和 critic 训练。为 RAG 智能体开发提供可复现基准和优化协议。

12 Angry AI Agents(2605.01986) — 通过电影陪审团审议场景评测多智能体 LLM 决策能力,测试协作、共识形成以及在冲突证据条件下的审慎推理。

环比摘要

指标本周上周变化
总论文数(cs.AI)9830+227%
智能体相关论文3025+20%
多智能体论文15150%
RAG 相关论文128+50%
高影响力论文(趋势评分 7+)108+25%
推理聚焦论文810-20%
工具调用论文440%

上周快照:arxiv-cs-ai-weekly-20260430

生态指标

类别分布

类别论文数占比
cs.AI(人工智能)4545.9%
cs.CL(计算与语言)3535.7%
cs.MA(多智能体系统)88.2%
cs.LG(机器学习)55.1%
其他55.1%

本周热门主题

主题论文数代表论文
多智能体 LLM 框架152605.01566, 2605.00410, 2605.01986
RAG 优化与评测122502.13957, 2605.03534, 2605.03476
智能体推理与决策82605.01566, 2605.02910, 2605.04018
工具调用与函数调用42605.02910, 2401.07324
自主系统设计62605.01879, 2605.01102, 2605.01293

关键词频率

关键词频率环比变化
agent28+7%
multi-agent150%
RAG12+50%
reasoning8-20%
autonomous6+50%
LLM6+20%
optimization5+67%
evaluation5+25%
tool-use40%
safety3+50%

趋势与观察

涌现模式

  1. 多智能体推理中的 Pareto-optimal scaling — 首批明确解决多智能体测试时扩展计算效率权衡的论文出现,超越单智能体优化范式。

  2. Token 效率成为一等设计约束 — Agent Capsules 的 51% token 缩减标志着从能力导向转向效率导向的多智能体流水线设计。

  3. 系统化 RAG 优化框架 — RAG-Gym 引入 Gym 风格环境用于可复现的 RAG 智能体优化,类似强化学习训练范式。

  4. GraphRAG 集成用于幻觉检测 — CuraView 展示了结合知识图谱与多智能体验证用于医疗 AI 可靠性的方法。

  5. 生产评测框架涌现 — 多篇论文涉及 Agentic AI 系统的失效模式、漂移模式和部署监控。

与上周的显著变化

  • 首次出现 Pareto-optimal scaling 论文 — 明确的多智能体计算效率优化
  • 质量门控粒度控制 — 自适应多智能体流水线执行的新范式
  • RAG 相关论文环比增长 50% — 检索与智能体架构持续融合
  • Token 分配成为系统设计原则 — 边际 token 分配器框架被提出

🔺 独家情报:别处看不到的洞察

置信度: 高 | 新颖度评分: 62/100

常规报道追踪单篇论文发布,但本周 30 篇智能体论文的聚合信号揭示了研究重心的战略转向。多智能体推理(2605.01566)证明多智能体方法实现 Pareto-optimal scaling,从根本上挑战了单智能体扩展的正统观念。Agent Capsules 的 51% token 缩减(2605.00410)验证了效率提升来自架构优化,而非仅仅依赖模型改进。RAG-Gym(2502.13957)引入可复现优化协议,解决了”每个 RAG 系统都是定制开发”这一阻碍企业采用的问题。

关键启示:平台团队应立即投资多智能体编排基础设施 — 单智能体扩展已呈边际收益递减,51% token 效率提升在生产规模下直接转化为成本降低。RAG-Gym 的系统化方法支持标准化评测,加速从原型到生产的路径。

历史快照

查看所有历史快照:/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly

信息来源


完整论文列表(30 篇)
ArXiv ID标题作者类别发表日期趋势评分
2605.01566Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time ScalingFlorian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gippcs.AI2026-05-069
2605.00410Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM PipelinesAninda Raycs.CL, cs.AI2026-05-018
2502.13957RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented GenerationGuangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhangcs.CL, cs.AI2025-02-198
2605.0198612 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury DeliberationAhmet Bahaddin Ersozcs.AI2026-05-067
2605.01604Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation FrameworkMukund Pandeycs.AI2026-05-067
2605.03534SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAGJingxi Qiu, Zeyu Han, Cheng Huangcs.CL2026-05-067
2605.03476CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge VerificationSeverin Ye, Xiao Kong, Xiaopeng He, Guangsu Yan, Dongsuk Ohcs.CL2026-05-067
2412.20138TradingAgents: Multi-Agents LLM Financial Trading FrameworkYijia Xiao, Edward Sun, Di Luo, Wei Wangq-fin.TR, cs.AI2024-12-287
2410.02958AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoMLPatara Trirat, Wonyong Jeong, Sung Ju Hwangcs.LG, cs.AI2024-10-037
2412.12881RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and RefinementJinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao, Yang Song, Tao Zhangcs.CL, cs.AI2024-12-177
2605.01920A Language for Describing Agentic LLM ContextsNoga Peleg Pelc, Gal A. Kaminka, Yoav Goldbergcs.AI2026-05-066
2605.01208Faithful Mobile GUI Agents with Guided Advantage EstimatorHaowen Hu, Pengzhou Cheng, Zheng Wu, Lingzhong Dong, Gongshen Liu, Zhuosheng Zhangcs.AI2026-05-066
2605.01293Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic TasksJie-Jing Shao, Haiyan Yin, Yueming Lyu, Xingrui Yu, Lan-Zhe Guo, Ivor Tsang, James Kwok, Yu-Feng Lics.AI2026-05-066
2605.01879Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous SystemsManuel Hernandez, Eduardo Sanchez-Sotocs.AI2026-05-066
2605.02910CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool RepurposingCheng Qian, Hyeonjeong Ha, Jiayu Liu, Bingxiang He, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Jics.AI2026-05-066
2605.02967AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG PipelinesXintan Zeng, Yongchao Liu, Yice Luo, Jiajun Zhencs.AI2026-05-066
2605.03228MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow MemoryAlexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjescs.CL2026-05-066
2605.04018Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search SystemsYilun Zhao, Jinbiao Wei, Tingyu Song, Siyue Zhang, Chen Zhao, Arman Cohancs.CL2026-05-066
2401.07324Small LLMs Are Weak Tool Learners: A Multi-LLM AgentWeizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huangcs.AI, cs.CL2024-01-146
2605.01102Towards Multi-Agent Autonomous Reasoning in HydrodynamicsJinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawsoncs.AI2026-05-065
2605.01101Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy AgentShakeel Sheikh, Patrick Marmaroli, MD Sahidullah, Slim Ouni, Fabrice Hirsch, Goncalo Leal, Bjorn W Schullercs.AI2026-05-065
2605.00841AI Agents for Sustainable SMEs: A Green ESG Assessment FrameworkViet Trinh, Tan Nguyen, Minh-Huyen Phan, Quan Luucs.AI2026-05-065
2605.01214Agentic AI Systems Should Be Designed as Marginal Token AllocatorsSiqi Zhucs.AI2026-05-065
2605.01758Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent SystemsYue Ma, Ziyuan Yang, Yi Zhangcs.AI2026-05-065
2605.01675CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized CheckersYuliang Song, Eldan Cohencs.AI2026-05-065
2605.03314When to Think, When to Speak: Learning Disclosure Policies for LLM ReasoningJiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu Youcs.CL2026-05-065
2605.00846ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable CitationsNavapat Nananukul, Mayank Kejriwalcs.AI2026-05-065
2605.01789DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop AgentsQisong Zhang, Wenzhuo Wu, Zhuangzhuang Jia, Yunhao Yang, Huayu Zhang, Xianghao Zang, Zhixiang He, Zhongjiang He, Kongming Liang, Zhanyu Macs.AI2026-05-065
2605.01847NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent ProfilesJia Xiaocs.AI2026-05-065
ty9xhcr4k82huqanzm4qe████1bw2qh8jlkz5aczs130i7dylk4u95onre████bls5b31w56ixf5u3lh45kcr9zwpnmbb7i████rohppdqkysb8c5gu8j56sitgn5eowu5f7████v62exdi613ekmfhjvds15l7bi62go33g3░░░3cnjlwnqjzevumdmx3z9yy80vouz6jtp████p5dm7hot02igpq4vtwe6rz7eik3r0t6r░░░ufowuyscrkpdj2xr5lgsppi81svsrwd░░░nuinld0nu1jbrrlkq2aozx2bpjn05mx████h9fozgsn0tijm662izvdylndm29rel████g5ydvz0aub52e6jj02lwbzi181rlf2qxa████hdq49orokwvwuodevu7jror0312ve2v████jr5gvi2t7pihdiuxmis33pqbt9h22s1en░░░9i70pc66q5jgjobyzu3mqcdfokyp1rfe████66qfwdwdx9luruzel448hauvmi6ncaqs░░░vifz0o8b6m9pv5rt20d2hbwyjesnv38v████1zohkmjtvzbvguj7dvcqxyi337y50hsj░░░wkw7pxps94dqv6iequwmrgk12mgxssrp░░░cw36z09rfha1v8xzsjklr28a26617zbn7████gqz6d6ppgsd1m95psi3svvrkgf1mtjhf████tm186sfx3fvbidgdwd38sw1z1kzq8o1f████o2jgevmnbtp6vpjw0j9i6tk58qzl84h████omi7ddl9dxma3odmfvsrksr8ew4pl2d5h░░░kvhq4nkq81sxiv8srqsgcf5a8f6bigt████q4q5kt3qwgl1ahgs0cqho7wsrxn6gdmr░░░89eun1mxx6kouacrz5rqovjz0jhhddck░░░cnppnyh3ipfdsyw9qqa0qr21jhqnhwae3████9bff90usortc0fi2m53mxh8wk47967jid████an8zlheardeqh0wnjbzxml53tsyvx8a7u░░░h4zlx4kb27tfai96o0kdtzon7dsngk0b░░░sas21zftqgcd20cks22po6t85zfsq3lh░░░4ulkg1bskxpm1powg48eofezw7ceyndt████jeh2l1aj4r2a5fs2p6214o4tftc5gmo░░░9u7u9v0e8m6enofs170q4uyby2wpli2░░░inunyw85gah1d8q45i6a96suofbdh7s5████l2poqnhjh85h6jpbk8m6oz8in3lzah░░░2fzwzp5o0fp0uitwak8yuosdfunilbb3ph████l7etb63e1bpdj9iqzc0hlzm2km4vedg████n3xit5flk2bjx9vsspsa9jz3hfmt8xkbp████tfib30gz72elc1eeofqkc8r3pt4yw7i5████q6avam0014lfbd1rxcj0qwjzwzb3bl2████3ow3ooq7tebgoa9edu0exkpvzaadpxta████90l8aatwidkiccoo23uvoar8rqmflo5q9████6oaqgvt2zsau60iuwpxtfrvdfpykaacak░░░w425plzj84ocqqhffzdykd1divo0xkbk████4eqccca8sxq4uskm06o7eokgdup813a░░░y0r99553z6hjt0312jyw590phh5a55dqyc░░░gakv3k4ge6b53ppztqn7wnsirvzl7453k████ws0kzk2w38lp7zhvjxl7q555cazmlegi░░░grt9p86782sk97cnke0o1qlkw83wdaa2q░░░edxx2bl9fvm