ArXiv cs.AI Weekly — Week of May 1, 2026
98 papers this week with 30 agent-related submissions. Multi-Agent Reasoning achieves Pareto-optimal test-time scaling; Agent Capsules reduces token usage by 51%; RAG-Gym provides systematic optimization framework.
Data Overview
- Snapshot Week: 2026-05-01 to 2026-05-07
- Tracker: ArXiv AI Agent Papers (view all snapshots:
/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly) - Update Frequency: Weekly
- Primary Sources: ArXiv cs.AI RSS, ArXiv cs.CL RSS, ArXiv API
Key Facts
- Who: 98 total papers in cs.AI this week; 30 agent-related submissions across cs.AI, cs.CL, and cs.MA categories
- What: Multi-Agent Reasoning paper introduces Pareto-optimal test-time scaling; Agent Capsules achieves 51% token reduction; 5 papers with Trend Score 7+
- When: Week of May 1-7, 2026
- Impact: Multi-agent frameworks dominate with 15 papers; RAG optimization research shows 50% week-over-week growth; token efficiency emerges as key design consideration
Methodology
This tracker monitors ArXiv cs.AI and cs.CL categories for AI agent-related submissions. Papers are filtered for relevance to agents, multi-agent systems, tool-use, reasoning, and RAG (Retrieval-Augmented Generation). Trend scores (1-10) are assigned based on novelty, citation velocity, practical applicability, and relevance to current industry trends. Data is collected via ArXiv RSS feeds and ArXiv API queries. This snapshot covers papers published during the week of May 1-7, 2026.
This Week’s Data
Top 20 Papers by Trend Score
| ArXiv ID | Title | Trend Score | Key Topics |
|---|---|---|---|
| 2605.01566 | Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling | 9 | multi-agent, reasoning, test-time scaling, compute efficiency |
| 2605.00410 | Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines | 8 | multi-agent, pipeline optimization, token efficiency, quality gating |
| 2502.13957 | RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation | 8 | RAG, agent optimization, systematic framework, language agent |
| 2605.01986 | 12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation | 7 | multi-agent, decision-making, evaluation, jury simulation |
| 2605.01604 | Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework | 7 | agentic AI, evaluation, failure modes, production deployment |
| 2605.03534 | SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG | 7 | RAG, evidence verification, uncertainty-aware, selective retrieval |
| 2605.03476 | CuraView: Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification | 7 | multi-agent, RAG, hallucination detection, medical AI, GraphRAG |
| 2412.20138 | TradingAgents: Multi-Agents LLM Financial Trading Framework | 7 | multi-agent, LLM, trading, financial analysis |
| 2410.02958 | AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | 7 | multi-agent, LLM, AutoML, automation |
| 2412.12881 | RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement | 7 | RAG, reasoning, MCTS, verification, refinement |
| 2605.01920 | A Language for Describing Agentic LLM Contexts | 6 | agentic, LLM, context specification, formal language |
| 2605.01208 | Faithful Mobile GUI Agents with Guided Advantage Estimator | 6 | agent, GUI, mobile, advantage estimator |
| 2605.01293 | Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning | 6 | agentic, neuro-symbolic, skill induction, long-horizon tasks |
| 2605.01879 | Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems | 6 | multi-agent, autonomous, planning, sheaf theory |
| 2605.02910 | CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing | 6 | agent, reasoning, creativity, tool repurposing |
| 2605.02967 | AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines | 6 | RAG, optimization, pipeline, declarative |
| 2605.03228 | MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory | 6 | agent, LLM, safety, shadow memory, long-horizon threats |
| 2605.04018 | Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems | 6 | agentic, reasoning, retrieval, search systems |
| 2401.07324 | Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | 6 | multi-agent, tool use, LLM, decomposition |
| 2605.01102 | Towards Multi-Agent Autonomous Reasoning in Hydrodynamics | 5 | multi-agent, autonomous, reasoning, hydrodynamics |
Notable Papers Summary
Multi-Agent Reasoning (2605.01566) — Demonstrates that multi-agent reasoning achieves Pareto-optimal compute efficiency in test-time scaling, systematically outperforming single-agent approaches. The paper provides empirical evidence that distributing reasoning across multiple agents yields better performance per compute unit than scaling single-agent inference.
Agent Capsules (2605.00410) — Introduces adaptive execution runtime treating multi-agent pipeline execution as an optimization problem with empirical quality constraints. Achieves 51% token reduction compared to hand-crafted implementations while maintaining quality thresholds through dynamic granularity control.
RAG-Gym (2502.13957) — Comprehensive platform for systematic optimization of language agents for RAG across three dimensions: prompt engineering, actor tuning, and critic training. Provides reproducible benchmarks and optimization protocols for RAG agent development.
12 Angry AI Agents (2605.01986) — Evaluates multi-agent LLM decision-making capabilities through cinematic jury deliberation scenarios, testing collaboration, consensus formation, and deliberative reasoning under conflicting evidence conditions.
Week-over-Week Summary
| Metric | This Week | Last Week | Change |
|---|---|---|---|
| Total papers (cs.AI) | 98 | 30 | +227% |
| Agent-related papers | 30 | 25 | +20% |
| Multi-agent papers | 15 | 15 | 0% |
| RAG-related papers | 12 | 8 | +50% |
| High-impact (Trend Score 7+) | 10 | 8 | +25% |
| Reasoning-focused papers | 8 | 10 | -20% |
| Tool-use papers | 4 | 4 | 0% |
Last week snapshot: arxiv-cs-ai-weekly-20260430
Ecosystem Metrics
Category Distribution
| Category | Paper Count | Percentage |
|---|---|---|
| cs.AI (Artificial Intelligence) | 45 | 45.9% |
| cs.CL (Computation and Language) | 35 | 35.7% |
| cs.MA (Multiagent Systems) | 8 | 8.2% |
| cs.LG (Machine Learning) | 5 | 5.1% |
| Other | 5 | 5.1% |
Trending Topics This Week
| Topic | Paper Count | Notable Papers |
|---|---|---|
| Multi-agent LLM frameworks | 15 | 2605.01566, 2605.00410, 2605.01986 |
| RAG optimization and evaluation | 12 | 2502.13957, 2605.03534, 2605.03476 |
| Agent reasoning and decision-making | 8 | 2605.01566, 2605.02910, 2605.04018 |
| Tool use and function calling | 4 | 2605.02910, 2401.07324 |
| Autonomous system design | 6 | 2605.01879, 2605.01102, 2605.01293 |
Keyword Frequency
| Keyword | Frequency | Week-over-Week Change |
|---|---|---|
| agent | 28 | +7% |
| multi-agent | 15 | 0% |
| RAG | 12 | +50% |
| reasoning | 8 | -20% |
| autonomous | 6 | +50% |
| LLM | 6 | +20% |
| optimization | 5 | +67% |
| evaluation | 5 | +25% |
| tool-use | 4 | 0% |
| safety | 3 | +50% |
Trends & Observations
Emergent Patterns
-
Pareto-optimal scaling in multi-agent reasoning — First papers explicitly addressing compute efficiency tradeoffs in multi-agent test-time scaling, moving beyond single-agent optimization paradigms.
-
Token efficiency as first-class design constraint — Agent Capsules’ 51% token reduction signals shift from capability-focused to efficiency-focused multi-agent pipeline design.
-
Systematic RAG optimization frameworks — RAG-Gym introduces Gym-style environments for reproducible RAG agent optimization, similar to RL training paradigms.
-
GraphRAG integration for hallucination detection — CuraView demonstrates combining knowledge graphs with multi-agent verification for medical AI reliability.
-
Production evaluation frameworks emerging — Multiple papers address failure modes, drift patterns, and deployment monitoring for agentic AI systems.
Notable Changes from Last Week
- First week with Pareto-optimal scaling papers — Explicit multi-agent compute efficiency optimization
- Quality-gated granularity control — New paradigm for adaptive multi-agent pipeline execution
- +50% RAG-related papers — Continued convergence of retrieval and agent architectures
- Token allocation as system design principle — Marginal token allocator frameworks proposed
🔺 Scout Intel: What Others Missed
Confidence: high | Novelty Score: 62/100
Standard coverage tracks individual paper releases, but the convergent signal across this week’s 30 agent papers reveals a strategic pivot in research priorities. Multi-Agent Reasoning (2605.01566) demonstrates that multi-agent approaches achieve Pareto-optimal scaling, fundamentally challenging the single-agent scaling orthodoxy. Agent Capsules’ 51% token reduction (2605.00410) validates that efficiency gains come from architecture, not just model improvement. RAG-Gym (2502.13957) introduces reproducible optimization protocols, addressing the “each RAG system is bespoke” problem that has hindered enterprise adoption.
Key Implication: Platform teams should invest in multi-agent orchestration infrastructure now — single-agent scaling has diminishing returns, and the 51% token efficiency gain translates directly to cost reduction at production scale. RAG-Gym’s systematic approach enables standardized evaluation, accelerating the path from prototype to production.
Previous Snapshots
- Week of Apr 23-30, 2026 — ClawNet introduces cross-user agent collaboration; HERA achieves 38.69% improvement
- Week of Apr 16-23, 2026 — Actor-Observer Asymmetry research emerges; benchmark papers surge 133%
- Week of Apr 9-16, 2026 — Earlier snapshot data
View all historical snapshots:
/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly
Sources
- ArXiv cs.AI RSS Feed — ArXiv, May 2026
- ArXiv cs.CL RSS Feed — ArXiv, May 2026
- ArXiv API - AI Agent Papers — ArXiv, May 2026
Complete Paper List (30 papers)
| ArXiv ID | Title | Authors | Category | Published | Trend Score |
|---|---|---|---|---|---|
| 2605.01566 | Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling | Florian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp | cs.AI | 2026-05-06 | 9 |
| 2605.00410 | Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines | Aninda Ray | cs.CL, cs.AI | 2026-05-01 | 8 |
| 2502.13957 | RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation | Guangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang | cs.CL, cs.AI | 2025-02-19 | 8 |
| 2605.01986 | 12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation | Ahmet Bahaddin Ersoz | cs.AI | 2026-05-06 | 7 |
| 2605.01604 | Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework | Mukund Pandey | cs.AI | 2026-05-06 | 7 |
| 2605.03534 | SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG | Jingxi Qiu, Zeyu Han, Cheng Huang | cs.CL | 2026-05-06 | 7 |
| 2605.03476 | CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification | Severin Ye, Xiao Kong, Xiaopeng He, Guangsu Yan, Dongsuk Oh | cs.CL | 2026-05-06 | 7 |
| 2412.20138 | TradingAgents: Multi-Agents LLM Financial Trading Framework | Yijia Xiao, Edward Sun, Di Luo, Wei Wang | q-fin.TR, cs.AI | 2024-12-28 | 7 |
| 2410.02958 | AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | Patara Trirat, Wonyong Jeong, Sung Ju Hwang | cs.LG, cs.AI | 2024-10-03 | 7 |
| 2412.12881 | RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement | Jinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao, Yang Song, Tao Zhang | cs.CL, cs.AI | 2024-12-17 | 7 |
| 2605.01920 | A Language for Describing Agentic LLM Contexts | Noga Peleg Pelc, Gal A. Kaminka, Yoav Goldberg | cs.AI | 2026-05-06 | 6 |
| 2605.01208 | Faithful Mobile GUI Agents with Guided Advantage Estimator | Haowen Hu, Pengzhou Cheng, Zheng Wu, Lingzhong Dong, Gongshen Liu, Zhuosheng Zhang | cs.AI | 2026-05-06 | 6 |
| 2605.01293 | Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks | Jie-Jing Shao, Haiyan Yin, Yueming Lyu, Xingrui Yu, Lan-Zhe Guo, Ivor Tsang, James Kwok, Yu-Feng Li | cs.AI | 2026-05-06 | 6 |
| 2605.01879 | Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems | Manuel Hernandez, Eduardo Sanchez-Soto | cs.AI | 2026-05-06 | 6 |
| 2605.02910 | CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing | Cheng Qian, Hyeonjeong Ha, Jiayu Liu, Bingxiang He, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Ji | cs.AI | 2026-05-06 | 6 |
| 2605.02967 | AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines | Xintan Zeng, Yongchao Liu, Yice Luo, Jiajun Zhen | cs.AI | 2026-05-06 | 6 |
| 2605.03228 | MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory | Alexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjes | cs.CL | 2026-05-06 | 6 |
| 2605.04018 | Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems | Yilun Zhao, Jinbiao Wei, Tingyu Song, Siyue Zhang, Chen Zhao, Arman Cohan | cs.CL | 2026-05-06 | 6 |
| 2401.07324 | Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang | cs.AI, cs.CL | 2024-01-14 | 6 |
| 2605.01102 | Towards Multi-Agent Autonomous Reasoning in Hydrodynamics | Jinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawson | cs.AI | 2026-05-06 | 5 |
| 2605.01101 | Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent | Shakeel Sheikh, Patrick Marmaroli, MD Sahidullah, Slim Ouni, Fabrice Hirsch, Goncalo Leal, Bjorn W Schuller | cs.AI | 2026-05-06 | 5 |
| 2605.00841 | AI Agents for Sustainable SMEs: A Green ESG Assessment Framework | Viet Trinh, Tan Nguyen, Minh-Huyen Phan, Quan Luu | cs.AI | 2026-05-06 | 5 |
| 2605.01214 | Agentic AI Systems Should Be Designed as Marginal Token Allocators | Siqi Zhu | cs.AI | 2026-05-06 | 5 |
| 2605.01758 | Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems | Yue Ma, Ziyuan Yang, Yi Zhang | cs.AI | 2026-05-06 | 5 |
| 2605.01675 | CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized Checkers | Yuliang Song, Eldan Cohen | cs.AI | 2026-05-06 | 5 |
| 2605.03314 | When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning | Jiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu You | cs.CL | 2026-05-06 | 5 |
| 2605.00846 | ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations | Navapat Nananukul, Mayank Kejriwal | cs.AI | 2026-05-06 | 5 |
| 2605.01789 | DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents | Qisong Zhang, Wenzhuo Wu, Zhuangzhuang Jia, Yunhao Yang, Huayu Zhang, Xianghao Zang, Zhixiang He, Zhongjiang He, Kongming Liang, Zhanyu Ma | cs.AI | 2026-05-06 | 5 |
| 2605.01847 | NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles | Jia Xiao | cs.AI | 2026-05-06 | 5 |
ArXiv cs.AI Weekly — Week of May 1, 2026
98 papers this week with 30 agent-related submissions. Multi-Agent Reasoning achieves Pareto-optimal test-time scaling; Agent Capsules reduces token usage by 51%; RAG-Gym provides systematic optimization framework.
Data Overview
- Snapshot Week: 2026-05-01 to 2026-05-07
- Tracker: ArXiv AI Agent Papers (view all snapshots:
/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly) - Update Frequency: Weekly
- Primary Sources: ArXiv cs.AI RSS, ArXiv cs.CL RSS, ArXiv API
Key Facts
- Who: 98 total papers in cs.AI this week; 30 agent-related submissions across cs.AI, cs.CL, and cs.MA categories
- What: Multi-Agent Reasoning paper introduces Pareto-optimal test-time scaling; Agent Capsules achieves 51% token reduction; 5 papers with Trend Score 7+
- When: Week of May 1-7, 2026
- Impact: Multi-agent frameworks dominate with 15 papers; RAG optimization research shows 50% week-over-week growth; token efficiency emerges as key design consideration
Methodology
This tracker monitors ArXiv cs.AI and cs.CL categories for AI agent-related submissions. Papers are filtered for relevance to agents, multi-agent systems, tool-use, reasoning, and RAG (Retrieval-Augmented Generation). Trend scores (1-10) are assigned based on novelty, citation velocity, practical applicability, and relevance to current industry trends. Data is collected via ArXiv RSS feeds and ArXiv API queries. This snapshot covers papers published during the week of May 1-7, 2026.
This Week’s Data
Top 20 Papers by Trend Score
| ArXiv ID | Title | Trend Score | Key Topics |
|---|---|---|---|
| 2605.01566 | Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling | 9 | multi-agent, reasoning, test-time scaling, compute efficiency |
| 2605.00410 | Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines | 8 | multi-agent, pipeline optimization, token efficiency, quality gating |
| 2502.13957 | RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation | 8 | RAG, agent optimization, systematic framework, language agent |
| 2605.01986 | 12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation | 7 | multi-agent, decision-making, evaluation, jury simulation |
| 2605.01604 | Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework | 7 | agentic AI, evaluation, failure modes, production deployment |
| 2605.03534 | SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG | 7 | RAG, evidence verification, uncertainty-aware, selective retrieval |
| 2605.03476 | CuraView: Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification | 7 | multi-agent, RAG, hallucination detection, medical AI, GraphRAG |
| 2412.20138 | TradingAgents: Multi-Agents LLM Financial Trading Framework | 7 | multi-agent, LLM, trading, financial analysis |
| 2410.02958 | AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | 7 | multi-agent, LLM, AutoML, automation |
| 2412.12881 | RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement | 7 | RAG, reasoning, MCTS, verification, refinement |
| 2605.01920 | A Language for Describing Agentic LLM Contexts | 6 | agentic, LLM, context specification, formal language |
| 2605.01208 | Faithful Mobile GUI Agents with Guided Advantage Estimator | 6 | agent, GUI, mobile, advantage estimator |
| 2605.01293 | Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning | 6 | agentic, neuro-symbolic, skill induction, long-horizon tasks |
| 2605.01879 | Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems | 6 | multi-agent, autonomous, planning, sheaf theory |
| 2605.02910 | CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing | 6 | agent, reasoning, creativity, tool repurposing |
| 2605.02967 | AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines | 6 | RAG, optimization, pipeline, declarative |
| 2605.03228 | MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory | 6 | agent, LLM, safety, shadow memory, long-horizon threats |
| 2605.04018 | Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems | 6 | agentic, reasoning, retrieval, search systems |
| 2401.07324 | Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | 6 | multi-agent, tool use, LLM, decomposition |
| 2605.01102 | Towards Multi-Agent Autonomous Reasoning in Hydrodynamics | 5 | multi-agent, autonomous, reasoning, hydrodynamics |
Notable Papers Summary
Multi-Agent Reasoning (2605.01566) — Demonstrates that multi-agent reasoning achieves Pareto-optimal compute efficiency in test-time scaling, systematically outperforming single-agent approaches. The paper provides empirical evidence that distributing reasoning across multiple agents yields better performance per compute unit than scaling single-agent inference.
Agent Capsules (2605.00410) — Introduces adaptive execution runtime treating multi-agent pipeline execution as an optimization problem with empirical quality constraints. Achieves 51% token reduction compared to hand-crafted implementations while maintaining quality thresholds through dynamic granularity control.
RAG-Gym (2502.13957) — Comprehensive platform for systematic optimization of language agents for RAG across three dimensions: prompt engineering, actor tuning, and critic training. Provides reproducible benchmarks and optimization protocols for RAG agent development.
12 Angry AI Agents (2605.01986) — Evaluates multi-agent LLM decision-making capabilities through cinematic jury deliberation scenarios, testing collaboration, consensus formation, and deliberative reasoning under conflicting evidence conditions.
Week-over-Week Summary
| Metric | This Week | Last Week | Change |
|---|---|---|---|
| Total papers (cs.AI) | 98 | 30 | +227% |
| Agent-related papers | 30 | 25 | +20% |
| Multi-agent papers | 15 | 15 | 0% |
| RAG-related papers | 12 | 8 | +50% |
| High-impact (Trend Score 7+) | 10 | 8 | +25% |
| Reasoning-focused papers | 8 | 10 | -20% |
| Tool-use papers | 4 | 4 | 0% |
Last week snapshot: arxiv-cs-ai-weekly-20260430
Ecosystem Metrics
Category Distribution
| Category | Paper Count | Percentage |
|---|---|---|
| cs.AI (Artificial Intelligence) | 45 | 45.9% |
| cs.CL (Computation and Language) | 35 | 35.7% |
| cs.MA (Multiagent Systems) | 8 | 8.2% |
| cs.LG (Machine Learning) | 5 | 5.1% |
| Other | 5 | 5.1% |
Trending Topics This Week
| Topic | Paper Count | Notable Papers |
|---|---|---|
| Multi-agent LLM frameworks | 15 | 2605.01566, 2605.00410, 2605.01986 |
| RAG optimization and evaluation | 12 | 2502.13957, 2605.03534, 2605.03476 |
| Agent reasoning and decision-making | 8 | 2605.01566, 2605.02910, 2605.04018 |
| Tool use and function calling | 4 | 2605.02910, 2401.07324 |
| Autonomous system design | 6 | 2605.01879, 2605.01102, 2605.01293 |
Keyword Frequency
| Keyword | Frequency | Week-over-Week Change |
|---|---|---|
| agent | 28 | +7% |
| multi-agent | 15 | 0% |
| RAG | 12 | +50% |
| reasoning | 8 | -20% |
| autonomous | 6 | +50% |
| LLM | 6 | +20% |
| optimization | 5 | +67% |
| evaluation | 5 | +25% |
| tool-use | 4 | 0% |
| safety | 3 | +50% |
Trends & Observations
Emergent Patterns
-
Pareto-optimal scaling in multi-agent reasoning — First papers explicitly addressing compute efficiency tradeoffs in multi-agent test-time scaling, moving beyond single-agent optimization paradigms.
-
Token efficiency as first-class design constraint — Agent Capsules’ 51% token reduction signals shift from capability-focused to efficiency-focused multi-agent pipeline design.
-
Systematic RAG optimization frameworks — RAG-Gym introduces Gym-style environments for reproducible RAG agent optimization, similar to RL training paradigms.
-
GraphRAG integration for hallucination detection — CuraView demonstrates combining knowledge graphs with multi-agent verification for medical AI reliability.
-
Production evaluation frameworks emerging — Multiple papers address failure modes, drift patterns, and deployment monitoring for agentic AI systems.
Notable Changes from Last Week
- First week with Pareto-optimal scaling papers — Explicit multi-agent compute efficiency optimization
- Quality-gated granularity control — New paradigm for adaptive multi-agent pipeline execution
- +50% RAG-related papers — Continued convergence of retrieval and agent architectures
- Token allocation as system design principle — Marginal token allocator frameworks proposed
🔺 Scout Intel: What Others Missed
Confidence: high | Novelty Score: 62/100
Standard coverage tracks individual paper releases, but the convergent signal across this week’s 30 agent papers reveals a strategic pivot in research priorities. Multi-Agent Reasoning (2605.01566) demonstrates that multi-agent approaches achieve Pareto-optimal scaling, fundamentally challenging the single-agent scaling orthodoxy. Agent Capsules’ 51% token reduction (2605.00410) validates that efficiency gains come from architecture, not just model improvement. RAG-Gym (2502.13957) introduces reproducible optimization protocols, addressing the “each RAG system is bespoke” problem that has hindered enterprise adoption.
Key Implication: Platform teams should invest in multi-agent orchestration infrastructure now — single-agent scaling has diminishing returns, and the 51% token efficiency gain translates directly to cost reduction at production scale. RAG-Gym’s systematic approach enables standardized evaluation, accelerating the path from prototype to production.
Previous Snapshots
- Week of Apr 23-30, 2026 — ClawNet introduces cross-user agent collaboration; HERA achieves 38.69% improvement
- Week of Apr 16-23, 2026 — Actor-Observer Asymmetry research emerges; benchmark papers surge 133%
- Week of Apr 9-16, 2026 — Earlier snapshot data
View all historical snapshots:
/tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly
Sources
- ArXiv cs.AI RSS Feed — ArXiv, May 2026
- ArXiv cs.CL RSS Feed — ArXiv, May 2026
- ArXiv API - AI Agent Papers — ArXiv, May 2026
Complete Paper List (30 papers)
| ArXiv ID | Title | Authors | Category | Published | Trend Score |
|---|---|---|---|---|---|
| 2605.01566 | Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling | Florian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp | cs.AI | 2026-05-06 | 9 |
| 2605.00410 | Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines | Aninda Ray | cs.CL, cs.AI | 2026-05-01 | 8 |
| 2502.13957 | RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation | Guangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang | cs.CL, cs.AI | 2025-02-19 | 8 |
| 2605.01986 | 12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation | Ahmet Bahaddin Ersoz | cs.AI | 2026-05-06 | 7 |
| 2605.01604 | Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework | Mukund Pandey | cs.AI | 2026-05-06 | 7 |
| 2605.03534 | SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG | Jingxi Qiu, Zeyu Han, Cheng Huang | cs.CL | 2026-05-06 | 7 |
| 2605.03476 | CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification | Severin Ye, Xiao Kong, Xiaopeng He, Guangsu Yan, Dongsuk Oh | cs.CL | 2026-05-06 | 7 |
| 2412.20138 | TradingAgents: Multi-Agents LLM Financial Trading Framework | Yijia Xiao, Edward Sun, Di Luo, Wei Wang | q-fin.TR, cs.AI | 2024-12-28 | 7 |
| 2410.02958 | AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | Patara Trirat, Wonyong Jeong, Sung Ju Hwang | cs.LG, cs.AI | 2024-10-03 | 7 |
| 2412.12881 | RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement | Jinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao, Yang Song, Tao Zhang | cs.CL, cs.AI | 2024-12-17 | 7 |
| 2605.01920 | A Language for Describing Agentic LLM Contexts | Noga Peleg Pelc, Gal A. Kaminka, Yoav Goldberg | cs.AI | 2026-05-06 | 6 |
| 2605.01208 | Faithful Mobile GUI Agents with Guided Advantage Estimator | Haowen Hu, Pengzhou Cheng, Zheng Wu, Lingzhong Dong, Gongshen Liu, Zhuosheng Zhang | cs.AI | 2026-05-06 | 6 |
| 2605.01293 | Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks | Jie-Jing Shao, Haiyan Yin, Yueming Lyu, Xingrui Yu, Lan-Zhe Guo, Ivor Tsang, James Kwok, Yu-Feng Li | cs.AI | 2026-05-06 | 6 |
| 2605.01879 | Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems | Manuel Hernandez, Eduardo Sanchez-Soto | cs.AI | 2026-05-06 | 6 |
| 2605.02910 | CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing | Cheng Qian, Hyeonjeong Ha, Jiayu Liu, Bingxiang He, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Ji | cs.AI | 2026-05-06 | 6 |
| 2605.02967 | AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines | Xintan Zeng, Yongchao Liu, Yice Luo, Jiajun Zhen | cs.AI | 2026-05-06 | 6 |
| 2605.03228 | MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory | Alexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjes | cs.CL | 2026-05-06 | 6 |
| 2605.04018 | Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems | Yilun Zhao, Jinbiao Wei, Tingyu Song, Siyue Zhang, Chen Zhao, Arman Cohan | cs.CL | 2026-05-06 | 6 |
| 2401.07324 | Small LLMs Are Weak Tool Learners: A Multi-LLM Agent | Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang | cs.AI, cs.CL | 2024-01-14 | 6 |
| 2605.01102 | Towards Multi-Agent Autonomous Reasoning in Hydrodynamics | Jinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawson | cs.AI | 2026-05-06 | 5 |
| 2605.01101 | Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent | Shakeel Sheikh, Patrick Marmaroli, MD Sahidullah, Slim Ouni, Fabrice Hirsch, Goncalo Leal, Bjorn W Schuller | cs.AI | 2026-05-06 | 5 |
| 2605.00841 | AI Agents for Sustainable SMEs: A Green ESG Assessment Framework | Viet Trinh, Tan Nguyen, Minh-Huyen Phan, Quan Luu | cs.AI | 2026-05-06 | 5 |
| 2605.01214 | Agentic AI Systems Should Be Designed as Marginal Token Allocators | Siqi Zhu | cs.AI | 2026-05-06 | 5 |
| 2605.01758 | Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems | Yue Ma, Ziyuan Yang, Yi Zhang | cs.AI | 2026-05-06 | 5 |
| 2605.01675 | CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized Checkers | Yuliang Song, Eldan Cohen | cs.AI | 2026-05-06 | 5 |
| 2605.03314 | When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning | Jiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu You | cs.CL | 2026-05-06 | 5 |
| 2605.00846 | ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations | Navapat Nananukul, Mayank Kejriwal | cs.AI | 2026-05-06 | 5 |
| 2605.01789 | DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents | Qisong Zhang, Wenzhuo Wu, Zhuangzhuang Jia, Yunhao Yang, Huayu Zhang, Xianghao Zang, Zhixiang He, Zhongjiang He, Kongming Liang, Zhanyu Ma | cs.AI | 2026-05-06 | 5 |
| 2605.01847 | NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles | Jia Xiao | cs.AI | 2026-05-06 | 5 |
Related Intel
NPM AI Packages Weekly Download Tracker — Week of May 10, 2026
Anthropic SDK gains 2.86M weekly downloads, narrowing gap with OpenAI to 15%. Vercel AI SDK ecosystem surpasses 23M downloads. LlamaIndex TS drops 35% WoW.
AI Agent Weekly Intelligence: The Enterprise Governance War Begins
Microsoft Agent 365 and NVIDIA-ServiceNow Project Arc represent competing governance architectures: endpoint-centric identity management versus runtime-based sandboxed execution. The 58-point adoption-to-governance gap defines the 2026 enterprise challenge.
Microsoft Agent 365 GA Adds Shadow AI Detection
Microsoft Agent 365 reaches general availability at $15/user with Shadow AI detection that identifies local agents like Claude Code. Defender and Intune integration enables policy-based blocking of unsanctioned agent execution.