ArXiv cs.AI Weekly — Week of May 1, 2026

Name: ArXiv cs.AI Weekly — Week of May 1, 2026
Creator: AgentScout
Published: 2026-05-07T00:00:00.000Z
Keywords: arxiv, ai-agents, multi-agent, rag, reasoning, llm

98 papers this week with 30 agent-related submissions. Multi-Agent Reasoning achieves Pareto-optimal test-time scaling; Agent Capsules reduces token usage by 51%; RAG-Gym provides systematic optimization framework.

AgentScout · Published May 7, 2026 · Updated May 7, 2026 · 10 min read

#arxiv #ai-agents #multi-agent #rag #reasoning #llm

Analyzing Data Nodes...

SIG_CONF:CALCULATING

Verified Sources

Data Overview

Snapshot Week: 2026-05-01 to 2026-05-07
Tracker: ArXiv AI Agent Papers (view all snapshots: /tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly)
Update Frequency: Weekly
Primary Sources: ArXiv cs.AI RSS, ArXiv cs.CL RSS, ArXiv API

Key Facts

Who: 98 total papers in cs.AI this week; 30 agent-related submissions across cs.AI, cs.CL, and cs.MA categories
What: Multi-Agent Reasoning paper introduces Pareto-optimal test-time scaling; Agent Capsules achieves 51% token reduction; 5 papers with Trend Score 7+
When: Week of May 1-7, 2026
Impact: Multi-agent frameworks dominate with 15 papers; RAG optimization research shows 50% week-over-week growth; token efficiency emerges as key design consideration

Methodology

This tracker monitors ArXiv cs.AI and cs.CL categories for AI agent-related submissions. Papers are filtered for relevance to agents, multi-agent systems, tool-use, reasoning, and RAG (Retrieval-Augmented Generation). Trend scores (1-10) are assigned based on novelty, citation velocity, practical applicability, and relevance to current industry trends. Data is collected via ArXiv RSS feeds and ArXiv API queries. This snapshot covers papers published during the week of May 1-7, 2026.

This Week’s Data

Top 20 Papers by Trend Score

ArXiv ID	Title	Trend Score	Key Topics
2605.01566	Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling	9	multi-agent, reasoning, test-time scaling, compute efficiency
2605.00410	Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines	8	multi-agent, pipeline optimization, token efficiency, quality gating
2502.13957	RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation	8	RAG, agent optimization, systematic framework, language agent
2605.01986	12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation	7	multi-agent, decision-making, evaluation, jury simulation
2605.01604	Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework	7	agentic AI, evaluation, failure modes, production deployment
2605.03534	SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG	7	RAG, evidence verification, uncertainty-aware, selective retrieval
2605.03476	CuraView: Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification	7	multi-agent, RAG, hallucination detection, medical AI, GraphRAG
2412.20138	TradingAgents: Multi-Agents LLM Financial Trading Framework	7	multi-agent, LLM, trading, financial analysis
2410.02958	AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML	7	multi-agent, LLM, AutoML, automation
2412.12881	RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement	7	RAG, reasoning, MCTS, verification, refinement
2605.01920	A Language for Describing Agentic LLM Contexts	6	agentic, LLM, context specification, formal language
2605.01208	Faithful Mobile GUI Agents with Guided Advantage Estimator	6	agent, GUI, mobile, advantage estimator
2605.01293	Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning	6	agentic, neuro-symbolic, skill induction, long-horizon tasks
2605.01879	Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems	6	multi-agent, autonomous, planning, sheaf theory
2605.02910	CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing	6	agent, reasoning, creativity, tool repurposing
2605.02967	AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines	6	RAG, optimization, pipeline, declarative
2605.03228	MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory	6	agent, LLM, safety, shadow memory, long-horizon threats
2605.04018	Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems	6	agentic, reasoning, retrieval, search systems
2401.07324	Small LLMs Are Weak Tool Learners: A Multi-LLM Agent	6	multi-agent, tool use, LLM, decomposition
2605.01102	Towards Multi-Agent Autonomous Reasoning in Hydrodynamics	5	multi-agent, autonomous, reasoning, hydrodynamics

Notable Papers Summary

Multi-Agent Reasoning (2605.01566) — Demonstrates that multi-agent reasoning achieves Pareto-optimal compute efficiency in test-time scaling, systematically outperforming single-agent approaches. The paper provides empirical evidence that distributing reasoning across multiple agents yields better performance per compute unit than scaling single-agent inference.

Agent Capsules (2605.00410) — Introduces adaptive execution runtime treating multi-agent pipeline execution as an optimization problem with empirical quality constraints. Achieves 51% token reduction compared to hand-crafted implementations while maintaining quality thresholds through dynamic granularity control.

RAG-Gym (2502.13957) — Comprehensive platform for systematic optimization of language agents for RAG across three dimensions: prompt engineering, actor tuning, and critic training. Provides reproducible benchmarks and optimization protocols for RAG agent development.

12 Angry AI Agents (2605.01986) — Evaluates multi-agent LLM decision-making capabilities through cinematic jury deliberation scenarios, testing collaboration, consensus formation, and deliberative reasoning under conflicting evidence conditions.

Week-over-Week Summary

Metric	This Week	Last Week	Change
Total papers (cs.AI)	98	30	+227%
Agent-related papers	30	25	+20%
Multi-agent papers	15	15	0%
RAG-related papers	12	8	+50%
High-impact (Trend Score 7+)	10	8	+25%
Reasoning-focused papers	8	10	-20%
Tool-use papers	4	4	0%

Last week snapshot: arxiv-cs-ai-weekly-20260430

Ecosystem Metrics

Category Distribution

Category	Paper Count	Percentage
cs.AI (Artificial Intelligence)	45	45.9%
cs.CL (Computation and Language)	35	35.7%
cs.MA (Multiagent Systems)	8	8.2%
cs.LG (Machine Learning)	5	5.1%
Other	5	5.1%

Topic	Paper Count	Notable Papers
Multi-agent LLM frameworks	15	2605.01566, 2605.00410, 2605.01986
RAG optimization and evaluation	12	2502.13957, 2605.03534, 2605.03476
Agent reasoning and decision-making	8	2605.01566, 2605.02910, 2605.04018
Tool use and function calling	4	2605.02910, 2401.07324
Autonomous system design	6	2605.01879, 2605.01102, 2605.01293

Keyword Frequency

Keyword	Frequency	Week-over-Week Change
agent	28	+7%
multi-agent	15	0%
RAG	12	+50%
reasoning	8	-20%
autonomous	6	+50%
LLM	6	+20%
optimization	5	+67%
evaluation	5	+25%
tool-use	4	0%
safety	3	+50%

Trends & Observations

Emergent Patterns

Pareto-optimal scaling in multi-agent reasoning — First papers explicitly addressing compute efficiency tradeoffs in multi-agent test-time scaling, moving beyond single-agent optimization paradigms.
Token efficiency as first-class design constraint — Agent Capsules’ 51% token reduction signals shift from capability-focused to efficiency-focused multi-agent pipeline design.
Systematic RAG optimization frameworks — RAG-Gym introduces Gym-style environments for reproducible RAG agent optimization, similar to RL training paradigms.
GraphRAG integration for hallucination detection — CuraView demonstrates combining knowledge graphs with multi-agent verification for medical AI reliability.
Production evaluation frameworks emerging — Multiple papers address failure modes, drift patterns, and deployment monitoring for agentic AI systems.

Notable Changes from Last Week

First week with Pareto-optimal scaling papers — Explicit multi-agent compute efficiency optimization
Quality-gated granularity control — New paradigm for adaptive multi-agent pipeline execution
+50% RAG-related papers — Continued convergence of retrieval and agent architectures
Token allocation as system design principle — Marginal token allocator frameworks proposed

🔺 Scout Intel: What Others Missed

Confidence: high | Novelty Score: 62/100

Standard coverage tracks individual paper releases, but the convergent signal across this week’s 30 agent papers reveals a strategic pivot in research priorities. Multi-Agent Reasoning (2605.01566) demonstrates that multi-agent approaches achieve Pareto-optimal scaling, fundamentally challenging the single-agent scaling orthodoxy. Agent Capsules’ 51% token reduction (2605.00410) validates that efficiency gains come from architecture, not just model improvement. RAG-Gym (2502.13957) introduces reproducible optimization protocols, addressing the “each RAG system is bespoke” problem that has hindered enterprise adoption.

Key Implication: Platform teams should invest in multi-agent orchestration infrastructure now — single-agent scaling has diminishing returns, and the 51% token efficiency gain translates directly to cost reduction at production scale. RAG-Gym’s systematic approach enables standardized evaluation, accelerating the path from prototype to production.

Previous Snapshots

Week of Apr 23-30, 2026 — ClawNet introduces cross-user agent collaboration; HERA achieves 38.69% improvement
Week of Apr 16-23, 2026 — Actor-Observer Asymmetry research emerges; benchmark papers surge 133%
Week of Apr 9-16, 2026 — Earlier snapshot data

View all historical snapshots: /tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly

Sources

ArXiv cs.AI RSS Feed — ArXiv, May 2026
ArXiv cs.CL RSS Feed — ArXiv, May 2026
ArXiv API - AI Agent Papers — ArXiv, May 2026

Complete Paper List (30 papers)

ArXiv ID	Title	Authors	Category	Published	Trend Score
2605.01566	Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling	Florian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp	cs.AI	2026-05-06	9
2605.00410	Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines	Aninda Ray	cs.CL, cs.AI	2026-05-01	8
2502.13957	RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation	Guangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang	cs.CL, cs.AI	2025-02-19	8
2605.01986	12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation	Ahmet Bahaddin Ersoz	cs.AI	2026-05-06	7
2605.01604	Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework	Mukund Pandey	cs.AI	2026-05-06	7
2605.03534	SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG	Jingxi Qiu, Zeyu Han, Cheng Huang	cs.CL	2026-05-06	7
2605.03476	CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification	Severin Ye, Xiao Kong, Xiaopeng He, Guangsu Yan, Dongsuk Oh	cs.CL	2026-05-06	7
2412.20138	TradingAgents: Multi-Agents LLM Financial Trading Framework	Yijia Xiao, Edward Sun, Di Luo, Wei Wang	q-fin.TR, cs.AI	2024-12-28	7
2410.02958	AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML	Patara Trirat, Wonyong Jeong, Sung Ju Hwang	cs.LG, cs.AI	2024-10-03	7
2412.12881	RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement	Jinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao, Yang Song, Tao Zhang	cs.CL, cs.AI	2024-12-17	7
2605.01920	A Language for Describing Agentic LLM Contexts	Noga Peleg Pelc, Gal A. Kaminka, Yoav Goldberg	cs.AI	2026-05-06	6
2605.01208	Faithful Mobile GUI Agents with Guided Advantage Estimator	Haowen Hu, Pengzhou Cheng, Zheng Wu, Lingzhong Dong, Gongshen Liu, Zhuosheng Zhang	cs.AI	2026-05-06	6
2605.01293	Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks	Jie-Jing Shao, Haiyan Yin, Yueming Lyu, Xingrui Yu, Lan-Zhe Guo, Ivor Tsang, James Kwok, Yu-Feng Li	cs.AI	2026-05-06	6
2605.01879	Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems	Manuel Hernandez, Eduardo Sanchez-Soto	cs.AI	2026-05-06	6
2605.02910	CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing	Cheng Qian, Hyeonjeong Ha, Jiayu Liu, Bingxiang He, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Ji	cs.AI	2026-05-06	6
2605.02967	AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines	Xintan Zeng, Yongchao Liu, Yice Luo, Jiajun Zhen	cs.AI	2026-05-06	6
2605.03228	MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory	Alexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjes	cs.CL	2026-05-06	6
2605.04018	Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems	Yilun Zhao, Jinbiao Wei, Tingyu Song, Siyue Zhang, Chen Zhao, Arman Cohan	cs.CL	2026-05-06	6
2401.07324	Small LLMs Are Weak Tool Learners: A Multi-LLM Agent	Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang	cs.AI, cs.CL	2024-01-14	6
2605.01102	Towards Multi-Agent Autonomous Reasoning in Hydrodynamics	Jinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawson	cs.AI	2026-05-06	5
2605.01101	Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent	Shakeel Sheikh, Patrick Marmaroli, MD Sahidullah, Slim Ouni, Fabrice Hirsch, Goncalo Leal, Bjorn W Schuller	cs.AI	2026-05-06	5
2605.00841	AI Agents for Sustainable SMEs: A Green ESG Assessment Framework	Viet Trinh, Tan Nguyen, Minh-Huyen Phan, Quan Luu	cs.AI	2026-05-06	5
2605.01214	Agentic AI Systems Should Be Designed as Marginal Token Allocators	Siqi Zhu	cs.AI	2026-05-06	5
2605.01758	Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems	Yue Ma, Ziyuan Yang, Yi Zhang	cs.AI	2026-05-06	5
2605.01675	CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized Checkers	Yuliang Song, Eldan Cohen	cs.AI	2026-05-06	5
2605.03314	When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning	Jiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu You	cs.CL	2026-05-06	5
2605.00846	ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations	Navapat Nananukul, Mayank Kejriwal	cs.AI	2026-05-06	5
2605.01789	DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents	Qisong Zhang, Wenzhuo Wu, Zhuangzhuang Jia, Yunhao Yang, Huayu Zhang, Xianghao Zang, Zhixiang He, Zhongjiang He, Kongming Liang, Zhanyu Ma	cs.AI	2026-05-06	5
2605.01847	NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles	Jia Xiao	cs.AI	2026-05-06	5

ArXiv cs.AI Weekly — Week of May 1, 2026

AgentScout · Published May 7, 2026 · Updated May 7, 2026 · 10 min read

#arxiv #ai-agents #multi-agent #rag #reasoning #llm

Analyzing Data Nodes...

SIG_CONF:CALCULATING

Verified Sources

Data Overview

Snapshot Week: 2026-05-01 to 2026-05-07
Tracker: ArXiv AI Agent Papers (view all snapshots: /tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly)
Update Frequency: Weekly
Primary Sources: ArXiv cs.AI RSS, ArXiv cs.CL RSS, ArXiv API

Key Facts

Who: 98 total papers in cs.AI this week; 30 agent-related submissions across cs.AI, cs.CL, and cs.MA categories
What: Multi-Agent Reasoning paper introduces Pareto-optimal test-time scaling; Agent Capsules achieves 51% token reduction; 5 papers with Trend Score 7+
When: Week of May 1-7, 2026
Impact: Multi-agent frameworks dominate with 15 papers; RAG optimization research shows 50% week-over-week growth; token efficiency emerges as key design consideration

Methodology

This Week’s Data

Top 20 Papers by Trend Score

ArXiv ID	Title	Trend Score	Key Topics
2605.01566	Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling	9	multi-agent, reasoning, test-time scaling, compute efficiency
2605.00410	Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines	8	multi-agent, pipeline optimization, token efficiency, quality gating
2502.13957	RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation	8	RAG, agent optimization, systematic framework, language agent
2605.01986	12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation	7	multi-agent, decision-making, evaluation, jury simulation
2605.01604	Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework	7	agentic AI, evaluation, failure modes, production deployment
2605.03534	SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG	7	RAG, evidence verification, uncertainty-aware, selective retrieval
2605.03476	CuraView: Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification	7	multi-agent, RAG, hallucination detection, medical AI, GraphRAG
2412.20138	TradingAgents: Multi-Agents LLM Financial Trading Framework	7	multi-agent, LLM, trading, financial analysis
2410.02958	AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML	7	multi-agent, LLM, AutoML, automation
2412.12881	RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement	7	RAG, reasoning, MCTS, verification, refinement
2605.01920	A Language for Describing Agentic LLM Contexts	6	agentic, LLM, context specification, formal language
2605.01208	Faithful Mobile GUI Agents with Guided Advantage Estimator	6	agent, GUI, mobile, advantage estimator
2605.01293	Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning	6	agentic, neuro-symbolic, skill induction, long-horizon tasks
2605.01879	Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems	6	multi-agent, autonomous, planning, sheaf theory
2605.02910	CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing	6	agent, reasoning, creativity, tool repurposing
2605.02967	AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines	6	RAG, optimization, pipeline, declarative
2605.03228	MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory	6	agent, LLM, safety, shadow memory, long-horizon threats
2605.04018	Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems	6	agentic, reasoning, retrieval, search systems
2401.07324	Small LLMs Are Weak Tool Learners: A Multi-LLM Agent	6	multi-agent, tool use, LLM, decomposition
2605.01102	Towards Multi-Agent Autonomous Reasoning in Hydrodynamics	5	multi-agent, autonomous, reasoning, hydrodynamics

Notable Papers Summary

Week-over-Week Summary

Metric	This Week	Last Week	Change
Total papers (cs.AI)	98	30	+227%
Agent-related papers	30	25	+20%
Multi-agent papers	15	15	0%
RAG-related papers	12	8	+50%
High-impact (Trend Score 7+)	10	8	+25%
Reasoning-focused papers	8	10	-20%
Tool-use papers	4	4	0%

Last week snapshot: arxiv-cs-ai-weekly-20260430

Ecosystem Metrics

Category Distribution

Category	Paper Count	Percentage
cs.AI (Artificial Intelligence)	45	45.9%
cs.CL (Computation and Language)	35	35.7%
cs.MA (Multiagent Systems)	8	8.2%
cs.LG (Machine Learning)	5	5.1%
Other	5	5.1%

Topic	Paper Count	Notable Papers
Multi-agent LLM frameworks	15	2605.01566, 2605.00410, 2605.01986
RAG optimization and evaluation	12	2502.13957, 2605.03534, 2605.03476
Agent reasoning and decision-making	8	2605.01566, 2605.02910, 2605.04018
Tool use and function calling	4	2605.02910, 2401.07324
Autonomous system design	6	2605.01879, 2605.01102, 2605.01293

Keyword Frequency

Keyword	Frequency	Week-over-Week Change
agent	28	+7%
multi-agent	15	0%
RAG	12	+50%
reasoning	8	-20%
autonomous	6	+50%
LLM	6	+20%
optimization	5	+67%
evaluation	5	+25%
tool-use	4	0%
safety	3	+50%

Trends & Observations

Emergent Patterns

Pareto-optimal scaling in multi-agent reasoning — First papers explicitly addressing compute efficiency tradeoffs in multi-agent test-time scaling, moving beyond single-agent optimization paradigms.
Token efficiency as first-class design constraint — Agent Capsules’ 51% token reduction signals shift from capability-focused to efficiency-focused multi-agent pipeline design.
Systematic RAG optimization frameworks — RAG-Gym introduces Gym-style environments for reproducible RAG agent optimization, similar to RL training paradigms.
GraphRAG integration for hallucination detection — CuraView demonstrates combining knowledge graphs with multi-agent verification for medical AI reliability.
Production evaluation frameworks emerging — Multiple papers address failure modes, drift patterns, and deployment monitoring for agentic AI systems.

Notable Changes from Last Week

First week with Pareto-optimal scaling papers — Explicit multi-agent compute efficiency optimization
Quality-gated granularity control — New paradigm for adaptive multi-agent pipeline execution
+50% RAG-related papers — Continued convergence of retrieval and agent architectures
Token allocation as system design principle — Marginal token allocator frameworks proposed

🔺 Scout Intel: What Others Missed

Confidence: high | Novelty Score: 62/100

Previous Snapshots

Week of Apr 23-30, 2026 — ClawNet introduces cross-user agent collaboration; HERA achieves 38.69% improvement
Week of Apr 16-23, 2026 — Actor-Observer Asymmetry research emerges; benchmark papers surge 133%
Week of Apr 9-16, 2026 — Earlier snapshot data

View all historical snapshots: /tech/ai-agents/data/?tracker=arxiv-cs-ai-weekly

Sources

ArXiv cs.AI RSS Feed — ArXiv, May 2026
ArXiv cs.CL RSS Feed — ArXiv, May 2026
ArXiv API - AI Agent Papers — ArXiv, May 2026

Complete Paper List (30 papers)

ArXiv ID	Title	Authors	Category	Published	Trend Score
2605.01566	Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling	Florian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp	cs.AI	2026-05-06	9
2605.00410	Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines	Aninda Ray	cs.CL, cs.AI	2026-05-01	8
2502.13957	RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation	Guangzhi Xiong, Qiao Jin, Xiao Wang, Yin Fang, Haolin Liu, Yifan Yang, Fangyuan Chen, Zhixing Song, Dengyu Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang	cs.CL, cs.AI	2025-02-19	8
2605.01986	12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation	Ahmet Bahaddin Ersoz	cs.AI	2026-05-06	7
2605.01604	Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework	Mukund Pandey	cs.AI	2026-05-06	7
2605.03534	SURE-RAG: Sufficiency and Uncertainty-Aware Evidence Verification for Selective RAG	Jingxi Qiu, Zeyu Han, Cheng Huang	cs.CL	2026-05-06	7
2605.03476	CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification	Severin Ye, Xiao Kong, Xiaopeng He, Guangsu Yan, Dongsuk Oh	cs.CL	2026-05-06	7
2412.20138	TradingAgents: Multi-Agents LLM Financial Trading Framework	Yijia Xiao, Edward Sun, Di Luo, Wei Wang	q-fin.TR, cs.AI	2024-12-28	7
2410.02958	AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML	Patara Trirat, Wonyong Jeong, Sung Ju Hwang	cs.LG, cs.AI	2024-10-03	7
2412.12881	RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement	Jinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao, Yang Song, Tao Zhang	cs.CL, cs.AI	2024-12-17	7
2605.01920	A Language for Describing Agentic LLM Contexts	Noga Peleg Pelc, Gal A. Kaminka, Yoav Goldberg	cs.AI	2026-05-06	6
2605.01208	Faithful Mobile GUI Agents with Guided Advantage Estimator	Haowen Hu, Pengzhou Cheng, Zheng Wu, Lingzhong Dong, Gongshen Liu, Zhuosheng Zhang	cs.AI	2026-05-06	6
2605.01293	Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks	Jie-Jing Shao, Haiyan Yin, Yueming Lyu, Xingrui Yu, Lan-Zhe Guo, Ivor Tsang, James Kwok, Yu-Feng Li	cs.AI	2026-05-06	6
2605.01879	Sheaf-Theoretic Planning: A Categorical Foundation for Resilient Multi-Agent Autonomous Systems	Manuel Hernandez, Eduardo Sanchez-Soto	cs.AI	2026-05-06	6
2605.02910	CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing	Cheng Qian, Hyeonjeong Ha, Jiayu Liu, Bingxiang He, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Ji	cs.AI	2026-05-06	6
2605.02967	AutoRAGTuner: A Declarative Framework for Automatic Optimization of RAG Pipelines	Xintan Zeng, Yongchao Liu, Yice Luo, Jiajun Zhen	cs.AI	2026-05-06	6
2605.03228	MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory	Alexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjes	cs.CL	2026-05-06	6
2605.04018	Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems	Yilun Zhao, Jinbiao Wei, Tingyu Song, Siyue Zhang, Chen Zhao, Arman Cohan	cs.CL	2026-05-06	6
2401.07324	Small LLMs Are Weak Tool Learners: A Multi-LLM Agent	Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang	cs.AI, cs.CL	2024-01-14	6
2605.01102	Towards Multi-Agent Autonomous Reasoning in Hydrodynamics	Jinpai Zhao, Albert Cerrone, Joannes Westerink, Clint Dawson	cs.AI	2026-05-06	5
2605.01101	Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent	Shakeel Sheikh, Patrick Marmaroli, MD Sahidullah, Slim Ouni, Fabrice Hirsch, Goncalo Leal, Bjorn W Schuller	cs.AI	2026-05-06	5
2605.00841	AI Agents for Sustainable SMEs: A Green ESG Assessment Framework	Viet Trinh, Tan Nguyen, Minh-Huyen Phan, Quan Luu	cs.AI	2026-05-06	5
2605.01214	Agentic AI Systems Should Be Designed as Marginal Token Allocators	Siqi Zhu	cs.AI	2026-05-06	5
2605.01758	Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems	Yue Ma, Ziyuan Yang, Yi Zhang	cs.AI	2026-05-06	5
2605.01675	CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized Checkers	Yuliang Song, Eldan Cohen	cs.AI	2026-05-06	5
2605.03314	When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning	Jiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu You	cs.CL	2026-05-06	5
2605.00846	ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations	Navapat Nananukul, Mayank Kejriwal	cs.AI	2026-05-06	5
2605.01789	DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents	Qisong Zhang, Wenzhuo Wu, Zhuangzhuang Jia, Yunhao Yang, Huayu Zhang, Xianghao Zang, Zhixiang He, Zhongjiang He, Kongming Liang, Zhanyu Ma	cs.AI	2026-05-06	5
2605.01847	NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles	Jia Xiao	cs.AI	2026-05-06	5

0zik0ld3z2heg0860asdrgf░░░aa97arp6lf9raohzgyi2mqzbtgqnxuj6░░░re1jf467k5r8xi3afujk6s8co2umx1dxa░░░j28xb2ecf3kuxbbdyqsyf9xeh1ncz2wu████xbz5j07ucrn3kil3c1zq4988x2kqhd2sj░░░et04d2p98opnn4lhlzahgpne4ubo58gn████5du55zdetgepdi54htkat9vflwv9o88u░░░70kt0oxljidw181nbkxt1vwx11kblzog░░░pzvtm4cx15mopq06pv8qbm8y7ufdbi26y████rmwlowhh75nlfjvxmicbu94pfc5i95aru░░░w2vi3xiks08mlz4wp7ox8mb0fow18r19l████4tbhnt1maasalo0en6awdlnn82t5siaon░░░razijhlh9fl99foas4y90imc3xetsqtii████3l8wkggihxc8g67qfqeja35jx3mot61oi░░░bzl3nyuuulrinng3oit3so9nkcdhmmso░░░lmwvdy3ywv3k7qxlboghascj1lcyrrgs░░░1i7my7790rgfwqb0jbb5bourh9k6b1x5p░░░oxyi7d2zecqz2qb7c5eqte9j2izjor░░░3ic80tvo8arvq6ta4v8rbrz1rl5r9rbws░░░jc5otao8updszvkq9qm746zmssft1e5f████swwvsydzmxsl44maqg5129n4kklcr5z5░░░ef0k7y4a014e9kqpvuh844wathpeohan8████lsc6hbiagz4b0ce5austsai72s5rtk6░░░7ql7pv8dqcl1r8k9mggolnmcse4r4pek░░░k9vs578qm161e7btb983xkan8mzmmim████2gfjr72avxcddyqg4080roj8w3rw42njh████zk0tybi8a5nizjjgja87m8zfbrhc1amp░░░jdomts47q2tuz5c9u213eq8776l78fi7x████av7cycaczc0n8rjh0krnhnrgq16dxu4░░░xdzoglyp31ijx5ry6gip88e4ihvonol8░░░9m4y6b52ysw1fm8vp839yoweuejhah8l████slyhw7orl3njdman430f38z2wtdoznbed░░░vt6jdv2947mlvm7cfbbpdl861cutkjbbv████iu89f7oh6jy9rcstb29l0yi84karzukm████2d8hcbndwkfjykz8bb4g1mpq8hjezhqs░░░qf5t7xr4iangf8troo5x5r0jzf8lg79a░░░oe397n0zs8j4i8qv1vi5jtlfbvah6qnpr░░░agry4zft7zrzza9a24yrnjp1lfw6f6cjd████0wp1h0j064ok1ec1jq2hc6ni7oyyg7kkj░░░e55frnocvdhxxjtc92m3zfk8ap8svmke░░░5fggrqcjt9enh3hf7w05q1jr0lmvuakk████5lrvxr8oofo4rrtuqov0n51ay2mv1g3ki████fgzcx6yprnoffwu4vtu37ay78b42twctg████h5wemdu5i4jt1y0ps3wqzbmolanh63d1████ggt1zpf95rf3h4ehjpf6n7ngkxb2cq2q░░░ksvnh91bs0de10e5uaeg9flgx22tdopxs░░░u8r8ptnuzggb2f6pads7gtcaqfpvs3b8d████sjzqbvyzeseu18t2g9g2hqw9tknwwaliq░░░91e0mas14cygtw4s9lxqk7dey9kt0w3t░░░ybmdon56yrgnpjjvo0ykmrqomvrimjml████w9uc7r8c07j

Related Intel

Data Jun 25, 2026

ArXiv cs.AI Weekly Papers Tracker — Week of Jun 25, 2026

ArXiv cs.AI papers for Jun 18-25, 2026: 32 total, 68.8% agent-related (22 papers), avg trend score 9.14. Notable: RIFT-Bench, Metis self-evolving agents, 14 new benchmarks.

#arxiv #cs-ai #agents #benchmarks

Data Jun 23, 2026

LLM Product Release Tracker — Week of Jun 17, 2026

Weekly snapshot of LLM vendor product releases, feature updates, and enterprise announcements. This week: Anthropic Korea expansion, Google TTS streaming.

#llm #product-release #anthropic #google

Data Jun 22, 2026

GitHub AI Agent Repository Stars Tracker — Week of Jun 22, 2026

hermes-agent hits 198,941 stars (+2.82% WoW). Python/TypeScript dominate 77% of top 30. Ecosystem grows to 158 repos.

#github #ai-agents #stars-tracker #open-source

Data Overview

Key Facts

Methodology

This Week’s Data

Top 20 Papers by Trend Score

Notable Papers Summary

Week-over-Week Summary

Ecosystem Metrics

Category Distribution

Trending Topics This Week

Keyword Frequency

Trends & Observations

Emergent Patterns

Notable Changes from Last Week

🔺 Scout Intel: What Others Missed

Previous Snapshots

Sources

Data Overview

Key Facts

Methodology

This Week’s Data

Top 20 Papers by Trend Score

Notable Papers Summary

Week-over-Week Summary

Ecosystem Metrics

Category Distribution

Trending Topics This Week

Keyword Frequency

Trends & Observations

Emergent Patterns

Notable Changes from Last Week

🔺 Scout Intel: What Others Missed

Previous Snapshots

Sources

Related Intel

ArXiv cs.AI Weekly Papers Tracker — Week of Jun 25, 2026

LLM Product Release Tracker — Week of Jun 17, 2026

GitHub AI Agent Repository Stars Tracker — Week of Jun 22, 2026