The AI Agent Research Report, September ’25 Edition

Article

By

Stephen Harrison

September 17, 2025

The AI Agent Research Report, September ’25 Edition: The Leap from Language Model to Autonomous Actor by AryaXAI

The past few months, July to September '25, have been transformative for AI agents, systems that combine large language models (LLMs) with planning, reasoning, and acting capabilities. Researchers have proposed new frameworks, benchmarks, and applications that show how AI agents can autonomously shop online, control mobile interfaces, coordinate multi‑agent simulations and even deliberate on court cases.

In this AI Agent research report, September '25 Edition, we dive deep into the latest and most impactful AI agent research from the month of August and September 2025. Each paper is linked to its source, and we break down the technical contributions, emerging trends, and real‑world implications.

For readers hungry for context, check out our related pieces on understanding AI alignment and balancing AI alignment with model performance.

Papers Covered in This Article

  1. A Comprehensive Survey of Self‑Evolving AI Agents
  2. A Comprehensive Review of AI Agents
  3. An Economy of AI Agents
  4. AI Agents and the Law 
  5. What Is Your AI Agent Buying? Agentic E‑Commerce
  6. AI Agents for Web Testing (WebProber)
  7. Incident Analysis for AI Agents
  8. Efficient Agents
  9. KG‑RAG: Enhancing GUI Agent Decision‑Making
  10. MobiAgent: A Systematic Framework for Customizable Mobile Agents
  11. Auras: Boosting Embodied AI Agents
  12. Automatic Differentiation of Agent‑Based Models 
  13. SAMVAD: Simulating Judicial Deliberations
  14. ShortageSim: Simulating Drug Shortages
  15. Synthetic Founders
  16. Nash Q‑Network for Cybersecurity
  17. Controller Synthesis Method for Multi‑Agent Systems
  18. Aspective Agentic AI
  19. Contemporary Agent Technology: LLM‑Driven Advancements vs Classic MAS
  20. Agentic AI for Software: Thoughts from the Software Engineering Community

1. Foundational Surveys and Frameworks

A Comprehensive Survey of Self‑Evolving AI Agents

Self‑evolving agents are systems that automatically upgrade themselves based on interaction data. The authors propose a unified conceptual framework consisting of four components - system inputs, the agent system, the environment and optimisers - and review domain‑specific evolution strategies. They also discuss evaluation, safety and ethics, emphasising that self‑evolving agents require careful oversight. This survey provides a roadmap for building adaptive agents that learn and improve after deployment.

A Comprehensive Review of AI Agents

This comprehensive review examines the architectural principles, components and paradigms of AI agents. It describes cognitive models, hierarchical reinforcement learning and LLM‑based reasoning architectures, and highlights ethical, safety and interpretability concerns. For developers designing agentic systems, this review offers a broad view of the field’s current state and emerging challenges.

An Economy of AI Agents

As agents start to act on our behalf, economic and institutional frameworks become critical. This chapter argues that 2025 is “the year of agents” and explores how AI agents will interact with humans and each other to shape markets. It highlights unpredictability due to misalignment and reward specification issues and calls for institutions to manage the behaviour of economic agents. For a deeper look at ethical and economic considerations, see our article on AI alignment vs. performance.

2. Legal, Social and Ethical Implications

AI Agents and the Law

This paper explores socio‑legal issues when AI agents act on behalf of users. It compares technical notions of agency to legal concepts such as loyalty and disclosure, revealing gaps between the two. The authors argue that alignment mechanisms must address these gaps to ensure agents act responsibly. They also warn about agentic e‑commerce, scenarios where sellers might optimise product descriptions for AI agents rather than humans - and call for regulation.

What Is Your AI Agent Buying? Agentic E‑Commerce

Using a sandbox environment called ACES, this study combines a vision‑language model agent with a mock marketplace to evaluate how agents shop. It finds strong but heterogeneous position effects: agents heavily favour products at the top of lists, yet exhibit diverse preferences across models. Sellers can even optimise their descriptions to capture agent attention. These results reveal new risks for consumer protection and antitrust law as AI agents start to purchase goods on our behalf.

AI Agents for Web Testing

WebProber is a prototype agent that autonomously explores websites to identify bugs and usability issues. In a case study, the agent discovered 29 issues missed by traditional testing frameworks. By simulating diverse user behaviours and producing detailed reports, WebProber demonstrates how AI agents can improve software quality assurance.

Incident Analysis for AI Agents

Failures in agent systems can be hard to diagnose. This paper proposes an incident analysis framework that categorises factors leading to incidents into system‑related, contextual and cognitive groups and recommends capturing information such as activity logs, system documentation and tool states. The goal is to build a culture of transparency and learning from failures, much like aviation safety.

3. Efficiency, Performance and Tools

Efficient Agents: Building Effective Agents While Reducing Cost

Although this paper was released in late July, its insights are crucial for the developments in the August to September period of research. The authors systematically study the efficiency‑effectiveness trade-off in LLM‑driven agents on the GAIA benchmark. They introduce the cost‑of‑pass metric and design Efficient Agents that retain 96.7 % of baseline performance while reducing operational cost by 28.4 %. With AI services incurring significant compute bills, efficiency techniques are becoming essential.

KG‑RAG: Enhancing GUI Agent Decision‑Making via Knowledge Graph‑Driven Retrieval‑Augmented Generation

Many GUI agents struggle to understand screen transitions and context. KG‑RAG transforms UI transition graphs into structured knowledge graphs and uses knowledge‑graph‑driven retrieval‑augmented generation. This approach improves success rate and decision accuracy and reduces the number of steps needed to complete tasks. The authors also introduce KG‑Android‑Bench and KG‑Harmony‑Bench to evaluate agents in Chinese mobile ecosystems.

MobiAgent: A Systematic Framework for Customizable Mobile Agents

Mobile applications pose unique challenges for AI agents. MobiAgent comprises three components: MobiMind agent models, the AgentRR acceleration framework and MobiFlow benchmarking. It includes an AI‑assisted data collection pipeline that reduces annotation costs and achieves state‑of‑the‑art performance on real‑world mobile tasks. This framework makes it easier to build and evaluate agents that interact with smartphone apps.

Boosting Embodied AI Agents through Perception‑Generation Disaggregation and Asynchronous Pipeline Execution

Embodied agents, such as household robots, need to process sensor data and generate actions in real time. The Auras framework disaggregates perception and generation modules and uses pipeline parallelism to increase throughput while maintaining accuracy. It also introduces a “public context” to share perceptions and avoid data staleness.

Automatic Differentiation of Agent‑Based Models

Agent‑based models (ABMs) are widely used to simulate social, biological and physical systems, but calibrating them can be challenging. This paper applies automatic differentiation to ABMs, enabling gradient‑based parameter calibration via variational inference and improving performance on models such as Sugarscape and SIR. It bridges the gap between simulation modelling and deep learning tools.

4. Domain‑Specific Applications and Multi‑Agent Simulations

SAMVAD: A Multi‑Agent System for Simulating Judicial Deliberation Dynamics in India

In this ambitious system, AI agents represent the judge, prosecution, defence and multiple adjudicators. Each agent retrieves relevant legal materials and generates arguments using retrieval‑augmented generation, while the adjudicators discuss and arrive at a verdict. The platform enables researchers to explore legal reasoning, fairness and consensus in a controlled environment.

ShortageSim: Simulating Drug Shortages under Information Asymmetry

How do manufacturers, buyers and regulators react to drug shortages when information is asymmetric? ShortageSim creates a multi‑agent simulation where LLM‑based agents play the roles of manufacturers, buyers and regulatory agencies, each with different information. The system significantly reduces resolution lag - the time it takes for a shortage to be resolved, and comes with a public dataset of 2,925 FDA shortage events. Such simulations help policymakers design more resilient supply chains.

Synthetic Founders: AI‑Generated Social Simulations for Startup Validation Research

This study compares interviews with human startup founders to conversations with AI‑generated “synthetic founders.” It finds many convergent and partial themes, along with some unique insights, suggesting that LLM‑driven personas can serve as complementary simulation tools for entrepreneurial research. By simulating diverse founder archetypes, researchers can explore biases and iterate on business ideas without recruiting participants for every test.

Nash Q‑Network for Multi‑Agent Cybersecurity Simulation

Cybersecurity often involves adversarial interactions. This paper introduces a policy‑based Nash Q‑learning network that combines Proximal Policy Optimisation (PPO), Deep Q‑Networks (DQN) and Nash‑Q to learn Nash‑optimal strategies in competitive environments. The approach yields robust defence strategies in multi‑agent security games.

Controller Synthesis for Multi‑Agent Probabilistic Systems

Designing controllers for semi‑cooperative, semi‑competitive multi‑agent systems is hard. This method uses temporal logic specifications and probabilistic model checking to synthesise controllers that satisfy complex requirements. It shows how formal methods can be applied to modern agentic systems.

Aspective Agentic AI: Situating Agents in Their Environment

Many agent architectures leak information because they rely on global world models. Aspective Agentic AI proposes a bottom‑up framework where behaviours are triggered only by local environmental changes, and each agent perceives different subsets of the environment (called umwelt aspects). This reduces information leakage from 83 % to zero and improves both security and efficiency. It highlights the importance of situational awareness and privacy in agent design.

5. Reflections and Future Directions

Contemporary Agent Technology: LLM‑Driven Advancements vs Classic Multi‑Agent Systems

This reflective piece contrasts modern LLM‑driven agents with traditional multi‑agent systems. It notes that LLM agents excel at reasoning and language tasks, while classic systems offer proven frameworks for coordination and negotiation. The authors emphasise that integrating both paradigms will be key to building robust, reliable agents.

Agentic AI for Software: Thoughts from the Software Engineering Community

Software developers are increasingly turning to agents for tasks beyond code generation—testing, program repair, architecture exploration and more. This comment underscores the need to clarify developer intent, incorporate verification and validation and treat agents as team members. For more on AI in software engineering, see our article on explainability and AI tools for developers.

A Comprehensive Analysis of AI Incidents: What We Can Learn

As agents become more pervasive, incidents are inevitable. The incident analysis framework outlined earlier provides practical guidance for building safer systems. Lessons include the importance of maintaining detailed logs, documenting tool states and proactively sharing incident reports to prevent repeat failures.

Conclusion

The research from August and September 2025 paints a rich picture of AI agents maturing across dimensions:

  1. Frameworks and Taxonomies: Surveys and frameworks provide structure, enabling researchers to unify concepts and highlight safety and ethical considerations.
  2. Safety and Regulation: Legal and socio‑economic analyses remind us that alignment, fairness and accountability must evolve alongside technological advances.
  3. Efficiency and Scale: Tools like Efficient Agents and Auras show that performance gains and cost reductions can go hand‑in‑hand.
  4. Domain‑Specific Applications: From healthcare supply chains to judicial deliberations and cybersecurity, agents are being deployed in increasingly complex environments.
  5. Information Privacy and Locality: Aspective Agentic AI demonstrates that carefully designing how agents perceive their environment reduces information leakage and improves security.

Together, these papers signal that the age of autonomous agents has moved from concept to practice. As agents take on more responsibilities in our digital and physical worlds, integrating insights from efficiency, safety, regulation and domain‑specific research will be critical. Stay tuned to AryaXAI for continuous updates on the evolving landscape of AI agents, alignment and interpretability.

SHARE THIS

Subscribe to AryaXAI

Stay up to date with all updates

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Discover More Articles

Explore a curated collection of in-depth articles covering the latest advancements, insights, and trends in AI, MLOps, governance, and more. Stay informed with expert analyses, thought leadership, and actionable knowledge to drive innovation in your field.

View All

Is Explainability critical for your AI solutions?

Schedule a demo with our team to understand how AryaXAI can make your mission-critical 'AI' acceptable and aligned with all your stakeholders.

The AI Agent Research Report, September ’25 Edition

Stephen HarrisonStephen Harrison
Stephen Harrison
September 17, 2025
The AI Agent Research Report, September ’25 Edition
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

The past few months, July to September '25, have been transformative for AI agents, systems that combine large language models (LLMs) with planning, reasoning, and acting capabilities. Researchers have proposed new frameworks, benchmarks, and applications that show how AI agents can autonomously shop online, control mobile interfaces, coordinate multi‑agent simulations and even deliberate on court cases.

In this AI Agent research report, September '25 Edition, we dive deep into the latest and most impactful AI agent research from the month of August and September 2025. Each paper is linked to its source, and we break down the technical contributions, emerging trends, and real‑world implications.

For readers hungry for context, check out our related pieces on understanding AI alignment and balancing AI alignment with model performance.

Papers Covered in This Article

  1. A Comprehensive Survey of Self‑Evolving AI Agents
  2. A Comprehensive Review of AI Agents
  3. An Economy of AI Agents
  4. AI Agents and the Law 
  5. What Is Your AI Agent Buying? Agentic E‑Commerce
  6. AI Agents for Web Testing (WebProber)
  7. Incident Analysis for AI Agents
  8. Efficient Agents
  9. KG‑RAG: Enhancing GUI Agent Decision‑Making
  10. MobiAgent: A Systematic Framework for Customizable Mobile Agents
  11. Auras: Boosting Embodied AI Agents
  12. Automatic Differentiation of Agent‑Based Models 
  13. SAMVAD: Simulating Judicial Deliberations
  14. ShortageSim: Simulating Drug Shortages
  15. Synthetic Founders
  16. Nash Q‑Network for Cybersecurity
  17. Controller Synthesis Method for Multi‑Agent Systems
  18. Aspective Agentic AI
  19. Contemporary Agent Technology: LLM‑Driven Advancements vs Classic MAS
  20. Agentic AI for Software: Thoughts from the Software Engineering Community

1. Foundational Surveys and Frameworks

A Comprehensive Survey of Self‑Evolving AI Agents

Self‑evolving agents are systems that automatically upgrade themselves based on interaction data. The authors propose a unified conceptual framework consisting of four components - system inputs, the agent system, the environment and optimisers - and review domain‑specific evolution strategies. They also discuss evaluation, safety and ethics, emphasising that self‑evolving agents require careful oversight. This survey provides a roadmap for building adaptive agents that learn and improve after deployment.

A Comprehensive Review of AI Agents

This comprehensive review examines the architectural principles, components and paradigms of AI agents. It describes cognitive models, hierarchical reinforcement learning and LLM‑based reasoning architectures, and highlights ethical, safety and interpretability concerns. For developers designing agentic systems, this review offers a broad view of the field’s current state and emerging challenges.

An Economy of AI Agents

As agents start to act on our behalf, economic and institutional frameworks become critical. This chapter argues that 2025 is “the year of agents” and explores how AI agents will interact with humans and each other to shape markets. It highlights unpredictability due to misalignment and reward specification issues and calls for institutions to manage the behaviour of economic agents. For a deeper look at ethical and economic considerations, see our article on AI alignment vs. performance.

2. Legal, Social and Ethical Implications

AI Agents and the Law

This paper explores socio‑legal issues when AI agents act on behalf of users. It compares technical notions of agency to legal concepts such as loyalty and disclosure, revealing gaps between the two. The authors argue that alignment mechanisms must address these gaps to ensure agents act responsibly. They also warn about agentic e‑commerce, scenarios where sellers might optimise product descriptions for AI agents rather than humans - and call for regulation.

What Is Your AI Agent Buying? Agentic E‑Commerce

Using a sandbox environment called ACES, this study combines a vision‑language model agent with a mock marketplace to evaluate how agents shop. It finds strong but heterogeneous position effects: agents heavily favour products at the top of lists, yet exhibit diverse preferences across models. Sellers can even optimise their descriptions to capture agent attention. These results reveal new risks for consumer protection and antitrust law as AI agents start to purchase goods on our behalf.

AI Agents for Web Testing

WebProber is a prototype agent that autonomously explores websites to identify bugs and usability issues. In a case study, the agent discovered 29 issues missed by traditional testing frameworks. By simulating diverse user behaviours and producing detailed reports, WebProber demonstrates how AI agents can improve software quality assurance.

Incident Analysis for AI Agents

Failures in agent systems can be hard to diagnose. This paper proposes an incident analysis framework that categorises factors leading to incidents into system‑related, contextual and cognitive groups and recommends capturing information such as activity logs, system documentation and tool states. The goal is to build a culture of transparency and learning from failures, much like aviation safety.

3. Efficiency, Performance and Tools

Efficient Agents: Building Effective Agents While Reducing Cost

Although this paper was released in late July, its insights are crucial for the developments in the August to September period of research. The authors systematically study the efficiency‑effectiveness trade-off in LLM‑driven agents on the GAIA benchmark. They introduce the cost‑of‑pass metric and design Efficient Agents that retain 96.7 % of baseline performance while reducing operational cost by 28.4 %. With AI services incurring significant compute bills, efficiency techniques are becoming essential.

KG‑RAG: Enhancing GUI Agent Decision‑Making via Knowledge Graph‑Driven Retrieval‑Augmented Generation

Many GUI agents struggle to understand screen transitions and context. KG‑RAG transforms UI transition graphs into structured knowledge graphs and uses knowledge‑graph‑driven retrieval‑augmented generation. This approach improves success rate and decision accuracy and reduces the number of steps needed to complete tasks. The authors also introduce KG‑Android‑Bench and KG‑Harmony‑Bench to evaluate agents in Chinese mobile ecosystems.

MobiAgent: A Systematic Framework for Customizable Mobile Agents

Mobile applications pose unique challenges for AI agents. MobiAgent comprises three components: MobiMind agent models, the AgentRR acceleration framework and MobiFlow benchmarking. It includes an AI‑assisted data collection pipeline that reduces annotation costs and achieves state‑of‑the‑art performance on real‑world mobile tasks. This framework makes it easier to build and evaluate agents that interact with smartphone apps.

Boosting Embodied AI Agents through Perception‑Generation Disaggregation and Asynchronous Pipeline Execution

Embodied agents, such as household robots, need to process sensor data and generate actions in real time. The Auras framework disaggregates perception and generation modules and uses pipeline parallelism to increase throughput while maintaining accuracy. It also introduces a “public context” to share perceptions and avoid data staleness.

Automatic Differentiation of Agent‑Based Models

Agent‑based models (ABMs) are widely used to simulate social, biological and physical systems, but calibrating them can be challenging. This paper applies automatic differentiation to ABMs, enabling gradient‑based parameter calibration via variational inference and improving performance on models such as Sugarscape and SIR. It bridges the gap between simulation modelling and deep learning tools.

4. Domain‑Specific Applications and Multi‑Agent Simulations

SAMVAD: A Multi‑Agent System for Simulating Judicial Deliberation Dynamics in India

In this ambitious system, AI agents represent the judge, prosecution, defence and multiple adjudicators. Each agent retrieves relevant legal materials and generates arguments using retrieval‑augmented generation, while the adjudicators discuss and arrive at a verdict. The platform enables researchers to explore legal reasoning, fairness and consensus in a controlled environment.

ShortageSim: Simulating Drug Shortages under Information Asymmetry

How do manufacturers, buyers and regulators react to drug shortages when information is asymmetric? ShortageSim creates a multi‑agent simulation where LLM‑based agents play the roles of manufacturers, buyers and regulatory agencies, each with different information. The system significantly reduces resolution lag - the time it takes for a shortage to be resolved, and comes with a public dataset of 2,925 FDA shortage events. Such simulations help policymakers design more resilient supply chains.

Synthetic Founders: AI‑Generated Social Simulations for Startup Validation Research

This study compares interviews with human startup founders to conversations with AI‑generated “synthetic founders.” It finds many convergent and partial themes, along with some unique insights, suggesting that LLM‑driven personas can serve as complementary simulation tools for entrepreneurial research. By simulating diverse founder archetypes, researchers can explore biases and iterate on business ideas without recruiting participants for every test.

Nash Q‑Network for Multi‑Agent Cybersecurity Simulation

Cybersecurity often involves adversarial interactions. This paper introduces a policy‑based Nash Q‑learning network that combines Proximal Policy Optimisation (PPO), Deep Q‑Networks (DQN) and Nash‑Q to learn Nash‑optimal strategies in competitive environments. The approach yields robust defence strategies in multi‑agent security games.

Controller Synthesis for Multi‑Agent Probabilistic Systems

Designing controllers for semi‑cooperative, semi‑competitive multi‑agent systems is hard. This method uses temporal logic specifications and probabilistic model checking to synthesise controllers that satisfy complex requirements. It shows how formal methods can be applied to modern agentic systems.

Aspective Agentic AI: Situating Agents in Their Environment

Many agent architectures leak information because they rely on global world models. Aspective Agentic AI proposes a bottom‑up framework where behaviours are triggered only by local environmental changes, and each agent perceives different subsets of the environment (called umwelt aspects). This reduces information leakage from 83 % to zero and improves both security and efficiency. It highlights the importance of situational awareness and privacy in agent design.

5. Reflections and Future Directions

Contemporary Agent Technology: LLM‑Driven Advancements vs Classic Multi‑Agent Systems

This reflective piece contrasts modern LLM‑driven agents with traditional multi‑agent systems. It notes that LLM agents excel at reasoning and language tasks, while classic systems offer proven frameworks for coordination and negotiation. The authors emphasise that integrating both paradigms will be key to building robust, reliable agents.

Agentic AI for Software: Thoughts from the Software Engineering Community

Software developers are increasingly turning to agents for tasks beyond code generation—testing, program repair, architecture exploration and more. This comment underscores the need to clarify developer intent, incorporate verification and validation and treat agents as team members. For more on AI in software engineering, see our article on explainability and AI tools for developers.

A Comprehensive Analysis of AI Incidents: What We Can Learn

As agents become more pervasive, incidents are inevitable. The incident analysis framework outlined earlier provides practical guidance for building safer systems. Lessons include the importance of maintaining detailed logs, documenting tool states and proactively sharing incident reports to prevent repeat failures.

Conclusion

The research from August and September 2025 paints a rich picture of AI agents maturing across dimensions:

  1. Frameworks and Taxonomies: Surveys and frameworks provide structure, enabling researchers to unify concepts and highlight safety and ethical considerations.
  2. Safety and Regulation: Legal and socio‑economic analyses remind us that alignment, fairness and accountability must evolve alongside technological advances.
  3. Efficiency and Scale: Tools like Efficient Agents and Auras show that performance gains and cost reductions can go hand‑in‑hand.
  4. Domain‑Specific Applications: From healthcare supply chains to judicial deliberations and cybersecurity, agents are being deployed in increasingly complex environments.
  5. Information Privacy and Locality: Aspective Agentic AI demonstrates that carefully designing how agents perceive their environment reduces information leakage and improves security.

Together, these papers signal that the age of autonomous agents has moved from concept to practice. As agents take on more responsibilities in our digital and physical worlds, integrating insights from efficiency, safety, regulation and domain‑specific research will be critical. Stay tuned to AryaXAI for continuous updates on the evolving landscape of AI agents, alignment and interpretability.

See how AryaXAI improves
ML Observability

Learn how to bring transparency & suitability to your AI Solutions, Explore relevant use cases for your team, and Get pricing information for XAI products.