Misplaced Pages

Agent-oriented software engineering

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
(Redirected from Multiagent systems product lines) Software
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
This article is written like a personal reflection, personal essay, or argumentative essay that states a Misplaced Pages editor's personal feelings or presents an original argument about a topic. Please help improve it by rewriting it in an encyclopedic style. (December 2008) (Learn how and when to remove this message)
This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations. Please help improve this article by introducing more precise citations. (April 2009) (Learn how and when to remove this message)
(Learn how and when to remove this message)

Agent-oriented software engineering (AOSE) is a software engineering paradigm that arose to apply best practice in the development of complex Multi-Agent Systems (MAS) by focusing on the use of agents, and organizations (communities) of agents as the main abstractions. The field of Software Product Lines (SPL) covers all the software development lifecycle necessary to develop a family of products where the derivation of concrete products is made systematically and rapidly.

Commentary

With the advent of biologically inspired, pervasive, and autonomic computing, the advantages of, and necessity of, agent-based technologies and MASs has become obvious. Unfortunately, current AOSE methodologies are dedicated to developing single MASs. Clearly, many MASs will make use of significantly the same techniques, adaptations, and approaches. The field is thus ripe for exploiting the benefits of SPL: reduced costs, improved time-to-market, etc. and enhancing agent technology in such a way that it is more industrially applicable.

Multiagent Systems Product Lines (MAS-PL) is a research field devoted to combining the two approaches: applying the SPL philosophy for building a MAS. This will afford all of the advantages of SPLs and make MAS development more practical.

Benchmarks

Several benchmarks have been developed to evaluate the capabilities of AI coding agents and large language models in software engineering tasks. Here are some of the key benchmarks:

Agentic software engineering benchmarks
Benchmark Description
SWE-bench Assesses the ability of AI models to solve real-world software engineering issues sourced from GitHub repositories. The benchmark involves:
  • Providing agents with a code repository and issue description
  • Challenging them to generate a patch that resolves the described problem
  • Evaluating the generated patch against unit tests
ML-Agent-Bench Designed to evaluate AI agent performance on machine learning tasks
τ-Bench τ-Bench is a benchmark developed by Sierra AI to evaluate AI agent performance and reliability in real-world settings. It focuses on:
  • Testing agents on complex tasks with dynamic user and tool interactions
  • Assessing the ability to follow domain-specific policies
  • Measuring consistency and reliability at scale
WebArena Evaluates AI agents in a simulated web environment. The benchmark tasks include:
  • Navigating complex websites to complete user-driven tasks
  • Extracting relevant information from the web
  • Testing the adaptability of agents to diverse web-based challenges
AgentBench A benchmark designed to assess the capabilities of AI agents in handling multi-agent coordination tasks. The key areas of evaluation include:
  • Communication and cooperation between agents
  • Task efficiency and resource management
  • Adaptability in dynamic environments
MMLU-Redux An enhanced version of the MMLU benchmark, focusing on evaluating AI models across a broad range of academic subjects and domains. It measures:
  • Subject matter expertise across multiple disciplines
  • Ability to handle complex problem-solving tasks
  • Consistency in providing accurate answers across topics
McEval A coding benchmark designed to test AI models' ability to solve coding challenges. The benchmark evaluates:
  • Code correctness and efficiency
  • Ability to handle diverse programming languages
  • Performance across different coding paradigms and tasks
CS-Bench A specialized benchmark for evaluating AI performance in computer science-related tasks. The key focus areas include:
  • Algorithms and data structures
  • Computational complexity and optimization
  • Theoretical and applied computer science concepts
WildBench Tests AI models in understanding and reasoning about real-world wild environments. It emphasizes:
  • Handling noisy and unstructured data
  • Adapting to unpredictable changes in the environment
  • Performing well in multi-modal scenarios with real-world relevance
Test of Time A benchmark that focuses on evaluating AI models' ability to reason about temporal sequences and events over time. It assesses:
  • Understanding of temporal logic and sequence prediction
  • Ability to make decisions based on time-dependent data
  • Performance in tasks requiring long-term planning and foresight

Software engineering agent systems

There are several software engineering (SWE) agent systems in development. Here are some examples:

List of SWE Agent Systems
SWE Agent System Backend LLM
Salesforce Research DEIBASE-1 gpt4o
Cosine Genie Fine-tuned OpenAI GPT
CodeStory Aide gpt4o + Claude 3.5 Sonnet
AbenteAI MentatBot gpt4o
Salesforce Research DEIBASE-2 gpt4o
Salesforce Research DEI-Open gpt4o
Bytedance MarsCode gpt4o
Alibaba Lingma gpt-4-1106-preview
Factory Code Droid Anthropic + OpenAI
AutoCodeRover gpt4o
Amazon Q Developer (unknown)
CodeR gpt-4-1106-preview
MASAI (unknown)
SIMA gpt4o
Agentless gpt4o
Moatless Tools Claude 3.5 Sonnet
IBM Research Agent (unknown)
Aider gpt4o + Claude 3 Opus
OpenDevin + CodeAct gpt4o
AgileCoder (various)
ChatDev (unknown)
MetaGPT gpt4o

External links

  • Agent-Oriented Software Engineering: Reflections on Architectures, Methodologies, Languages, and Frameworks ISBN 978-3642544316

References


Stub icon

This software-engineering-related article is a stub. You can help Misplaced Pages by expanding it.

Categories:
Agent-oriented software engineering Add topic