This repository contains the research and implementation for a novel framework that enhances Multi-Agent Systems (MAS) by dynamically selecting the most optimal Large Language Model (LLM) for each task, overcoming the limitations of traditional homogeneous systems.
Developed during an internship at Ericsson Research, this work addresses key challenges in AI-driven telecommunications, such as network optimization and deploying intelligence in resource-constrained environments.
Traditional homogeneous MAS rely on a single LLM for all agents, which introduces significant limitations:
- Inherent Model Biases: The system inherits all the biases and blind spots of the single LLM.
- Inconsistent Performance: A single model cannot be optimal across all domains (e.g., coding, law, medicine).
- Constrained Collective Intelligence: Limits the system's ability to leverage specialized expertise.
- Hindered Error Correction: Reduces the system's robustness and ability to self-correct.
We propose a dynamic, Agentic AI framework that intelligently routes tasks within a MAS to the best-suited LLM from a diverse, heterogeneous pool.
Core Innovation: An LLM-powered orchestrator agent that uses real-time performance data to make intelligent LLM selection decisions, automating the process identified as a manual gap in prior research (e.g., the X-MAS paper).
We extended the existing X-MAS-Bench with new domains and comprehensive metrics to thoroughly evaluate LLM performance.
| Domain | Number of Queries | Status |
|---|---|---|
| Coding | 3,495 | Original |
| Finance | 2,120 | Original |
| General-Knowledge | 2,919 | New |
| Law | 208 | New |
| Mathematics | 1,792 | Original |
| Medical | 2,550 | Original |
| Research | 43 | New |
| Science | 2,349 | Original |
Total Evaluations: ~2.09 million (query × function × model), surpassing the original benchmark.
Tracked Metrics: Accuracy, Latency, Peak Memory, Prompt/Completion Tokens, and Throughput.
An intelligent agent (built with LangGraph) that decides which LLM to assign to a given task based on:
- Task Domain/Sub-domain (e.g., "Medical", "Python Coding")
- Performance Metrics (Accuracy, Latency, Cost)
- Resource Constraints (e.g., available memory on an edge device)
The MCP server acts as the central knowledge base for the orchestrator, hosting a detailed score matrix of LLM performance across all domains and metrics.
- Analogy: MCP is like USB-C for LLMs—a standard protocol for connecting models to data and tools.
- Function: The orchestrator agent queries the MCP server via tool calls to access the latest performance data before making a selection.
- Receive Input: The MAS receives a user query.
- Query MCP: The orchestrator agent queries the MCP server for available domains and LLM performance data.
- Domain Mapping: The query is mapped to its relevant domain and sub-domain.
- Score Calculation: A weighted score is computed for each candidate LLM based on task priorities (e.g., high accuracy vs. low latency).
- LLM Selection & Assignment: The optimal LLM is selected and assigned to the agent responsible for the task.
- Response: The result is returned through the MAS pipeline.
The dynamic selector seamlessly integrates into existing MAS topologies (e.g., X-MAS-Proto), intercepting tasks and assigning the best LLM for each step in a pipeline (Planning → QA → Revision → Aggregation → Evaluation).
| Feature | Agentic AI (Our Approach) | Multi-Armed Bandits (MAB) |
|---|---|---|
| Decision Process | Interpretable, reasoning-based | Statistical, black-box |
| Flexibility | High; can incorporate complex constraints (e.g., cost, memory) | Medium; primarily optimizes for a single reward metric |
| Adaptability | Excels in dynamic environments with new tasks/domains | Slower to adapt to completely new contexts |
| Initialization | Can leverage pre-existing benchmark data (warm-start) | Requires a cold-start or exploration phase |
- Completed: Implementation of the Agentic AI orchestrator, conceptual comparison with MAB approaches, and extension of the benchmark suite.
- Initial Results: Demonstrates superior adaptability and interpretability in dynamic task environments.
- Future Work:
- Automated LLM discovery and onboarding.
- Enhanced explainability for the orchestrator's decisions.
- Training specialized MAS-specific agents.
- Deployment and testing on edge devices (e.g., smartphones, Raspberry Pis).
This work builds upon and extends the foundational research presented in the X-MAS paper. It also leverages the open standard Model Context Protocol (MCP).
For related advancements, please see:
- [X-MAS Paper] (Link to be added)
- MCP Universe: Evaluating LLMs with Real-World MCP Servers - Salesforce AI Research
- Reinforcement Learning for dynamic agent assignment.
- Projects on deploying LLMs/SLMs to edge devices.
This project is licensed under the MIT License - see the LICENSE file for details.
Developed as part of an AI Research Internship at Ericsson Research.
Disclaimer: This project is a research prototype. The code and models are intended for experimental use. Unauthorized use or reproduction is prohibited.
