-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Epic: Implement a Self-Configuring Provider Generation System
Labels: epic, research, architecture, agent-autonomy, llm
Opened: 2025-07-14
1. Overview & Motivation
The ARLA platform has successfully implemented a decoupled provider pattern, which separates the core agent-engine from simulation-specific logic. This is a major architectural strength. However, it still requires a human developer to write the concrete provider implementations (e.g., SoulSimRewardCalculator) for each new simulation environment.
This epic proposes the next major evolution of ARLA's architecture: to make the agents themselves responsible for creating their own providers on the fly.
The vision is to move from a "pluggable" system to a "self-configuring" one. When an agent enters a new, unseen environment, it should be able to introspect its own components and the structure of the world, reason about the "physics" of its new reality, and dynamically generate the provider code it needs to function and learn. This represents a significant step towards truly general and autonomous agents, shifting the developer's role from writing world-specific logic to designing the agent's meta-learning and self-configuration capabilities.
2. Architectural Vision: The "Provider Generation" Meta-System
The core of this feature will be a new, high-level cognitive system that runs once for each agent at the beginning of a simulation.
New System: ProviderGenerationSystem (agent-engine/systems/)
- Purpose: To inspect the agent's components and the environment, and then use the LLM via the
CognitiveScaffoldto write and load the necessary provider classes (RewardCalculator,StateEncoder, etc.) at runtime.
High-Level Workflow:
-
Introspection: At the start of a simulation, the
ProviderGenerationSystemfor a given agent will perform a detailed scan of:- Its own components (e.g.,
PortfolioComponent,HealthComponent). - The attributes of those components (e.g.,
cash_balance: float). - The components of other entities it can observe in the environment.
- Its own components (e.g.,
-
Meta-Prompt Construction: The system will then construct a detailed, structured prompt to be sent to the LLM. This prompt is critical and will contain:
- The Goal: A high-level objective for the agent (e.g., "My primary objective is to maximize my
TimeBudgetComponent.current_time_budget."). - The Interfaces: The full source code of the provider interfaces it needs to implement (e.g.,
RewardCalculatorInterface,StateEncoderInterface). This gives the LLM a clear "contract" to fulfill. - The "Physics" (World Schema): A summary of the components and attributes it discovered during introspection. For example: "I have a component called
PortfolioComponentwith an attributetotal_value: float. My goal is to maximize survival, which is tied to myTimeBudgetComponent. Therefore, a good reward signal would likely be a positive change in myPortfolioComponent.total_value." - The Task: A clear instruction to the LLM: "You are an expert AI research programmer. Your task is to write the complete Python code for a set of provider classes that correctly implement the given interfaces. The logic in these classes should be designed to help me achieve my primary objective given the world schema. Output only the raw, executable Python code."
- The Goal: A high-level objective for the agent (e.g., "My primary objective is to maximize my
-
Dynamic Code Generation and Injection:
- The
ProviderGenerationSystemsends the prompt through theCognitiveScaffold. - It receives a string containing the complete Python code for the new provider classes.
- It uses a function like
exec()to execute this code in a restricted namespace, dynamically defining the new classes (e.g.,AutoGeneratedRewardCalculator) in the agent's runtime memory. - It then instantiates these newly-defined classes and injects them into the other core systems (
QLearningSystem,ActionSystem, etc.), which are waiting for their dependencies.
- The
3. Implementation Plan
This is a major feature and should be rolled out in phases.
Phase 1: Core System and Dynamic Execution
- Create the
ProviderGenerationSystemin theagent-engine. - Implement the introspection logic to scan an agent's components.
- Implement the meta-prompt construction logic.
- Implement a safe wrapper around
exec()to execute the LLM's code and retrieve the newly defined classes.
Phase 2: Dependency Injection Mechanism
- Modify the
SystemManagerto support late-stage dependency injection. The manager needs to be able to hold off on fully initializing systems likeQLearningSystemuntil after theProviderGenerationSystemhas run and created the necessary providers. - The
ProviderGenerationSystemwill need a way to pass the newly created provider instances to theSystemManager, which will then complete the initialization of the other systems.
Phase 3: Simplified Simulation Entry Point
- Refactor the
simulations/soul_sim/run.pyfile to remove the manual instantiation of all theSoulSim...providers. - Instead, the
run.pyfile will now only need to register theProviderGenerationSystemalongside the other core systems.
Phase 4: Safety, Validation, and Testing
- Implement guardrails for the
exec()call to minimize security risks. The executed code should be in a restricted environment with no file system or network access. - Before injecting a generated provider, the
ProviderGenerationSystemshould validate it usingisinstance()to ensure it correctly implements the required interface. - Write unit tests for the
ProviderGenerationSystemitself, likely using a mock LLM that returns a pre-written, valid provider implementation as a string.
4. Definition of Done
- A new
ProviderGenerationSystemexists in theagent-engine. - A simulation can be successfully launched by only registering this meta-system (and the other core systems), without any simulation-specific providers being manually instantiated in
run.py. - An agent, when placed in the existing
soul-simenvironment, can dynamically generate a functional set of providers that allow it to learn and act in the world. - The process is robust, with clear error handling for when the LLM produces invalid or non-compliant code.
- The new architecture is documented, explaining the workflow and the new, simplified process for creating new simulation environments.