Skip to content

AEC Turn Environment - Heterogeneous agents #3215

@Moritz-Link

Description

@Moritz-Link

Description
I'm trying to train a turn-based AEC (Agent-Environment-Cycle) environment from PettingZoo with two heterogeneous agents where the second agent's observation includes the first agent's action. However, my agents are not learning, and I suspect it's related to how I'm handling the state transitions and advantage computation.
Environment Setup

Environment Type: PettingZoo AEC (turn-based)
Number of Agents: 2 (heterogeneous)
Agent Dependencies: Agent 2's observation includes Agent 1's action
Reward Structure: Sparse reward only given after both agents have acted
Framework: TorchRL with PPO

Problem
During sample collection with the TorchRL collector, each sample contains data for both agents, but only one agent is active at a time (tracked via agent_mask). I filter the samples from the collectors dict by the mask. The key issue is:
The environment state only updates after the second agent acts, which means:

Agent 1 acts: current_state = S0 → next_state = S0 (unchanged!)
Agent 2 acts: current_state = S0 → next_state = S1 (finally updated)

This creates problems for Agent 1's advantage computation since V(S0) = V(next_state) when current_state == next_state.

Is there an example or other information how to train in such a setting and what to mask or adapt in comparison to a normal ParallelEnv?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions