AEC Turn Environment - Heterogeneous agents

**Description**
I'm trying to train a turn-based AEC (Agent-Environment-Cycle) environment from PettingZoo with two heterogeneous agents where the second agent's observation includes the first agent's action. However, my agents are not learning, and I suspect it's related to how I'm handling the state transitions and advantage computation.
Environment Setup

**Environment Type: PettingZoo AEC (turn-based)**
Number of Agents: 2 (heterogeneous)
Agent Dependencies: Agent 2's observation includes Agent 1's action
Reward Structure: Sparse reward only given after both agents have acted
Framework: TorchRL with PPO

**Problem**
During sample collection with the TorchRL collector, each sample contains data for both agents, but only one agent is active at a time (tracked via agent_mask). I filter the samples from the collectors dict by the mask. The key issue is:
The environment state only updates after the second agent acts, which means:

Agent 1 acts: current_state = S0 → next_state = S0 (unchanged!)
Agent 2 acts: current_state = S0 → next_state = S1 (finally updated)

This creates problems for Agent 1's advantage computation since V(S0) = V(next_state) when current_state == next_state.

Is there an example or other information how to train in such a setting and what to mask or adapt in comparison to a normal ParallelEnv?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

AEC Turn Environment - Heterogeneous agents #3215

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

AEC Turn Environment - Heterogeneous agents #3215

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions