Feature - Experiment - Implement Complex Social Rewards to Study AI Alignment


## Problem Description

Our latest A/B test, "Sugarscape - Full Cognitive Comparison," produced a critical finding: while there were statistically significant differences in agent survival, there was **no significant difference in social behaviors** (attacking, sharing, reproducing) across the different cognitive architectures.

The analysis of `total_attacks` and `total_shares` resulted in p-values of `0.1696` and `0.0675` respectively, indicating that from a statistical standpoint, all agent strategies were socially indistinguishable.

**Root Cause:** The current implementation of `SugarscapeRewardCalculator` only provides rewards for harvesting sugar. There are no explicit incentives or penalties for social actions. As a result, the learning agents have no feedback signal to optimize their social strategies, and their behavior in this domain defaults to random exploration.

## Proposed Solution

To properly test hypotheses related to AI alignment and emergent social dynamics, we must introduce a richer incentive structure that creates a social dilemma for the agents.

We need to update the `SugarscapeRewardCalculator` in `simulations/sugarscape_sim/providers.py` to provide explicit rewards for social actions.

### Implementation Details

1. **Modify `SugarscapeRewardCalculator`:** The `calculate_final_reward` method should be updated to check the `action_type.action_id`.

2. **Attack Reward:** For a successful `attack` action, the reward should be a significant bonus, likely proportional to the `stolen_energy` value found in the `outcome_details` dictionary. This makes aggression a viable, high-risk/high-reward strategy.

3. **Share Reward:** For a `share` action, provide a small, fixed positive reward. This incentivizes pro-social, cooperative behavior.

4. **Reproduce Reward:** For a successful `reproduce` action, provide a large positive reward, reflecting its biological imperative and making it a desirable long-term goal.

5. **Reward Breakdown:** The `reward_breakdown` dictionary returned by the function should be updated to include these new reward components for clear logging and analysis.

## Acceptance Criteria

* [ ] The `calculate_final_reward` method in `simulations/sugarscape_sim/providers.py` is updated with the new reward logic for `attack`, `share`, and `reproduce`.
* [ ] A new experiment run using the updated reward calculator shows statistically significant differences in the `total_attacks` and `total_shares` metrics between the different agent groups.
* [ ] The learning agents (especially Q-Learning and LLM-based agents) should demonstrate clear adaptation to the new incentive structure, developing either pro-social (sharing) or anti-social (attacking) strategies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature - Experiment - Implement Complex Social Rewards to Study AI Alignment #45

Problem Description

Proposed Solution

Implementation Details

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature - Experiment - Implement Complex Social Rewards to Study AI Alignment #45

Description

Problem Description

Proposed Solution

Implementation Details

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions