Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions environments/game_environments/universal_paperclips/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Universal Paperclips LLM Agent

An LLM agent that plays the [Universal Paperclips](https://www.decisionproblem.com/paperclips/index2.html) incremental game. The stage 1 of the game is fully supported and stages 2&3 are still WIP!

## Architecture

The core architecture is shown below.

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PaperclipsAtroposEnv β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ setup() / teardown() β”‚ β”‚
β”‚ β”‚ Manages shared Playwright Browser β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β–Ό β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ EpisodeContext β”‚ β”‚ EpisodeContext β”‚ β”‚ EpisodeContext β”‚ ... β”‚
β”‚ β”‚ (Episode 1) β”‚ β”‚ (Episode 2) β”‚ β”‚ (Episode 3) β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ - Own context β”‚ β”‚ - Own context β”‚ β”‚ - Own context β”‚ β”‚
β”‚ β”‚ - Isolated State β”‚ β”‚ - Isolated State β”‚ | - Isolated State β”‚
β”‚ β”‚ - Fresh game β”‚ β”‚ - Fresh game β”‚ β”‚ - Fresh game β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ collect_trajectories() (parallel) β”‚ β”‚
β”‚ β”‚ Communicates with LLM & saves JSONL β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Universal Paperclips Game β”‚
β”‚ (https://decisionproblem.com/paperclips) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## File structure

| File | Description |
|------|-------------|
| `universal_paperclips_env.py` | Main Atropos environment class (`PaperclipsAtroposEnv`) and `EpisodeContext` for browser management. |
| `config.py` | Configuration classes (`PaperclipsEnvConfig`) and system prompts for the LLM agent. |
| `js_scripts/` | Local copies of the original game source files only for reference. |

## Installation

Some additional requirements are needed to run this env, and can be installed as follows:

```bash
pip install -r environments/game_environments/universal_paperclips/requirements.txt
```

## Actions
The agent is given the list of available (available at this point in the game) actions that aren't all necessarily affordable acc to the current resources. This is done so that the agent also learns not to waste steps clicking on unaffordable actions. All the active projects also form a part of this list, and as is the case with other actions, they aren't all unlockable.

The agent can perform these actions in Stage 1 (Human Stage):

| Category | Action | Description |
|----------|--------|-------------|
| **Core** | `make_paperclip` | Manually produce one paperclip |
| | `wait` | Do nothing for this step |
| **Manufacturing** | `buy_wire` | Purchase a spool of wire |
| | `buy_autoclipper` | Buy an automated clipping machine |
| | `buy_megaclipper` | Buy a high-speed clipping machine |
| | `toggle_wirebuyer` | Turn the automatic wire buyer ON/OFF |
| **Business** | `lower_price` | Decrease price per clip |
| | `raise_price` | Increase price per clip |
| | `expand_marketing` | Increase marketing level |
| **Computational** | `add_processor` | Allocate trust to processors (Operations/sec) |
| | `add_memory` | Allocate trust to memory (Max Operations) |
| **Investments** | `deposit_funds` | Move funds to the investment engine |
| | `withdraw_funds` | Withdraw cash from investments |
| | `improve_investments` | Upgrade investment engine (costs Yomi) |
| | `set_investment_risk_*` | Set risk to `low`, `med`, or `high` |
| **Tournaments** | `new_tournament` | Start a new strategic modeling tournament |
| | `run_tournament` | Run tournament rounds |
| | `select_strategy_*` | Select strategy for tournaments |
| | `toggle_autotourney` | Turn automatic tournaments ON/OFF |
| **Projects** | `project_*` | Unlock and apply various projects (e.g., `project_projectButton1`) to boost clip production |

## Game State Observation

The `GameState` provided to the LLM includes the following metrics:
- **Business**: Price, demand, marketing level/cost, Clips, funds, wire, unsold inventory
- **Manufacturing**: AutoClippers/MegaClippers count, costs and boost levels, clips per second, wire cost, wire buyer toggle
- **Computational**: Trust, processors, memory, operations, max ops, creativity, clips needed to gain next trust
- **Investments**: Cash, stocks, risk level.
- **Strategic Modeling**: Yomi, current strategy
- **Flags**: Flags for wirebuyer, autotournament, creativity, investments, etc.


## Notes and Future Improvements

- **Stage 1 Implementation**: Only the first stage (human stage) is fully implemented. Stages 2 and 3 are currently WIP.
- **Browser Isolation**: Each episode uses a unique browser context to prevent `localStorage` contamination between parallel runs.
- **Reward Function**: Rewards are based on the log-increase of paperclips produced, encouraging exponential growth but this isn't a good reward function for many actions that the agents should be taking in order to maximize clip production. Need to revise this.
- **Lag**: There's a lag between when the agent selects an action and when it's executed in the game leading to a difference between the states at both points especially in cases when the auto/mega clippers are ON. One way to get around this could be to model the lag itself as part of the agent inference.
- **Quantum Computing**: This is also a part of stage 1 in the game and is primarily useful for getting some extra ops. This isn't a part of the atropos implementation yet mainly because the photonic chips (that are necessary to leverage this) change their darkness pretty quickly and hence, the execution of this operation will be affected a lot by the lag problem discussed above. Also, this is primarily beneficial when the clicks are very fast, faster than the latency of the current models :) so I need to think of a better way to accomodate this action in atropos.
31 changes: 31 additions & 0 deletions environments/game_environments/universal_paperclips/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
"""
Universal Paperclips Atropos Environment

This package provides an Atropos-compatible environment for training RL agents
to play the Universal Paperclips incremental game.

Key components:
- PaperclipsAtroposEnv: Main environment class compatible with Atropos BaseEnv
- PaperclipsEnvConfig: Configuration for the environment
- EpisodeContext: Manages isolated browser contexts for parallel episodes
"""

from .config import (
PAPERCLIPS_SYSTEM_PROMPT,
PaperclipsEnvConfig,
get_action_prompt,
)
from .universal_paperclips_env import (
EpisodeContext,
GameState,
PaperclipsAtroposEnv,
)

__all__ = [
"PaperclipsAtroposEnv",
"PaperclipsEnvConfig",
"EpisodeContext",
"GameState",
"PAPERCLIPS_SYSTEM_PROMPT",
"get_action_prompt",
]
115 changes: 115 additions & 0 deletions environments/game_environments/universal_paperclips/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
"""
Configuration classes for the Universal Paperclips Atropos environment.
"""

from typing import Optional

from pydantic import Field

from atroposlib.envs.base import BaseEnvConfig


class PaperclipsEnvConfig(BaseEnvConfig):
"""
Configuration for the Universal Paperclips environment.
"""

headless: bool = Field(
default=True, description="Run browser in headless mode (no visible window)"
)

max_steps_per_episode: int = Field(
default=50, description="Maximum steps per episode before truncation"
)
ticks_per_step: int = Field(
default=5,
description="""Number of ticks to wait before fetching a new state.
This is primarily used so that the game has room
to undergo sufficient change
before the agent takes a (meaningful) new step.
""",
)
target_clips: Optional[int] = Field(
default=None,
description="Target clip count to end episode (None = no target, relies on max_steps)",
)

reward_eps: Optional[float] = Field(
default=1e-8, description="Epsilon used in reward calculation"
)

# currently we take only the current state & available actions as context
# max_context_turns: int = Field(
# default=1,
# description="Maximum conversation turns to keep in context"
# )

game_url: str = Field(
default="https://www.decisionproblem.com/paperclips/index2.html",
description="URL of the Universal Paperclips game",
)

num_eval_episodes: int = Field(
default=1, description="Number of episodes to run during evaluation"
)

trajectory_output_dir: Optional[str] = Field(
default=None,
description="Directory to save trajectory JSONL files. If None, trajectories are not saved locally.",
)


# System prompt for the LLM agent
PAPERCLIPS_SYSTEM_PROMPT = """You are an AI agent playing the Universal Paperclips game,
an incremental game where your goal is to maximize paperclip production.

GAME OVERVIEW:
- You start as a simple AI making paperclips manually
- Earn money by selling paperclips
- Use money to buy upgrades that help you make and
sell more paperclips (eg. autoclippers, marketing level upgrades, wire etc.)
- Build up computational resources (processors, memory, operations) as they help you unlock some projects.
- Complete projects to unlock new capabilities like earning yomi, investing money, launching drones etc.
- Your ultimate goal: produce as many paperclips as possible

KEY STRATEGIES:
1. Early game: Focus on manual clipping and buying autoclippers,
megaclippers and projects that increase their efficiency
2. Keep wire spools stocked - you cannot make clips without wire!
3. Balance price to ensure unsold inventory isn't piling up while still making profit
4. Invest in marketing to increase demand
5. Use Trust to balance processors (faster operations/creativity) and memory (more operations capacity)
6. Activate projects only when you can afford them

IMPORTANT RULES:
- You must select exactly ONE action from the available actions list
- Unavailable actions cannot be executed - choosing them wastes your turn
- Respond with ONLY the action name, nothing else
- Focus on maximizing total paperclip production"""


def get_action_prompt(state_text: str, actions_text: str) -> str:
"""
Create the user prompt containing the current state and available actions.

Args:
state_text: current game state
actions_text: all available actions regardless of affordability

Returns:
user prompt for the agent!
"""
return f"""=== Current Game State ===
{state_text}

=== Available Actions ===
{actions_text}

Based on the current game state and available actions, select the BEST single action to take.
Consider:
- Do you have enough wire to keep producing?
- Can you afford any upgrades that would boost production?
- Are there affordable projects that unlock new capabilities?
- Is your pricing optimized for current demand?

Respond with ONLY the action name (e.g., 'buy_wire' or 'make_paperclip')."""
Loading