This guide provides a comprehensive reference for selecting and configuring the execution modes in the ACS2 project. It covers all command-line parameters for the unified entry point:
acs2.py(Unified CPU/GPU Framework)
It is important to distinguish between how the experiments are distributed (Mode) and where the actual math is performed (Device).
These flags control the architectural backend. They decide which strategy manages the experiment phases:
gpu: Uses Batched Execution. It takes alln_expexperiments and packs them into huge tensors. All experiments are updated simultaneously in a single loop. Highly efficient for large populations and simple logic.cpu_mp: Uses Parallel Execution. It starts multiple independent Python processes (one per CPU core). Each process runs one experiment at a time using standard object-oriented logic. Best for complex logic that is hard to vectorize.cpu_single: Uses the same CPU object-oriented logic in a single process. Best for debugging and deterministic step-by-step inspection.- Note:
acs2.pycan now run fully on one backend or use a hybrid handoff between backends.
This flag is specific to the GPU algorithm mode. Once you choose the gpu backend, device tells PyTorch which physical hardware to use for those tensor calculations:
cuda: Tensors are sent to your NVIDIA graphics card. This provides maximum speed.cpu: Tensors stay in RAM and are processed by your CPU's instruction sets (AVX/SSE).- Note:
mode: gpuwithdevice: cpuis often slower thanmode: cpu_mpbecause you get the overhead of tensors without the benefit of multi-core process parallelization.
- Note:
auto: Automatically detects if a CUDA-capable GPU is available.
| Backend Mode | Device Setting | Resulting Behavior |
|---|---|---|
gpu |
cuda |
Tensor-batched math on Graphics Card (Fastest for Discovery). |
gpu |
cpu |
Tensor-batched math on CPU (Testing GPU logic without a card). |
cpu_mp |
(Ignored) | Multi-core parallel processes (Fastest for CPU-based logic). |
cpu_single |
(Ignored) | One process, one core (Best for debugging). |
| Command Pattern | Meaning |
|---|---|
--explore_mode cpu_single --exploit_mode cpu_single |
Full single-core CPU run. |
--explore_mode cpu_mp --exploit_mode cpu_mp |
Full multiprocessing CPU run. |
--explore_mode gpu --exploit_mode gpu |
Full GPU run across all phases. |
--explore_mode gpu --exploit_mode cpu_mp |
Hybrid run: GPU explore, CPU multiprocessing exploit. |
--explore_mode <mode> |
If --exploit_mode is omitted, exploit1 and exploit2 default to cpu_single. |
For the system to function correctly, files and results must be organized as follows:
acs2_project/
├── acs2.py # Unified entry point
├── experiments/
│ └── configs/ # YAML configuration files (.yaml)
├── reports/ # Generated PNG dashboards
│ └── saved_states/ # Raw data (CSV, NPY, Metadata YAML)
├── src/ # Core logic
│ ├── hybrid_utils.py # Merging and translation logic
│ ├── universal_runner.py # Phase dispatcher and handoff
│ ├── models/ # ACS2 Agent implementations (CPU/GPU)
│ ├── config*.py # Configuration models
│ ├── experiment_runner*.py# Hardware-specific loop runners
│ ├── data_handler*.py # Saving and Loading mechanisms
│ └── visualization*.py # Plotting and dashboard logic
├── environment/ # Environment definitions (Mazes, Multiplexers)
└── tests/ # Verification tests
If not specified in a YAML config or via CLI, the following defaults are used:
| Parameter | Default Value | Description |
|---|---|---|
n_exp |
5 |
Number of averaged experiment runs. |
n_steps |
100 |
Max steps per episode. |
beta |
0.05 |
Global learning rate. |
gamma |
0.95 |
Discount factor. |
theta_i |
0.1 |
Inadequacy threshold. |
theta_r |
0.9 |
Reliability threshold. |
epsilon |
0.1 |
Global exploration rate. |
u_max |
1 |
Max attributes in covering. |
theta_ga |
100 |
GA invocation interval. |
mu |
0.3 |
Mutation probability. |
chi |
0.8 |
Crossover probability. |
theta_as |
50 |
Action set size limit. |
theta_exp |
20 |
Experience for subsumption. |
no_subsumption |
false |
Disable subsumption in CPU and GPU backends. |
alp_mark_only_incorrect |
true |
Restricted marking mode. |
metric_calculation_frequency |
1 |
Compute metrics every N episodes. |
max_population |
1024 |
GPU-only population limit. |
device |
auto |
Auto-select CUDA if available. |
Phase Defaults (Total 2500 Episodes):
| Phase | Episodes | Epsilon | Beta | ALP | GA | Decay |
|---|---|---|---|---|---|---|
| explore | 1500 | 0.8 | 0.05 | ON | ON | OFF |
| exploit1 | 500 | 0.2 | 0.05 | ON | OFF | OFF |
| exploit2 | 500 | 0.0 | 0.05 | OFF | OFF | OFF |
--explore_mode <cpu_single|cpu_mp|gpu>: Backend used for theexplorephase.--exploit_mode <cpu_single|cpu_mp|gpu>: Backend used forexploit1andexploit2. Default:cpu_single.--device <auto|cpu|cuda>: Physical device used by the GPU backend.
-
--config <path>: Path to YAML config (e.g.,experiments/configs/mux_11.yaml). -
--n_exp <int>: Number of averaged experiment runs. -
--n_steps <int>: Max steps per episode. (Set to1for Multiplexer/Parity). -
--beta <float>: Learning rate (0.0 - 1.0). -
--gamma <float>: Discount factor (0.0 - 1.0). -
--theta_i <float>: Inadequacy threshold for deletion. -
--theta_r <float>: Reliability threshold for metrics/knowledge. -
--epsilon <float>: General exploration probability. -
--u_max <int>: Max attributes in covering (smaller = more general). -
--steps_to_goal_threshold <int>: Optional threshold used by some evaluation/selection workflows. -
--metric_calculation_frequency <int>: Metric computation interval. -
--alp_mark_only_incorrect <bool>: ALP Marking Strategy. This flag is directly part of the Anticipatory Learning Process (ALP) and controls how environmental features are recorded in a classifier's "Mark" ($M$ ):-
true(Restricted Marking): Marks are added ONLY to classifiers that made an incorrect prediction. The agent only seeks to specialize a rule when it is proven wrong. This strongly favors generalization and keeps the population compact. -
false(Full Action Set Marking): Marks are added to EVERY classifier in the action set on every step, regardless of accuracy. This leads to rapid specialization and captures environmental nuances faster, but often results in a significantly larger, more specific population (linear growth).
Feature Restricted ( true)Full Action Set ( false)Population Size Compact (Consolidates into rules) Large (Captures specific states) Generalization High (Logical focus) Low (Contextual focus) Learning Speed Slower initial discovery Rapid initial specialization Recommended For Standard logic tasks (Mux, Parity) Complex/Noisy environments -
-
--no_subsumption [true|false]: Disable classifier subsumption in the active backend(s). This applies to CPU, GPU, and hybrid runs. The default isfalse, meaning standard subsumption is enabled.
--theta_ga <int>: Episodes between GA triggers.--mu <float>: Mutation probability.--chi <float>: Crossover probability.--theta_as <int>: Action set size limit (triggers deletion).--theta_exp <int>: Experience required for subsumption.--no_subsumption <bool>: Turns subsumption off entirely while keeping the rest of ALP and GA active.--max_population <int>: GPU-only hard cap on classifier population tensors.
--environment_type <str>: High-level environment family.--environment_name <str>: Specific registered environment name.
These are environment-dependent. Only a subset is meaningful for a given environment:
--rows <int>: Grid height for maze-like environments.--cols <int>: Grid width for maze-like environments.--start_pos <row col>: Start position as two integers.--goal_pos <row col>: Goal position as two integers.--obstacles <r1 c1 r2 c2 ...>: Flat list of obstacle coordinates; values are paired into(row, col)entries.--reset_to_random_start <bool>: Randomize the starting position between episodes when supported.--address_bits <int>: Number of address bits for multiplexer tasks.--problem_kind <str>: Problem type for boolean environments such as parity/carry.--input_bits <int>: Total input width for boolean environments.--left_bits <int>: Left operand size for split-input tasks.--right_bits <int>: Right operand size for split-input tasks.--sampling <str>: Sampling mode for generated environments.--env_id <str>: Gymnasium environment identifier when using Gym-based environments.--observation_encoding <str>: Observation preprocessing mode, typicallyautounless you need a specific encoding.--bins <int ...>: Optional discretization bins for continuous observations.--is_slippery <bool>: FrozenLake-style transition stochasticity flag.
--<phase>_episodes <int>: Phase length.--<phase>_epsilon <float>: Phase exploration rate.--<phase>_beta <float>: Phase learning rate.--<phase>_alp <bool>: Toggle ALP for this phase.--<phase>_ga <bool>: Toggle GA for this phase.--<phase>_decay <bool>: Toggle epsilon decay.
--save_dashboard_data: Export raw data toreports/saved_states/.--load_dashboard_data <timestamp>: Load previously saved data and generate plots without re-running training.--plot_all_dashboards: Generate the full report (ends in_all.png).--plot_steps: Show Steps-to-Goal chart.--plot_population: Show Micro/Macro population chart.--plot_knowledge: Show Knowledge and Generalization chart.--plot_reward_quality: Show Reward and Quality chart.--plot_policy_map: Show policy map (maze-like environments only).--plot_top_rules: Show the top-rules table.--plot_origin_distribution: Show relative origin distribution over time.--plot_origin_distribution_abs: Show Absolute Origin counts.--plot_creation_dist <key>: Show birth episodes for a rule type.
Plotting behavior:
- During a normal experiment run, if you do not provide any
--plot_*toggle,acs2.pygenerates the full dashboard by default. - During a
--load_dashboard_datarun, you can request one or more specific plots using the--plot_*flags.
The final dashboard displays several key performance indicators (KPIs):
- Steps to Goal: The number of steps the agent took to reach the goal state in an episode. Lower is better.
- Knowledge (%): The percentage of the environment's state-action space for which the agent has a reliable and correct predicting classifier.
-
Generalization (%): The ratio of wildcards (
#) to total attributes in the population. Higher indicates a more compact, generalized knowledge base. - Micro-population: The total number of classifiers in the population, including their numerosities (copies).
- Macro-population: The number of unique classifiers in the population.
-
Reliable (Rel): The average calculated exclusively for the subset of rules meeting the reliability threshold (
$q > \theta_r$ ). - Reward (R): The average internal reward prediction of the classifiers.
- Quality (Q): The average accuracy/reliability of the classifiers' predictions.
By default, acs2.py generates a comprehensive dashboard (_all.png) when no specific plot toggle is provided. To generate individual charts for specific metrics (for example, for inclusion in a paper), use --load_dashboard_data with one or more plot toggles:
- Find the timestamp of your run in
reports/saved_states/(e.g.,20260329_123456). - Run the command with the desired toggles:
# Example: Generate only the Knowledge and Population charts
.\.venv\Scripts\python.exe acs2.py --load_dashboard_data 20260329_123456 --plot_knowledge --plot_populationAvailable Toggles:
--plot_steps--plot_population--plot_knowledge--plot_reward_quality--plot_policy_map(Mazes only)--plot_top_rules--plot_origin_distribution--plot_origin_distribution_abs--plot_creation_dist <key>(Keys:alp_expected,covering,ga,alp_unexpected,alp_covering)
- Multiplexer Runs: Always use
n_steps: 1. - Linear Population Growth: If the population doesn't plateau, consider enabling Action Set Subsumption (standard ACS2).
- Knowledge Reporting: Ensure your classifiers reach
theta_r(0.9) to be counted as "knowing" the environment. - Default Workflow: If you set only
--explore_mode, the two exploit phases run incpu_single. - Hybrid Workflow: Use
acs2.py --explore_mode gpu --exploit_mode cpu_mpwhen you want fast tensorized discovery and CPU-based exploit evaluation. - Full GPU Workflow: Use
acs2.py --explore_mode gpu --exploit_mode gpu --device cudawhen you want every phase to stay on the GPU backend. - No-Subsumption Ablation: Use
acs2.py --no_subsumption ...when you want a direct CPU/GPU comparison without subsumption overhead.
The batch benchmark script supports the same subsumption toggle:
python run_maze_benchmarks.pypython run_maze_benchmarks.py --no_subsumption
When --no_subsumption is used, the output CSV includes a no_subsumption column so the benchmark setting is recorded explicitly.