A Gymnasium reinforcement learning environment for Trackmania United Forever (TMUF), powered by TMInterface and TMLoader. This package enables RL agents to interact with a running instance of Trackmania United Forever via socket communication.
The Linesight project is great and runs like butter if you want to train an agent out of the box with a cracked algorithm, but the code itself is pure spaghetti and almost unreadable unless you are 200h balls deep into it yourself. So if you want to try out a specific RL algorithm (Deep Q-Learning is a good start for Trackmania) yourself, you will be helplessly flailing around trying to adopt the code from that project. So, out of that need, this project is born, and as long as you undestand the Gym API, you will be able to do some delicious Reinforcement Learning. May your agent rise to sentience.
This project is still very minimal. The reward function is extremely simple: +1 reward for crossing a checkpoint, +10 reward for finishing the race, and -0.1 reward per second as a time penalty. Should be made more sophisticated in the future. For example, Linesight uses a reference trajectory which is pre-generated to track progress on a map, which gives the agent a meaningful reward at every time step.
- Full Gymnasium compatibility
- Real-time game state via TMInterface
- Control multiple game instances using
GameInstanceManager
- Clone the repository:
git clone https://github.com/JackOE3/tmufrl.git
cd tmufrl- Install in editable mode (recommended for development):
pip install -e .This allows you to modify the code and have changes reflected immediately.
Alternatively, for a standard install:
pip install .Requirements:
- Python >= 3.8
- Trackmania United Forever
- ModLoader
- TMInterface
- TMI Plugin:
Python_Link.as(put this inside yourTMInterface\Pluginsfolder)
Warning: Windows-only (due to
pywin32and game dependencies)
Before you can use the environment, set these two environment variables.
Set the path to your TMLoader executable:
TMLOADER_PATH=C:/Path/To/TMLoader.exe
Set the profile name to use with TMLoader:
TMLOADER_PROFILE_NAME=MyTMProfile
In TMInterface/config.txt, add:
set autologin 1
Replace
1with your desired profile number.
from tmufrl import GameInstanceManager
# Specify the port that TMInterface will use for this instance
manager = GameInstanceManager(tmi_port=8477)This will:
- Launch
TMLoader.exewith the specified profile - Wait for TMInterface to connect on the given port (for this you will need to activate the Python_Link plugin in-game)
from tmufrl import GameInstanceManager
from gymnasium.wrappers import GrayscaleObservation
# Specify the path to the map you want the agent to play
env = gym.make("Trackmania-v0", manager=manager, map_path="My Challenges/VeryCoolTrack")
# recommended to greyscale the images, just use this gym wrapper
env = GrayscaleObservation(env)# Reset environment
obs, info = env.reset()
done = False
total_reward = 0
while not done:
# Replace with your policy (e.g., random action)
action = env.action_space.sample()
# Step the environment
obs, reward, terminated, truncated, info = env.step(action)
total_reward += reward
done = terminated or truncated
if done:
print(f"Episode finished! Total reward: {total_reward}")
env.close()
manager.close_game() # Cleanly shut down the game instanceUse gymnasium.vector.AsyncVectorEnv to run multiple Trackmania instances in parallel.
This is optimal for on-policy algorithms like Proximal Policy Optimization (PPO) which make heavy use of vectorized environments. However, for off-policy algorithms (like Deep Q-Learning) this setup is suboptimal because the environment will be paused while the agent trains. For reference, the Linesight project uses a Learner-Worker architecture where the network is being trained while multiple agents are simultaneously driving around and collecting experience. It is an added complexity which breaks the vanilla RL loop laid out here by trying to squeeze out the maximum efficiency of parallel computing.
from functools import partial
import gymnasium as gym
from tmufrl.utils.misc import clear_tm_instances, launch_tm_instances
def make_env(manager):
env = gym.make("Trackmania-v0", manager=manager, map_path="My Challenges/VeryCoolTrack")
return env
# important to protect point entry here so the subprocesses wont execute this code
if __name__ == '__main__':
# Launch 2 parallel game instances
N_ENVS = 2
managers = launch_tm_instances(N_ENVS) # auto-assigns ports: 8477, 8478
env_fns = [partial(make_env, manager) for manager in managers]
envs = gym.vector.AsyncVectorEnv(env_fns)
obs, infos = envs.reset()
# Step all environments
actions = envs.action_space.sample() # shape: (N_ENVS,)
obs, rewards, terms, truncs, infos = envs.step(actions)
# Close all
envs.close()
clear_tm_instances()Each instance runs on a unique
tmi_port(8477, 8478, ...). UseAsyncVectorEnvfor non-blocking execution.
- Observation Space:
Box(0, 255, (120, 160, 3), uint8)(RGB image of size 120x160, the dimensions can be modified) - Action Space:
Discrete(12) - Actions:
| Index | Action |
|---|---|
| 0 | NO_OP |
| 1 | FORWARD |
| 2 | FORWARD_LEFT |
| 3 | FORWARD_RIGHT |
| 4 | LEFT |
| 5 | RIGHT |
| 6 | BRAKE |
| 7 | BRAKE_LEFT |
| 8 | BRAKE_RIGHT |
| 9 | DRIFT_LEFT |
| 10 | DRIFT_RIGHT |
| 11 | FORWARD_BRAKE |
env = gym.make(
"Trackmania-v0",
map_path="My Challenges/VeryCoolTrack" # TM map to load
manager=manager,
max_episode_steps=1000, # Max steps per episode
game_speed=1, # Game speed multiplier
game_ticks_per_step=5, # Game ticks per env.step()
image_dim=(120, 160), # (height, width) of observation
)Default parameters are shown here.
- Only tested on Windows
- Ensure TMInterface is running in the game
- Multiple environments require different
tmi_portvalues - Use
manager.close_game()to properly terminate the game process
- TMInterface Python client (
tminterface2.py): tminterface2.py by Linesight - TMI Plugin: Original Python_Link by Linesight