HumanCompatibleAI · bmielnicki · Oct 7, 2020 · Oct 13, 2020
diff --git a/.gitignore b/.gitignore
@@ -7,3 +7,5 @@ node_modules/
 **/master_agents/
 **/__pycache__/
 **/agents/
+**/trajectories/
+.env
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 <img src="./server/static/images/browser_view.png" >
 </p>
 
-A web application where humans can play Overcooked with trained AI agents.
+A web application where humans can play Overcooked with trained AI agents and replay trajectories.
 
 * [Installation](#installation)
 * [Usage](#usage)
@@ -21,7 +21,7 @@ Building the server image requires [Docker](https://docs.docker.com/get-docker/)
 
 The server can be deployed locally using the driver script included in the repo. To run the production server, use the command
 ```bash
-./up.sh production
+./up.sh --env production
 ```
 
 In order to build and run the development server, which includes a deterministic scheduler and helpful debugging logs, run
@@ -40,28 +40,37 @@ In order to kill the production server, run
 
 The Overcooked-Demo server relies on both the [overcooked-ai](https://github.com/HumanCompatibleAI/overcooked_ai) and [human-aware-rl](https://github.com/HumanCompatibleAI/human_aware_rl) repos. The former contains the game logic, the latter contains the rl training code required for managing agents. Both repos are automatically cloned and installed in the Docker builds.
 
-The branch of `overcooked_ai` and `human_aware_rl` imported in both the development and production servers can be specified by the `OVERCOOKED_BRANCH` and `HARL_BRANCH` environment variables, respectively. For example, to use the branch `foo` from `overcooked-ai` and branch `bar` from `human_aware_rl`, run
+The branch of `overcooked_ai` and `human_aware_rl` imported in both the development and production servers can be specified by the `--overcooked-branch` and `--harl-branch` parameters, respectively. For example, to use the branch `foo` from `overcooked-ai` and branch `bar` from `human_aware_rl`, run
 ```bash
-OVERCOOKED_BRANCH=foo HARL_BRANCH=bar ./up.sh
+./up.sh --overcooked-branch foo --harl-branch bar
 ```
 The default branch for both repos is currently `master`.
 
 ## Using Pre-trained Agents
 
-Overcooked-Demo can dynamically load pre-trained agents provided by the user. In order to use a pre-trained agent, a pickle file should be added to the `agents` directory. The final structure will look like `static/assets/agents/<agent_name>/agent.pickle`. Note, to use the pre-defined rllib loading routine, the agent directory name must start with 'rllib', and contain the appropriate rllib checkpoint, config, and metadata files. For more detailed info and instructions see the [RllibDummy_CrampedRoom](server/static/assets/agents/RllibDummy_CrampedRoom/) example agent.
+Overcooked-Demo can dynamically load pre-trained agents provided by the user. In order to use a pre-trained agent, a pickle file should be added to the `agents` directory. The final structure will look like `static/assets/agents/<agent_name>/agent.pickle`. You can also specify other agent directory by using `--agents-dir` parameter.
+Note, to use the pre-defined rllib loading routine, the agent directory name must start with 'rllib', and contain the appropriate rllib checkpoint, config, and metadata files. For more detailed info and instructions see the [RllibDummy_CrampedRoom](server/static/assets/agents/RllibDummy_CrampedRoom/) example agent.
 
 If a more complex or custom loading routing is necessary, one can subclass the `OvercookedGame` class and override the `get_policy` method, as done in [DummyOvercookedGame](server/game.py#L420). Make sure the subclass is properly imported [here](server/app.py#L5)
 
+## Saving trajectories
+Trajectories from games run in overcooked-demo can be saved. By using `--trajectories-dir` you can specify directory that will be used to save trajectories. By default trajectories are saved inside `static/assets/trajectories` directory.
+
+
+## Replying trajectories
+Trajectories from specified `--trajectories-dir` can be replayed in `http://localhost/replay`.
+
+
 ## Updating Overcooked_ai
-This repo was designed to be as flexible to changes in overcooked_ai as possible. To change the branch used, use the `OVERCOOKED_BRANCH` environment variable shown above.
+This repo was designed to be as flexible to changes in overcooked_ai as possible. To change the branch used, use the `--overcooked-branch` parameter shown above.
 
 Changes to the JSON state representation of the game will require updating the JS graphics. At the highest level, a graphics implementation must implement the functions `graphics_start`, called at the start of each game, `graphics_end`, called at the end of each game, and `drawState`, called at every timestep tick. See [dummy_graphcis.js](server/graphics/dummy_graphics.js) for a barebones example.
 
-The graphics file is dynamically loaded into the docker container and served to the client. Which file is loaded is determined by the `GRAPHICS` environment variable. For example, to server `dummy_graphics.js` one would run
+The graphics file is dynamically loaded into the docker container and served to the client. Which file is loaded is determined by the `--graphics` parameter. For example, to server `dummy_graphics.js` one would run
 ```bash
-GRAPHICS=dummy_graphics.js ./up.sh
+./up.sh --graphics dummy_graphics.js
 ```
-The default graphics file is currently `overcooked_graphics_v2.1.js`
+The default graphics file is currently `overcooked_graphics_v2.2.js`
 
 
 ## Configuration

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -2,6 +2,8 @@ version : '3.7'
 
 services:
     app:
+        env_file:
+        - .env
         build:
             context: ./server
             args:
@@ -13,4 +15,6 @@ services:
             FLASK_ENV: "${BUILD_ENV:-production}"
         ports:
             - "80:5000"
-
+        volumes:
+            - "${AGENTS_DIR:-./server/static/assets/agents}:/app/static/assets/agents"
+            - "${TRAJECTORIES_DIR:-./server/static/assets/trajectories}:/app/static/assets/trajectories"
diff --git a/server/Dockerfile b/server/Dockerfile
@@ -1,21 +1,18 @@
 FROM python:3.7-stretch
-
-ARG BUILD_ENV
-ARG OVERCOOKED_BRANCH
-ARG HARL_BRANCH
-ARG GRAPHICS
-
 WORKDIR /app
 
 # Install non-chai dependencies
 COPY ./requirements.txt ./requirements.txt
 RUN pip install -r requirements.txt
 
 # Install eventlet production server if production build
+ARG BUILD_ENV
 RUN if [ "$BUILD_ENV" = "production" ] ; then pip install eventlet ; fi
 
 # Clone chai code
+ARG OVERCOOKED_BRANCH
 RUN git clone https://github.com/HumanCompatibleAI/overcooked_ai.git --branch $OVERCOOKED_BRANCH --single-branch /overcooked_ai
+ARG HARL_BRANCH
 RUN git clone https://github.com/HumanCompatibleAI/human_aware_rl.git --branch $HARL_BRANCH --single-branch /human_aware_rl
 
 # Dummy data_dir so things don't break
@@ -31,6 +28,7 @@ RUN apt-get install -y libgl1-mesa-dev
 # Copy over remaining files
 COPY ./static ./static
 COPY ./*.py ./
+ARG GRAPHICS
 COPY ./graphics/$GRAPHICS ./static/js/graphics.js
 COPY ./config.json ./config.json
 

diff --git a/server/app.py b/server/app.py
@@ -13,7 +13,7 @@
 from flask_socketio import SocketIO, join_room, leave_room, emit
 from game import OvercookedGame, OvercookedTutorial, Game, OvercookedPsiturk
 import game
-
+from overcooked_ai_py.utils import load_from_json, cumulative_rewards_from_rew_list
 
 ### Thoughts -- where I'll log potential issues/ideas as they come up
 # Should make game driver code more error robust -- if overcooked randomlly errors we should catch it and report it to user
@@ -45,6 +45,8 @@
 # Path to where pre-trained agents will be stored on server
 AGENT_DIR = CONFIG['AGENT_DIR']
 
+TRAJECTORIES_DIR = CONFIG["TRAJECTORIES_DIR"]
+
 # Maximum number of games that can run concurrently. Contrained by available memory and CPU
 MAX_GAMES = CONFIG['MAX_GAMES']
 
@@ -91,7 +93,7 @@
     "psiturk" : OvercookedPsiturk
 }
 
-game._configure(MAX_GAME_LENGTH, AGENT_DIR)
+game._configure(MAX_GAME_LENGTH, AGENT_DIR, TRAJECTORIES_DIR)
 
 
 
@@ -326,6 +328,10 @@ def _ensure_consistent_state():
 def get_agent_names():
     return [d for d in os.listdir(AGENT_DIR) if os.path.isdir(os.path.join(AGENT_DIR, d))]
 
+def get_trajectories_names():
+    def remove_file_extension(name):
+        return name.split(".")[0]
+    return sorted([remove_file_extension(name) for name in os.listdir(TRAJECTORIES_DIR) if name.endswith(".json")])
 
 ######################
 # Application routes #
@@ -339,6 +345,10 @@ def index():
     agent_names = get_agent_names()
     return render_template('index.html', agent_names=agent_names, layouts=LAYOUTS)
 
+@app.route('/replay')
+def replay():
+    return render_template('replay.html', trajectories=get_trajectories_names())
+
 @app.route('/psiturk')
 def psiturk():
     uid = request.args.get("UID")
@@ -509,6 +519,21 @@ def on_disconnect():
 
 
 
+@socketio.on('trajectory_selected')
+def on_trajectory_selected(data):
+    traj_idx = int(data["trajectory_idx"] or 0)
+    trajectories = load_from_json(os.path.join(TRAJECTORIES_DIR, data["trajectory_file"]))
+    trajectory_states = trajectories["ep_states"][traj_idx]
+    trajectory_rewards = trajectories["ep_rewards"][traj_idx]
+    scores = cumulative_rewards_from_rew_list(trajectory_rewards)
+    states = [{"state":state, "time_left": time_left, "score": score} for state, score, time_left in zip(trajectory_states, scores, reversed(range(len(trajectory_states))))]
+    terrain = trajectories["mdp_params"][traj_idx]["terrain"]
+    start_info = {
+        "terrain": terrain,
+        "state": states[0],
+    }
+    socketio.emit("replay_trajectory",  {"start_info": start_info, "states": states, "max_trajectory_idx": len(trajectories["ep_states"])-1})
+
 
 # Exit handler for server
 def on_exit():

diff --git a/server/config.json b/server/config.json
@@ -4,6 +4,7 @@
     "MAX_GAMES" : 10,
     "MAX_GAME_LENGTH" : 120,
     "AGENT_DIR" : "./static/assets/agents",
+    "TRAJECTORIES_DIR": "./static/assets/trajectories",
     "MAX_FPS" : 30,
     "psiturk" : {
         "experimentParams" : {