Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 17 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,10 +83,24 @@ The entry point for the conjecture-prove loop is in [learning/bootstrap.py](boot
[learning] $ python bootstrap.py theory=groups
```

We use hydra for configuration -- the relevant file here is [config/bootstrap.yaml](config/bootstrap.yaml). This will run the loop in "sequential" mode, in a single process. There is a distributed mode, backed by a [https://docs.celeryq.dev/en/stable/](Celery queue), that you can use to leverage multiple CPUs/GPUs, either in the same or different machines (it doesn't matter, as long as they can connect to the queue). The setup is a bit manual: you must first spin up a Redis server, then run Celery worker processes backed by the Redis server, and finally run bootstrap.py with a DISTRIBUTED=1 environment variable:
We use hydra for configuration -- the relevant file here is [config/bootstrap.yaml](config/bootstrap.yaml). This will run the loop in "sequential" mode, in a single process. There is a distributed mode, backed by a [https://docs.celeryq.dev/en/stable/](Celery queue), that you can use to leverage multiple CPUs/GPUs, either in the same or different machines (it doesn't matter, as long as they can connect to the queue).

```sh
[learning] $ DISTRIBUTED=1 python bootstrap.py theory=groups
The setup is a bit manual:
1. Build the redis container
```
apptainer build redis.sif redis.def
```
1. Start the redis container
```
sh launch/start_redis.sh
```
2. Run Celery worker process
```
sh launch/start_worker.sh
```
3. Run bootstrap.py in distributed mode
```
sh launch/run_bootstrap_distributed.sh
```

Feel free to open an issue if you're interested in setting this up, and I can expand on the documentation. The details might get a little bit cluster-specific, though the general setup is just that you need (a) a Redis server, (b) a number of worker processes that connect to it, and (c) a teacher process that runs the bootstrapping loop, also connecting to the same Redis server.
2 changes: 1 addition & 1 deletion environment/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ smallvec = "1.10.0"

[dependencies.pyo3]
version = "0.20.2"
features = ["abi3-py38"]
features = ["extension-module"]
9 changes: 9 additions & 0 deletions launch/run_bootstrap_distributed.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash
# Read the Redis port
export REDIS=$(cat redis_hostname_port.txt)
export DISTRIBUTED=1

# Run the bootstrap script
# TODO(f.srambical): Make theory a command line argument
cd learning/
python bootstrap.py theory=groups
31 changes: 31 additions & 0 deletions launch/start_redis.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash

LOGLEVEL=""
while [[ $# -gt 0 ]]; do
case $1 in
-v|--verbose)
LOGLEVEL="--loglevel verbose"
shift
;;
esac
done

# Get a random available port or use a specific one assigned by your cluster
export REDIS_PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
echo "Starting Redis on port $REDIS_PORT"

# Save the port to a file for other processes to read
echo "$(hostname):$REDIS_PORT" > redis_hostname_port.txt

# Cleanup function to remove port file and kill Redis when the script exits
cleanup() {
rm -f redis_hostname_port.txt
pkill -f "redis-server --port $REDIS_PORT"
exit
}

# Set up trap to catch script termination
trap cleanup SIGINT SIGTERM

# Start Redis container in the foreground
apptainer run --env REDIS_PORT=$REDIS_PORT redis.sif redis-server --port $REDIS_PORT --protected-mode no --bind 0.0.0.0 $LOGLEVEL
7 changes: 7 additions & 0 deletions launch/start_worker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash
# Read the Redis port
export REDIS=$(cat redis_hostname_port.txt)

# Start a Celery worker
cd learning/
celery -A worker worker --concurrency=1 -n $REDIS
2 changes: 2 additions & 0 deletions redis.def
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Bootstrap: docker
From: redis:7.2