Segfault reported when running Primus on 8 MI250X GPUs

Hello there, 

I'm trying to run Primus through an Apptainer container on a node with 8 MI250X GPUs. Within the running container, I'm using `primus-cli direct`. The container I'm using is the `rocm/primus:v26.1` from Dockerhub. Running from within the Apptainer container is effectively the same as running directly on the node, it has access to all 8 GPUs and the network. 


Clone Primus first
```
git clone --recursive -b v0.7.0 https://github.com/AMD-AGI/Primus/
```

To build the container

```
apptainer build primusdockerhub.sif docker://docker.io/rocm/primus:v26.1
```

To run Primus
```
# start a shell with the container
apptainer shell primusdockerhub.sif

# by default this starts a container shell and also mount your current working directory on the host, so you should still see your current
# directory within the container shell when you do `ls`

# Now we're in the running container 
cd ./Primus # cd-ing into the Primus repository we had cloned earlier in the current directory
# running the qwen2.5 pretrain from the MI300X examples. There was no directory for MI250X.
./runner/primus-cli direct -- train pretrain   --config ./examples/megatron/configs/MI300X/qwen2.5_7B-BF16-pretrain.yaml

```

I've attached the output file from the `/runner/primus-cli direct` run. You can see at the very end of the output there are reports of SIGSEGV from the processes started by torchrun.

[interactiveoutput.txt](https://github.com/user-attachments/files/25369595/interactiveoutput.txt)

EDIT (2026-02-19): Updating some of the above instructions because it was incorrect. Need to recursively git clone, and apptainer `--bind` flags are not necesary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault reported when running Primus on 8 MI250X GPUs #555

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segfault reported when running Primus on 8 MI250X GPUs #555

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions