Skip to content

Conversation

@Matthew-Whitlock
Copy link
Collaborator

Testing against these versions (and any future releases past v5.0.9) will hopefully resolve our CI issues. Dropping the repeat-after-timeouts functionality from the testing, since it will hopefully no longer be needed.

@Matthew-Whitlock
Copy link
Collaborator Author

We still got some test failures when running against the fixed ompi versions, but I was only ever able to reproduce when running in a docker container while oversubscribed. Running the tests normally or in an oversubscribed slurm allocation failed to reproduce even with dozens of attempts.

The processes in the docker containers were still getting to MPI_Finalize, though, and specifically were getting to the PMIx barrier at the end. So I set the tests to skip that barrier with the async_mpi_finalize parameter.

Since that's now working great, I set the tests to run 5x to help detect intermittent errors that might pop up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants