Skip to content

Possible data corruption for Fortran in the presence of MPI_IN_PLACE #46

@jgraciahlrs

Description

@jgraciahlrs

A range of MPI operations allow to reuse send buffers as receive buffers by setting the send buffer to a special constant MPI_IN_PLACE. With Fortran applications this can lead to data corruption if executed with mpiP.

The corruption can be demonstrated with this simple code:

PROGRAM sample_allreduce
  USE mpi
  IMPLICIT NONE

  INTEGER :: ierr
  INTEGER :: rank, rank_in_place
  INTEGER :: rank_sum

  CALL MPI_Init(ierr)
  CALL MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

  rank_in_place = rank

  PRINT *, 'Rank: ', rank
  CALL MPI_Allreduce(rank, rank_sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, ierr)                           
  CALL MPI_Allreduce(MPI_IN_PLACE, rank_in_place, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, ierr)              
  PRINT *, 'Sum: ', rank_sum, ' - ', rank_in_place

  CALL MPI_Finalize(ierr)
END PROGRAM sample_allreduce

Executing without mpiP instrumentation leads to the expected output

$ mpirun -np 3 ./a.out                                                           
 Rank:            0
 Sum:            3  -            3
 Rank:            1
 Sum:            3  -            3
 Rank:            2
 Sum:            3  -            3

while executing with mpiP corrupts the data as

$ mpirun -np 3 env LD_PRELOAD=$HLRS_MPIP_ROOT/lib/libmpiP.so ./a.out
mpiP: 
mpiP: mpiP V3.5.0 (Build Mar 16 2023/14:16:24)
mpiP: 
 Rank:            0
 Sum:            3  -            0
 Rank:            1
 Sum:            3  -            0
 Rank:            2
 Sum:            3  -            0
mpiP: 
mpiP: Storing mpiP output in [./a.out.3.1905013.1.mpiP].
mpiP: 

Note, that the second column (which used MPI_IN_PLACE) is "0" while it should be "3".

I guess that the underlying problem is missing or incorrect treatment of constants such as MPI_IN_PLACE in the transition from Fortran to C PMPI interfaces. A similar problem has been observed in other projects / tools using PMPI such as here. In fact, the code above is taken from that issue.

I have observed this behavior for mpiP v3.4.1 and v3.5 using GCC v10.2 with either OpenMPI v4.1.4 or HPE's MPI implementation MPT 2.26.

Also note, that the code runs correctly when replacing use mpi with use mpi_f08, at least for OpenMPI (but not for MPT).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions