From 1d256289b5cb12594c4b3575c1d734f292d9f19c Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 29 Jan 2019 13:49:36 -0600 Subject: [PATCH 01/17] Rewrite the interoperability annex --- content/backmatter.tex | 200 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 200 insertions(+) diff --git a/content/backmatter.tex b/content/backmatter.tex index dbd707baa..7d34e91c7 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -184,7 +184,207 @@ \chapter{Undefined Behavior in OpenSHMEM}\label{sec:undefined} \end{longtable} +\color{ForestGreen} +\chapter{Interoperability with other Programming Models}\label{sec:interoperability} + +OpenSHMEM routines may be used in conjunction with the routines of other +communication libraries or parallel languages in the same program. This section +describes the interoperability with other programming models including +clarification of undefined behaviors caused by mixed use of different models, +advice to \openshmem library users and developers that may improve the portability +and performance of hybrid programs, and the definition of an OpenSHMEM extension +API that queries the interoperability features provided by an \openshmem library. + + +\section{MPI Interoperability} + +\openshmem and MPI are two commonly used parallel programming models for distributed +memory systems. The user can choose to utilize both models in the same program +to efficiently and easily support various communication patterns. + +A vendor may implement the \openshmem and MPI libraries in different ways. For +instance, one may implement both \openshmem and MPI as standalone libraries +and each of them allocates and initializes fully isolated communication +resources. Consequently, an \openshmem call does not interfere with any MPI +communication in the same application. As the other common approach, however, +a vendor may also implement both \openshmem and MPI interfaces within the +same software system in order to share communication resource when possible. +In such a case, internal interference may occur. + +To improve interoperability and portability in \openshmem + MPI hybrid +programming, we clarify several aspects in the following subsections. + + +\subsection{Initialization} +To ensure that a hybrid program can be portably performed with different vendor +implementations, the \openshmem environment of the program must be initialized by +a call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread}, and be finalized by +a call to \FUNC{shmem\_finalize}; the MPI environment of the program must be initialized +by a call to \FUNC{MPI\_Init} or \FUNC{MPI\_Init\_thread}, and be finalized by a +call to \FUNC{MPI\_Finalize}. + +\apiimpnotes{ +Portable implementations of OpenSHMEM and MPI must ensure that the initialization +calls can be made in an arbitrary order within a program; the same rule also +applies to the finalization calls. A software runtime that utilizes shared +communication resource for \openshmem and MPI communication may maintain an +internal reference counter in order to ensure that the shared resource is +initialized only once, and no shared resource is released until the last +finalization call is made. +} + + +\subsection{Dynamic Process Creation and MPMD Programming} +\label{subsec:interoperability:mpmd} + +MPI defines the dynamic process model that allows creation of processes after +an MPI application has started, and provides the mechanism to establish communication +between the newly created processes and the existing MPI application. This model +can be useful when implementing a MPMD application by dynamically starting multiple +groups of processes, and each of these groups may launch a different executable +MPI program. The communication performed within a process group is identified by +an intracommunicator, and that performed between two process groups is identified +by an intercommunicator. The two types of communication do not interfere with +each other. + +Unlike MPI, \openshmem requires all PEs to collectively allocate and initialize +resources used by the \openshmem library before any other \openshmem routine may +be called. Thus, the dynamic process model is not supported in \openshmem. For +instance, the processes newly created by a call to \FUNC{MPI\_Comm\_spawn} cannot +join the existing \openshmem environment that was initialized by other existing +PEs. The \FUNC{shmem\_pe\_accessible} routine can be used in this scenario to +portably ensure that a remote PE is accessible via \openshmem communication. + + +\subsection{Thread Safety} +\label{subsec:interoperability:thread} +Both \openshmem and MPI define the interaction with user threads in a program +with routines that can be used for initializing and querying the thread +environment. In a hybrid program, the user can request different thread levels +at the initialization calls of \openshmem and MPI environments, however, the +returned support level provided by the \openshmem library might be different +from that returned in an \openshmem-only program. For instance, the former +initialization call in a hybrid program may initialize resource with the user +requested thread level but the supported level cannot be updated by the latter +initialization call, if the underlying software runtime of \openshmem and MPI +share the same internal communication resource. +The program should always check the \VAR{provided} thread level returned +at the corresponding initialization call to portably ensure thread support in each +communication environment. + + +\subsection{Mapping Process Identification Numbers} +\label{subsec:interoperability:id} + +Similar to the PE identifier in \openshmem, MPI defines rank as the +identification number of a process in a communicator. Both \openshmem PE +and MPI rank are unique integers assigned from zero to one less than the total +number of processes. In a hybrid program, one may observe that the \openshmem +PE and the MPI rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. +This feature, however, may be provided by only some of the \openshmem and MPI +implementations (e.g., if both environments share the same underlying process +manager), and is not portably guaranteed. A portable program should always +use the standard functions in each model, i.e., \FUNC{shmem\_my\_pe} in \openshmem +and \FUNC{MPI\_Comm\_rank} in MPI, to query the process identification numbers +in each communication environment and manage the mapping of identifiers in the +program when necessary. + + +\subsection{RMA Synchronization, Ordering and Atomicity} +\label{subsec:interoperability:rma} + +Both \openshmem and MPI define similar RMA and atomic operations with additional +semantics and synchronization routines to ensure the operations' ordering and +completion. A synchronization call in \openshmem, however, does not interfere +with the outstanding operations issued in the MPI environment. For instance, +the \FUNC{shmem\_quiet} function only ensures completion of \openshmem RMA, +AMO, and memory store operations. It does not force the completion +of any MPI outstanding operations. To ensure the completion of RMA operations +in MPI, the program should use an appropriate MPI synchronization routine in the +MPI context (e.g., using \FUNC{MPI\_Win\_flush\_all} to ensure remote completion +of all outstanding operations in the passive-target mode). Similarly, \openshmem +guarantees only the atomicity of concurrent AMO operations that operate on +symmetric data with the same datatype. Access to the same symmetric object with +MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in undefined +result. + +\apiimpnotes{ +In the implementations that share the same communication resources for \openshmem +and MPI, the memory or network synchronization internally issued for one +programming model may also effect the status of operations in the other model. +Although the user program must make necessary synchronization calls for both models +in order to ensure semantics correctness, a high performance implementation may +internally avoid the later synchronization made by the other model when no +subsequent operation is issued between these two synchronization calls. +} + +\subsection{Communication Progress} +\label{subsec:interoperability:progress} + +\openshmem promises the progression of communication both with and without +\openshmem calls and requires the software progress mechanism in implementation +(e.g., a progress thread) when the hardware does not provide asynchronous communication +capabilities. In MPI, however, a weak progress semantics is applied. That is, +an MPI communication call is only guaranteed to complete in finite time. For +instance, an MPI Put may be completed only when the remote process makes an MPI +call which internally triggers the progress of MPI, if the underlying hardware +does not support asynchronous communication. A portable hybrid program +should not assume that a call to the \openshmem library also makes progress for MPI, +and it may have to explicitly manage the asynchronous communication in MPI in +order to prevent any deadlock or performance degradation. + +\apiimpnotes{ +Implementations that provide both \openshmem and MPI interfaces should try +to ensure progress for both models when necessary and possible, for performance +reasons. For instance, a high-quality implementation may start making progress for +both \openshmem and MPI whenever possible, after the user program has called +\FUNC{shmem\_init} and \FUNC{MPI\_init} provided by the same system. +} + +To avoid unnecessary overhead and programming complexity in the user program, +the \openshmem implementation may provide an extended \openshmem routine that +allows the user program to query the progress support for the MPI environment. +We introduce the definition and semantics of this routine in +Section~\ref{subsec:interoperability:query}. + + +\section{Interoperability Query API} +\label{subsec:interoperability:query} + +Determines whether an interoperability feature is supported by the \openshmem +library implementation. + +\begin{apidefinition} + +\begin{Csynopsis} +int @\FuncDecl{shmemx\_query\_interoperability}@(int property); +\end{Csynopsis} + +\begin{apiarguments} + \apiargument{IN}{property}{The interoperability property queried by the user.} +\end{apiarguments} + +% compiling error ? +% \apidescription{ +\FUNC{shmemx\_query\_interoperability} is an extended \openshmem routine that queries +whether an interoperability property is supported by the \openshmem library. One of the +following property can be queried in an \openshmem program after finishing the +initialization call to \openshmem and that of the relevant programming models +being used in the program. An OpenSHMEM library implementation may extend the +available properties. +\begin{itemize} + \item \VAR{SHMEM\_PROGRESS\_MPI} Query whether the \openshmem + implementation makes progress for the MPI communication used in the user program. +\end{itemize} +% } + +\apireturnvalues{ + The return value is \CONST{1} if \VAR{property} is supported by the \openshmem library; + otherwise, it is \CONST{0}. +} +\end{apidefinition} +\color{black} \chapter{History of OpenSHMEM}\label{sec:openshmem_history} From 430157f6a74eec2e5d8b473dca1d02890d51206f Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 1 Apr 2019 17:08:09 -0500 Subject: [PATCH 02/17] Update dynamic process creation subsection --- content/backmatter.tex | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 7d34e91c7..fbe70eee0 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -238,22 +238,33 @@ \subsection{Dynamic Process Creation and MPMD Programming} \label{subsec:interoperability:mpmd} MPI defines the dynamic process model that allows creation of processes after -an MPI application has started, and provides the mechanism to establish communication -between the newly created processes and the existing MPI application. This model -can be useful when implementing a MPMD application by dynamically starting multiple -groups of processes, and each of these groups may launch a different executable -MPI program. The communication performed within a process group is identified by -an intracommunicator, and that performed between two process groups is identified -by an intercommunicator. The two types of communication do not interfere with -each other. - +an MPI application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}), +and provides the mechanism to establish communication +between the newly created processes and the existing MPI application (see +MPI standard version 3.1, Chapter 10). Unlike MPI, \openshmem requires all PEs to collectively allocate and initialize resources used by the \openshmem library before any other \openshmem routine may -be called. Thus, the dynamic process model is not supported in \openshmem. For -instance, the processes newly created by a call to \FUNC{MPI\_Comm\_spawn} cannot -join the existing \openshmem environment that was initialized by other existing -PEs. The \FUNC{shmem\_pe\_accessible} routine can be used in this scenario to -portably ensure that a remote PE is accessible via \openshmem communication. +be called. Hence, attention must be paid when using \openshmem together with the +MPI dynamic process routines. Specifically, we clarify the following three scenarios: + +\begin{enumerate} +\item After MPI initialization and before any PEs start \openshmem initialization, +it is implementation defined whether processes created by a call to MPI dynamic +process routine are able to join the call to \FUNC{shmem\_init} or +\FUNC{shmem\_init\_thread} and establish the same \openshmem environment together +with other existing PEs. + +\item After \openshmem initialization, a process newly created by +the MPI dynamic process routine cannot join the existing \openshmem environment +that was initialized by other existing PEs. The \FUNC{shmem\_pe\_accessible} routine +may be used in this scenario to portably ensure that a remote PE is accessible +via \openshmem communication. + +\item After \openshmem initialization, it is implementation defined whether +processes newly created by MPI dynamic process routine can make a call to +\FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and establish a separate +\openshmem environment. +\end{enumerate} \subsection{Thread Safety} From 1d38dc30d0d71d471008e2b4fb40fae944a709a0 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 1 Apr 2019 17:19:32 -0500 Subject: [PATCH 03/17] Typo fix and minor word adjustment --- content/backmatter.tex | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index fbe70eee0..e0bc354f3 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -322,7 +322,7 @@ \subsection{RMA Synchronization, Ordering and Atomicity} \apiimpnotes{ In the implementations that share the same communication resources for \openshmem and MPI, the memory or network synchronization internally issued for one -programming model may also effect the status of operations in the other model. +programming model may also affect the status of operations in the other model. Although the user program must make necessary synchronization calls for both models in order to ensure semantics correctness, a high performance implementation may internally avoid the later synchronization made by the other model when no @@ -347,7 +347,7 @@ \subsection{Communication Progress} \apiimpnotes{ Implementations that provide both \openshmem and MPI interfaces should try to ensure progress for both models when necessary and possible, for performance -reasons. For instance, a high-quality implementation may start making progress for +reasons. For instance, an implementation may start making progress for both \openshmem and MPI whenever possible, after the user program has called \FUNC{shmem\_init} and \FUNC{MPI\_init} provided by the same system. } From 8df294699d201397fe1c503a5283cad0f7ba222d Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 2 Apr 2019 13:53:52 -0500 Subject: [PATCH 04/17] Add more details in RMA semantics subsection --- content/backmatter.tex | 61 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 53 insertions(+), 8 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index e0bc354f3..82b5ee166 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -301,23 +301,68 @@ \subsection{Mapping Process Identification Numbers} program when necessary. -\subsection{RMA Synchronization, Ordering and Atomicity} +\subsection{RMA Memory Semantics, Completion, Ordering and Atomicity} \label{subsec:interoperability:rma} -Both \openshmem and MPI define similar RMA and atomic operations with additional -semantics and synchronization routines to ensure the operations' ordering and -completion. A synchronization call in \openshmem, however, does not interfere -with the outstanding operations issued in the MPI environment. For instance, +Both \openshmem and MPI define similar RMA and atomic operations for remote memory +access, however, each model defines different semantics for memory synchronization, +operation completion, ordering, and atomicity. +We clarify the semantics differences and interoperability of these two models +as below. + +\begin{itemize} + +\item Memory Semantics: MPI defines the concept of public and private copies +for each RMA window. Any remote RMA operation can access only the +public copy of that window, and memory load\slash store can access only the +private copy. MPI defines two memory models for memory +synchronization between the copies: RMA separate and RMA unified (see definition +in MPI standard version 3.1, Section 11.4), and requires additional RMA +synchronization call to ensure consistent view on memory in each memory model +(see requirement of RMA synchronization in MPI standard version 3.1, Section 11.7). +Unlike MPI, the memory model in \openshmem is implicit. +However, additional synchronization is still required to ensure consistent view +between remote memory access and memory load\slash store (e.g., \FUNC{shmem\_barrier}). + +To ensure portability, a hybrid program should always make appropriate \openshmem +and MPI synchronization calls for remote access in each environment respectively +in order to ensure any remote updates are visible to the target PE +and also become visible to other remote access operations. For instance, a program +can make a call to \FUNC{shmem\_barrier} on both local and target PEs after +a \FUNC{shmem\_put} operation in order to ensure the remote update is visible to +the target PE, and then make a call to \FUNC{MPI\_Win\_sync} on the target +PE before the data can be accessed by other PEs using MPI RMA operations. + +\item Completion: Unlike \openshmem RMA operations, all MPI RMA communication +operations including the atomic operations such as \FUNC{MPI\_Accumulate} are +nonblocking. Similar to \openshmem nonblocking RMA, the program should perform +additional MPI synchronization to ensure any local buffers involved in the outstanding +MPI RMA operations can be safely reused (see definition of MPI RMA synchronization +in MPI standard version 3.1, Section 11.5). +A synchronization call in \openshmem, however, does not interfere +with any outstanding operations issued in the MPI environment. For instance, the \FUNC{shmem\_quiet} function only ensures completion of \openshmem RMA, AMO, and memory store operations. It does not force the completion of any MPI outstanding operations. To ensure the completion of RMA operations in MPI, the program should use an appropriate MPI synchronization routine in the MPI context (e.g., using \FUNC{MPI\_Win\_flush\_all} to ensure remote completion -of all outstanding operations in the passive-target mode). Similarly, \openshmem +of all outstanding operations in the passive-target mode). + +\item Ordering: Unlike \openshmem ordering semantics, MPI does not ensure the +ordering of {\PUT} and {\GET} operations, however, it guarantees ordering between +MPI atomic operations from one process to the same (or overlapping) memory +locations at another process via the same window. A call to \FUNC{shmem\_fence} +forces neither ordering of any MPI operations, nor ordering between outstanding +MPI operations +and \openshmem operations. + +\item Atomicity: \openshmem guarantees only the atomicity of concurrent AMO operations that operate on symmetric data with the same datatype. Access to the same symmetric object with -MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in undefined -result. +MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in +undefined result. + +\end{itemize} \apiimpnotes{ In the implementations that share the same communication resources for \openshmem From eef40c40a089bcad9e1ea95e4d01db899f5ba89f Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 20 May 2019 16:46:33 -0500 Subject: [PATCH 05/17] Made a pass by English editor --- content/backmatter.tex | 64 ++++++++++++++++++++++-------------------- 1 file changed, 33 insertions(+), 31 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 82b5ee166..a1384f945 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -189,26 +189,26 @@ \chapter{Interoperability with other Programming Models}\label{sec:interoperabil OpenSHMEM routines may be used in conjunction with the routines of other communication libraries or parallel languages in the same program. This section -describes the interoperability with other programming models including +describes the interoperability with other programming models, including clarification of undefined behaviors caused by mixed use of different models, advice to \openshmem library users and developers that may improve the portability -and performance of hybrid programs, and the definition of an OpenSHMEM extension +and performance of hybrid programs, and definition of an OpenSHMEM extension API that queries the interoperability features provided by an \openshmem library. \section{MPI Interoperability} -\openshmem and MPI are two commonly used parallel programming models for distributed -memory systems. The user can choose to utilize both models in the same program +\openshmem and MPI are two commonly used parallel programming models for +distributed-memory systems. The user can choose to utilize both models in the same program to efficiently and easily support various communication patterns. A vendor may implement the \openshmem and MPI libraries in different ways. For -instance, one may implement both \openshmem and MPI as standalone libraries -and each of them allocates and initializes fully isolated communication +instance, one may implement both \openshmem and MPI as standalone libraries, +each of which allocates and initializes fully isolated communication resources. Consequently, an \openshmem call does not interfere with any MPI communication in the same application. As the other common approach, however, -a vendor may also implement both \openshmem and MPI interfaces within the -same software system in order to share communication resource when possible. +a vendor may implement both \openshmem and MPI interfaces within the +same software system in order to share a communication resource when possible. In such a case, internal interference may occur. To improve interoperability and portability in \openshmem + MPI hybrid @@ -218,18 +218,18 @@ \section{MPI Interoperability} \subsection{Initialization} To ensure that a hybrid program can be portably performed with different vendor implementations, the \openshmem environment of the program must be initialized by -a call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread}, and be finalized by +a call to \FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and be finalized by a call to \FUNC{shmem\_finalize}; the MPI environment of the program must be initialized -by a call to \FUNC{MPI\_Init} or \FUNC{MPI\_Init\_thread}, and be finalized by a +by a call to \FUNC{MPI\_Init} or \FUNC{MPI\_Init\_thread} and be finalized by a call to \FUNC{MPI\_Finalize}. \apiimpnotes{ Portable implementations of OpenSHMEM and MPI must ensure that the initialization calls can be made in an arbitrary order within a program; the same rule also -applies to the finalization calls. A software runtime that utilizes shared +applies to the finalization calls. A software runtime that utilizes a shared communication resource for \openshmem and MPI communication may maintain an internal reference counter in order to ensure that the shared resource is -initialized only once, and no shared resource is released until the last +initialized only once and thus no shared resource is released until the last finalization call is made. } @@ -237,9 +237,11 @@ \subsection{Initialization} \subsection{Dynamic Process Creation and MPMD Programming} \label{subsec:interoperability:mpmd} -MPI defines the dynamic process model that allows creation of processes after -an MPI application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}), -and provides the mechanism to establish communication +MPI defines a dynamic process model that allows creation of processes after +an MPI application has started (e.g., by calling \FUNC{MPI\_Comm\_spawn}) and +connection to independent processes (e.g., through \FUNC{MPI\_Comm\_accept} +and \FUNC{MPI\_Comm\_connect}) +and provides a mechanism to establish communication between the newly created processes and the existing MPI application (see MPI standard version 3.1, Chapter 10). Unlike MPI, \openshmem requires all PEs to collectively allocate and initialize @@ -272,12 +274,12 @@ \subsection{Thread Safety} Both \openshmem and MPI define the interaction with user threads in a program with routines that can be used for initializing and querying the thread environment. In a hybrid program, the user can request different thread levels -at the initialization calls of \openshmem and MPI environments, however, the +at the initialization calls of \openshmem and MPI environments; however, the returned support level provided by the \openshmem library might be different from that returned in an \openshmem-only program. For instance, the former -initialization call in a hybrid program may initialize resource with the user -requested thread level but the supported level cannot be updated by the latter -initialization call, if the underlying software runtime of \openshmem and MPI +initialization call in a hybrid program may initialize a resource with the +user-requested thread level, but the supported level cannot be updated by the latter +initialization call if the underlying software runtime of \openshmem and MPI share the same internal communication resource. The program should always check the \VAR{provided} thread level returned at the corresponding initialization call to portably ensure thread support in each @@ -290,18 +292,18 @@ \subsection{Mapping Process Identification Numbers} Similar to the PE identifier in \openshmem, MPI defines rank as the identification number of a process in a communicator. Both \openshmem PE and MPI rank are unique integers assigned from zero to one less than the total -number of processes. In a hybrid program, one may observe that the \openshmem +number of processes. In a hybrid program, the \openshmem PE and the MPI rank in \VAR{MPI\_COMM\_WORLD} of a process can be equal. This feature, however, may be provided by only some of the \openshmem and MPI implementations (e.g., if both environments share the same underlying process -manager), and is not portably guaranteed. A portable program should always -use the standard functions in each model, i.e., \FUNC{shmem\_my\_pe} in \openshmem +manager) and is not portably guaranteed. A portable program should always +use the standard functions in each model, namely, \FUNC{shmem\_my\_pe} in \openshmem and \FUNC{MPI\_Comm\_rank} in MPI, to query the process identification numbers in each communication environment and manage the mapping of identifiers in the program when necessary. -\subsection{RMA Memory Semantics, Completion, Ordering and Atomicity} +\subsection{RMA Memory Semantics, Completion, Ordering, and Atomicity} \label{subsec:interoperability:rma} Both \openshmem and MPI define similar RMA and atomic operations for remote memory @@ -341,7 +343,7 @@ \subsection{RMA Memory Semantics, Completion, Ordering and Atomicity} in MPI standard version 3.1, Section 11.5). A synchronization call in \openshmem, however, does not interfere with any outstanding operations issued in the MPI environment. For instance, -the \FUNC{shmem\_quiet} function only ensures completion of \openshmem RMA, +the \FUNC{shmem\_quiet} function ensures completion only of \openshmem RMA, AMO, and memory store operations. It does not force the completion of any MPI outstanding operations. To ensure the completion of RMA operations in MPI, the program should use an appropriate MPI synchronization routine in the @@ -357,9 +359,9 @@ \subsection{RMA Memory Semantics, Completion, Ordering and Atomicity} and \openshmem operations. \item Atomicity: \openshmem -guarantees only the atomicity of concurrent AMO operations that operate on +guarantees the atomicity only of concurrent AMO operations that operate on symmetric data with the same datatype. Access to the same symmetric object with -MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in +MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in an undefined result. \end{itemize} @@ -369,7 +371,7 @@ \subsection{RMA Memory Semantics, Completion, Ordering and Atomicity} and MPI, the memory or network synchronization internally issued for one programming model may also affect the status of operations in the other model. Although the user program must make necessary synchronization calls for both models -in order to ensure semantics correctness, a high performance implementation may +in order to ensure semantics correctness, a high-performance implementation may internally avoid the later synchronization made by the other model when no subsequent operation is issued between these two synchronization calls. } @@ -378,12 +380,12 @@ \subsection{Communication Progress} \label{subsec:interoperability:progress} \openshmem promises the progression of communication both with and without -\openshmem calls and requires the software progress mechanism in implementation +\openshmem calls and requires the software progress mechanism in the implementation (e.g., a progress thread) when the hardware does not provide asynchronous communication capabilities. In MPI, however, a weak progress semantics is applied. That is, -an MPI communication call is only guaranteed to complete in finite time. For +an MPI communication call is guaranteed only to complete in finite time. For instance, an MPI Put may be completed only when the remote process makes an MPI -call which internally triggers the progress of MPI, if the underlying hardware +call that internally triggers the progress of MPI, if the underlying hardware does not support asynchronous communication. A portable hybrid program should not assume that a call to the \openshmem library also makes progress for MPI, and it may have to explicitly manage the asynchronous communication in MPI in @@ -424,7 +426,7 @@ \section{Interoperability Query API} % \apidescription{ \FUNC{shmemx\_query\_interoperability} is an extended \openshmem routine that queries whether an interoperability property is supported by the \openshmem library. One of the -following property can be queried in an \openshmem program after finishing the +following properties can be queried in an \openshmem program after finishing the initialization call to \openshmem and that of the relevant programming models being used in the program. An OpenSHMEM library implementation may extend the available properties. From 1f8d66711134f4dc3863e1d0d1ff2decd4ba5397 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 9 Sep 2019 11:42:56 -0500 Subject: [PATCH 06/17] Fix function format --- content/backmatter.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index a1384f945..7583ae41d 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -384,7 +384,7 @@ \subsection{Communication Progress} (e.g., a progress thread) when the hardware does not provide asynchronous communication capabilities. In MPI, however, a weak progress semantics is applied. That is, an MPI communication call is guaranteed only to complete in finite time. For -instance, an MPI Put may be completed only when the remote process makes an MPI +instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an MPI call that internally triggers the progress of MPI, if the underlying hardware does not support asynchronous communication. A portable hybrid program should not assume that a call to the \openshmem library also makes progress for MPI, From c0245b93d00b9ba9d3d6c60e396312f045a725b1 Mon Sep 17 00:00:00 2001 From: Min Si Date: Mon, 9 Sep 2019 11:43:43 -0500 Subject: [PATCH 07/17] Change query API to shmem_ and move text into separate file --- content/backmatter.tex | 50 +++++------------------- content/shmem_query_interoperability.tex | 39 ++++++++++++++++++ 2 files changed, 49 insertions(+), 40 deletions(-) create mode 100644 content/shmem_query_interoperability.tex diff --git a/content/backmatter.tex b/content/backmatter.tex index 7583ae41d..50dc02db8 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -192,7 +192,7 @@ \chapter{Interoperability with other Programming Models}\label{sec:interoperabil describes the interoperability with other programming models, including clarification of undefined behaviors caused by mixed use of different models, advice to \openshmem library users and developers that may improve the portability -and performance of hybrid programs, and definition of an OpenSHMEM extension +and performance of hybrid programs, and definition of an OpenSHMEM API that queries the interoperability features provided by an \openshmem library. @@ -399,49 +399,19 @@ \subsection{Communication Progress} \FUNC{shmem\_init} and \FUNC{MPI\_init} provided by the same system. } -To avoid unnecessary overhead and programming complexity in the user program, -the \openshmem implementation may provide an extended \openshmem routine that -allows the user program to query the progress support for the MPI environment. -We introduce the definition and semantics of this routine in -Section~\ref{subsec:interoperability:query}. +\section{Query Interoperability} -\section{Interoperability Query API} -\label{subsec:interoperability:query} - -Determines whether an interoperability feature is supported by the \openshmem -library implementation. - -\begin{apidefinition} - -\begin{Csynopsis} -int @\FuncDecl{shmemx\_query\_interoperability}@(int property); -\end{Csynopsis} - -\begin{apiarguments} - \apiargument{IN}{property}{The interoperability property queried by the user.} -\end{apiarguments} +A hybrid user program can query the interoperability feature of an \openshmem +implementation in order to avoid unnecessary overhead and programming complexity. +For instance, the user program can eliminate manual progress polling for MPI +communication if the underlying software runtime guarantees the progression of +communication also for MPI even without explicit function calls. -% compiling error ? -% \apidescription{ -\FUNC{shmemx\_query\_interoperability} is an extended \openshmem routine that queries -whether an interoperability property is supported by the \openshmem library. One of the -following properties can be queried in an \openshmem program after finishing the -initialization call to \openshmem and that of the relevant programming models -being used in the program. An OpenSHMEM library implementation may extend the -available properties. - -\begin{itemize} - \item \VAR{SHMEM\_PROGRESS\_MPI} Query whether the \openshmem - implementation makes progress for the MPI communication used in the user program. -\end{itemize} -% } +\subsection{\textbf{SHMEM\_QUERY\_INTEROPERABILITY}} +\label{subsec:interoperability:query} +\input{content/shmem_query_interoperability} -\apireturnvalues{ - The return value is \CONST{1} if \VAR{property} is supported by the \openshmem library; - otherwise, it is \CONST{0}. -} -\end{apidefinition} \color{black} \chapter{History of OpenSHMEM}\label{sec:openshmem_history} diff --git a/content/shmem_query_interoperability.tex b/content/shmem_query_interoperability.tex new file mode 100644 index 000000000..8af1e26ca --- /dev/null +++ b/content/shmem_query_interoperability.tex @@ -0,0 +1,39 @@ +\apisummary{ + Determines whether an interoperability feature is supported by the \openshmem + library implementation. +} +\begin{apidefinition} + +\begin{Csynopsis} +int @\FuncDecl{shmem\_query\_interoperability}@(int property); +\end{Csynopsis} + +\begin{apiarguments} + \apiargument{IN}{property}{The interoperability property queried by the user.} +\end{apiarguments} + +% compiling error ? +% \apidescription{ +\FUNC{shmem\_query\_interoperability} queries whether an interoperability property +is supported by the \openshmem library. One of the following properties can be +queried in an \openshmem program after finishing the +initialization call to \openshmem and that of the relevant programming models +being used in the program. An \openshmem library implementation may extend the +available properties. + +\begin{itemize} +\item \VAR{SHMEM\_PROGRESS\_MPI} Query whether the \openshmem +implementation makes progress for the MPI communication used in the user program. +\end{itemize} +% } + +\apireturnvalues{ + The return value is \CONST{1} if \VAR{property} is supported by the \openshmem library; + otherwise, it is \CONST{0}. +} +\end{apidefinition} + +\apiimpnotes{ +Implementations that do not support interoperability with other programming models +may simply return \CONST{0} for the relevant interoperability query. +} From b79d7e3f63cf6eb3930923cdb577a96ef0711e8b Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 10 Sep 2019 06:37:30 -0500 Subject: [PATCH 08/17] Add example code for pe mapping --- content/backmatter.tex | 7 ++++++ example_code/hybrid_mpi_mapping_id.c | 36 ++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+) create mode 100644 example_code/hybrid_mpi_mapping_id.c diff --git a/content/backmatter.tex b/content/backmatter.tex index 50dc02db8..cd11249ed 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -302,6 +302,13 @@ \subsection{Mapping Process Identification Numbers} in each communication environment and manage the mapping of identifiers in the program when necessary. +\subsubsection{Example} +The following example demonstrates how to manage the mapping of process +identifiers in a hybrid \openshmem and MPI program. + +\lstinputlisting[language={C}, tabsize=2, + basicstyle=\ttfamily\footnotesize] + {example_code/hybrid_mpi_mapping_id.c} \subsection{RMA Memory Semantics, Completion, Ordering, and Atomicity} \label{subsec:interoperability:rma} diff --git a/example_code/hybrid_mpi_mapping_id.c b/example_code/hybrid_mpi_mapping_id.c new file mode 100644 index 000000000..9720ce94f --- /dev/null +++ b/example_code/hybrid_mpi_mapping_id.c @@ -0,0 +1,36 @@ +#include +#include +#include +#include + +int main(int argc, char *argv[]) +{ + static long pSync[SHMEM_COLLECT_SYNC_SIZE]; + for (int i = 0; i < SHMEM_COLLECT_SYNC_SIZE; i++) + pSync[i] = SHMEM_SYNC_VALUE; + + MPI_Init(&argc, &argv); + shmem_init(); + + int mype = shmem_my_pe(); + int npes = shmem_n_pes(); + + static int myrank; + MPI_Comm_rank(MPI_COMM_WORLD, &myrank); + + int *mpi_ranks = shmem_calloc(npes, sizeof(int)); + + shmem_barrier_all(); + shmem_collect32(mpi_ranks, &myrank, 1, 0, 0, npes, pSync); + + if (mype == 0) + for (int i = 0; i < npes; i++) + printf("PE %d's MPI rank is %d\n", i, mpi_ranks[i]); + + shmem_free(mpi_ranks); + + shmem_finalize(); + MPI_Finalize(); + + return 0; +} From 51b1b786639339e435bb013e3fff201e4d57cd98 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 10 Sep 2019 06:47:44 -0500 Subject: [PATCH 09/17] Minor text adjustment --- content/backmatter.tex | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index cd11249ed..616956502 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -212,7 +212,7 @@ \section{MPI Interoperability} In such a case, internal interference may occur. To improve interoperability and portability in \openshmem + MPI hybrid -programming, we clarify several aspects in the following subsections. +programming, we clarify the relevant semantics in the following subsections. \subsection{Initialization} @@ -282,8 +282,9 @@ \subsection{Thread Safety} initialization call if the underlying software runtime of \openshmem and MPI share the same internal communication resource. The program should always check the \VAR{provided} thread level returned -at the corresponding initialization call to portably ensure thread support in each -communication environment. +at the corresponding initialization call or query the level of thread support +after initialization to portably ensure thread support in each communication +environment. \subsection{Mapping Process Identification Numbers} @@ -303,8 +304,10 @@ \subsection{Mapping Process Identification Numbers} program when necessary. \subsubsection{Example} -The following example demonstrates how to manage the mapping of process -identifiers in a hybrid \openshmem and MPI program. +\label{subsubsec:interoperability:id:example} +The following example demonstrates how to manage the mapping between \openshmem +PE identifier and MPI ranks in \VAR{MPI\_COMM\_WORLD} in a hybrid \openshmem +and MPI program. \lstinputlisting[language={C}, tabsize=2, basicstyle=\ttfamily\footnotesize] From 499f7c13e8ed087c3c8590e3582a7b0310672f84 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 10 Sep 2019 10:51:01 -0500 Subject: [PATCH 10/17] Simplified version of dynamic process and rma sections --- content/backmatter.tex | 97 ++++++++---------------------------------- 1 file changed, 18 insertions(+), 79 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 616956502..4935bcec8 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -243,30 +243,13 @@ \subsection{Dynamic Process Creation and MPMD Programming} and \FUNC{MPI\_Comm\_connect}) and provides a mechanism to establish communication between the newly created processes and the existing MPI application (see -MPI standard version 3.1, Chapter 10). -Unlike MPI, \openshmem requires all PEs to collectively allocate and initialize +MPI standard version 3.1, Chapter 10). The dynamic process model can be used to +implement Multiple Program Multiple Data (MPMD) style program. +Unlike MPI, \openshmem follows the SPMD programming model. It starts +all processes at once and requires all PEs to collectively allocate and initialize resources used by the \openshmem library before any other \openshmem routine may -be called. Hence, attention must be paid when using \openshmem together with the -MPI dynamic process routines. Specifically, we clarify the following three scenarios: - -\begin{enumerate} -\item After MPI initialization and before any PEs start \openshmem initialization, -it is implementation defined whether processes created by a call to MPI dynamic -process routine are able to join the call to \FUNC{shmem\_init} or -\FUNC{shmem\_init\_thread} and establish the same \openshmem environment together -with other existing PEs. - -\item After \openshmem initialization, a process newly created by -the MPI dynamic process routine cannot join the existing \openshmem environment -that was initialized by other existing PEs. The \FUNC{shmem\_pe\_accessible} routine -may be used in this scenario to portably ensure that a remote PE is accessible -via \openshmem communication. - -\item After \openshmem initialization, it is implementation defined whether -processes newly created by MPI dynamic process routine can make a call to -\FUNC{shmem\_init} or \FUNC{shmem\_init\_thread} and establish a separate -\openshmem environment. -\end{enumerate} +be called. Hence, users should avoid using \openshmem and MPI dynamic process model +in the same program. \subsection{Thread Safety} @@ -313,68 +296,24 @@ \subsubsection{Example} basicstyle=\ttfamily\footnotesize] {example_code/hybrid_mpi_mapping_id.c} -\subsection{RMA Memory Semantics, Completion, Ordering, and Atomicity} +\subsection{RMA Programming Models} \label{subsec:interoperability:rma} Both \openshmem and MPI define similar RMA and atomic operations for remote memory -access, however, each model defines different semantics for memory synchronization, -operation completion, ordering, and atomicity. -We clarify the semantics differences and interoperability of these two models -as below. +access, however, each model defines different semantics and functions for memory +synchronization, operation completion, and ordering. To ensure semantics correctness +and portability, a hybrid program should always make appropriate \openshmem and MPI +synchronization calls for remote access in each environment respectively. -\begin{itemize} +\openshmem guarantees the atomicity only of concurrent \openshmem AMO operations +that operate on symmetric data with the same datatype. Access to the same symmetric +object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may +result in an undefined result. -\item Memory Semantics: MPI defines the concept of public and private copies -for each RMA window. Any remote RMA operation can access only the -public copy of that window, and memory load\slash store can access only the -private copy. MPI defines two memory models for memory -synchronization between the copies: RMA separate and RMA unified (see definition -in MPI standard version 3.1, Section 11.4), and requires additional RMA -synchronization call to ensure consistent view on memory in each memory model -(see requirement of RMA synchronization in MPI standard version 3.1, Section 11.7). -Unlike MPI, the memory model in \openshmem is implicit. -However, additional synchronization is still required to ensure consistent view -between remote memory access and memory load\slash store (e.g., \FUNC{shmem\_barrier}). - -To ensure portability, a hybrid program should always make appropriate \openshmem -and MPI synchronization calls for remote access in each environment respectively -in order to ensure any remote updates are visible to the target PE -and also become visible to other remote access operations. For instance, a program -can make a call to \FUNC{shmem\_barrier} on both local and target PEs after -a \FUNC{shmem\_put} operation in order to ensure the remote update is visible to -the target PE, and then make a call to \FUNC{MPI\_Win\_sync} on the target -PE before the data can be accessed by other PEs using MPI RMA operations. - -\item Completion: Unlike \openshmem RMA operations, all MPI RMA communication -operations including the atomic operations such as \FUNC{MPI\_Accumulate} are -nonblocking. Similar to \openshmem nonblocking RMA, the program should perform -additional MPI synchronization to ensure any local buffers involved in the outstanding -MPI RMA operations can be safely reused (see definition of MPI RMA synchronization -in MPI standard version 3.1, Section 11.5). -A synchronization call in \openshmem, however, does not interfere -with any outstanding operations issued in the MPI environment. For instance, -the \FUNC{shmem\_quiet} function ensures completion only of \openshmem RMA, -AMO, and memory store operations. It does not force the completion -of any MPI outstanding operations. To ensure the completion of RMA operations -in MPI, the program should use an appropriate MPI synchronization routine in the -MPI context (e.g., using \FUNC{MPI\_Win\_flush\_all} to ensure remote completion -of all outstanding operations in the passive-target mode). - -\item Ordering: Unlike \openshmem ordering semantics, MPI does not ensure the -ordering of {\PUT} and {\GET} operations, however, it guarantees ordering between -MPI atomic operations from one process to the same (or overlapping) memory -locations at another process via the same window. A call to \FUNC{shmem\_fence} -forces neither ordering of any MPI operations, nor ordering between outstanding -MPI operations -and \openshmem operations. - -\item Atomicity: \openshmem -guarantees the atomicity only of concurrent AMO operations that operate on -symmetric data with the same datatype. Access to the same symmetric object with -MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may result in an -undefined result. +Most RMA programs can be written using either \openshmem or MPI RMA. +It is recommended to choose only one of the RMA models in the same program, whenever +possible, for performance and code simplicity. -\end{itemize} \apiimpnotes{ In the implementations that share the same communication resources for \openshmem From 9a8fc48a841d91ee8a3ac1634182b3aa1432db71 Mon Sep 17 00:00:00 2001 From: Min Si Date: Thu, 12 Sep 2019 13:25:08 -0500 Subject: [PATCH 11/17] Do not mention interference in first paragraph --- content/backmatter.tex | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 4935bcec8..b6f4ed693 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -205,11 +205,10 @@ \section{MPI Interoperability} A vendor may implement the \openshmem and MPI libraries in different ways. For instance, one may implement both \openshmem and MPI as standalone libraries, each of which allocates and initializes fully isolated communication -resources. Consequently, an \openshmem call does not interfere with any MPI -communication in the same application. As the other common approach, however, +resources. +As the other common approach, however, a vendor may implement both \openshmem and MPI interfaces within the same software system in order to share a communication resource when possible. -In such a case, internal interference may occur. To improve interoperability and portability in \openshmem + MPI hybrid programming, we clarify the relevant semantics in the following subsections. From ce4a759b32e7bda3a456b42d90b30ad851332a46 Mon Sep 17 00:00:00 2001 From: Min Si Date: Thu, 12 Sep 2019 13:27:58 -0500 Subject: [PATCH 12/17] interop/mpmd: strong advice to not use dynamic process with shmem --- content/backmatter.tex | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index b6f4ed693..a088fcf38 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -242,12 +242,13 @@ \subsection{Dynamic Process Creation and MPMD Programming} and \FUNC{MPI\_Comm\_connect}) and provides a mechanism to establish communication between the newly created processes and the existing MPI application (see -MPI standard version 3.1, Chapter 10). The dynamic process model can be used to -implement Multiple Program Multiple Data (MPMD) style program. -Unlike MPI, \openshmem follows the SPMD programming model. It starts -all processes at once and requires all PEs to collectively allocate and initialize -resources used by the \openshmem library before any other \openshmem routine may -be called. Hence, users should avoid using \openshmem and MPI dynamic process model +MPI standard version 3.1, Chapter 10). +Unlike MPI, \openshmem starts all processes at once and requires all PEs to +collectively allocate and initialize resources (e.g., symmetric heap) used by +the \openshmem library before any other \openshmem routine may +be called. Communicating with a dynamically created process in the \openshmem +environment may result in undefined behavior. +Hence, users should not use \openshmem and MPI dynamic process model in the same program. From af1743c673b4223a8d084b83af428985327b9182 Mon Sep 17 00:00:00 2001 From: Min Si Date: Tue, 24 Sep 2019 23:35:37 -0400 Subject: [PATCH 13/17] interop/rma: simply ask user to avoid using both RMA models --- content/backmatter.tex | 27 ++++++--------------------- 1 file changed, 6 insertions(+), 21 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index a088fcf38..041abbcf3 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -300,31 +300,16 @@ \subsection{RMA Programming Models} \label{subsec:interoperability:rma} Both \openshmem and MPI define similar RMA and atomic operations for remote memory -access, however, each model defines different semantics and functions for memory -synchronization, operation completion, and ordering. To ensure semantics correctness -and portability, a hybrid program should always make appropriate \openshmem and MPI -synchronization calls for remote access in each environment respectively. - -\openshmem guarantees the atomicity only of concurrent \openshmem AMO operations +access, however, a portable program should not assume interoperability between these +two RMA models. +For instance, \openshmem guarantees the atomicity only of concurrent \openshmem AMO operations that operate on symmetric data with the same datatype. Access to the same symmetric object with MPI atomic operations, such as an \FUNC{MPI\_Fetch\_and\_op}, may -result in an undefined result. - -Most RMA programs can be written using either \openshmem or MPI RMA. -It is recommended to choose only one of the RMA models in the same program, whenever +result in an undefined result. Furthermore, +because most RMA programs can be written using either \openshmem or MPI RMA, +users should choose only one of the RMA models in the same program, whenever possible, for performance and code simplicity. - -\apiimpnotes{ -In the implementations that share the same communication resources for \openshmem -and MPI, the memory or network synchronization internally issued for one -programming model may also affect the status of operations in the other model. -Although the user program must make necessary synchronization calls for both models -in order to ensure semantics correctness, a high-performance implementation may -internally avoid the later synchronization made by the other model when no -subsequent operation is issued between these two synchronization calls. -} - \subsection{Communication Progress} \label{subsec:interoperability:progress} From 04dec987a414fc147dc1f5d766d5562555d61be4 Mon Sep 17 00:00:00 2001 From: Min Si Date: Wed, 25 Sep 2019 06:36:55 -0400 Subject: [PATCH 14/17] interop/progress: mention query api to connect paragraphs --- content/backmatter.tex | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 041abbcf3..2863354a8 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -321,8 +321,12 @@ \subsection{Communication Progress} instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an MPI call that internally triggers the progress of MPI, if the underlying hardware does not support asynchronous communication. A portable hybrid program -should not assume that a call to the \openshmem library also makes progress for MPI, -and it may have to explicitly manage the asynchronous communication in MPI in +should not assume that a call to the \openshmem library also makes progress for MPI. +A call to \FUNC{shmem\_query\_interoperability} (see definition in \ref{subsec:interoperability:query}) +can be used to check whether the implementation provides such a functionality. +If it is provided, then the library ensures progression of +both \openshmem and MPI communication; otherwise, it may have to explicitly +manage the asynchronous communication in MPI in order to prevent any deadlock or performance degradation. \apiimpnotes{ From 23f1537ef3f354234cec325f7dec3c6574fc2cbc Mon Sep 17 00:00:00 2001 From: Min Si Date: Wed, 25 Sep 2019 09:19:35 -0400 Subject: [PATCH 15/17] interop/threads: add restriction for mixed thread levels --- content/backmatter.tex | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 2863354a8..50664ed8b 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -256,7 +256,7 @@ \subsection{Thread Safety} \label{subsec:interoperability:thread} Both \openshmem and MPI define the interaction with user threads in a program with routines that can be used for initializing and querying the thread -environment. In a hybrid program, the user can request different thread levels +environment. In a hybrid program, the user may request different thread levels at the initialization calls of \openshmem and MPI environments; however, the returned support level provided by the \openshmem library might be different from that returned in an \openshmem-only program. For instance, the former @@ -269,6 +269,28 @@ \subsection{Thread Safety} after initialization to portably ensure thread support in each communication environment. +Both \openshmem and MPI define similar thread levels, namely, \VAR{THREAD\_SINGLE}, +\VAR{THREAD\_FUNNELED}, \VAR{THREAD\_SERIALIZED}, and \VAR{THREAD\_MULTIPLE}. +When requesting threading support in a hybrid program, however, +users should follow additional rules as described below. + +\begin{itemize} + \item The \VAR{THREAD\_SINGLE} thread level requires a single-threaded program. + Hence, users should not request \VAR{THREAD\_SINGLE} at the initialization + call of either \openshmem or MPI but request a different thread level at the + initialization call of the other model in the same program. + + \item The \VAR{THREAD\_FUNNELED} thread level allows only the main thread to + make communication calls. A hybrid program using the \VAR{THREAD\_FUNNELED} + thread level in both \openshmem and MPI should ensure the same main thread + is used in both communication environments. + + \item The \VAR{THREAD\_SERIALIZED} thread level requires the program to ensure + communication calls are not made concurrently by multiple threads. A hybrid + program should ensure serialized calls to both \openshmem and MPI libraries, + if the program uses \VAR{THREAD\_SERIALIZED} in one communication environment + and \VAR{THREAD\_SERIALIZED} or \VAR{THREAD\_FUNNELED} in the other one. +\end{itemize} \subsection{Mapping Process Identification Numbers} \label{subsec:interoperability:id} From aeaa635d98e8cd75b6c74bd43f046c807afc94b2 Mon Sep 17 00:00:00 2001 From: Min Si Date: Wed, 25 Sep 2019 10:46:26 -0400 Subject: [PATCH 16/17] interop/id: use sync_all instead of barrier_all in example --- example_code/hybrid_mpi_mapping_id.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/example_code/hybrid_mpi_mapping_id.c b/example_code/hybrid_mpi_mapping_id.c index 9720ce94f..c72168d6e 100644 --- a/example_code/hybrid_mpi_mapping_id.c +++ b/example_code/hybrid_mpi_mapping_id.c @@ -20,7 +20,7 @@ int main(int argc, char *argv[]) int *mpi_ranks = shmem_calloc(npes, sizeof(int)); - shmem_barrier_all(); + shmem_sync_all(); shmem_collect32(mpi_ranks, &myrank, 1, 0, 0, npes, pSync); if (mype == 0) From a3ebd2754cf74a2bd3d26d0c22abf0ec821d029e Mon Sep 17 00:00:00 2001 From: Min Si Date: Wed, 25 Sep 2019 11:00:20 -0400 Subject: [PATCH 17/17] interop/progress: minor text adjustment --- content/backmatter.tex | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/content/backmatter.tex b/content/backmatter.tex index 50664ed8b..988a10308 100644 --- a/content/backmatter.tex +++ b/content/backmatter.tex @@ -342,13 +342,13 @@ \subsection{Communication Progress} an MPI communication call is guaranteed only to complete in finite time. For instance, an \FUNC{MPI\_Put} may be completed only when the remote process makes an MPI call that internally triggers the progress of MPI, if the underlying hardware -does not support asynchronous communication. A portable hybrid program -should not assume that a call to the \openshmem library also makes progress for MPI. -A call to \FUNC{shmem\_query\_interoperability} (see definition in \ref{subsec:interoperability:query}) -can be used to check whether the implementation provides such a functionality. -If it is provided, then the library ensures progression of -both \openshmem and MPI communication; otherwise, it may have to explicitly -manage the asynchronous communication in MPI in +does not support asynchronous communication. A hybrid program +should not assume that the \openshmem library also makes progress for MPI. +A call to \FUNC{shmem\_query\_interoperability} with the \VAR{SHMEM\_PROGRESS\_MPI} +property (see definition in \ref{subsec:interoperability:query}) +can be used to portably check whether the implementation provides asynchronous +progression also for MPI. If it is not provided, the user program may have to +explicitly manage the asynchronous communication in MPI in order to prevent any deadlock or performance degradation. \apiimpnotes{