Modern Fortran implementation of a template method pattern with two hardware backend specialisations (pure CPU and CPU/GPU backends).
Given an array of numbers
with f(a) = sum(a + 1) and g(a) = max(a * 2).
Given an input array a, the algorithm is
- Compute
f(a) - Compute
g(a) - Compute
f(a) + g(a)
This algorithm only depends on the interface of f and g: their
argument and what they return. Conversely, it does not depend on the
actual implementation of f and g.
cd backends_example
FC=nvfortran cmake -S . -B build
The above builds three executables:
main_cpu: CPU-only version.main_gpu: Version withfandgimplemented as accelerated GPU kernels.main_hybrid: Execution of GPU kernels is enabled/disabled at runtime.
From the build directory:
$ make main_hybrid # Build once, run everywhere.
./main_hybrid
Executing on CPU only
184.000
$ ./main_hybrid --gpu
Executing CUDA kernels
184.00
The algorithm itself is defined once as a bound procedure doit to the
abstract type basetype (base.f90). This type is abstract because,
although doit is defined, f and g are not. This makes
instanciating a object of type basetype impossible. The point is that
we can now extend basetype with concrete types providing a
definition for both functions.
The basetype abstract type is extended by cputype (cpu/cpu.f90)
and gputype (gpu/gpu.f90). The former implements f and g
using standard Fortran to be executed on a CPU. The latter, gputype,
provides an implementation of f and g based on CUDA Fortran, using
kernel procedures to be executed on NVIDIA GPUs.
The input array memblock
(memblock.f90) type (memory block). The cpublock
(cpu/cpublock.f90) type holds an allocatable real array, whilst the
gpublock (gpu/gpublock) holds an allocatable real device array.
The current (main_hybrid.f90) implementation uses a pointer to the right type depending on the execution target:
case('gpu')
gpublk = gpublock(16); blk => gpublkAn preferable approach would be to rely on automatic (re)allocation of polymorphic entities
class(memblock), allocatable :: blk
case('gpu')
blk = gpublock(16) ! Automatic allocation of the dynamic typeUnfortunately this is not supported by the NVIDIA Fortran compiler
(nvfortran 22.5).
