LFRic Examples#
These examples illustrate the functionality of PSyclone for the LFRic domain.
Example 1: Basic Operation#
Basic operation of PSyclone with an invoke() containing two
kernels, one user-supplied, the other a
Built-in. Code is generated both with and
without distributed-memory support. Also demonstrates the use of the
-d flag to specify where to search for user-supplied kernel code
(see The psyclone command section for more details).
Example 2: Applying Transformations#
A more complex example showing the use of PSyclone transformations to change the generated PSy-layer code. Provides examples of kernel-inlining and loop-fusion transformations.
Example 4: Multiple Built-ins, Named Invokes and Boundary Conditions#
Demonstrates the use of the special enforce_bc_kernel which
PSyclone recognises as a boundary-condition kernel.
Example 5: Stencils#
Example of kernels which require stencil information.
Example 6: Reductions#
Example of applying OpenMP to an InvokeSchedule containing kernels that perform reduction operations. Two scripts are provided, one of which demonstrates how to request that PSyclone generate code for a reproducible OpenMP reduction. (The default OpenMP reduction is not guaranteed to be reproducible from one run to the next on the same number of threads.)
Example 7: Column-Matrix Assembly Operators#
Example of kernels requiring Column-Matrix Assembly operators.
Example 8: Redundant Computation#
Example of the use of the redundant-computation and move transformations to eliminate and re-order halo exchanges.
Example 9: Writing to Discontinuous Fields#
Demonstrates the behaviour of PSyclone for kernels that read and write quantities on horizontally-discontinuous function spaces. In addition, this example demonstrates how to write a PSyclone transformation script that only colours loops over continuous spaces.
Example 10: Inter-grid Kernels#
Demonstrates the use of “inter-grid” kernels that prolong or restrict
fields (map between grids of different resolutions), as well as the
use of ANY_DISCONTINUOUS_SPACE function space metadata.
Example 11: Asynchronous Halo Exchanges#
Example of the use of transformations to introduce redundant computation, split synchronous halo exchanges into asynchronous exchanges (start and stop) and move the starts of those exchanges in order to overlap them with computation.
Example 12: Code Extraction#
Example of applying code extraction to Nodes in an Invoke Schedule:
> psyclone -nodm -s ./extract_nodes.py \
gw_mixed_schur_preconditioner_alg_mod.x90
or to a Kernel in an Invoke after applying transformations:
> psyclone -nodm -s ./extract_kernel_with_transformations.py \
gw_mixed_schur_preconditioner_alg_mod.x90
This example also contains a Python helper script find_kernel.py
which displays the names and Schedules of Invokes containing call(s)
to the specified Kernel:
> python find_kernel.py
Example 13 : Kernel Transformation#
Demonstrates how an LFRic kernel can be transformed. The example transformation makes Kernel values constant where appropriate. For example, the number of levels is usually passed into a kernel by argument but the transformation allows a particular value to be specified which the transformation then sets as a parameter in the kernel. Hard-coding values in a kernel helps the compiler to do a better job when optimising the code.
Example 14: OpenACC#
Example of adding OpenACC directives in the LFRic API.
A single transformation script (acc_parallel.py) is provided
which demonstrates how to add OpenACC Kernels and Enter Data
directives to the PSy-layer. It supports distributed memory being
switched on by placing an OpenACC Kernels directive around each
(parallelisable) loop, rather than having one for the whole invoke.
This approach avoids having halo exchanges within an OpenACC Parallel
region. The script also uses ACCRoutineTrans
to transform the one user-supplied kernel through
the addition of an !$acc routine directive. This ensures that the
compiler builds a version suitable for execution on the accelerator (GPU).
This script is used by the supplied Makefile. The invocation of PSyclone
within that Makefile also specifies the --profile invokes option so that
each invoke is enclosed within profiling calipers (by default the
‘template’ profiling library supplied with PSyclone is used at the link
stage). Compilation of the example using the NVIDIA compiler may be performed
by e.g.:
> F90=nvfortran F90FLAGS="-acc -Minfo=all" make compile
Launching the resulting binary with NV_ACC_NOTIFY set will show details
of the kernel launches and data transfers:
> NV_ACC_NOTIFY=3 ./example_openacc
...
Step 5 : chksm = 2.1098315506694516E-004
PreStart called for module 'main_psy' region 'invoke_2-setval_c-r2'
upload CUDA data file=PSyclone/examples/lfric/eg14/main_psy.f90 function=invoke_2 line=183 device=0 threadid=1 variable=.attach. bytes=144
upload CUDA data file=PSyclone/examples/lfric/eg14/main_psy.f90 function=invoke_2 line=183 device=0 threadid=1 variable=.attach. bytes=144
launch CUDA kernel file=PSyclone/examples/lfric/eg14/main_psy.f90 function=invoke_2 line=186 device=0 threadid=1 num_gangs=5 num_workers=1 vector_length=128 grid=5 block=128
PostEnd called for module 'main_psy' region 'invoke_2-setval_c-r2'
download CUDA data file=PSyclone/src/psyclone/tests/test_files/lfric/infrastructure//field/field_r64_mod.f90 function=log_minmax line=756 device=0 threadid=1 variable=self%data(:) bytes=4312
20230807214504.374+0100:INFO : Min/max minmax of field1 = 0.30084014E+00 0.17067212E+01
...
However, performance will be very poor as, with the limited optimisations and directives currently applied, the NVIDIA compiler refuses to run the user-supplied kernel in parallel.
Example 15: CPU Optimisation of Matvec#
Example of optimising the LFRic matvec kernel for CPUs. This is work in progress with the idea being that PSyclone transformations will be able to reproduce hand-optimised code.
There is one script which, when run:
> psyclone ./matvec_opt.py ../code/gw_mixed_schur_preconditioner_alg_mod.x90
will print out the modified matvec kernel code. At the moment no transformations are included (as they are work-in-progress) so the code that is output is the same as the original (but looks different as it has been translated to PSyIR and then output by the PSyIR Fortran back-end).
Example 16: Generating LFRic Code Using LFRic-specific PSyIR#
This example shows how LFRic-specific PSyIR can be used to create LFRic kernel code. There is one Python script provided which when run:
> python create.py
will print out generated LFRic kernel code. The script makes use of LFRic-specific data symbols to simplify code generation.
Example 17: Runnable Simplified Examples#
This directory contains three simplified LFRic examples that can be compiled and executed - of course, a suitable Fortran compiler is required. The examples are using a subset of the LFRic infrastructure library, which is contained in PSyclone and which has been slightly modified to make it easier to create stand-alone, non-MPI LFRic codes.
Example 17.1: A Simple Runnable Example#
The subdirectory full_example contains a very simple example code
that uses PSyclone to process two invokes. It uses unit-testing
code from various classes to create the required data structures like
initial grid etc. The code can be compiled with make compile, and
the binary executed with either make run or ./example.
Example 17.2: A Simple Runnable Example With NetCDF#
The subdirectory full_example_netcdf contains code very similar
to the previous example, but uses NetCDF to read the initial grid
from the NetCDF file mesh_BiP128x16-400x100.nc.
Installation of NetCDF is described in
the hands-on practicals documentation.
The code can be compiled with make compile, and
the binary executed with either make run or ./example.
Example 17.3: Kernel Data Extraction#
The example in the subdirectory full_example_extract shows the
use of kernel extraction. The code can be compiled with
make compile, and the binary executed with either make run or
./extract.binary. By default, it will be using
a stand-alone extraction library using a Fortran binary format
(see Extraction Libraries).
If you want to use the NetCDF version, set the environment variable
TYPE to be netcdf:
TYPE=netcdf make compile
This requires the installation of a NetCDF development environment
(see here
for installing NetCDF). The binary will be called extract.netcdf,
and the output files will have the .nc extension.
Similarly, you can use TYPE==ascii to use an ASCII output format.
Running the compiled binary will create two Fortran binary files (or two NetCDF files if the NetCDF library was used, or ASCII files if ASCII output was used). They contain the input and output parameters for the two invokes in this example:
cd full_example_extraction
TYPE=netcdf make compile
./extract.netcdf
ncdump ./main-update.nc | less
Example 18: Special Accesses of Continuous Fields - Incrementing After Reading and Writing Before (Potentially) Reading#
Example containing one kernel with a GH_READINC access and one
with a GH_WRITE access, both for continuous fields. A kernel with
GH_READINC access first reads the field data and then increments
the field data. This contrasts with a GH_INC access which simply
increments the field data. As an increment is effectively a read
followed by a write, it may not be clear why we need to distinguish
between these cases. The reason for distinguishing is that the
GH_INC access is able to remove a halo exchange (or at least
reduce its depth by one) in certain circumstances, whereas a
GH_READINC is not able to take advantage of this optimisation.
A kernel with a GH_WRITE access for a continuous field must guarantee to
write the same value to a given shared DoF, independent of which cell
is being updated. As described
in the Developer Guide, this means that annexed DoFs are computed
correctly without the need to iterate into the L1 halo and thus can
remove the need for halo exchanges on those fields that are read.
Example 19: Mixed Precision#
This example shows the use of the LFRic mixed-precision support to call a kernel with scalars, fields and operators of different precision.
Example 20: Algorithm Generation#
Illustration of the use of the psyclone-kern tool to create an
algorithm layer for a kernel. A makefile is provide that also
runs psyclone to create an executable program from the generated
algorithm layer and original kernel code. To see the generated
algorithm layer run:
cd eg20/
psyclone-kern -gen alg ../code/testkern_mod.F90