Tutorial and Examples
Tutorial
PSyclone provides a hands-on tutorial. The easiest way to follow it is reading the Readme files in github. The tutorial is divided into two sections, a first section that introduces PSyclone and how to use it to transform generic Fortran code (this is the recommended starting point for everybody). And a second section about the LFRic DSL (this is only recommended for people interested in PSyKAL DSLs and LFRic in particular).
To do the proposed hands-on you will need a linux shell with Python installed and to download the hands-on directory with:
git clone --recursive git@github.com:stfc/PSyclone.git
cd PSyclone
# If psyclone isn't already installed you can use 'pip' in this folder to
# install a version that matches the downloaded tutorials
pip install .
cd tutorial/practicals
Examples
Various examples of the use of PSyclone are provided under the
examples
directory in the Git repository. If you have installed
PSyclone using pip
then the examples may be found in
share/psyclone/examples
in psyclone Installation location.
Running any of these examples requires that PSyclone be installed on
the host system, see Section Getting Going.
This section is intended to provide an overview of the various examples
so that a user can find one that is appropriate to them. For details of
what each example does and how to run each example please see the
README.md
files in the associated directories.
For the purposes of correctness checking, the whole suite of examples
may be executed using Gnu make
(this functionality is used by GitHub
Actions alongside the test suite). The default target is transform
which
just performs the PSyclone code transformation steps for each
example. For those examples that support it, the compile
target
also requests that the generated code be compiled. The notebook
target checks the various Jupyter notebooks using nbconvert
.
Note
As outlined in the Run section, if
working with the examples from a PSyclone installation, it is
advisable to copy the whole examples
directory to some
convenient location before running them. If you have copied the
examples
directory but still wish to use make
then you
will also have to set the PSYCLONE_CONFIG
environment variable
to the full path to the PSyclone configuration file, e.g.
PSYCLONE_CONFIG=/some/path/psyclone.cfg make
.
Compilation
Some of the examples support compilation (and some even execution of
a compiled binary). Please consult the README.md
to check which ones
can be compiled and executed.
As mentioned above, by default each example will execute the
transform
target, which performs the PSyclone code transformation
steps. In order to compile the sources, use the target compile
:
make compile
which will first perform the transformation steps before compiling any created Fortan source files. If the example also supports running a compiled and linked binary, use the target:
make run
This will first trigger compilation using the compile
target, and
then execute the program with any parameters that might be required
(check the corresponding README.md
document for details).
All Makefile
s support the variables F90
and F90FLAGS
to specify
the compiler and compilation flags to use. By default, the Gnu Fortran
compiler (gfortran
) is used, and the compilation flags will be set
to debugging. If you want to change the compiler or flags, just define
these as environment variables:
F90=ifort F90FLAGS="-g -check bounds" make compile
To clean all compiled files (and potential output files from a run), use:
make clean
This will clean up in the examples
directory. If you want to change compilers
or compiler flags, you should run make allclean
, see the section
about Dependencies for details.
Supported Compilers
All examples have been tested with the following compilers. Please let the developers know if you have problems using a compiler that has been tested or if you are working with a different compiler so it can be recorded in this table.
Compiler |
Version |
---|---|
Gnu Fortran |
9.3 |
Intel Fortran |
17, 21 |
NVIDIA Fortran |
23.5 |
Dependencies
Any required library that is included in PSyclone (typically the infrastructure libraries for the APIs, or PSyData wrapper libraries) will automatically be compiled with the same compiler and compilation flags as the examples.
Note
Once a dependent library is compiled, changing the compilation flags will not trigger a recompilation of this library. For example, if an example is first compiled with debug options, and later the same or a different example is compiled with optimisations, the dependent library will not automatically be recompiled!
All Makefile
s support an allclean
target, which will not only
clean the current directory, but also all libraries the current
example depends on.
Important
Using make allclean
is especially important if
the compiler is changed. Typically, one compiler cannot
read module information from a different compiler, and
then compilation will fail.
NetCDF
Some examples require NetCDF for compilation. Installation of NetCDF is described in detail in the hands-on practicals documentation.
PSyIR Examples
Examples may all be found in the examples/psyir
directory. Read the
README.md
file in this directory for full details.
Example 1: Constructing PSyIR and Generating Code
create.py
is a Python script that demonstrates the use of the various
create
methods to build a PSyIR tree from scratch.
Example 2: Creating PSyIR for Structure Types
create_structure_types.py
demonstrates the representation of
structure types (i.e. Fortran derived types or C structs) in the PSyIR.
GOcean Examples
Example 1: Loop transformations
Examples of applying various transformations (loop fusion, OpenMP, OpenMP Taskloop, OpenACC, OpenCL) to the semi-PSyKAl’d version of the Shallow benchmark. (“semi” because not all kernels are called from within invoke()’s.) Also includes an example of generating a DAG from an InvokeSchedule.
Example 2: OpenACC
This is a simple but complete example of using PSyclone to enable an
application to run on a GPU by adding OpenACC directives. A Makefile
is included which will use PSyclone to generate the PSy code and
transformed kernels and then compile the application. This compilation
requires that the dl_esm_inf library
be installed/available - it is provided as a Git submodule of the PSyclone
project (see Installation in the Developers’ Guide
for details on working with submodules).
The supplied Makefile
also provides a second, profile
target which
performs the same OpenACC transformations but then encloses the whole
of the resulting PSy layer in a profiling region. By linking this with
the PSyclone NVTX profiling wrapper (and the NVTX library itself), the
resulting application can be profiled using NVIDIA’s nvprof or
nvvp tools.
Example 3: OpenCL
Example of the use of PSyclone to generate an OpenCL driver version of
the PSy layer and OpenCL kernels. The Makefile
in this example provides
a target (make compile-ocl) to compile the generated OpenCL code. This
requires an OpenCL implementation installed in the system. Read the README
provided in the example folder for more details about how to compile and
execute the generated OpenCL code.
Example 4: Kernels containing use statements
Transforming kernels for use with either OpenACC or OpenCL requires
that we handle those that access data and/or routines via module
use
statements. This example shows the various forms for which
support is being implemented. Although there is support for converting
global-data accesses into kernel arguments, PSyclone does not yet support
nested use
of modules (i.e. data accessed via a module that in turn
imports that symbol from another module) and kernels that call other
kernels (Issue #342).
Example 5: PSyData
This directory contains all examples that use the PSyData API. At this stage there are three runnable examples:
Example 5.1: Kernel data extraction
This example shows the use of kernel data extraction in PSyclone.
It instruments each of the two invokes in the example program
with the PSyData-based kernel extraction code. Detailed compilation
instructions are in the README.md
file, including how to switch
from using the stand-alone extraction library to the NetCDF-based one
(see Extraction Libraries for details).
The Makefile
in this example will create the binary that extracts
the data at run time, as well as two driver programs that can read in
the extracted data, call the kernel, and compare the results. These
driver programs are independent of the dl_esm_inf infrastructure library.
These drivers can only read the corresponding file format, i.e. a NetCDF
driver program cannot read in extraction data that is based on Fortran IO
and vice versa.
Note
At this stage the driver program still needs the infrastructure library when compiling the kernels, see #1757.
Example 5.2: Profiling
This example shows how to use the profiling support in PSyclone.
It instruments two invoke statements and can link in with any
of the following profiling wrapper libraries: template,
simple_timer, dl_timer, TAU, and DrHook (see
Interface to Third Party Profiling Tools). The README.md
file contains detailed instructions on how to build the
different executables. By default (i.e. just using make
without additional parameters) it links in with the
template profiling library included in PSyclone. This library just
prints out the name of the module and region before and after each
invoke is executed. This example can actually be executed to
test the behaviour of the various profiling wrappers, and is
also useful if you want to develop your own wrapper libraries.
Example 5.3: Read-only-verification
This example shows the use of read-only-verification with PSyclone.
It instruments each of the two invokes in the example program
with the PSyData-based read-only-verification code.
It uses the dl_esm_inf-specific read-only-verification library
(lib/read_only/dl_esm_inf/
).
Note
The update_field_mod
subroutine contains some very
buggy and non-standard code to change the value of some
read-only variables and fields, even though the variables
are all declared with
intent(in)
. It uses the addresses of variables and
then out-of-bound writes to a writeable array to
actually overwrite the read-only variables. Using
array bounds checking at runtime will be triggered by these
out-of-bound writes.
The Makefile
in this example will link with the compiled
read-only-verification library. You can execute the created
binary and it will print two warnings about modified
read-only variables:
--------------------------------------
Double precision field b_fld has been modified in main : update
Original checksum: 4611686018427387904
New checksum: 4638355772470722560
--------------------------------------
--------------------------------------
Double precision variable z has been modified in main : update
Original value: 1.0000000000000000
New value: 123.00000000000000
--------------------------------------
Example 5.4: Value Range Check
This example shows the use of valid number verification with PSyclone.
It instruments each of the two invokes in the example program
with the PSyData-based Value-Range-Check code.
It uses the dl_esm_inf-specific value range check library
(lib/value_range_check/dl_esm_inf/
).
Note
The update_field_mod
subroutine contains code
that will trigger a division by 0 to create NaNs. If
the compiler happens to add code that handles floating point
exceptions , this will take effect before the value testing
is done by the PSyData-based verification code.
The Makefile
in this example will link with the compiled
value_range_check library. You can then execute the binary
and enable the value range check by setting environments
(see value range check for
details).
PSYVERIFY__main__init__b_fld=2:3 ./value_range_check
...
PSyData: Variable b_fld has the value 0.0000000000000000 at index/indices 6 1 in module 'main', region 'init', which is not between '2.0000000000000000' and '3.0000000000000000'.
...
PSyData: Variable a_fld has the invalid value 'Inf' at index/indices 1 1 in module 'main', region 'update'.
As indicated in value range check, you can also check a variable in all kernels of a module, or in any instrumented code region (since the example has only one module, both settings below will create the same warnings):
PSYVERIFY__main__b_fld=2:3 ./value_range_check
PSYVERIFY__b_fld=2:3 ./value_range_check
...
PSyData: Variable b_fld has the value 0.0000000000000000 at index/indices 6 1 in module 'main', region 'init', which is not between '2.0000000000000000' and '3.0000000000000000'.
...
PSyData: Variable b_fld has the value 0.0000000000000000 at index/indices 6 1 in module 'main', region 'update', which is not between '2.0000000000000000' and '3.0000000000000000'.
Notice that now a warning is created for both kernels: init
and update
.
Support for checking arbitrary Fortran code is tracked as issue #2741.
Example 6: PSy-layer Code Creation using PSyIR
This example informs the development of the code generation of PSy-layer code using the PSyIR language backends.
LFRic Examples
These examples illustrate the functionality of PSyclone for the LFRic domain.
Example 1: Basic Operation
Basic operation of PSyclone with an invoke()
containing two
kernels, one user-supplied, the other a
Built-in. Code is generated both with and
without distributed-memory support. Also demonstrates the use of the
-d
flag to specify where to search for user-supplied kernel code
(see The psyclone command section for more details).
Example 2: Applying Transformations
A more complex example showing the use of PSyclone transformations to change the generated PSy-layer code. Provides examples of kernel-inlining and loop-fusion transformations.
Example 4: Multiple Built-ins, Named Invokes and Boundary Conditions
Demonstrates the use of the special enforce_bc_kernel
which
PSyclone recognises as a boundary-condition kernel.
Example 5: Stencils
Example of kernels which require stencil information.
Example 6: Reductions
Example of applying OpenMP to an InvokeSchedule containing kernels that perform reduction operations. Two scripts are provided, one of which demonstrates how to request that PSyclone generate code for a reproducible OpenMP reduction. (The default OpenMP reduction is not guaranteed to be reproducible from one run to the next on the same number of threads.)
Example 7: Column-Matrix Assembly Operators
Example of kernels requiring Column-Matrix Assembly operators.
Example 8: Redundant Computation
Example of the use of the redundant-computation and move transformations to eliminate and re-order halo exchanges.
Example 9: Writing to Discontinuous Fields
Demonstrates the behaviour of PSyclone for kernels that read and write quantities on horizontally-discontinuous function spaces. In addition, this example demonstrates how to write a PSyclone transformation script that only colours loops over continuous spaces.
Example 10: Inter-grid Kernels
Demonstrates the use of “inter-grid” kernels that prolong or restrict
fields (map between grids of different resolutions), as well as the
use of ANY_DISCONTINUOUS_SPACE
function space metadata.
Example 11: Asynchronous Halo Exchanges
Example of the use of transformations to introduce redundant computation, split synchronous halo exchanges into asynchronous exchanges (start and stop) and move the starts of those exchanges in order to overlap them with computation.
Example 12: Code Extraction
Example of applying code extraction to Nodes in an Invoke Schedule:
> psyclone -nodm -s ./extract_nodes.py \
gw_mixed_schur_preconditioner_alg_mod.x90
or to a Kernel in an Invoke after applying transformations:
> psyclone -nodm -s ./extract_kernel_with_transformations.py \
gw_mixed_schur_preconditioner_alg_mod.x90
For now it only inserts comments in appropriate locations while the the full support for code extraction is being developed.
This example also contains a Python helper script find_kernel.py
which displays the names and Schedules of Invokes containing call(s)
to the specified Kernel:
> python find_kernel.py
Example 13 : Kernel Transformation
Demonstrates how an LFRic kernel can be transformed. The example transformation makes Kernel values constant where appropriate. For example, the number of levels is usually passed into a kernel by argument but the transformation allows a particular value to be specified which the transformation then sets as a parameter in the kernel. Hard-coding values in a kernel helps the compiler to do a better job when optimising the code.
Example 14: OpenACC
Example of adding OpenACC directives in the LFRic API.
A single transformation script (acc_parallel.py
) is provided
which demonstrates how to add OpenACC Kernels and Enter Data
directives to the PSy-layer. It supports distributed memory being
switched on by placing an OpenACC Kernels directive around each
(parallelisable) loop, rather than having one for the whole invoke.
This approach avoids having halo exchanges within an OpenACC Parallel
region. The script also uses ACCRoutineTrans
to transform the one user-supplied kernel through
the addition of an !$acc routine
directive. This ensures that the
compiler builds a version suitable for execution on the accelerator (GPU).
This script is used by the supplied Makefile. The invocation of PSyclone
within that Makefile also specifies the --profile invokes
option so that
each invoke
is enclosed within profiling calipers (by default the
‘template’ profiling library supplied with PSyclone is used at the link
stage). Compilation of the example using the NVIDIA compiler may be performed
by e.g.:
> F90=nvfortran F90FLAGS="-acc -Minfo=all" make compile
Launching the resulting binary with NV_ACC_NOTIFY
set will show details
of the kernel launches and data transfers:
> NV_ACC_NOTIFY=3 ./example_openacc
...
Step 5 : chksm = 2.1098315506694516E-004
PreStart called for module 'main_psy' region 'invoke_2-setval_c-r2'
upload CUDA data file=PSyclone/examples/lfric/eg14/main_psy.f90 function=invoke_2 line=183 device=0 threadid=1 variable=.attach. bytes=144
upload CUDA data file=PSyclone/examples/lfric/eg14/main_psy.f90 function=invoke_2 line=183 device=0 threadid=1 variable=.attach. bytes=144
launch CUDA kernel file=PSyclone/examples/lfric/eg14/main_psy.f90 function=invoke_2 line=186 device=0 threadid=1 num_gangs=5 num_workers=1 vector_length=128 grid=5 block=128
PostEnd called for module 'main_psy' region 'invoke_2-setval_c-r2'
download CUDA data file=PSyclone/src/psyclone/tests/test_files/dynamo0p3/infrastructure//field/field_r64_mod.f90 function=log_minmax line=756 device=0 threadid=1 variable=self%data(:) bytes=4312
20230807214504.374+0100:INFO : Min/max minmax of field1 = 0.30084014E+00 0.17067212E+01
...
However, performance will be very poor as, with the limited optimisations and directives currently applied, the NVIDIA compiler refuses to run the user-supplied kernel in parallel.
Example 15: CPU Optimisation of Matvec
Example of optimising the LFRic matvec kernel for CPUs. This is work in progress with the idea being that PSyclone transformations will be able to reproduce hand-optimised code.
There is one script which, when run:
> psyclone ./matvec_opt.py ../code/gw_mixed_schur_preconditioner_alg_mod.x90
will print out the modified matvec kernel code. At the moment no transformations are included (as they are work-in-progress) so the code that is output is the same as the original (but looks different as it has been translated to PSyIR and then output by the PSyIR Fortran back-end).
Example 16: Generating LFRic Code Using LFRic-specific PSyIR
This example shows how LFRic-specific PSyIR can be used to create LFRic kernel code. There is one Python script provided which when run:
> python create.py
will print out generated LFRic kernel code. The script makes use of LFRic-specific data symbols to simplify code generation.
Example 17: Runnable Simplified Examples
This directory contains three simplified LFRic examples that can be compiled and executed - of course, a suitable Fortran compiler is required. The examples are using a subset of the LFRic infrastructure library, which is contained in PSyclone and which has been slightly modified to make it easier to create stand-alone, non-MPI LFRic codes.
Example 17.1: A Simple Runnable Example
The subdirectory full_example
contains a very simple example code
that uses PSyclone to process two invokes. It uses unit-testing
code from various classes to create the required data structures like
initial grid etc. The code can be compiled with make compile
, and
the binary executed with either make run
or ./example
.
Example 17.2: A Simple Runnable Example With NetCDF
The subdirectory full_example_netcdf
contains code very similar
to the previous example, but uses NetCDF to read the initial grid
from the NetCDF file mesh_BiP128x16-400x100.nc
.
Installation of NetCDF is described in
the hands-on practicals documentation.
The code can be compiled with make compile
, and
the binary executed with either make run
or ./example
.
Example 17.3: Kernel Data Extraction
The example in the subdirectory full_example_extract
shows the
use of kernel extraction. The code can be compiled with
make compile
, and the binary executed with either make run
or
./extract.standalone
. By default, it will be using
a stand-alone extraction library (see Extraction Libraries).
If you want to use the NetCDF version, set the environment variable
TYPE
to be netcdf
:
TYPE=netcdf make compile
This requires the installation of a NetCDF development environment
(see here
for installing NetCDF). The binary will be called extract.netcdf
,
and the output files will have the .nc
extension.
Running the compiled binary will create two Fortran binary files or two NetCDF files if the NetCDF library was used. They contain the input and output parameters for the two invokes in this example:
cd full_example_extraction
TYPE=netcdf make compile
./extract.netcdf
ncdump ./main-update.nc | less
Example 18: Special Accesses of Continuous Fields - Incrementing After Reading and Writing Before (Potentially) Reading
Example containing one kernel with a GH_READINC
access and one
with a GH_WRITE
access, both for continuous fields. A kernel with
GH_READINC
access first reads the field data and then increments
the field data. This contrasts with a GH_INC
access which simply
increments the field data. As an increment is effectively a read
followed by a write, it may not be clear why we need to distinguish
between these cases. The reason for distinguishing is that the
GH_INC
access is able to remove a halo exchange (or at least
reduce its depth by one) in certain circumstances, whereas a
GH_READINC
is not able to take advantage of this optimisation.
A kernel with a GH_WRITE
access for a continuous field must guarantee to
write the same value to a given shared DoF, independent of which cell
is being updated. As described
in the Developer Guide, this means that annexed DoFs are computed
correctly without the need to iterate into the L1 halo and thus can
remove the need for halo exchanges on those fields that are read.
Example 19: Mixed Precision
This example shows the use of the LFRic mixed-precision support to call a kernel with scalars, fields and operators of different precision.
Example 20: Algorithm Generation
Illustration of the use of the psyclone-kern
tool to create an
algorithm layer for a kernel. A makefile is provide that also
runs psyclone
to create an executable program from the generated
algorithm layer and original kernel code. To see the generated
algorithm layer run:
cd eg20/
psyclone-kern -gen alg ../code/testkern_mod.F90
NEMO Examples
These examples may all be found in the examples/nemo
directory.
Example 1: OpenMP parallelisation of tra_adv
Demonstrates the use of PSyclone to parallelise loops in a NEMO tracer-advection benchmark using OpenMP for CPUs and for GPUs.
Example 2: OpenMP parallelisation of traldf_iso
Demonstrates the use of PSyclone to parallelise in some NEMO tracer-diffusion code using OpenMP for CPUs and for GPUs.
Example 3: OpenACC parallelisation of tra_adv
Demonstrates the introduction of simple OpenACC parallelisation (using the
data
and kernels
directives) for a NEMO tracer-advection benchmark.
Example 4: Transforming Fortran code to the SIR
Demonstrates that simple Fortran code can be transformed to the Stencil Intermediate Representation (SIR). The SIR is the front-end language to DAWN (https://github.com/MeteoSwiss-APN/dawn), a tool which generates optimised cuda, or gridtools code. Thus various simple Fortran examples and the computational part of the tracer-advection benchmark can be transformed to optimised cuda and/or gridtools code by using PSyclone and then DAWN.
Example 5: Kernel Data Extraction
This example shows the use of kernel data extraction in PSyclone for
generic Fortran code. It instruments each kernel in the NEMO tracer-advection
benchmark with the PSyData-based kernel extraction code. Detailed
compilation instructions are in the README.md
file, including how
to switch from using the stand-alone extraction library to the NetCDF-based
one (see Extraction Libraries for details).
Scripts
This contains examples of two different scripts that aid the use of PSyclone with the full NEMO model. The first, process_nemo.py is a simple wrapper script that allows a user to control which source files are transformed, which only have profiling instrumentation added and which are ignored altogether. The second, kernels_trans.py is a PSyclone transformation script which adds the largest possible OpenACC Kernels regions to the code being processed.
For more details see the examples/nemo/README.md
file.
Note that these scripts are here to support the ongoing development of PSyclone to transform the NEMO source. They are not intended as ‘turn-key’ solutions but as a starting point.