PSy Kernel Extractor (PSyKE)

Introduction

PSyclone has the ability to define regions of a PSyclone-conformant code to be extracted and run as a stand-alone application. This ability, called PSyKE (PSy Kernel Extractor), can be useful for benchmarking parts of a model, such as LFRic, without the need for using its infrastructure.

Mechanism

The code marked for extraction can be (subject to Restrictions):

  • One or more Nodes in an Invoke (e.g. Loops containing Kernel or Built-In calls, a Directive enclosing one or more Loops) or

  • The entire Invoke (extraction applied to all Nodes).

The basic mechanism of code extraction is through applying the ExtractTrans transformation to selected Nodes. This transformation is further sub-classed into API-specific implementations, LFRicExtractTrans and GOceanExtractTrans. Both sub-classed transformations insert an instance of the ExtractNode object into the Schedule of a specific Invoke. All Nodes marked for extraction become children of the ExtractNode.

The ExtractNode class uses the dependency analysis to detect which variables are input-, and which ones are output-parameters. The lists of variables are then passed to the PSyDataNode, which is the base class of any ExtractNode (details of the PSyDataNode can be found in psy_data). This node then creates the actual code, as in the following LFRic example:

! ExtractStart
!
CALL extract_psy_data%PreStart("testkern_mod", "testkern_code", 4, 2)
CALL extract_psy_data%PreDeclareVariable("a", a)
CALL extract_psy_data%PreDeclareVariable("f2", f2)
CALL extract_psy_data%PreDeclareVariable("m1", m1)
CALL extract_psy_data%PreDeclareVariable("m2", m2)
CALL extract_psy_data%PreDeclareVariable("map_w1", map_w1)
...
CALL extract_psy_data%PreDeclareVariable("undf_w3", undf_w3)
CALL extract_psy_data%PreDeclareVariable("f1_post", f1)
CALL extract_psy_data%PreDeclareVariable("cell_post", cell)
CALL extract_psy_data%PreEndDeclaration
CALL extract_psy_data%ProvideVariable("a", a)
CALL extract_psy_data%ProvideVariable("f2", f2)
CALL extract_psy_data%ProvideVariable("m1", m1)
CALL extract_psy_data%ProvideVariable("m2", m2)
CALL extract_psy_data%ProvideVariable("map_w1", map_w1)
...
CALL extract_psy_data%ProvideVariable("undf_w3", undf_w3)
CALL extract_psy_data%PreEnd
DO cell=1,f1_proxy%vspace%get_ncell()
  !
  CALL testkern_code(nlayers, a, f1_proxy%data, f2_proxy%data,  &
       m1_proxy%data, m2_proxy%data, ndf_w1, undf_w1,           &
       map_w1(:,cell), ndf_w2, undf_w2, map_w2(:,cell), ndf_w3, &
       undf_w3, map_w3(:,cell))
END DO
CALL extract_psy_data%PostStart
CALL extract_psy_data%ProvideVariable("cell_post", cell)
CALL extract_psy_data%ProvideVariable("f1_post", f1)
CALL extract_psy_data%PostEnd
!
! ExtractEnd

The PSyData API relies on generic Fortran interfaces to provide the field-type-specific implementations of the ProvideVariable for different types. This means that a different version of the external PSyData library that PSyKE uses must be supplied for each PSyclone API.

Restrictions

Code extraction can be applied to unoptimised or optimised code. There are restrictions that check for correctness of optimising transformations when extraction is applied, as well as restrictions that eliminate dependence on the specific model infrastructure.

General

This group of restrictions is enforced irrespective of whether optimisations are used or not.

  • Extraction can be applied to a single Node or a list of Nodes in a Schedule. For the latter, Nodes in the list must be consecutive children of the same parent Schedule.

  • Extraction cannot be applied to an ExtractNode or a Node list that already contains one (otherwise we would have an extract region within another extract region).

  • A Kernel or a Built-In call cannot be extracted without its parent Loop.

Distributed memory

As noted in the Distributed Memory section, support for distributed memory in PSyclone is currently limited to the LFRic (Dynamo0.3) API. Since the implementation generates calls to LFRic infrastructure (e.g. runtime checks for status of field halos), code extraction is not allowed when distributed memory is enabled.

Shared memory and API-specific

The ExtractTrans transformation cannot be applied to:

  • A Loop without its parent Directive,

  • An orphaned Directive (e.g. OMPDoDirective, ACCLoopDirective) without its parent Directive (e.g. ACC or OMP Parallel Directive),

  • A Loop over cells in a colour without its parent Loop over colours in the LFRic API,

  • An inner Loop without its parent outer Loop in the GOcean1.0 API.

Use

The code extraction is currently enabled by utilising a transformation script (see Script section for more details).

For example, the transformation script which extracts the first Kernel call in LFRic API test example 15.1.2_builtin_and_normal_kernel_invoke.f90 would be written as:

from psyclone.domain.lfric.transformations import LFRicExtractTrans

# Get instance of the ExtractRegionTrans transformation
etrans = LFRicExtractTrans()

# Get Invoke and its Schedule
invoke = psy.invokes.get("invoke_0")
schedule = invoke.schedule

# Apply extract transformation to the selected Node
etrans.apply(schedule.children[2])
print(schedule.view())

and called as:

> psyclone -nodm -s ./extract_single_node.py \
    <path-to-example>/15.1.2_builtin_and_normal_kernel_invoke.f90

PSyclone modifies the Schedule of the selected invoke_0:

Schedule[invoke='invoke_0' dm=False]
    0: Loop[type='dofs',field_space='any_space_1',it_space='dofs',
            upper_bound='ndofs']
        Literal[value:'NOT_INITIALISED']
        Literal[value:'NOT_INITIALISED']
        Literal[value:'1']
        Schedule[]
            0: BuiltIn setval_c(f5,0.0)
    1: Loop[type='dofs',field_space='any_space_1',it_space='dofs',
            upper_bound='ndofs']
        ...
        Schedule[]
            0: BuiltIn setval_c(f2,0.0)
    2: Loop[type='',field_space='w2',it_space='cells', upper_bound='ncells']
        ...
        Schedule[]
            0: CodedKern testkern_code_w2_only(f3,f2) [module_inline=False]
    3: Loop[type='',field_space='wtheta',it_space='cells', upper_bound='ncells']
        ...
        Schedule[]
            0: CodedKern testkern_wtheta_code(f4,f5) [module_inline=False]
    4: Loop[type='',field_space='w1',it_space='cells', upper_bound='ncells']
        ...
        Schedule[]
            0: CodedKern testkern_code(scalar,f1,f2,f3,f4) [module_inline=False]

to insert the extract region. As shown below, all children of an ExtractNode will be part of the region:

Schedule[invoke='invoke_0' dm=False]
    0: Loop[type='dofs',field_space='any_space_1',it_space='dofs',
            upper_bound='ndofs']
        ...
        Schedule[]
            0: BuiltIn setval_c(f5,0.0)
    1: Loop[type='dofs',field_space='any_space_1',it_space='dofs',
            upper_bound='ndofs']
        ...
        Schedule[]
            0: BuiltIn setval_c(f2,0.0)
    2: Extract
        Schedule[]
            0: Loop[type='',field_space='w2',it_space='cells', upper_bound='ncells']
                ...
                Schedule[]
                    0: CodedKern testkern_code_w2_only(f3,f2) [module_inline=False]
    3: Loop[type='',field_space='wtheta',it_space='cells', upper_bound='ncells']
        ...
        Schedule[]
            0: CodedKern testkern_wtheta_code(f4,f5) [module_inline=False]
    4: Loop[type='',field_space='w1',it_space='cells', upper_bound='ncells']
        ...
        Schedule[]
            0: CodedKern testkern_code(scalar,f1,f2,f3,f4) [module_inline=False]

To extract multiple Nodes, ExtractTrans can be applied to the list of Nodes (subject to General restrictions above):

# Apply extract transformation to the selected Nodes
etrans.apply(schedule.children[1:3])

This modifies the above Schedule as:

...
    Extract
        Schedule[]
            0: Loop[type='dofs',field_space='any_space_1',it_space='dofs',
                    upper_bound='ndofs']
                ...
                Schedule[]
                    0: BuiltIn setval_c(f2,0.0)
            1: Loop[type='',field_space='w2',it_space='cells', upper_bound='ncells']
                ...
                Schedule[]
                    0: CodedKern testkern_code_w2_only(f3,f2) [module_inline=False]
...

As said above, extraction can be performed on optimised code. For example, the following example transformation script first adds !$OMP PARALLEL DO directive and then extracts the optimised code in LFRic API test example 15.1.2_builtin_and_normal_kernel_invoke.f90:

from psyclone.domain.lfric.transformations import LFRicExtractTrans
from psyclone.transformations import DynamoOMPParallelLoopTrans

# Get instances of the transformations
etrans = LFRicExtractTrans()
otrans = DynamoOMPParallelLoopTrans()

# Get Invoke and its Schedule
invoke = psy.invokes.get("invoke_0")
schedule = invoke.schedule

# Add OMP PARALLEL DO directives
otrans.apply(schedule.children[1])
otrans.apply(schedule.children[2])
# Apply extract transformation to the selected Nodes
etrans.apply(schedule.children[1:3])
print(schedule.view())

The generated code is now:

! ExtractStart
CALL extract_psy_data%PreStart("unknown-module", "setval_c", 1, 3)
CALL extract_psy_data%PreDeclareVariable("f2", f2)
CALL extract_psy_data%PreDeclareVariable("cell_post", cell)
CALL extract_psy_data%PreDeclareVariable("df_post", df)
CALL extract_psy_data%PreDeclareVariable("f3_post", f3)
...
CALL extract_psy_data%PreEndDeclaration
CALL extract_psy_data%ProvideVariable("f2", f2)
...
CALL extract_psy_data%PreEnd
!
!$omp parallel do default(shared), private(df), schedule(static)
DO df=1,undf_aspc1_f2
  f2_proxy%data(df) = 0.0
END DO
!$omp end parallel do
!$omp parallel do default(shared), private(cell), schedule(static)
DO cell=1,f3_proxy%vspace%get_ncell()
  !
  CALL testkern_code_w2_only(nlayers, f3_proxy%data, f2_proxy%data, ndf_w2, undf_w2, map_w2(:,cell))
END DO
!$omp end parallel do
CALL extract_psy_data%PostStart
CALL extract_psy_data%ProvideVariable("cell_post", cell)
CALL extract_psy_data%ProvideVariable("df_post", df)
CALL extract_psy_data%ProvideVariable("f3_post", f3)
CALL extract_psy_data%PostEnd
!
! ExtractEnd

Note

At this stage Built-Ins are not fully supported, resulting in f2 being incorrectly detected as an input parameter, and not as an output parameter. This issue is tracked in #637.

Examples in examples/lfric/eg12 directory demonstrate how to apply code extraction by utilising PSyclone transformation scripts (see Examples section for more information).

NetCDF Extraction Examples

PSyclone comes with example NetCDF-based extraction libraries in lib/extract/netcdf.

NetCDF Extraction for GOcean

The library in lib/extract/netcdf/dl_esm_inf implements the full PSyData API for use with the GOcean1.0 dl_esm_inf infrastructure library. In order to compile this library, you must have NetCDF installed. When running the code, it will create a NetCDF file for the instrumented code region. It includes all variables that are read before the code is executed, and all variables that have been modified. The output variables have the postfix _post attached to the NetCDF names, e.g. a variable xyz that is read and written will be stored with the name xyz containing the input values, and the name xyz_post containing the output values. Arrays have their size stored as NetCDF dimensions: again the variable xyz will have its sizes stored as xyzdim1, xyzdim2 for the input values, and output arrays use the name xyz_postdim1, xyz_postdim2.

The output file contains the values of all variables used in the subroutine. The GOceanExtractTrans can automatically create a driver program which will read the NetCDF file, call the instrumented region, and compare the results. In order to create this driver program, the options parameter create_driver must be set to true:

extract = GOceanExtractTrans()
extract.apply(schedule.children,
              {"create_driver": True,
               "region_name": ("main", "init")})

This will create a Fortran file called driver-main-init.f90, which can then be compiled and executed. This stand-alone program will read the NetCDF file created during an execution of the actual program, call the kernel with all required input parameter, and compare the output variables with the original output variables. This can be used to create stand-alone test cases to reproduce a bug, or for performance optimisation of a stand-alone kernel.

NetCDF Extraction for LFRic

The library in lib/extract/netcdf/lfric implements the full PSyData API for use with the LFRic infrastructure library. In order to compile this library, you must have NetCDF installed. When running the code, it will create a NetCDF file for the instrumented code region.

As in the case of e.g. read-only verification, this library uses the pared-down LFRic infrastructure located in a clone of PSyclone repository, <PSYCLONEHOME>/src/psyclone/tests/test_files/dynamo0p3/infrastructure. However, this needs to be changed for any user (for instance with PSyclone installation). Please refer to the relevant README.md documentation on how to build and link this library.

Note

Driver creation in LFRic is not yet fully supported, and is tracked in issue #1392.