Transformations

As discussed in the previous section, transformations can be applied to PSyclone’s internal representation (PSyIR) to modify it. Typically transformations will be used to optimise the Algorithm, PSy and/or Kernel layer(s) for a particular architecture, however transformations could be added for other reasons, such as to aid debugging or for performance monitoring.

Finding

Transformations can be imported directly, but the user needs to know what transformations are available. A helper class TransInfo is provided to show the available transformations

Note

The directory layout of PSyclone is currently being restructured. As a result of this some transformations are already in the new locations, while others have not been moved yet. Transformations in the new locations can at the moment not be found using the TransInfo approach, and need to be imported directly from the path indicated in the documentation.

class psyclone.psyGen.TransInfo(module=None, base_class=None)[source]

This class provides information about, and access, to the available transformations in this implementation of PSyclone. New transformations will be picked up automatically as long as they subclass the abstract Transformation class.

For example:

>>> from psyclone.psyGen import TransInfo
>>> t = TransInfo()
>>> print(t.list)
There is 1 transformation available:
  1: SwapTrans, A test transformation
>>> # accessing a transformation by index
>>> trans = t.get_trans_num(1)
>>> # accessing a transformation by name
>>> trans = t.get_trans_name("SwapTrans")
get_trans_name(name)[source]

return the transformation with this name (use list() first to see available transformations)

get_trans_num(number)[source]

return the transformation with this number (use list() first to see available transformations)

property list

return a string with a human readable list of the available transformations

property num_trans

return the number of transformations available

Standard Functionality

Each transformation must provide at least two functions for the user: one for validation, i.e. to verify that a certain transformation can be applied, and one to actually apply the transformation. They are described in detail in the overview of all transformations, but the following general guidelines apply.

Validation

Each transformation provides a function validate. This function can be called by the user, and it will raise an exception if the transformation can not be applied (and otherwise will return nothing). Validation will always be called when a transformation is applied. The parameters for validate can change from transformation to transformation, but each validate function accepts a parameter options. This parameter is either None, or a dictionary of string keys, that will provide additional parameters to the validation process. For example, some validation functions allow part of the validation process to be disabled in order to allow the HPC expert to apply a transformation that they know to be safe, even if the more general validation process might reject it. Those parameters are documented for each transformation, and will show up as a parameter, e.g.: options["node-type-check"]. As a simple example:

# The validation might reject the application, but in this
# specific case it is safe to apply the transformation,
# so disable the node type check:
my_transform.validate(node, {"node-type-check": False})

Application

Each transformation provides a function apply which will apply the transformation. It will first validate the transform by calling the validate function. Each apply function takes the same options parameter as the validate function described above. Besides potentially modifying the validation process, optional parameters for the transformation are also provided this way. A simple example:

kctrans = Dynamo0p3KernelConstTrans()
kctrans.apply(kernel, {"element_order": 0, "quadrature": True})

The same options dictionary will be used when calling validate.

Available transformations

Some transformations are generic as the schedule structure is independent of the API, however it often makes sense to specialise these for a particular API by adding API-specific errors checks. Some transformations are API-specific. Currently these different types of transformation are indicated by their names.

The generic transformations currently available are listed in alphabetical order below (a number of these have specialisations which can be found in the API-specific sections).

Note

PSyclone currently only supports OpenCL and KernelImportsToArguments transformations for the GOcean 1.0 API, the OpenACC Data transformation is limited to the NEMO and GOcean 1.0 APIs and the OpenACC Kernels transformation is limited to the NEMO and Dynamo0.3 APIs.

Note

The directory layout of PSyclone is currently being restructured. As a result of this some transformations are already in the new locations, while others have not been moved yet.


class psyclone.psyir.transformations.Abs2CodeTrans[source]

Provides a transformation from a PSyIR ABS Operator node to equivalent code in a PSyIR tree. Validity checks are also performed.

The transformation replaces

R = ABS(X)

with the following logic:

IF X < 0.0:
    R = X*-1.0
ELSE:
    R = X
apply(node, options=None)[source]

Apply the ABS intrinsic conversion transformation to the specified node. This node must be an ABS UnaryOperation. The ABS UnaryOperation is converted to equivalent inline code. This is implemented as a PSyIR transform from:

R = ... ABS(X) ...

to:

tmp_abs = X
if tmp_abs < 0.0:
    res_abs = tmp_abs*-1.0
else:
    res_abs = tmp_abs
R = ... res_abs ...

where X could be an arbitrarily complex PSyIR expression and ... could be arbitrary PSyIR code.

This transformation requires the operation node to be a descendent of an assignment and will raise an exception if this is not the case.

Parameters
  • node (psyclone.psyir.nodes.UnaryOperation) – an ABS UnaryOperation node.

  • options (dictionary of string:values or None) – a dictionary with options for transformations.

Warning

This transformation assumes that the ABS Operator acts on PSyIR Real scalar data and does not check that this is not the case. Once issue #658 is on master then this limitation can be fixed.







class psyclone.psyir.transformations.ArrayRange2LoopTrans(writer=None)[source]

Provides a transformation from a PSyIR Array Range to a PSyIR Loop. For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> api = "nemo"
>>> filename = "tra_adv_compute.F90"
>>> ast, invoke_info = parse(filename, api=api)
>>> psy = PSyFactory(api).create(invoke_info)
>>> schedule = psy.invokes.invoke_list[0].schedule
>>>
>>> from psyclone.psyir.nodes import Assignment
>>> from psyclone.psyir.transformations import ArrayRange2LoopTrans,     >>>     TransformationError
>>>
>>> print(schedule.view())
>>> trans = ArrayRange2LoopTrans()
>>> for assignment in schedule.walk(Assignment):
>>>     while True:
>>>         try:
>>>             trans.apply(assignment)
>>>         except TransformationError:
>>>             break
>>> print(schedule.view())
apply(node, options=None)[source]

Apply the ArrayRange2Loop transformation to the specified node. The node must be an assignment. The rightmost range node in each array within the assignment is replaced with a loop index and the assignment is placed within a loop iterating over that index. The bounds of the loop are determined from the bounds of the array range on the left hand side of the assignment.

Parameters

node (psyclone.psyir.nodes.Assignment) – an Assignment node.


class psyclone.psyir.transformations.ChunkLoopTrans(writer=None)[source]

Apply a chunking transformation to a loop (in order to permit a chunked parallelisation). For example:

>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Loop
>>> from psyclone.psyir.transformations import ChunkLoopTrans
>>> psyir = FortranReader().psyir_from_source("""
... subroutine sub()
...     integer :: ji, tmp(100)
...     do ji=1, 100
...         tmp(ji) = 2 * ji
...     enddo
... end subroutine sub""")
>>> loop = psyir.walk(Loop)[0]
>>> ChunkLoopTrans().apply(loop)

will generate:

subroutine sub()
    integer :: ji
    integer, dimension(100) :: tmp
    integer :: ji_el_inner
    integer :: ji_out_var
    do ji_out_var = 1, 100, 32
        ji_el_inner = MIN(ji_out_var + (32 - 1), 100)
        do ji = ji_out_var, ji_el_inner, 1
            tmp(ji) = 2 * ji
        enddo
    enddo
end subroutine sub
apply(node, options=None)[source]

Converts the given Loop node into a nested loop where the outer loop is over chunks and the inner loop is over each individual element of the chunk.

Parameters
  • node (psyclone.psyir.nodes.Loop) – the loop to transform.

  • options (dict of str:values or None) – a dict with options for transformations.

  • options["chunksize"] (int) – The size to chunk over for this transformation. If not specified, the value 32 is used.



class psyclone.psyir.transformations.DotProduct2CodeTrans[source]

Provides a transformation from a PSyIR DOT_PRODUCT Operator node to equivalent code in a PSyIR tree. Validity checks are also performed.

If R is a scalar and A, and B have dimension N, The transformation replaces:

R = ... DOT_PRODUCT(A,B) ...

with the following code:

TMP = 0.0
do I=1,N
    TMP = TMP + A(i)*B(i)
R = ... TMP ...

For example:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import BinaryOperation
>>> from psyclone.psyir.transformations import DotProduct2CodeTrans
>>> code = ("subroutine dot_product_test(v1,v2)\n"
...         "real,intent(in) :: v1(10), v2(10)\n"
...         "real :: result\n"
...         "result = dot_product(v1,v2)\n"
...         "end subroutine\n")
>>> psyir = FortranReader().psyir_from_source(code)
>>> trans = DotProduct2CodeTrans()
>>> trans.apply(psyir.walk(BinaryOperation)[0])
>>> print(FortranWriter()(psyir))
subroutine dot_product_test(v1, v2)
  real, dimension(10), intent(in) :: v1
  real, dimension(10), intent(in) :: v2
  real :: result
  integer :: i
  real :: res_dot_product

  res_dot_product = 0.0
  do i = 1, 10, 1
    res_dot_product = res_dot_product + v1(i) * v2(i)
  enddo
  result = res_dot_product

end subroutine dot_product_test
apply(node, options=None)[source]

Apply the DOT_PRODUCT intrinsic conversion transformation to the specified node. This node must be a DOT_PRODUCT BinaryOperation. If the transformation is successful then an assignment which includes a DOT_PRODUCT BinaryOperation node is converted to equivalent inline code.

Parameters
  • node (psyclone.psyir.nodes.BinaryOperation) – a DOT_PRODUCT Binary-Operation node.

  • options (dict of str:str or None) – a dictionary with options for transformations.


class psyclone.psyir.transformations.extract_trans.ExtractTrans(node_class=<class 'psyclone.psyir.nodes.extract_node.ExtractNode'>)[source]

This transformation inserts an ExtractNode or a node derived from ExtractNode into the PSyIR of a schedule. At code creation time this node will use the PSyData API to create code that can write the input and output parameters to a file. The node might also create a stand-alone driver program that can read the created file and then execute the instrumented region. Examples are given in the derived classes DynamoExtractTrans and GOceanExtractTrans.

After applying the transformation the Nodes marked for extraction are children of the ExtractNode. Nodes to extract can be individual constructs within an Invoke (e.g. Loops containing a Kernel or BuiltIn call) or entire Invokes. This functionality does not support distributed memory.

Parameters

node_class (psyclone.psyir.nodes.ExtractNode or derived class) – The Node class of which an instance will be inserted into the tree (defaults to ExtractNode), but can be any derived class.

apply(nodes, options=None)

Apply this transformation to a subset of the nodes within a schedule - i.e. enclose the specified Nodes in the schedule within a single PSyData region.

Parameters
  • nodes (psyclone.psyir.nodes.Node or list of psyclone.psyir.nodes.Node) – can be a single node or a list of nodes.

  • options (dictionary of string:values or None) – a dictionary with options for transformations.

  • options["prefix"] (str) – a prefix to use for the PSyData module name (PREFIX_psy_data_mod) and the PSyDataType (PREFIX_PSYDATATYPE) - a “_” will be added automatically. It defaults to “”.

  • options["region_name"] ((str,str)) – an optional name to use for this PSyData area, provided as a 2-tuple containing a location name followed by a local name. The pair of strings should uniquely identify a region unless aggregate information is required (and is supported by the runtime library).


class psyclone.psyir.transformations.HoistLocalArraysTrans(writer=None)[source]

This transformation takes a Routine and promotes any local, ‘automatic’ arrays to Container scope:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Assignment
>>> from psyclone.psyir.transformations import HoistLocalArraysTrans
>>> code = ("module test_mod\n"
...         "contains\n"
...         "  subroutine test_sub(n)\n"
...         "  integer :: i,j,n\n"
...         "  real :: a(n,n)\n"
...         "  real :: value = 1.0\n"
...         "  do i=1,n\n"
...         "    do j=1,n\n"
...         "      a(i,j) = value\n"
...         "    end do\n"
...         "  end do\n"
...         "  end subroutine test_sub\n"
...         "end module test_mod\n")
>>> psyir = FortranReader().psyir_from_source(code)
>>> hoist = HoistLocalArraysTrans()
>>> hoist.apply(psyir.walk(Routine)[0])
>>> print(FortranWriter()(psyir).lower())
module test_mod
  implicit none
  real, allocatable, dimension(:,:), private :: a
  public

  public :: test_sub

  contains
  subroutine test_sub(n)
    integer :: n
    integer :: i
    integer :: j
    real :: value = 1.0

    if (.not.allocated(a)) then
      allocate(a(1 : n, 1 : n))
    end if
    do i = 1, n, 1
      do j = 1, n, 1
        a(i,j) = value
      enddo
    enddo

  end subroutine test_sub

end module test_mod
apply(node, options=None)[source]

Applies the transformation to the supplied Routine node, moving any local arrays up to Container scope and adding a suitable allocation when they are first accessed. If there are no local arrays or the supplied Routine is a program then this method does nothing.

Parameters
  • node (subclass of psyclone.psyir.nodes.Routine) – target PSyIR node.

  • options (dict of str:values or None) – a dictionary with options for transformations.


class psyclone.psyir.transformations.HoistTrans(writer=None)[source]

This transformation takes an assignment and moves it outside of its parent loop if it is valid to do so. For example:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Assignment
>>> from psyclone.psyir.transformations import HoistTrans
>>> code = ("program test\n"
...         "  integer :: i,j,n\n"
...         "  real :: a(n,n)\n"
...         "  real value\n"
...         "  do i=1,n\n"
...         "    value = 1.0\n"
...         "    do j=1,n\n"
...         "      a(i,j) = value\n"
...         "    end do\n"
...         "  end do\n"
...         "end program\n")
>>> psyir = FortranReader().psyir_from_source(code)
>>> hoist = HoistTrans()
>>> hoist.apply(psyir.walk(Assignment)[0])
>>> print(FortranWriter()(psyir))
program test
  integer :: i
  integer :: j
  integer :: n
  real, dimension(n,n) :: a
  real :: value

  value = 1.0
  do i = 1, n, 1
    do j = 1, n, 1
      a(i,j) = value
    enddo
  enddo

end program test
apply(node, options=None)[source]

Applies the hoist transformation to the supplied assignment node within a loop, moving the assignment outside of the loop if it is valid to do so. Issue #1445 will also look to extend this transformation to other types of node.

Parameters
  • node (subclass of psyclone.psyir.nodes.Assignment) – target PSyIR node.

  • options (dictionary of string:values or None) – a dictionary with options for transformations.



class psyclone.psyir.transformations.LoopFuseTrans(writer=None)[source]

Provides a generic loop-fuse transformation to two Nodes in the PSyIR of a Schedule after performing validity checks for the supplied Nodes. Examples are given in the descriptions of any children classes.

apply(node1, node2, options=None)[source]

Fuses two loops represented by psyclone.psyir.nodes.Node objects after performing validity checks.

Parameters
  • node1 (psyclone.psyir.nodes.Node) – the first Node that is being checked.

  • node2 (psyclone.psyir.nodes.Node) – the second Node that is being checked.

  • options (dictionary of string:values or None) – a dictionary with options for transformations.


class psyclone.psyir.transformations.LoopSwapTrans(writer=None)[source]

Provides a loop-swap transformation, e.g.:

DO j=1, m
    DO i=1, n

becomes:

DO i=1, n
    DO j=1, m

This transform is used as follows:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> ast, invokeInfo = parse("shallow_alg.f90")
>>> psy = PSyFactory("gocean1.0").create(invokeInfo)
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>>
>>> from psyclone.transformations import LoopSwapTrans
>>> swap = LoopSwapTrans()
>>> swap.apply(schedule.children[0])
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
apply(node, options=None)[source]

The argument outer must be a loop which has exactly one inner loop. This transform then swaps the outer and inner loop.

Parameters
  • outer (psyclone.psyir.nodes.Loop) – the node representing the outer loop.

  • options (dictionary of string:values or None) – a dictionary with options for transformations.

Raises

TransformationError – if the supplied node does not allow a loop swap to be done.


class psyclone.psyir.transformations.LoopTiling2DTrans(writer=None)[source]

Apply a 2D loop tiling transformation to a loop. For example:

>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Loop
>>> from psyclone.psyir.transformations import LoopTiling2DTrans
>>> psyir = FortranReader().psyir_from_source("""
... subroutine sub()
...     integer :: ji, tmp(100)
...     do i=1, 100
...       do j=1, 100
...         tmp(i, j) = 2 * tmp(i, j)
...       enddo
...     enddo
... end subroutine sub""")
>>> loop = psyir.walk(Loop)[0]
>>> LoopTiling2DTrans().apply(loop)

will generate:

subroutine sub()
    integer :: ji
    integer, dimension(100) :: tmp
    integer :: ji_el_inner
    integer :: ji_out_var
    do i_out_var = 1, 100, 32
      i_el_inner = MIN(i_out_var + (32 - 1), 100)
      do j_out_var = 1, 100, 32
        do i = i_out_var, i_el_inner, 1
          j_el_inner = MIN(j_out_var + (32 - 1), 100)
          do j = j_out_var, j_el_inner, 1
            tmp(i, j) = 2 * tmp(i, j)
          enddo
        enddo
      enddo
    enddo
end subroutine sub
apply(node, options=None)[source]

Converts the given 2D Loop construct into a tiled version of the nested loops.

Parameters
  • node (psyclone.psyir.nodes.Loop) – the loop to transform.

  • options (dict of str:values or None) – a dict with options for transformations.

  • options["tilesize"] (int) – The size of the resulting tile, currently square tiles are always used. If not specified, the value 32 is used.


class psyclone.psyir.transformations.Matmul2CodeTrans[source]

Provides a transformation from a PSyIR MATMUL Operator node to equivalent code in a PSyIR tree. Validity checks are also performed.

For a matrix-vector multiplication, if the dimensions of R, A, and B are R(N), A(N,M), B(M), the transformation replaces:

R=MATMUL(A,B)

with the following code:

do i=1,N
    R(i) = 0.0
    do j=1,M
        R(i) = R(i) + A(i,j) * B(j)

For a matrix-matrix multiplication, if the dimensions of R, A, and B are R(P,M), A(P,N), B(N,M), the MATMUL is replaced with the following code:

do j=1,M
    do i=1,P
        R(i,j) = 0.0
        do ii=1,N
            R(i,j) = R(i,j) + A(i,ii) * B(ii,j)

Note that this transformation does not support the case where A is a rank-1 array.

apply(node, options=None)[source]

Apply the MATMUL intrinsic conversion transformation to the specified node. This node must be a MATMUL BinaryOperation. The first argument must currently have two dimensions while the second must have either one or two dimensions. Each argument is permitted to have additional dimensions (i.e. more than 2) but in each case it is only the first one or two which may be ranges. Further, the ranges must currently be for the full index space for that dimension (i.e. array subsections are not supported). If the transformation is successful then an assignment which includes a MATMUL BinaryOperation node is converted to equivalent inline code.

Parameters
  • node (psyclone.psyir.nodes.BinaryOperation) – a MATMUL Binary-Operation node.

  • options (Optional[Dict[str, str]]) – options for the transformation.

Note

This transformation is currently limited to translating the matrix vector form of MATMUL to equivalent PSyIR code.


class psyclone.psyir.transformations.Max2CodeTrans[source]

Provides a transformation from a PSyIR MAX Operator node to equivalent code in a PSyIR tree. Validity checks are also performed (by a parent class).

The transformation replaces

R = MAX(A, B, C ...)

with the following logic:

R = A
if B > R:
    R = B
if C > R:
    R = C
...
apply(node, options=None)

Apply this utility transformation to the specified node. This node must be a MIN or MAX BinaryOperation or NaryOperation. The operation is converted to equivalent inline code. This is implemented as a PSyIR transform from:

R = ... [MIN or MAX](A, B, C ...) ...

to:

res = A
tmp = B
IF tmp [< or >] res:
    res = tmp
tmp = C
IF tmp [< or >] res:
    res = tmp
...
R = ... res ...

where A, B, C … could be arbitrarily complex PSyIR expressions and the ... before and after [MIN or MAX](A, B, C ...) can be arbitrary PSyIR code.

This transformation requires the operation node to be a descendent of an assignment and will raise an exception if this is not the case.

Parameters
  • node (psyclone.psyir.nodes.BinaryOperation or psyclone.psyir.nodes.NaryOperation) – a MIN or MAX Binary- or Nary-Operation node.

  • options (dict of str:values or None) – a dictionary with options for transformations.

Warning

This transformation assumes that the MAX Operator acts on PSyIR Real scalar data and does not check that this is not the case. Once issue #658 is on master then this limitation can be fixed.


class psyclone.psyir.transformations.Min2CodeTrans[source]

Provides a transformation from a PSyIR MIN Operator node to equivalent code in a PSyIR tree. Validity checks are also performed (by a parent class).

The transformation replaces

R = MIN(A, B, C ...)

with the following logic:

R = A
if B < R:
    R = B
if C < R:
    R = C
...
apply(node, options=None)

Apply this utility transformation to the specified node. This node must be a MIN or MAX BinaryOperation or NaryOperation. The operation is converted to equivalent inline code. This is implemented as a PSyIR transform from:

R = ... [MIN or MAX](A, B, C ...) ...

to:

res = A
tmp = B
IF tmp [< or >] res:
    res = tmp
tmp = C
IF tmp [< or >] res:
    res = tmp
...
R = ... res ...

where A, B, C … could be arbitrarily complex PSyIR expressions and the ... before and after [MIN or MAX](A, B, C ...) can be arbitrary PSyIR code.

This transformation requires the operation node to be a descendent of an assignment and will raise an exception if this is not the case.

Parameters
  • node (psyclone.psyir.nodes.BinaryOperation or psyclone.psyir.nodes.NaryOperation) – a MIN or MAX Binary- or Nary-Operation node.

  • options (dict of str:values or None) – a dictionary with options for transformations.

Warning

This transformation assumes that the MIN Operator acts on PSyIR Real scalar data and does not check that this is not the case. Once issue #658 is on master then this limitation can be fixed.






Note

PSyclone does not support (distributed-memory) halo swaps or global sums within OpenMP master regions. Attempting to create a master region for a set of nodes that includes halo swaps or global sums will produce an error. In such cases it may be possible to re-order the nodes in the Schedule such that the halo swaps or global sums are performed outside the single region. The MoveTrans transformation may be used for this.



Note

PSyclone does not support (distributed-memory) halo swaps or global sums within OpenMP parallel regions. Attempting to create a parallel region for a set of nodes that includes halo swaps or global sums will produce an error. In such cases it may be possible to re-order the nodes in the Schedule such that the halo swaps or global sums are performed outside the parallel region. The MoveTrans transformation may be used for this.


Note

PSyclone does not support (distributed-memory) halo swaps or global sums within OpenMP single regions. Attempting to create a single region for a set of nodes that includes halo swaps or global sums will produce an error. In such cases it may be possible to re-order the nodes in the Schedule such that the halo swaps or global sums are performed outside the single region. The MoveTrans transformation may be used for this.




class psyclone.psyir.transformations.OMPTaskwaitTrans(writer=None)[source]

Adds zero or more OpenMP Taskwait directives to an OMP parallel region. This transformation will add directives to satisfy dependencies between Taskloop directives without an associated taskgroup (i.e. no nogroup clause). It also tries to minimise the number added to maximise available parallelism.

For example:

>>> from pysclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> api = "gocean1.0"
>>> filename = "nemolite2d_alg.f90"
>>> ast, invokeInfo = parse(filename, api=api, invoke_name="invoke")
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> from psyclone.transformations import OMPParallelTrans, OMPSingleTrans
>>> from psyclone.transformations import OMPTaskloopTrans
>>> from psyclone.psyir.transformations import OMPTaskwaitTrans
>>> singletrans = OMPSingleTrans()
>>> paralleltrans = OMPParallelTrans()
>>> tasklooptrans = OMPTaskloopTrans()
>>> taskwaittrans = OMPTaskwaitTrans()
>>>
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> print(schedule.view())
>>>
>>> # Apply the OpenMP Taskloop transformation to *every* loop
>>> # in the schedule.
>>> # This ignores loop dependencies. These are handled by the
>>> # taskwait transformation.
>>> for child in schedule.children:
>>>     tasklooptrans.apply(child, nogroup = true)
>>> # Enclose all of these loops within a single OpenMP
>>> # SINGLE region
>>> singletrans.apply(schedule.children)
>>> # Enclose all of these loops within a single OpenMP
>>> # PARALLEL region
>>> paralleltrans.apply(schedule.children)
>>> taskwaittrans.apply(schedule.children)
>>> print(schedule.view())
apply(node, options=None)[source]

Apply an OMPTaskwait Transformation to the supplied node (which must be an OMPParallelDirective). In the generated code this corresponds to adding zero or more OMPTaskwaitDirectives as appropriate:

!$OMP PARALLEL
  ...
  !$OMP TASKWAIT
  ...
  !$OMP TASKWAIT
  ...
!$OMP END PARALLEL
Parameters
  • node (psyclone.psyir.nodes.OMPParallelDirective) – the node to which to apply the transformation.

  • options (dict of string:values or None) – a dictionary with options for transformations and validation.

  • options["fail_on_no_taskloop"] (bool) – indicating whether this should throw an error if no OMPTaskloop nodes are found in this tree. This can be safely disabled as if there are no Taskloop nodes the result of this transformation is valid OpenMP code. Default is True


class psyclone.psyir.transformations.ProfileTrans[source]

Create a profile region around a list of statements. For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.parse.utils import ParseError
>>> from psyclone.psyGen import PSyFactory, GenerationError
>>> from psyclone.psyir.transformations import ProfileTrans
>>> api = "gocean1.0"
>>> filename = "nemolite2d_alg.f90"
>>> ast, invokeInfo = parse(filename, api=api, invoke_name="invoke")
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> p_trans = ProfileTrans()
>>>
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> print(schedule.view())
>>>
>>> # Enclose all children within a single profile region
>>> p_trans.apply(schedule.children)
>>> print(schedule.view())

This implementation relies completely on the base class PSyDataTrans for the actual work, it only adjusts the name etc, and the list of valid nodes.

apply(nodes, options=None)

Apply this transformation to a subset of the nodes within a schedule - i.e. enclose the specified Nodes in the schedule within a single PSyData region.

Parameters
  • nodes (psyclone.psyir.nodes.Node or list of psyclone.psyir.nodes.Node) – can be a single node or a list of nodes.

  • options (dictionary of string:values or None) – a dictionary with options for transformations.

  • options["prefix"] (str) – a prefix to use for the PSyData module name (PREFIX_psy_data_mod) and the PSyDataType (PREFIX_PSYDATATYPE) - a “_” will be added automatically. It defaults to “”.

  • options["region_name"] ((str,str)) – an optional name to use for this PSyData area, provided as a 2-tuple containing a location name followed by a local name. The pair of strings should uniquely identify a region unless aggregate information is required (and is supported by the runtime library).


class psyclone.psyir.transformations.ReadOnlyVerifyTrans(node_class=<class 'psyclone.psyir.nodes.read_only_verify_node.ReadOnlyVerifyNode'>)[source]

This transformation inserts a ReadOnlyVerifyNode or a node derived from ReadOnlyVerifyNode into the PSyIR of a schedule. At code creation time this node will use the PSyData API to create code that will verify that read-only quantities are not modified.

After applying the transformation the Nodes marked for verification are children of the ReadOnlyVerifyNode. Nodes to verify can be individual constructs within an Invoke (e.g. Loops containing a Kernel or BuiltIn call) or entire Invokes.

Parameters

node_class (psyclone.psyir.nodes.ReadOnlyVerifyNode or derived class) – The class of Node which will be inserted into the tree (defaults to ReadOnlyVerifyNode), but can be any derived class.

apply(nodes, options=None)

Apply this transformation to a subset of the nodes within a schedule - i.e. enclose the specified Nodes in the schedule within a single PSyData region.

Parameters
  • nodes (psyclone.psyir.nodes.Node or list of psyclone.psyir.nodes.Node) – can be a single node or a list of nodes.

  • options (dictionary of string:values or None) – a dictionary with options for transformations.

  • options["prefix"] (str) – a prefix to use for the PSyData module name (PREFIX_psy_data_mod) and the PSyDataType (PREFIX_PSYDATATYPE) - a “_” will be added automatically. It defaults to “”.

  • options["region_name"] ((str,str)) – an optional name to use for this PSyData area, provided as a 2-tuple containing a location name followed by a local name. The pair of strings should uniquely identify a region unless aggregate information is required (and is supported by the runtime library).


class psyclone.psyir.transformations.Sign2CodeTrans[source]

Provides a transformation from a PSyIR SIGN Operator node to equivalent code in a PSyIR tree. Validity checks are also performed.

The transformation replaces

R = SIGN(A, B)

with the following logic:

R = ABS(A)
if B < 0.0:
    R = R*-1.0

i.e. the value of A with the sign of B

apply(node, options=None)[source]

Apply the SIGN intrinsic conversion transformation to the specified node. This node must be a SIGN BinaryOperation. The SIGN BinaryOperation is converted to equivalent inline code. This is implemented as a PSyIR transform from:

R = ... SIGN(A, B) ...

to:

tmp_abs = A
if tmp_abs < 0.0:
    res_abs = tmp_abs*-1.0
else:
    res_abs = tmp_abs
res_sign = res_abs
tmp_sign = B
if tmp_sign < 0.0:
    res_sign = res_sign*-1.0
R = ... res_sign ...

where A and B could be arbitrarily complex PSyIR expressions, ... could be arbitrary PSyIR code and where ABS has been replaced with inline code by the NemoAbsTrans transformation.

This transformation requires the operation node to be a descendent of an assignment and will raise an exception if this is not the case.

Parameters
  • node (psyclone.psyir.nodes.BinaryOperation) – a SIGN BinaryOperation node.

  • symbol_table (psyclone.psyir.symbols.SymbolTable) – the symbol table.

  • options (dictionary of string:values or None) – a dictionary with options for transformations.

Warning

This transformation assumes that the SIGN Operator acts on PSyIR Real scalar data and does not check whether or not this is the case. Once issue #658 is on master then this limitation can be fixed.

Algorithm-layer

The gocean1.0 API supports the transformation of the algorithm layer. In the future the LFRic (dynamo0.3) API will also support this. However, this is not relevant to the nemo API as it does not have the concept of an algorithm layer (just PSy and Kernel layers). The ability to transformation the algorithm layer is new and at this time no relevant transformations have been developed.

Kernels

PSyclone supports the transformation of Kernels as well as PSy-layer code. However, the transformation of kernels to produce new kernels brings with it additional considerations, especially regarding the naming of the resulting kernels. PSyclone supports two use cases:

  1. the HPC expert wishes to optimise the same kernel in different ways, depending on where/how it is called;

  2. the HPC expert wishes to transform the kernel just once and have the new version used throughout the Algorithm file.

The second case is really an optimisation of the first for the case where the same set of transformations is applied to every instance of a given kernel.

Since PSyclone is run separately for each Algorithm in a given application, ensuring that there are no name clashes for kernels in the application as a whole requires that some state is maintained between PSyclone invocations. This is achieved by requiring that the same kernel output directory is used for every invocation of PSyclone when building a given application. However, this is under the control of the user and therefore it is possible to use the same output directory for a subset of algorithms that require the same kernel transformation and then a different directory for another subset requiring a different transformation. Of course, such use would require care when building and linking the application since the differently-optimised kernels would have the same names.

By default, transformed kernels are written to the current working directory. Alternatively, the user may specify the location to which to write the modified code via the -okern command-line flag.

In order to support the two use cases given above, PSyclone supports two different kernel-renaming schemes: “multiple” and “single” (specified via the --kernel-renaming command-line flag). In the default, “multiple” scheme, PSyclone ensures that each transformed kernel is given a unique name (with reference to the contents of the kernel output directory). In the “single” scheme, it is assumed that any given kernel that is transformed is always transformed in the same way (or left unchanged) and thus just one transformed version of it is created. This assumption is checked by examining the Fortran code for any pre-existing transformed version of that kernel. If another transformed version of that kernel exists and does not match that created by the current transformation then PSyclone will raise an exception.

Rules

Kernel code that is to be transformed is subject to certain restrictions. These rules are intended to make kernel transformations as robust as possible, in particular by limiting the amount of code that must be parsed by PSyclone (via fparser). The rules are as follows:

  1. Any variable or procedure accessed by a kernel must either be explicitly declared or named in the only clause of a module use statement within the scope of the subroutine containing the kernel implementation. This means that:

    1. Kernel subroutines are forbidden from accessing data using COMMON blocks;

    2. Kernel subroutines are forbidden from calling procedures declared via the EXTERN statement;

    3. Kernel subroutines must not access data or procedures made available via their parent (containing) module.

  2. The full Fortran source of a kernel must be available to PSyclone. This includes the source of any modules from which it accesses either routines or data. (However, kernel routines are permitted to make use of Fortran intrinsic routines.)

For instance, consider the following Fortran module containing the bc_ssh_code kernel:

module boundary_conditions_mod
  real :: forbidden_var

contains

  subroutine bc_ssh_code(ji, jj, istep, ssha)
    use kind_params_mod, only: go_wp
    use model_mod, only: rdt
    integer,                     intent(in)    :: ji, jj, istep
    real(go_wp), dimension(:,:), intent(inout) :: ssha
    real(go_wp) :: rtime

    rtime = real(istep, go_wp) * rdt
    ...
  end subroutine bc_ssh_code

end module boundary_conditions_mod

Since the kernel subroutine accesses data (the rdt variable) from the model_mod module, the source of that module must be available to PSyclone if a transformation is applied to this kernel. Should rdt not actually be defined in model_mod (i.e. model_mod itself imports it from another module) then the source containing its definition must also be available to PSyclone. Note that the rules forbid the bc_ssh_code kernel from accessing the forbidden_var variable that is available to it from the enclosing module scope.

Note

these rules only apply to kernels that are the target of PSyclone kernel transformations.

Available Kernel Transformations

The transformations listed below have to be applied specifically to a PSyclone kernel. There are a number of transformations not listed here that can be applied to either or both the PSy-layer and Kernel-layer PSyIR.

Note

Some of these transformations modify the PSyIR tree of both the InvokeSchedule where the transformed CodedKernel is located and its associated KernelSchedule.



class psyclone.psyir.transformations.FoldConditionalReturnExpressionsTrans(writer=None)[source]

Provides a transformation that folds conditional expressions with only a return statement inside so that the Return statement is moved to the end of the Routine and therefore it can be safely removed. This simplifies the control flow of the kernel to facilitate other transformations like kernel fusions. For example, the following code:

subroutine test(i)
    if (i < 5) then
        return
    endif
    if (i > 10) then
        return
    endif
    ! CODE
end subroutine

will be transformed to:

subroutine test(i)
    if (.not.(i < 5)) then
        if (.not.(i > 10)) then
            ! CODE
        endif
    endif
end subroutine
apply(node, options=None)[source]

Apply this transformation to the supplied node.

Parameters
  • node (psyclone.psyir.nodes.Routine) – the node to transform.

  • options (dict of string:values or None) – a dictionary with options for transformations.

property name

Returns the name of this transformation as a string.

validate(node, options=None)[source]

Ensure that it is valid to apply this transformation to the supplied node.

Parameters
  • node (psyclone.psyir.nodes.Routine) – the node to validate.

  • options (dict of string:values or None) – a dictionary with options for transformations.

Raises

TransformationError – if the node is not a Routine.


Note

This transformation is only supported by the GOcean 1.0 API.

Applying

Transformations can be applied either interactively or through a script.

Interactive

To apply a transformation interactively we first parse and analyse the code. This allows us to generate a “vanilla” PSy layer. For example:

>>> from fparser.common.readfortran import FortranStringReader
>>> from psyclone.parse.algorithm import Parser
>>> from psyclone.psyGen import PSyFactory
>>> from fparser.two.parser import ParserFactory

>>> example_str = (
...     "program example\n"
...     "  use field_mod, only: field_type\n"
...     "  type(field_type) :: field\n"
...     "  call invoke(setval_c(field, 0.0))\n"
...     "end program example\n")

>>> parser = ParserFactory().create(std="f2008")
>>> reader = FortranStringReader(example_str)
>>> ast = parser(reader)
>>> invoke_info = Parser().invoke_info(ast)

# This example uses the LFRic (dynamo0.3) API
>>> api = "dynamo0.3"

# Create the PSy-layer object using the invokeInfo
>>> psy = PSyFactory(api, distributed_memory=False).create(invoke_info)

# Optionally generate the vanilla PSy layer fortran
>>> print(psy.gen)
  MODULE example_psy
    USE constants_mod, ONLY: r_def, i_def
    USE field_mod, ONLY: field_type, field_proxy_type
    IMPLICIT NONE
    CONTAINS
    SUBROUTINE invoke_0(field)
      TYPE(field_type), intent(in) :: field
      INTEGER df
      INTEGER(KIND=i_def) loop0_start, loop0_stop
      TYPE(field_proxy_type) field_proxy
      INTEGER(KIND=i_def) undf_aspc1_field
      !
      ! Initialise field and/or operator proxies
      !
      field_proxy = field%get_proxy()
      !
      ! Initialise number of DoFs for aspc1_field
      !
      undf_aspc1_field = field_proxy%vspace%get_undf()
      !
      ! Set-up all of the loop bounds
      !
      loop0_start = 1
      loop0_stop = undf_aspc1_field
      !
      ! Call our kernels
      !
      DO df=loop0_start,loop0_stop
        field_proxy%data(df) = 0.0
      END DO
      !
    END SUBROUTINE invoke_0
  END MODULE example_psy

We then extract the particular schedule we are interested in. For example:

# List the various invokes that the PSy layer contains
>>> print(psy.invokes.names)
dict_keys(['invoke_0'])

# Get the required invoke
>>> invoke = psy.invokes.get('invoke_0')

# Get the schedule associated with the required invoke
> schedule = invoke.schedule
> print(schedule.view())
InvokeSchedule[invoke='invoke_0', dm=True]
    0: Loop[type='dof', field_space='any_space_1', it_space='dof', upper_bound='ndofs']
        Literal[value:'NOT_INITIALISED', Scalar<INTEGER, UNDEFINED>]
        Literal[value:'NOT_INITIALISED', Scalar<INTEGER, UNDEFINED>]
        Literal[value:'1', Scalar<INTEGER, UNDEFINED>]
        Schedule[]
            0: BuiltIn setval_c(field,0.0)

Now we have the schedule we can create and apply a transformation to it to create a new schedule and then replace the original schedule with the new one. For example:

# Create an OpenMPParallelLoopTrans
> from psyclone.transformations import OMPParallelLoopTrans
> ol = OMPParallelLoopTrans()

# Apply it to the loop schedule of the selected invoke
> ol.apply(schedule.children[0])
> print(schedule.view())

# Generate the Fortran code for the new PSy layer
> print(psy.gen)

Script

PSyclone provides a Python script (psyclone) that can be used from the command line to generate PSy layer code and to modify algorithm layer code appropriately. By default this script will generate “vanilla” (unoptimised) PSy-layer and algorithm layer code. For example:

> psyclone algspec.f90
> psyclone -oalg alg.f90 -opsy psy.f90 -api dynamo0.3 algspec.f90

The psyclone script has an optional -s flag which allows the user to specify a script file to modify the PSy layer as required. Script files may be specified without a path. For example:

> psyclone -s opt.py algspec.f90

In this case the Python search path PYTHONPATH will be used to try to find the script file.

Alternatively, script files may be specified with a path. In this case the file is expected to be found in the specified location. For example …

> psyclone -s ./opt.py algspec.f90
> psyclone -s ../scripts/opt.py algspec.f90
> psyclone -s /home/me/PSyclone/scripts/opt.py algspec.f90

PSyclone also provides the same functionality via a function (which is what the psyclone script calls internally).

###.. autofunction:: psyclone.generator.generate ### :noindex:

A valid script file must contain a trans function which accepts a PSy object as an argument and returns a PSy object, i.e.:

>>> def trans(psy):
...     # ...
...     return psy

It is up to the script what it does with the PSy object. The example below does the same thing as the example in the Interactive section.

>>> def trans(psy):
...     from psyclone.transformations import OMPParallelLoopTrans
...     invoke = psy.invokes.get('invoke_0_v3_kernel_type')
...     schedule = invoke.schedule
...     ol = OMPParallelLoopTrans()
...     ol.apply(schedule.children[0])
...     return psy

In the gocean1.0 API (and in the future the lfric (dynamo0.3) API) an optional trans_alg function may also be supplied. This function accepts PSyIR (reprenting the algorithm layer) as an argument and returns PSyIR i.e.:

>>> def trans_alg(psyir):
...     # ...
...    return psyir

As with the trans() function it is up to the script what it does with the algorithm PSyIR. Note that the trans_alg() script is applied to the algorithm layer before the PSy-layer is generated so any changes applied to the algorithm layer will be reflected in the PSy object that is passed to the trans() function.

For example, if the trans_alg() function in the script merged two invoke calls into one then the Alg object passed to the trans() function of the script would only contain one schedule associated with the merged invoke.

Of course the script may apply as many transformations as is required for a particular algorithm and/or schedule and may apply transformations to all the schedules (i.e. invokes and/or kernels) contained within the PSy layer.

Examples of the use of transformation scripts can be found in many of the examples, such as examples/lfric/eg3 and examples/lfric/scripts. Please read the examples/lfric/README file first as it explains how to run the examples (and see also the examples/check_examples script).

An example of the use of a script making use of the trans_alg() function can be found in examples/gocean/eg7.

OpenMP

OpenMP is added to a code by using transformations. The OpenMP transformations currently supported allow the addition of:

  • an OpenMP Parallel directive

  • an OpenMP Target directive

  • an OpenMP Declare Target directive

  • an OpenMP Do/For/Loop directive

  • an OpenMP Single directive

  • an OpenMP Master directive

  • an OpenMP Taskloop directive

  • multiple OpenMP Taskwait directives; and

  • an OpenMP Parallel Do directive.

The generic versions of these transformations (i.e. ones that theoretically work for all APIs) were given in the Standard Functionality section. The API-specific versions of these transformations are described in the API-specific sections of this document.

Reductions

PSyclone supports parallel scalar reductions. If a scalar reduction is specified in the Kernel metadata (see the API-specific sections for details) then PSyclone ensures the appropriate reduction is performed.

In the case of distributed memory, PSyclone will add GlobalSum’s at the appropriate locations. As can be inferred by the name, only “summation” reductions are currently supported for distributed memory.

In the case of an OpenMP parallel loop the standard reduction support will be used by default. For example

!$omp parallel do, reduction(+:x)
!loop
!$omp end parallel do

OpenMP reductions do not guarantee to give bit reproducible results for different runs of the same problem even if the same problem is run using the same resources. The reason for this is that the order in which data is reduced is not mandated.

Therefore, an additional reprod option has been added to the OpenMP Do transformation. If the reprod option is set to “True” then the OpenMP reduction support is replaced with local per-thread reductions which are reduced serially after the loop has finished. This implementation guarantees to give bit-wise reproducible results for different runs of the same problem using the same resources, but will not bit-wise compare if the code is rerun with different numbers of OpenMP threads.

Restrictions

If two reductions are used within an OpenMP region and the same variable is used for both reductions then PSyclone will raise an exception. In this case the solution is to use a different variable for each reduction.

PSyclone does not support (distributed-memory) halo swaps or global sums within OpenMP parallel regions. Attempting to create a parallel region for a set of nodes that includes halo swaps or global sums will produce an error. In such cases it may be possible to re-order the nodes in the Schedule using the MoveTrans transformation.

OpenMP Tasking

PSyclone supports OpenMP Tasking, through the OMPTaskloopTrans and OMPTaskwaitTrans transformations. OMPTaskloopTrans transformations can be applied to loops, whilst the OMPTaskwaitTrans operator is applied to an OpenMP Parallel Region, and computes the dependencies caused by Taskloops, and adds OpenMP Taskwait statements to satisfy those dependencies. An example of using OpenMP tasking is available in PSyclone/examples/nemo/eg1/openmp_taskloop_trans.py.

OpenCL

OpenCL is added to a code by using the GOOpenCLTrans transformation (see the Standard Functionality Section above). Currently this transformation is only supported for the GOcean1.0 API and is applied to the whole InvokeSchedule of an Invoke. This transformation will add an OpenCL driver infrastructure to the PSy layer and generate an OpenCL kernel for each of the Invoke kernels. This means that all kernels in that Invoke will be executed on the OpenCL device. The PSy-layer OpenCL code generated by PSyclone is still Fortran and makes use of the FortCL library (https://github.com/stfc/FortCL) to access OpenCL functionality. It also relies upon the device acceleration support provided by the dl_esm_inf library (https://github.com/stfc/dl_esm_inf).

Note

The generated OpenCL kernels are written in a file called opencl_kernels_<index>.cl where the index keeps increasing if the file name already exist.

The GOOpenCLTrans transformation accepts an options argument with a map of optional parameters to tune the OpenCL host code in the PSy layer. These options will be attached to the transformed InvokeSchedule. The current available options are:

Option

Description

Default

end_barrier

Whether a synchronization barrier should be placed at the end of the Invoke.

True

enable_profiling

Enables the profiling of OpenCL Kernels.

False

out_of_order

Allows the OpenCL implementation to execute the enqueued kernels out-of-order.

False

Additionally, each individual kernel (inside the Invoke that is going to be transformed) also accepts a map of options which are provided by the set_opencl_options() method of the Kern object. This can affect both the driver layer and/or the OpenCL kernels. The current available options are:

Option

Description

Default

local_size

Number of work-items to group together in a work-group execution (kernel instances executed at the same time).

64

queue_number

The identifier of the OpenCL command_queue to which the kernel should be submitted. If the kernel has a dependency on another kernel submitted to a different command_queue a barrier will be added to guarantee the execution order.

1

Below is an example of a PSyclone script that uses a GOOpenCLTrans with multiple InvokeSchedule and kernel-specific optimization options.

 1from psyclone.psyir.transformations import \
 2    FoldConditionalReturnExpressionsTrans
 3from psyclone.domain.gocean.transformations import GOOpenCLTrans, \
 4    GOMoveIterationBoundariesInsideKernelTrans
 5
 6
 7def trans(psy):
 8    '''
 9    Transformation routine for use with PSyclone. Applies the OpenCL
10    transform to the first Invoke in the psy object.
11
12    :param psy: the PSy object which this script will transform.
13    :type psy: :py:class:`psyclone.psyGen.PSy`
14    :returns: the transformed PSy object.
15    :rtype: :py:class:`psyclone.psyGen.PSy`
16
17    '''
18    ocl_trans = GOOpenCLTrans()
19    fold_trans = FoldConditionalReturnExpressionsTrans()
20    move_boundaries_trans = GOMoveIterationBoundariesInsideKernelTrans()
21
22    # Get the Schedule associated with the first Invoke
23    invoke = psy.invokes.invoke_list[0]
24    sched = invoke.schedule
25
26    # Provide kernel-specific OpenCL optimization options
27    for idx, kern in enumerate(sched.kernels()):
28        # Move the PSy-layer loop boundaries inside the kernel as a kernel
29        # mask, this allows to iterate through the whole domain
30        move_boundaries_trans.apply(kern)
31        # Change the syntax to remove the return statements introduced by the
32        # previous transformation
33        fold_trans.apply(kern.get_kernel_schedule())
34        # Specify the OpenCL queue and workgroup size of the kernel
35        # In this case we dispatch each kernel in a different queue to check
36        # that the output code has the necessary barriers to guarantee the
37        # kernel execution order.
38        kern.set_opencl_options({"queue_number": idx+1, 'local_size': 4})
39
40    # Transform the Schedule
41    ocl_trans.apply(sched, options={"end_barrier": True})

OpenCL delays the decision of which and where kernels will execute until run-time, therefore it is important to use the environment variables provided by FortCL and DL_ESM_INF to inform how things should execute. Specifically:

  • FORTCL_KERNELS_FILE: Point to the file containing the kernels to execute, they can be compiled ahead-of-time or providing the source for JIT compilation. To link more than a single kernel, one must merge all the kernels generated by PSyclone in a single source file.

  • FORTCL_PLATFORM: If the system has more than 1 OpenCL platform. This environment variable may be used to select which platform on which to execute the kernels.

  • DL_ESM_ALIGNMENT: When using OpenCL <= 1.2 the local_size should be exactly divisible by the total size. If this is not the case some implementations fail silently. A way to solve this issue is to set the DL_ESM_ALIGNMENT variable to be equal to the local size.

Note

The OpenCL generation can be combined with distributed memory generation. In the case where there is more than one accelerator available on each node, the PSyclone configuration file parameter OCL_DEVICES_PER_NODE has to be set to the appropriate value and the number of MPI-ranks-per-node set by the mpirun command has to match this value accordingly.

For instance if there are 2 accelerators per nodes, psyclone.cfg should have OCL_DEVICES_PER_NODE=2 and the program must be executed with mpirun -n <total_ranks> -ppn 2 ./application (Note: -ppn is an Intel MPI specific parameter, use equivalent configuration parameters for other MPI implementations.)

For example, an execution of a PSyclone generated OpenCL code using all the mentioned run-time configuration options could look something like:

FORTCL_PLATFORM=3 FORTCL_KERNELS_FILE=allkernels.cl DL_ESM_ALIGNMENT=64 \
mpirun -n 2 ./application.exe

OpenACC

PSyclone supports the generation of code targetting GPUs through the addition of OpenACC directives. This is achieved by a user applying various OpenACC transformations to the PSyIR before the final Fortran code is generated. The steps to parallelisation are very similar to those in OpenMP with the added complexity of managing the movement of data to and from the GPU device. For the latter task PSyclone provides the ACCDataTrans and ACCEnterDataTrans transformations, as described in the Standard Functionality Section above. These two transformations add statically- and dynamically-scoped data regions, respectively. The former manages what data is on the remote device for a specific section of code while the latter allows run-time control of data movement. This second option is essential for minimising data movement as, without it, PSyclone-generated code would move data to and from the device upon every entry/exit of an Invoke. The first option is mainly provided as an aid to incremental porting and/or debugging of an OpenACC application as it provides explicit control over what data is present on a device for a given (part of an) Invoke routine.

The PGI compiler provides an alternative approach to controlling data movement through its ‘unified memory’ option (-ta=tesla:managed). When this is enabled the compiler itself takes on the task of ensuring that data is copied to/from the GPU when required. (Note that this approach can struggle with Fortran code containing derived types however.)

As well as ensuring the correct data is copied to and from the remote device, OpenACC directives must also be added to a code in order to tell the compiler how it should be parallelised. PSyclone provides the ACCKernelsTrans, ACCParallelTrans and ACCLoopTrans transformations for this purpose. The simplest of these is ACCKernelsTrans (currently only supported for the NEMO and Dynamo0.3 APIs) which encloses the code represented by a sub-tree of the PSyIR within an OpenACC kernels region. This essentially gives free-reign to the compiler to automatically parallelise any suitable loops within the specified region. An example of the use of ACCDataTrans and ACCKernelsTrans may be found in PSyclone/examples/nemo/eg3 and an example of ACCKernelsTrans may be found in PSyclone/examples/lfric/eg14.

However, as with any “automatic” approach, a more performant solution can almost always be obtained by providing the compiler with more explicit direction on how to parallelise the code. The ACCParallelTrans and ACCLoopTrans transformations allow the user to define thread-parallel regions and, within those, define which loops should be parallelised. For an example of their use please see PSyclone/examples/gocean/eg2 or PSyclone/examples/lfric/eg14.

In order for a given section of code to be executed on a GPU, any routines called from within that section must also have been compiled for the GPU. This then requires either that any such routines are in-lined or that the OpenACC routine directive be added to any such routines. This situation will occur routinely in those PSyclone APIs that use the PSyKAl separation of concerns since the user-supplied kernel routines are called from within PSyclone-generated loops in the PSy layer. PSyclone therefore provides the ACCRoutineTrans transformation which, given a Kernel node in the PSyIR, creates a new version of that kernel with the routine directive added. See either PSyclone/examples/gocean/eg2 or PSyclone/examples/lfric/eg14 for an example (although please note that this transformation is not yet fully working for kernels in the LFRic (Dynamo0.3) API - see #1724).

SIR

It is currently not possible for PSyclone to output SIR code without using a script. Two examples of such scripts are given in example 4 for the NEMO API, one of which includes transformations to remove PSyIR intrinsics, hoist code out of a loop, translate array-index notation into explicit loops and translate a single access to an array dimension to a one-trip loop (to make the code suitable for the SIR backend).