Transformations

As discussed in the previous section, transformations can be applied to PSyclone’s internal representation (PSyIR) to modify it. Typically transformations will be used to optimise the Algorithm, PSy and/or Kernel layer(s) for a particular architecture, however transformations could be added for other reasons, such as to aid debugging or for performance monitoring.

Finding

Transformations can be imported directly, but the user needs to know what transformations are available. A helper class TransInfo is provided to show the available transformations

Note

The directory layout of PSyclone is currently being restructured. As a result of this some transformations are already in the new locations, while others have not been moved yet. Transformations in the new locations can at the moment not be found using the TransInfo approach, and need to be imported directly from the path indicated in the documentation.

class psyclone.psyGen.TransInfo(module=None, base_class=None)[source]

This class provides information about, and access, to the available transformations in this implementation of PSyclone. New transformations will be picked up automatically as long as they subclass the abstract Transformation class.

For example:

>>> from psyclone.psyGen import TransInfo
>>> t = TransInfo()
>>> print(t.list)
There is 1 transformation available:
  1: SwapTrans, A test transformation
>>> # accessing a transformation by index
>>> trans = t.get_trans_num(1)
>>> # accessing a transformation by name
>>> trans = t.get_trans_name("SwapTrans")

get_trans_name(name)[source]: return the transformation with this name (use list() first to see available transformations)

get_trans_num(number)[source]: return the transformation with this number (use list() first to see available transformations)

property list: return a string with a human readable list of the available transformations

property num_trans: return the number of transformations available

Standard Functionality

Each transformation must provide at least two functions for the user: one for validation, i.e. to verify that a certain transformation can be applied, and one to actually apply the transformation. They are described in detail in the overview of all transformations, but the following general guidelines apply.

Validation

Each transformation provides a function validate. This function can be called by the user, and it will raise an exception if the transformation can not be applied (and otherwise will return nothing). Validation will always be called when a transformation is applied. The parameters for validate can change from transformation to transformation, but each validate function accepts a parameter options. This parameter is either None, or a dictionary of string keys, that will provide additional parameters to the validation process. For example, some validation functions allow part of the validation process to be disabled in order to allow the HPC expert to apply a transformation that they know to be safe, even if the more general validation process might reject it. Those parameters are documented for each transformation, and will show up as a parameter, e.g.: options["node-type-check"]. As a simple example:

# The validation might reject the application, but in this
# specific case it is safe to apply the transformation,
# so disable the node type check:
my_transform.validate(node, {"node-type-check": False})

Application

Each transformation provides a function apply which will apply the transformation. It will first validate the transform by calling the validate function. Each apply function takes the same options parameter as the validate function described above. Besides potentially modifying the validation process, optional parameters for the transformation are also provided this way. A simple example:

kctrans = Dynamo0p3KernelConstTrans()
kctrans.apply(kernel, {"element_order": 0, "quadrature": True})

The same options dictionary will be used when calling validate.

Available transformations

Some transformations are generic as the schedule structure is independent of the API, however it often makes sense to specialise these for a particular API by adding API-specific errors checks. Some transformations are API-specific. Currently these different types of transformation are indicated by their names.

The generic transformations currently available are listed in alphabetical order below (a number of these have specialisations which can be found in the API-specific sections).

Note

PSyclone currently only supports OpenCL and KernelImportsToArguments transformations for the GOcean 1.0 API, the OpenACC Data transformation is limited to the NEMO and GOcean 1.0 APIs and the OpenACC Kernels transformation is limited to the NEMO and LFRic (Dynamo0.3) APIs.

Note

The directory layout of PSyclone is currently being restructured. As a result of this some transformations are already in the new locations, while others have not been moved yet.

class psyclone.psyir.transformations.Abs2CodeTrans[source]

Provides a transformation from a PSyIR ABS Operator node to equivalent code in a PSyIR tree. Validity checks are also performed.

The transformation replaces

R = ABS(X)

with the following logic:

IF X < 0.0:
    R = X*-1.0
ELSE:
    R = X

apply(node, options=None)[source]

Apply the ABS intrinsic conversion transformation to the specified node. This node must be an ABS UnaryOperation. The ABS UnaryOperation is converted to equivalent inline code. This is implemented as a PSyIR transform from:

R = ... ABS(X) ...

to:

tmp_abs = X
if tmp_abs < 0.0:
    res_abs = tmp_abs*-1.0
else:
    res_abs = tmp_abs
R = ... res_abs ...

where X could be an arbitrarily complex PSyIR expression and ... could be arbitrary PSyIR code.

This transformation requires the operation node to be a descendent of an assignment and will raise an exception if this is not the case.

Parameters:

node (psyclone.psyir.nodes.UnaryOperation) – an ABS UnaryOperation node.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

Warning

This transformation assumes that the ABS Intrinsic acts on PSyIR Real scalar data and does not check that this is not the case. Once issue #658 is on master then this limitation can be fixed.

class psyclone.transformations.ACCDataTrans[source]

Add an OpenACC data region around a list of nodes in the PSyIR. COPYIN, COPYOUT and COPY clauses are added as required.

For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> api = "nemo"
>>> ast, invokeInfo = parse(NEMO_SOURCE_FILE, api=api)
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> from psyclone.transformations import ACCKernelsTrans, ACCDataTrans
>>> ktrans = ACCKernelsTrans()
>>> dtrans = ACCDataTrans()
>>>
>>> schedule = psy.invokes.get('tra_adv').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>>
>>> # Add a kernels construct for execution on the device
>>> kernels = schedule.children[9]
>>> ktrans.apply(kernels)
>>>
>>> # Enclose the kernels in a data construct
>>> kernels = schedule.children[9]
>>> dtrans.apply(kernels)

apply(node, options=None)[source]

Put the supplied node or list of nodes within an OpenACC data region.

Parameters:

node ((list of) psyclone.psyir.nodes.Node) – the PSyIR node(s) to enclose in the data region.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

class psyclone.transformations.ACCEnterDataTrans[source]

Adds an OpenACC “enter data” directive to a Schedule. For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> api = "gocean1.0"
>>> ast, invokeInfo = parse(GOCEAN_SOURCE_FILE, api=api)
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> from psyclone.transformations import         ACCEnterDataTrans, ACCLoopTrans, ACCParallelTrans
>>> dtrans = ACCEnterDataTrans()
>>> ltrans = ACCLoopTrans()
>>> ptrans = ACCParallelTrans()
>>>
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>>
>>> # Apply the OpenACC Loop transformation to *every* loop in the schedule
>>> for child in schedule.children[:]:
...     ltrans.apply(child)
>>>
>>> # Enclose all of these loops within a single OpenACC parallel region
>>> ptrans.apply(schedule)
>>>
>>> # Add an enter data directive
>>> dtrans.apply(schedule)
>>>
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())

apply(sched, options=None)[source]

Adds an OpenACC “enter data” directive to the invoke associated with the supplied Schedule. Any fields accessed by OpenACC kernels within this schedule will be added to this data region in order to ensure they remain on the target device.

Parameters:

sched (sub-class of psyclone.psyir.nodes.Schedule) – schedule to which to add an “enter data” directive.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

class psyclone.transformations.ACCKernelsTrans[source]

Enclose a sub-set of nodes from a Schedule within an OpenACC kernels region (i.e. within “!$acc kernels” … “!$acc end kernels” directives).

For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> api = "nemo"
>>> ast, invokeInfo = parse(NEMO_SOURCE_FILE, api=api)
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> from psyclone.transformations import ACCKernelsTrans
>>> ktrans = ACCKernelsTrans()
>>>
>>> schedule = psy.invokes.get('tra_adv').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>> kernels = schedule.children[9]
>>> # Transform the kernel
>>> ktrans.apply(kernels)

apply(node, options=None)[source]

Enclose the supplied list of PSyIR nodes within an OpenACC Kernels region.

Parameters:

node ((a list of) psyclone.psyir.nodes.Node) – a node or list of nodes in the PSyIR to enclose.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.
options["default_present"] (bool) – whether or not the kernels region should have the ‘default present’ attribute (indicating that data is already on the accelerator). When using managed memory this option should be False.

class psyclone.transformations.ACCLoopTrans[source]

Adds an OpenACC loop directive to a loop. This directive must be within the scope of some OpenACC Parallel region (at code-generation time).

For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.parse.utils import ParseError
>>> from psyclone.psyGen import PSyFactory
>>> from psyclone.errors import GenerationError
>>> api = "gocean1.0"
>>> ast, invokeInfo = parse(GOCEAN_SOURCE_FILE, api=api)
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> from psyclone.psyGen import TransInfo
>>> t = TransInfo()
>>> ltrans = t.get_trans_name('ACCLoopTrans')
>>> rtrans = t.get_trans_name('ACCParallelTrans')
>>>
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>>
>>> # Apply the OpenACC Loop transformation to *every* loop in the schedule
>>> for child in schedule.children[:]:
...     ltrans.apply(child)
>>>
>>> # Enclose all of these loops within a single OpenACC parallel region
>>> rtrans.apply(schedule)
>>>

apply(node, options=None)[source]

Apply the ACCLoop transformation to the specified node. This node must be a Loop since this transformation corresponds to inserting a directive immediately before a loop, e.g.:

!$ACC LOOP
do ...
   ...
end do

At code-generation time (when psyclone.psyir.nodes.ACCLoopDirective.gen_code() is called), this node must be within (i.e. a child of) a PARALLEL region.

Parameters:

node (psyclone.psyir.nodes.Loop) – the supplied node to which we will apply the Loop transformation.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.
options["collapse"] (int) – number of nested loops to collapse.
options["independent"] (bool) – whether to add the “independent” clause to the directive (not strictly necessary within PARALLEL regions).
options["sequential"] (bool) – whether to add the “seq” clause to the directive.
options["gang"] (bool) – whether to add the “gang” clause to the directive.
options["vector"] (bool) – whether to add the “vector” clause to the directive.

class psyclone.transformations.ACCParallelTrans(default_present=True)[source]

Create an OpenACC parallel region by inserting an ‘acc parallel’ directive.

>>> from psyclone.psyGen import TransInfo
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.nodes import Loop
>>> psyir = FortranReader().psyir_from_source("""
... program do_loop
...     real, dimension(10) :: A
...     integer i
...     do i = 1, 10
...       A(i) = i
...     end do
... end program do_loop
... """)
>>> ptrans = TransInfo().get_trans_name('ACCParallelTrans')
>>>
>>> # Enclose the loop within a OpenACC PARALLEL region
>>> ptrans.apply(psyir.walk(Loop))
>>> print(FortranWriter()(psyir))
program do_loop
  real, dimension(10) :: a
  integer :: i

  !$acc parallel default(present)
  do i = 1, 10, 1
    a(i) = i
  enddo
  !$acc end parallel

end program do_loop

apply(target_nodes, options=None)[source]

Encapsulate given nodes with the ACCParallelDirective.

Parameters:

target_nodes (psyclone.psyir.nodes.Node | List[psyclone.psyir.nodes.Node]) – a single Node or a list of Nodes.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.
options["node-type-check"] (bool) – this flag controls if the type of the nodes enclosed in the region should be tested to avoid using unsupported nodes inside a region.
options["default_present"] (bool) – this flag controls if the inserted directive should include the default_present clause.

class psyclone.psyir.transformations.ArrayRange2LoopTrans[source]

Provides a transformation from a PSyIR Array Range to a PSyIR Loop. For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> api = "nemo"
>>> filename = "tra_adv_compute.F90"
>>> ast, invoke_info = parse(filename, api=api)
>>> psy = PSyFactory(api).create(invoke_info)
>>> schedule = psy.invokes.invoke_list[0].schedule
>>>
>>> from psyclone.psyir.nodes import Assignment
>>> from psyclone.psyir.transformations import ArrayRange2LoopTrans,     >>>     TransformationError
>>>
>>> print(schedule.view())
>>> trans = ArrayRange2LoopTrans()
>>> for assignment in schedule.walk(Assignment):
>>>     while True:
>>>         try:
>>>             trans.apply(assignment)
>>>         except TransformationError:
>>>             break
>>> print(schedule.view())

apply(node, options=None)[source]

Apply the ArrayRange2Loop transformation to the specified node. The node must be an assignment. The rightmost range node in each array within the assignment is replaced with a loop index and the assignment is placed within a loop iterating over that index. The bounds of the loop are determined from the bounds of the array range on the left hand side of the assignment.

Parameters:

node (psyclone.psyir.nodes.Assignment) – an Assignment node.
options["allow_string"] (bool) – whether to allow the transformation on a character type array range. Defaults to False.

class psyclone.psyir.transformations.ChunkLoopTrans[source]

Apply a chunking transformation to a loop (in order to permit a chunked parallelisation). For example:

>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Loop
>>> from psyclone.psyir.transformations import ChunkLoopTrans
>>> psyir = FortranReader().psyir_from_source("""
... subroutine sub()
...     integer :: ji, tmp(100)
...     do ji=1, 100
...         tmp(ji) = 2 * ji
...     enddo
... end subroutine sub""")
>>> loop = psyir.walk(Loop)[0]
>>> ChunkLoopTrans().apply(loop)

will generate:

subroutine sub()
    integer :: ji
    integer, dimension(100) :: tmp
    integer :: ji_el_inner
    integer :: ji_out_var
    do ji_out_var = 1, 100, 32
        ji_el_inner = MIN(ji_out_var + (32 - 1), 100)
        do ji = ji_out_var, ji_el_inner, 1
            tmp(ji) = 2 * ji
        enddo
    enddo
end subroutine sub

apply(node, options=None)[source]

Converts the given Loop node into a nested loop where the outer loop is over chunks and the inner loop is over each individual element of the chunk.

Parameters:

node (psyclone.psyir.nodes.Loop) – the loop to transform.
options (Optional[Dict[str, Any]]) – a dict with options for transformations.
options["chunksize"] (int) – The size to chunk over for this transformation. If not specified, the value 32 is used.

class psyclone.transformations.ColourTrans[source]

Apply a colouring transformation to a loop (in order to permit a subsequent parallelisation over colours). For example:

>>> invoke = ...
>>> schedule = invoke.schedule
>>>
>>> ctrans = ColourTrans()
>>>
>>> # Colour all of the loops
>>> for child in schedule.children:
>>>     ctrans.apply(child)
>>>
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())

apply(node, options=None)[source]

Converts the Loop represented by node into a nested loop where the outer loop is over colours and the inner loop is over cells of that colour.

Parameters:

node (psyclone.psyir.nodes.Loop) – the loop to transform.
options (Optional[Dict[str, Any]]) – options for the transformation.

class psyclone.psyir.transformations.DotProduct2CodeTrans[source]

Provides a transformation from a PSyIR DOT_PRODUCT Operator node to equivalent code in a PSyIR tree. Validity checks are also performed.

If R is a scalar and A, and B have dimension N, The transformation replaces:

R = ... DOT_PRODUCT(A,B) ...

with the following code:

TMP = 0.0
do I=1,N
    TMP = TMP + A(i)*B(i)
R = ... TMP ...

For example:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import IntrinsicCall
>>> from psyclone.psyir.transformations import DotProduct2CodeTrans
>>> code = ("subroutine dot_product_test(v1,v2)\n"
...         "real,intent(in) :: v1(10), v2(10)\n"
...         "real :: result\n"
...         "result = dot_product(v1,v2)\n"
...         "end subroutine\n")
>>> psyir = FortranReader().psyir_from_source(code)
>>> trans = DotProduct2CodeTrans()
>>> trans.apply(psyir.walk(IntrinsicCall)[0])
>>> print(FortranWriter()(psyir))
subroutine dot_product_test(v1, v2)
  real, dimension(10), intent(in) :: v1
  real, dimension(10), intent(in) :: v2
  real :: result
  integer :: i
  real :: res_dot_product

  res_dot_product = 0.0
  do i = 1, 10, 1
    res_dot_product = res_dot_product + v1(i) * v2(i)
  enddo
  result = res_dot_product

end subroutine dot_product_test

apply(node, options=None)[source]

Apply the DOT_PRODUCT intrinsic conversion transformation to the specified node. This node must be a DOT_PRODUCT BinaryOperation. If the transformation is successful then an assignment which includes a DOT_PRODUCT BinaryOperation node is converted to equivalent inline code.

Parameters:

node (psyclone.psyir.nodes.BinaryOperation) – a DOT_PRODUCT Binary-Operation node.
options (dict of str:str or None) – a dictionary with options for transformations.

class psyclone.psyir.transformations.extract_trans.ExtractTrans(node_class=<class 'psyclone.psyir.nodes.extract_node.ExtractNode'>)[source]

This transformation inserts an ExtractNode or a node derived from ExtractNode into the PSyIR of a schedule. At code creation time this node will use the PSyData API to create code that can write the input and output parameters to a file. The node might also create a stand-alone driver program that can read the created file and then execute the instrumented region. Examples are given in the derived classes DynamoExtractTrans and GOceanExtractTrans.

After applying the transformation the Nodes marked for extraction are children of the ExtractNode. Nodes to extract can be individual constructs within an Invoke (e.g. Loops containing a Kernel or BuiltIn call) or entire Invokes. This functionality does not support distributed memory.

Parameters:: node_class (psyclone.psyir.nodes.ExtractNode or derived class) – The Node class of which an instance will be inserted into the tree (defaults to ExtractNode), but can be any derived class.

apply(nodes, options=None)

Apply this transformation to a subset of the nodes within a schedule - i.e. enclose the specified Nodes in the schedule within a single PSyData region.

Parameters:

nodes (psyclone.psyir.nodes.Node or list of psyclone.psyir.nodes.Node) – can be a single node or a list of nodes.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.
options["prefix"] (str) – a prefix to use for the PSyData module name (PREFIX_psy_data_mod) and the PSyDataType (PREFIX_PSYDATATYPE) - a “_” will be added automatically. It defaults to “”.
options["region_name"] ((str,str)) – an optional name to use for this PSyData area, provided as a 2-tuple containing a location name followed by a local name. The pair of strings should uniquely identify a region unless aggregate information is required (and is supported by the runtime library).

class psyclone.psyir.transformations.HoistLocalArraysTrans[source]

This transformation takes a Routine and promotes any local, ‘automatic’ arrays to Container scope:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Assignment
>>> from psyclone.psyir.transformations import HoistLocalArraysTrans
>>> code = ("module test_mod\n"
...         "contains\n"
...         "  subroutine test_sub(n)\n"
...         "  integer :: i,j,n\n"
...         "  real :: a(n,n)\n"
...         "  real :: value = 1.0\n"
...         "  do i=1,n\n"
...         "    do j=1,n\n"
...         "      a(i,j) = value\n"
...         "    end do\n"
...         "  end do\n"
...         "  end subroutine test_sub\n"
...         "end module test_mod\n")
>>> psyir = FortranReader().psyir_from_source(code)
>>> hoist = HoistLocalArraysTrans()
>>> hoist.apply(psyir.walk(Routine)[0])
>>> print(FortranWriter()(psyir).lower())
module test_mod
  implicit none
  real, allocatable, dimension(:,:), private :: a
  public

  public :: test_sub

  contains
  subroutine test_sub(n)
    integer :: n
    integer :: i
    integer :: j
    real :: value = 1.0

    if (.not.allocated(a) .or. ubound(a, 1) /= n .or. ubound(a, 2) /= n) then
      if (allocated(a)) then
        deallocate(a)
      end if
      allocate(a(1 : n, 1 : n))
    end if
    do i = 1, n, 1
      do j = 1, n, 1
        a(i,j) = value
      enddo
    enddo

  end subroutine test_sub

end module test_mod

By default, the target routine will be rejected if it is found to contain an ACCRoutineDirective since this usually implies that the routine will be launched in parallel on the OpenACC device. This check can be disabled by setting ‘allow_accroutine’ to True in the options dictionary.

apply(node, options=None)[source]

Applies the transformation to the supplied Routine node, moving any local arrays up to Container scope and adding a suitable allocation when they are first accessed. If there are no local arrays or the supplied Routine is a program then this method does nothing.

Parameters:

node (psyclone.psyir.nodes.Routine) – target PSyIR node.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.
options["allow_accroutine"] (bool) – permit the target routine to contain an ACCRoutineDirective. These are forbidden by default because their presence usually indicates that the routine will be run in parallel on the OpenACC device.

class psyclone.psyir.transformations.HoistLoopBoundExprTrans[source]

This transformation moves complex bounds expressions out of the loop construct and places them in integer scalar assignments before the loop.

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Loop
>>> from psyclone.psyir.transformations import HoistTrans
>>> code = ("program test\n"
...         "  use mymod, only: mytype\n"
...         "  integer :: i,j,n\n"
...         "  real :: a(n)\n"
...         "  do i=mytype%start, UBOUND(a,1)\n"
...         "    a(i) = 1.0\n"
...         "  end do\n"
...         "end program\n")
>>> psyir = FortranReader().psyir_from_source(code)
>>> hoist = HoistLoopBoundExprTrans()
>>> hoist.apply(psyir.walk(Loop)[0])
>>> print(FortranWriter()(psyir))
program test
  use mymod, only : mytype
  integer :: i
  integer :: j
  integer :: n
  real, dimension(n) :: a
  integer :: loop_bound
  integer :: loop_bound_1

  loop_bound_1 = UBOUND(a, 1)
  loop_bound = mytype%start
  do i = loop_bound, loop_bound_1, 1
    a(i) = 1.0
  enddo

end program test

apply(node, options=None)[source]

Move complex bounds expressions out of the given loop construct and place them in integer scalar assignments before the loop.

Parameters:

node (psyclone.psyir.nodes.Loop) – target PSyIR loop.
options (Dict[str, Any]) – a dictionary with options for transformations.

class psyclone.psyir.transformations.HoistTrans[source]

This transformation takes an assignment and moves it outside of its parent loop if it is valid to do so. For example:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Assignment
>>> from psyclone.psyir.transformations import HoistTrans
>>> code = ("program test\n"
...         "  integer :: i,j,n\n"
...         "  real :: a(n,n)\n"
...         "  real value\n"
...         "  do i=1,n\n"
...         "    value = 1.0\n"
...         "    do j=1,n\n"
...         "      a(i,j) = value\n"
...         "    end do\n"
...         "  end do\n"
...         "end program\n")
>>> psyir = FortranReader().psyir_from_source(code)
>>> hoist = HoistTrans()
>>> hoist.apply(psyir.walk(Assignment)[0])
>>> print(FortranWriter()(psyir))
program test
  integer :: i
  integer :: j
  integer :: n
  real, dimension(n,n) :: a
  real :: value

  value = 1.0
  do i = 1, n, 1
    do j = 1, n, 1
      a(i,j) = value
    enddo
  enddo

end program test

apply(node, options=None)[source]

Applies the hoist transformation to the supplied assignment node within a loop, moving the assignment outside of the loop if it is valid to do so. Issue #1445 will also look to extend this transformation to other types of node.

Parameters:

node (subclass of psyclone.psyir.nodes.Assignment) – target PSyIR node.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

class psyclone.psyir.transformations.InlineTrans[source]

This transformation takes a Call (which may have a return value) and replaces it with the body of the target routine. It is used as follows:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Call, Routine
>>> from psyclone.psyir.transformations import InlineTrans
>>> code = """
... module test_mod
... contains
...   subroutine run_it()
...     integer :: i
...     real :: a(10)
...     do i=1,10
...       a(i) = 1.0
...       call sub(a(i))
...     end do
...   end subroutine run_it
...   subroutine sub(x)
...     real, intent(inout) :: x
...     x = 2.0*x
...   end subroutine sub
... end module test_mod"""
>>> psyir = FortranReader().psyir_from_source(code)
>>> call = psyir.walk(Call)[0]
>>> inline_trans = InlineTrans()
>>> inline_trans.apply(call)
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(psyir.walk(Routine)[0].view())
>>> print(FortranWriter()(psyir.walk(Routine)[0]))
subroutine run_it()
  integer :: i
  real, dimension(10) :: a

  do i = 1, 10, 1
    a(i) = 1.0
    a(i) = 2.0 * a(i)
  enddo

end subroutine run_it

Warning

Routines/calls with any of the following characteristics are not supported and will result in a TransformationError:

the routine is not in the same file as the call;
the routine contains an early Return statement;
the routine contains a variable with UnknownInterface;
the routine contains a variable with StaticInterface;
the routine contains an UnsupportedType variable with ArgumentInterface;
the routine has a named argument;
the shape of any array arguments as declared inside the routine does not match the shape of the arrays being passed as arguments;
the routine accesses an un-resolved symbol;
the routine accesses a symbol declared in the Container to which it belongs.

Some of these restrictions will be lifted by #924.

apply(node, options=None)[source]

Takes the body of the routine that is the target of the supplied call and replaces the call with it.

Parameters:

node (psyclone.psyir.nodes.Routine) – target PSyIR node.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

class psyclone.domain.common.transformations.KernelModuleInlineTrans[source]

Module-inlines (bring the subroutine to the same compiler-unit) the subroutine pointed by this Kernel. For example:

from psyclone.domain.common.transformations import \
        KernelModuleInlineTrans

inline_trans = KernelModuleInlineTrans()
inline_trans.apply(schedule.walk(CodedKern)[0])

print(schedule.parent.view())

Warning

Not all kernel subroutines can be module-inlined. This transformation will reject attempts to in-line kernels that access global data in the original module.

apply(node, options=None)[source]

Bring the kernel subroutine into this Container.

Parameters:

node (psyclone.psyGen.CodedKern) – the kernel to module-inline.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

class psyclone.psyir.transformations.LoopFuseTrans[source]

Provides a generic loop-fuse transformation to two Nodes in the PSyIR of a Schedule after performing validity checks for the supplied Nodes. Examples are given in the descriptions of any children classes.

If loops have different named loop variables, when possible the loop variable of the second loop will be renamed to be the same as the first loop. This has the side effect that the second loop’s variable will no longer have its value modified, with the expectation that that value isn’t used anymore.

Note that the validation of this transformation still has several shortcomings, especially for domain API loops. Use at your own risk.

apply(node1, node2, options=None)[source]

Fuses two loops represented by psyclone.psyir.nodes.Node objects after performing validity checks.

If the two loops don’t have the same loop variable, the second loop’s variable (and references to it inside the loop) will be changed to be references to the first loop’s variable before merging. This has the side effect that the second loop’s variable will no longer have its value modified, with the expectation that that value isn’t used after.

Parameters:

node1 (psyclone.psyir.nodes.Node) – the first Node that is being checked.
node2 (psyclone.psyir.nodes.Node) – the second Node that is being checked.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

class psyclone.psyir.transformations.LoopSwapTrans[source]

Provides a loop-swap transformation, e.g.:

DO j=1, m
    DO i=1, n

becomes:

DO i=1, n
    DO j=1, m

This transform is used as follows:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> ast, invokeInfo = parse("shallow_alg.f90")
>>> psy = PSyFactory("gocean1.0").create(invokeInfo)
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>>
>>> from psyclone.transformations import LoopSwapTrans
>>> swap = LoopSwapTrans()
>>> swap.apply(schedule.children[0])
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())

apply(node, options=None)[source]

The argument outer must be a loop which has exactly one inner loop. This transform then swaps the outer and inner loop.

Parameters:

outer (psyclone.psyir.nodes.Loop) – the node representing the outer loop.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

Raises:

TransformationError – if the supplied node does not allow a loop swap to be done.

class psyclone.psyir.transformations.LoopTiling2DTrans[source]

Apply a 2D loop tiling transformation to a loop. For example:

>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Loop
>>> from psyclone.psyir.transformations import LoopTiling2DTrans
>>> psyir = FortranReader().psyir_from_source("""
... subroutine sub()
...     integer :: ji, tmp(100)
...     do i=1, 100
...       do j=1, 100
...         tmp(i, j) = 2 * tmp(i, j)
...       enddo
...     enddo
... end subroutine sub""")
>>> loop = psyir.walk(Loop)[0]
>>> LoopTiling2DTrans().apply(loop)

will generate:

subroutine sub()
    integer :: ji
    integer, dimension(100) :: tmp
    integer :: ji_el_inner
    integer :: ji_out_var
    do i_out_var = 1, 100, 32
      i_el_inner = MIN(i_out_var + (32 - 1), 100)
      do j_out_var = 1, 100, 32
        do i = i_out_var, i_el_inner, 1
          j_el_inner = MIN(j_out_var + (32 - 1), 100)
          do j = j_out_var, j_el_inner, 1
            tmp(i, j) = 2 * tmp(i, j)
          enddo
        enddo
      enddo
    enddo
end subroutine sub

apply(node, options=None)[source]

Converts the given 2D Loop construct into a tiled version of the nested loops.

Parameters:

node (psyclone.psyir.nodes.Loop) – the loop to transform.
options (Optional[Dict[str, Any]]) – a dict with options for transformations.
options["tilesize"] (int) – The size of the resulting tile, currently square tiles are always used. If not specified, the value 32 is used.

class psyclone.psyir.transformations.Matmul2CodeTrans[source]

Provides a transformation from a PSyIR MATMUL Operator node to equivalent code in a PSyIR tree. Validity checks are also performed.

For a matrix-vector multiplication, if the dimensions of R, A, and B are R(N), A(N,M), B(M), the transformation replaces:

R=MATMUL(A,B)

with the following code:

do i=1,N
    R(i) = 0.0
    do j=1,M
        R(i) = R(i) + A(i,j) * B(j)

For a matrix-matrix multiplication, if the dimensions of R, A, and B are R(P,M), A(P,N), B(N,M), the MATMUL is replaced with the following code:

do j=1,M
    do i=1,P
        R(i,j) = 0.0
        do ii=1,N
            R(i,j) = R(i,j) + A(i,ii) * B(ii,j)

Note that this transformation does not support the case where A is a rank-1 array.

apply(node, options=None)[source]

Apply the MATMUL intrinsic conversion transformation to the specified node. This node must be a MATMUL IntrinsicCall. The first argument must currently have two dimensions while the second must have either one or two dimensions. Each argument is permitted to have additional dimensions (i.e. more than 2) but in each case it is only the first one or two which may be ranges. Further, the ranges must currently be for the full index space for that dimension (i.e. array subsections are not supported). If the transformation is successful then an assignment which includes a MATMUL IntrinsicCall node is converted to equivalent inline code.

Parameters:

node (psyclone.psyir.nodes.IntrinsicCall) – a MATMUL IntrinsicCall node.
options (Optional[Dict[str, Any]]) – options for the transformation.

Note

This transformation is currently limited to translating the matrix vector form of MATMUL to equivalent PSyIR code.

class psyclone.psyir.transformations.Max2CodeTrans[source]

Provides a transformation from a PSyIR MAX Intrinsic node to equivalent code in a PSyIR tree. Validity checks are also performed (by a parent class).

The transformation replaces

R = MAX(A, B, C ...)

with the following logic:

R = A
if B > R:
    R = B
if C > R:
    R = C
...

apply(node, options=None)

Apply this utility transformation to the specified node. This node must be a MIN or MAX BinaryOperation or NaryOperation. The operation is converted to equivalent inline code. This is implemented as a PSyIR transform from:

R = ... [MIN or MAX](A, B, C ...) ...

to:

res = A
tmp = B
IF tmp [< or >] res:
    res = tmp
tmp = C
IF tmp [< or >] res:
    res = tmp
...
R = ... res ...

where A, B, C … could be arbitrarily complex PSyIR expressions and the ... before and after [MIN or MAX](A, B, C ...) can be arbitrary PSyIR code.

This transformation requires the operation node to be a descendent of an assignment and will raise an exception if this is not the case.

Parameters:

node (psyclone.psyir.nodes.BinaryOperation or psyclone.psyir.nodes.NaryOperation) – a MIN or MAX Binary- or Nary-Operation node.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

Warning

This transformation assumes that the MAX Intrinsic acts on PSyIR Real scalar data and does not check that this is not the case. Once issue #658 is on master then this limitation can be fixed.

class psyclone.psyir.transformations.Maxval2LoopTrans[source]

Provides a transformation from a PSyIR MAXVAL IntrinsicCall node to an equivalent PSyIR loop structure that is suitable for running in parallel on CPUs and GPUs. Validity checks are also performed.

If MAXVAL contains a single positional argument which is an array, the maximum value of all of the elements in the array is returned in the the scalar R.

R = MAXVAL(ARRAY)

For example, if the array is two dimensional, the equivalent code for real data is:

R = -HUGE(R)
DO J=LBOUND(ARRAY,2),UBOUND(ARRAY,2)
  DO I=LBOUND(ARRAY,1),UBOUND(ARRAY,1)
    R = MAX(R, ARRAY(I,J))

If the mask argument is provided then the mask is used to determine whether the maxval is applied:

R = MAXVAL(ARRAY, mask=MOD(ARRAY, 2.0)==1)

If the array is two dimensional, the equivalent code for real data is:

R = -HUGE(R)
DO J=LBOUND(ARRAY,2),UBOUND(ARRAY,2)
  DO I=LBOUND(ARRAY,1),UBOUND(ARRAY,1)
    IF (MOD(ARRAY(I,J), 2.0)==1) THEN
      R = MAX(R, ARRAY(I,J))

The dimension argument is currently not supported and will result in a TransformationError exception being raised.

R = MAXVAL(ARRAY, dimension=2)

The array passed to MAXVAL may use any combination of array syntax, array notation, array sections and scalar bounds:

R = MAXVAL(ARRAY) ! array syntax
R = MAXVAL(ARRAY(:,:)) ! array notation
R = MAXVAL(ARRAY(1:10,lo:hi)) ! array sections
R = MAXVAL(ARRAY(1:10,:)) ! mix of array section and array notation
R = MAXVAL(ARRAY(1:10,2)) ! mix of array section and scalar bound

An example use of this transformation is given below:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.transformations import Maxval2LoopTrans
>>> code = ("subroutine maxval_test(array)\n"
...         "  real :: array(10,10)\n"
...         "  real :: result\n"
...         "  result = maxval(array)\n"
...         "end subroutine\n")
>>> psyir = FortranReader().psyir_from_source(code)
>>> sum_node = psyir.children[0].children[0].children[1]
>>> Maxval2LoopTrans().apply(sum_node)
>>> print(FortranWriter()(psyir))
subroutine maxval_test(array)
  real, dimension(10,10) :: array
  real :: result
  integer :: idx
  integer :: idx_1

  result = -HUGE(result)
  do idx = 1, 10, 1
    do idx_1 = 1, 10, 1
      result = MAX(result, array(idx_1,idx))
    enddo
  enddo

end subroutine maxval_test

apply(node, options=None)

Apply the array-reduction intrinsic conversion transformation to the specified node. This node must be one of these intrinsic operations which is converted to an equivalent loop structure.

Parameters:

node (psyclone.psyir.nodes.IntrinsicCall) – an array-reduction intrinsic.
options (Optional[Dict[str, Any]]) – options for the transformation.

class psyclone.psyir.transformations.Min2CodeTrans[source]

Provides a transformation from a PSyIR MIN Intrinsic node to equivalent code in a PSyIR tree. Validity checks are also performed (by a parent class).

The transformation replaces

R = MIN(A, B, C ...)

with the following logic:

R = A
if B < R:
    R = B
if C < R:
    R = C
...

apply(node, options=None)

Apply this utility transformation to the specified node. This node must be a MIN or MAX BinaryOperation or NaryOperation. The operation is converted to equivalent inline code. This is implemented as a PSyIR transform from:

R = ... [MIN or MAX](A, B, C ...) ...

to:

res = A
tmp = B
IF tmp [< or >] res:
    res = tmp
tmp = C
IF tmp [< or >] res:
    res = tmp
...
R = ... res ...

where A, B, C … could be arbitrarily complex PSyIR expressions and the ... before and after [MIN or MAX](A, B, C ...) can be arbitrary PSyIR code.

This transformation requires the operation node to be a descendent of an assignment and will raise an exception if this is not the case.

Parameters:

node (psyclone.psyir.nodes.BinaryOperation or psyclone.psyir.nodes.NaryOperation) – a MIN or MAX Binary- or Nary-Operation node.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

Warning

This transformation assumes that the MIN Intrinsic acts on PSyIR Real scalar data and does not check that this is not the case. Once issue #658 is on master then this limitation can be fixed.

class psyclone.psyir.transformations.Minval2LoopTrans[source]

Provides a transformation from a PSyIR MINVAL IntrinsicCall node to an equivalent PSyIR loop structure that is suitable for running in parallel on CPUs and GPUs. Validity checks are also performed.

If MINVAL contains a single positional argument which is an array, the minimum value of all of the elements in the array is returned in the the scalar R.

R = MINVAL(ARRAY)

For example, if the array is two dimensional, the equivalent code for real data is:

R = HUGE(R)
DO J=LBOUND(ARRAY,2),UBOUND(ARRAY,2)
  DO I=LBOUND(ARRAY,1),UBOUND(ARRAY,1)
    R = MIN(R, ARRAY(I,J))

If the mask argument is provided then the mask is used to determine whether the minval is applied:

R = MINVAL(ARRAY, mask=MOD(ARRAY, 2.0)==1)

If the array is two dimensional, the equivalent code for real data is:

R = HUGE(R)
DO J=LBOUND(ARRAY,2),UBOUND(ARRAY,2)
  DO I=LBOUND(ARRAY,1),UBOUND(ARRAY,1)
    IF (MOD(ARRAY(I,J), 2.0)==1) THEN
      R = MIN(R, ARRAY(I,J))

The dimension argument is currently not supported and will result in a TransformationError exception being raised.

R = MINVAL(ARRAY, dimension=2)

The array passed to MINVAL may use any combination of array syntax, array notation, array sections and scalar bounds:

R = MINVAL(ARRAY) ! array syntax
R = MINVAL(ARRAY(:,:)) ! array notation
R = MINVAL(ARRAY(1:10,lo:hi)) ! array sections
R = MINVAL(ARRAY(1:10,:)) ! mix of array section and array notation
R = MINVAL(ARRAY(1:10,2)) ! mix of array section and scalar bound

For example:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.transformations import Minval2LoopTrans
>>> code = ("subroutine minval_test(array)\n"
...         "  real :: array(10,10)\n"
...         "  real :: result\n"
...         "  result = minval(array)\n"
...         "end subroutine\n")
>>> psyir = FortranReader().psyir_from_source(code)
>>> sum_node = psyir.children[0].children[0].children[1]
>>> Minval2LoopTrans().apply(sum_node)
>>> print(FortranWriter()(psyir))
subroutine minval_test(array)
  real, dimension(10,10) :: array
  real :: result
  integer :: idx
  integer :: idx_1

  result = HUGE(result)
  do idx = 1, 10, 1
    do idx_1 = 1, 10, 1
      result = MIN(result, array(idx_1,idx))
    enddo
  enddo

end subroutine minval_test

apply(node, options=None)

Apply the array-reduction intrinsic conversion transformation to the specified node. This node must be one of these intrinsic operations which is converted to an equivalent loop structure.

Parameters:

node (psyclone.psyir.nodes.IntrinsicCall) – an array-reduction intrinsic.
options (Optional[Dict[str, Any]]) – options for the transformation.

class psyclone.transformations.MoveTrans[source]

Provides a transformation to move a node in the tree. For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> ast,invokeInfo=parse("dynamo.F90")
>>> psy=PSyFactory("dynamo0.3").create(invokeInfo)
>>> schedule=psy.invokes.get('invoke_v3_kernel_type').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>>
>>> from psyclone.transformations import MoveTrans
>>> trans=MoveTrans()
>>> trans.apply(schedule.children[0], schedule.children[2],
...             options = {"position":"after")
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())

Nodes may only be moved to a new location with the same parent and must not break any dependencies otherwise an exception is raised.

apply(node, location, options=None)[source]

Move the node represented by node before location location (which is also a node) by default and after if the optional position argument is set to ‘after’.

Parameters:

node (psyclone.psyir.nodes.Node) – the node to be moved.
location (psyclone.psyir.nodes.Node) – node before or after which the given node should be moved.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.
options["position"] (str) – either ‘before’ or ‘after’.

Raises:

TransformationError – if the given node is not an instance of psyclone.psyir.nodes.Node
TransformationError – if the location is not valid.

class psyclone.domain.gocean.transformations.GOOpenCLTrans[source]

Switches on/off the generation of an OpenCL PSy layer for a given InvokeSchedule. Additionally, it will generate OpenCL kernels for each of the kernels referenced by the Invoke. For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> API = "gocean1.0"
>>> FILENAME = "shallow_alg.f90" # examples/gocean/eg1
>>> ast, invoke_info = parse(FILENAME, api=API)
>>> psy = PSyFactory(API, distributed_memory=False).create(invoke_info)
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> ocl_trans = GOOpenCLTrans()
>>> ocl_trans.apply(schedule)
>>> print(schedule.view())

apply(node, options=None)[source]

Apply the OpenCL transformation to the supplied GOInvokeSchedule. This causes PSyclone to generate an OpenCL version of the corresponding PSy-layer routine. The generated code makes use of the FortCL library (https://github.com/stfc/FortCL) in order to manage the OpenCL device directly from Fortran.

Parameters:

node (psyclone.psyGen.GOInvokeSchedule) – the InvokeSchedule to transform.
options (dict of str:value or None) – set of option to tune the OpenCL generation.
options["enable_profiling"] (bool) – whether or not to set up the OpenCL environment with the profiling option enabled.
options["out_of_order"] (bool) – whether or not to set up the OpenCL environment with the out_of_order option enabled.
options["end_barrier"] (bool) – whether or not to add an OpenCL barrier at the end of the transformed invoke.

class psyclone.transformations.OMPDeclareTargetTrans[source]

Adds an OpenMP declare target directive to the specified routine.

For example:

>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Loop
>>> from psyclone.transformations import OMPDeclareTargetTrans
>>>
>>> tree = FortranReader().psyir_from_source("""
...     subroutine my_subroutine(A)
...         integer, dimension(10, 10), intent(inout) :: A
...         integer :: i
...         integer :: j
...         do i = 1, 10
...             do j = 1, 10
...                 A(i, j) = 0
...             end do
...         end do
...     end subroutine
...     """
>>> omptargettrans = OMPDeclareTargetTrans()
>>> omptargettrans.apply(tree.walk(Routine)[0])

will generate:

subroutine my_subroutine(A)
    integer, dimension(10, 10), intent(inout) :: A
    integer :: i
    integer :: j
    !$omp declare target
    do i = 1, 10
        do j = 1, 10
            A(i, j) = 0
        end do
    end do
end subroutine

apply(node, options=None)[source]

Insert an OMPDeclareTargetDirective inside the provided routine.

Parameters:

node (psyclone.psyir.nodes.Routine) – the PSyIR routine to insert the directive into.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

class psyclone.psyir.transformations.OMPLoopTrans(omp_directive='do', omp_schedule='auto')[source]

Adds an OpenMP directive to parallelise this loop. It can insert different directives such as “omp do/for”, “omp parallel do/for”, “omp teams distribute parallel do/for” or “omp loop” depending on the provided parameters. The OpenMP schedule to use can also be specified, but this will be ignored in case of the “omp loop” (as the ‘schedule’ clause is not valid for this specific directive). The configuration-defined ‘reprod’ parameter also specifies whether a manual reproducible reproduction is to be used. Note, reproducible in this case means obtaining the same results with the same number of OpenMP threads, not for different numbers of OpenMP threads.

Parameters:

omp_schedule (str) – the OpenMP schedule to use. Defaults to ‘auto’.
omp_directive (str) – choose which OpenMP loop directive to use. Defaults to “omp do”

For example:

>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.nodes import Loop
>>> from psyclone.transformations import OMPLoopTrans, OMPParallelTrans
>>>
>>> psyir = FortranReader().psyir_from_source("""
...     subroutine my_subroutine()
...         integer, dimension(10, 10) :: A
...         integer :: i
...         integer :: j
...         do i = 1, 10
...             do j = 1, 10
...                 A(i, j) = 0
...             end do
...         end do
...     end subroutine
...     """)
>>> loop = psyir.walk(Loop)[0]
>>> omplooptrans1 = OMPLoopTrans(omp_schedule="dynamic",
...                              omp_directive="paralleldo")
>>> omplooptrans1.apply(loop)
>>> print(FortranWriter()(psyir))
subroutine my_subroutine()
  integer, dimension(10,10) :: a
  integer :: i
  integer :: j

  !$omp parallel do default(shared), private(i,j), schedule(dynamic)
  do i = 1, 10, 1
    do j = 1, 10, 1
      a(i,j) = 0
    enddo
  enddo
  !$omp end parallel do

end subroutine my_subroutine

apply(node, options=None)[source]

Apply the OMPLoopTrans transformation to the specified PSyIR Loop.

Parameters:

node (psyclone.psyir.nodes.Node) – the supplied node to which we will apply the OMPLoopTrans transformation
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations and validation.
options["reprod"] (bool) – indicating whether reproducible reductions should be used. By default the value from the config file will be used.

property omp_directive

Returns:: the type of OMP directive that this transformation will insert.
Return type:: str

property omp_schedule

Returns:: the OpenMP schedule that will be specified by this transformation.
Return type:: str

class psyclone.transformations.OMPMasterTrans[source]

Create an OpenMP MASTER region by inserting directives. The most likely use case for this transformation is to wrap around task-based transformations. Note that adding this directive requires a parent OpenMP parallel region (which can be inserted by OMPParallelTrans), otherwise it will produce an error in generation-time.

For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> api = "gocean1.0"
>>> ast, invokeInfo = parse(GOCEAN_SOURCE_FILE, api=api)
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> from psyclone.transformations import OMPParallelTrans, OMPMasterTrans
>>> mastertrans = OMPMasterTrans()
>>> paralleltrans = OMPParallelTrans()
>>>
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>>
>>> # Enclose all of these loops within a single OpenMP
>>> # MASTER region
>>> mastertrans.apply(schedule.children)
>>> # Enclose all of these loops within a single OpenMP
>>> # PARALLEL region
>>> paralleltrans.apply(schedule.children)
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())

apply(target_nodes, options=None)

Apply this transformation to a subset of the nodes within a schedule - i.e. enclose the specified Loops in the schedule within a single parallel region.

Parameters:

target_nodes ((list of) psyclone.psyir.nodes.Node) – a single Node or a list of Nodes.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.
options["node-type-check"] (bool) – this flag controls if the type of the nodes enclosed in the region should be tested to avoid using unsupported nodes inside a region.

get_node_list(nodes)

This is a helper function for region based transformations. The parameter for any of those transformations is either a single node, a schedule, or a list of nodes. This function converts this into a list of nodes according to the parameter type. This function will always return a copy, to avoid issues e.g. if a child list of a node should be provided, and a transformation changes the order in this list (which would then also change the order of the nodes in the tree).

Parameters:

nodes (Union[psyclone.psyir.nodes.Node, psyclone.psyir.nodes.Schedule, List[psyclone.psyir.nodes.Node]) – can be a single node, a schedule or a list of nodes.
options (Optional[Dict[str,Any]]) – a dictionary with options for transformations.

Returns:

a list of nodes.

Return type:

List[psyclone.psyir.nodes.Node]

Raises:

TransformationError – if the supplied parameter is neither a single Node, nor a Schedule, nor a list of Nodes.

validate(node_list, options=None)

Check that the supplied list of Nodes are eligible to be put inside a parallel region.

Parameters:

node_list (list) – list of nodes to put into a parallel region
options – a dictionary with options for transformations. :type options: Optional[Dict[str, Any]]
options["node-type-check"] (bool) – this flag controls whether or not the type of the nodes enclosed in the region should be tested to avoid using unsupported nodes inside a region.

Raises:

TransformationError – if the supplied node is an InvokeSchedule rather than being within an InvokeSchedule.
TransformationError – if the supplied nodes are not all children of the same parent (siblings).

Note

PSyclone does not support (distributed-memory) halo swaps or global sums within OpenMP master regions. Attempting to create a master region for a set of nodes that includes halo swaps or global sums will produce an error. In such cases it may be possible to re-order the nodes in the Schedule such that the halo swaps or global sums are performed outside the single region. The MoveTrans transformation may be used for this.

class psyclone.transformations.OMPParallelLoopTrans(omp_directive='do', omp_schedule='auto')[source]

Adds an OpenMP PARALLEL DO directive to a loop.

For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> ast, invokeInfo = parse("dynamo.F90")
>>> psy = PSyFactory("dynamo0.3").create(invokeInfo)
>>> schedule = psy.invokes.get('invoke_v3_kernel_type').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>>
>>> from psyclone.transformations import OMPParallelLoopTrans
>>> trans = OMPParallelLoopTrans()
>>> trans.apply(schedule.children[0])
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())

apply(node, options=None)[source]

Apply an OMPParallelLoop Transformation to the supplied node (which must be a Loop). In the generated code this corresponds to wrapping the Loop with directives:

!$OMP PARALLEL DO ...
do ...
  ...
end do
!$OMP END PARALLEL DO

Parameters:

node (psyclone.f2pygen.DoGen) – the node (loop) to which to apply the transformation.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations and validation.

class psyclone.transformations.OMPParallelTrans[source]

Create an OpenMP PARALLEL region by inserting directives. For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.parse.utils import ParseError
>>> from psyclone.psyGen import PSyFactory
>>> from psyclone.errors import GenerationError
>>> api = "gocean1.0"
>>> ast, invokeInfo = parse(GOCEAN_SOURCE_FILE, api=api)
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> from psyclone.psyGen import TransInfo
>>> t = TransInfo()
>>> ltrans = t.get_trans_name('GOceanOMPLoopTrans')
>>> rtrans = t.get_trans_name('OMPParallelTrans')
>>>
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>>
>>> # Apply the OpenMP Loop transformation to *every* loop
>>> # in the schedule
>>> for child in schedule.children:
>>>     ltrans.apply(child)
>>>
>>> # Enclose all of these loops within a single OpenMP
>>> # PARALLEL region
>>> rtrans.apply(schedule.children)
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())

apply(target_nodes, options=None)

Apply this transformation to a subset of the nodes within a schedule - i.e. enclose the specified Loops in the schedule within a single parallel region.

Parameters:

target_nodes ((list of) psyclone.psyir.nodes.Node) – a single Node or a list of Nodes.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.
options["node-type-check"] (bool) – this flag controls if the type of the nodes enclosed in the region should be tested to avoid using unsupported nodes inside a region.

get_node_list(nodes)

This is a helper function for region based transformations. The parameter for any of those transformations is either a single node, a schedule, or a list of nodes. This function converts this into a list of nodes according to the parameter type. This function will always return a copy, to avoid issues e.g. if a child list of a node should be provided, and a transformation changes the order in this list (which would then also change the order of the nodes in the tree).

Parameters:

nodes (Union[psyclone.psyir.nodes.Node, psyclone.psyir.nodes.Schedule, List[psyclone.psyir.nodes.Node]) – can be a single node, a schedule or a list of nodes.
options (Optional[Dict[str,Any]]) – a dictionary with options for transformations.

Returns:

a list of nodes.

Return type:

List[psyclone.psyir.nodes.Node]

Raises:

TransformationError – if the supplied parameter is neither a single Node, nor a Schedule, nor a list of Nodes.

validate(node_list, options=None)[source]

Perform OpenMP-specific validation checks.

Parameters:

node_list (list of psyclone.psyir.nodes.Node) – list of Nodes to put within parallel region.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.
options["node-type-check"] (bool) – this flag controls if the type of the nodes enclosed in the region should be tested to avoid using unsupported nodes inside a region.

Raises:

TransformationError – if the target Nodes are already within some OMP parallel region.

Note

PSyclone does not support (distributed-memory) halo swaps or global sums within OpenMP parallel regions. Attempting to create a parallel region for a set of nodes that includes halo swaps or global sums will produce an error. In such cases it may be possible to re-order the nodes in the Schedule such that the halo swaps or global sums are performed outside the parallel region. The MoveTrans transformation may be used for this.

class psyclone.transformations.OMPSingleTrans(nowait=False)[source]

Create an OpenMP SINGLE region by inserting directives. The most likely use case for this transformation is to wrap around task-based transformations. The parent region for this should usually also be a OMPParallelTrans.

Parameters:: nowait (bool) – whether to apply a nowait clause to this transformation. The default value is False

For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> api = "gocean1.0"
>>> ast, invokeInfo = parse(GOCEAN_SOURCE_FILE, api=api)
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> from psyclone.transformations import OMPParallelTrans, OMPSingleTrans
>>> singletrans = OMPSingleTrans()
>>> paralleltrans = OMPParallelTrans()
>>>
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>>
>>> # Enclose all of these loops within a single OpenMP
>>> # SINGLE region
>>> singletrans.apply(schedule.children)
>>> # Enclose all of these loops within a single OpenMP
>>> # PARALLEL region
>>> paralleltrans.apply(schedule.children)
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())

apply(node_list, options=None)[source]

Apply the OMPSingleTrans transformation to the specified node in a Schedule.

At code-generation time this node must be within (i.e. a child of) an OpenMP PARALLEL region. Code generation happens when OMPLoopDirective.gen_code() is called, or when the PSyIR tree is given to a backend.

If the keyword “nowait” is specified in the options, it will cause a nowait clause to be added if it is set to True, otherwise no clause will be added.

Parameters:

node_list ((a list of) psyclone.psyir.nodes.Node) – the supplied node or node list to which we will apply the OMPSingleTrans transformation
options (Optional[Dict[str, Any]]) – a list with options for transformations and validation.
options["nowait"] (bool) – indicating whether or not to use a nowait clause on this single region.

get_node_list(nodes)

This is a helper function for region based transformations. The parameter for any of those transformations is either a single node, a schedule, or a list of nodes. This function converts this into a list of nodes according to the parameter type. This function will always return a copy, to avoid issues e.g. if a child list of a node should be provided, and a transformation changes the order in this list (which would then also change the order of the nodes in the tree).

Parameters:

nodes (Union[psyclone.psyir.nodes.Node, psyclone.psyir.nodes.Schedule, List[psyclone.psyir.nodes.Node]) – can be a single node, a schedule or a list of nodes.
options (Optional[Dict[str,Any]]) – a dictionary with options for transformations.

Returns:

a list of nodes.

Return type:

List[psyclone.psyir.nodes.Node]

Raises:

TransformationError – if the supplied parameter is neither a single Node, nor a Schedule, nor a list of Nodes.

property omp_nowait

Returns:: whether or not this Single region uses a nowait clause to remove the end barrier.
Return type:: bool

validate(node_list, options=None)

Check that the supplied list of Nodes are eligible to be put inside a parallel region.

Parameters:

node_list (list) – list of nodes to put into a parallel region
options – a dictionary with options for transformations. :type options: Optional[Dict[str, Any]]
options["node-type-check"] (bool) – this flag controls whether or not the type of the nodes enclosed in the region should be tested to avoid using unsupported nodes inside a region.

Raises:

TransformationError – if the supplied node is an InvokeSchedule rather than being within an InvokeSchedule.
TransformationError – if the supplied nodes are not all children of the same parent (siblings).

Note

PSyclone does not support (distributed-memory) halo swaps or global sums within OpenMP single regions. Attempting to create a single region for a set of nodes that includes halo swaps or global sums will produce an error. In such cases it may be possible to re-order the nodes in the Schedule such that the halo swaps or global sums are performed outside the single region. The MoveTrans transformation may be used for this.

class psyclone.psyir.transformations.OMPTargetTrans[source]

Adds an OpenMP target directive to a region of code.

For example:

>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.nodes import Loop
>>> from psyclone.psyir.transformations import OMPTargetTrans
>>>
>>> tree = FortranReader().psyir_from_source("""
...     subroutine my_subroutine()
...         integer, dimension(10, 10) :: A
...         integer :: i
...         integer :: j
...         do i = 1, 10
...             do j = 1, 10
...                 A(i, j) = 0
...             end do
...         end do
...     end subroutine
...     """)
>>> OMPTargetTrans().apply(tree.walk(Loop)[0])
>>> print(FortranWriter()(tree))
subroutine my_subroutine()
  integer, dimension(10,10) :: a
  integer :: i
  integer :: j

  !$omp target
  do i = 1, 10, 1
    do j = 1, 10, 1
      a(i,j) = 0
    enddo
  enddo
  !$omp end target

end subroutine my_subroutine

apply(node, options=None)[source]

Insert an OMPTargetDirective before the provided node or list of nodes.

Parameters:

node (List[psyclone.psyir.nodes.Node]) – the PSyIR node or nodes to enclose in the OpenMP target region.
options (Optional[Dict[str,Any]]) – a dictionary with options for transformations.

class psyclone.transformations.OMPTaskloopTrans(grainsize=None, num_tasks=None, nogroup=False)[source]

Adds an OpenMP taskloop directive to a loop. Only one of grainsize or num_tasks must be specified.

TODO: #1364 Taskloops do not yet support reduction clauses.

Parameters:

grainsize (int or None) – the grainsize to use in for this transformation.
num_tasks (int or None) – the num_tasks to use for this transformation.
nogroup (bool) – whether or not to use a nogroup clause for this transformation. Default is False.

For example:

>>> from pysclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> api = "gocean1.0"
>>> ast, invokeInfo = parse(GOCEAN_SOURCE_FILE, api=api)
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> from psyclone.transformations import OMPParallelTrans, OMPSingleTrans
>>> from psyclone.transformations import OMPTaskloopTrans
>>> from psyclone.psyir.transformations import OMPTaskwaitTrans
>>> singletrans = OMPSingleTrans()
>>> paralleltrans = OMPParallelTrans()
>>> tasklooptrans = OMPTaskloopTrans()
>>> taskwaittrans = OMPTaskwaitTrans()
>>>
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>>
>>> # Apply the OpenMP Taskloop transformation to *every* loop
>>> # in the schedule.
>>> # This ignores loop dependencies. These can be handled
>>> # by the OMPTaskwaitTrans
>>> for child in schedule.children:
>>>     tasklooptrans.apply(child)
>>> # Enclose all of these loops within a single OpenMP
>>> # SINGLE region
>>> singletrans.apply(schedule.children)
>>> # Enclose all of these loops within a single OpenMP
>>> # PARALLEL region
>>> paralleltrans.apply(schedule.children)
>>> # Ensure loop dependencies are satisfied
>>> taskwaittrans.apply(schedule.children)
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())

apply(node, options=None)[source]

Apply the OMPTaskloopTrans transformation to the specified node in a Schedule. This node must be a Loop since this transformation corresponds to wrapping the generated code with directives like so:

!$OMP TASKLOOP
do ...
   ...
end do
!$OMP END TASKLOOP

At code-generation time (when OMPTaskloopDirective.gen_code() is called), this node must be within (i.e. a child of) an OpenMP SERIAL region.

If the keyword “nogroup” is specified in the options, it will cause a nogroup clause be generated if it is set to True. This will override the value supplied to the constructor, but will only apply to the apply call to which the value is supplied.

Parameters:

node (psyclone.psyir.nodes.Node) – the supplied node to which we will apply the OMPTaskloopTrans transformation
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations and validation.
options["nogroup"] (bool) – indicating whether a nogroup clause should be applied to this taskloop.

property omp_grainsize

Returns the grainsize that will be specified by this transformation. By default the grainsize clause is not applied, so grainsize is None.

Returns:: The grainsize specified by this transformation.
Return type:: int or None

property omp_num_tasks

Returns the num_tasks that will be specified by this transformation. By default the num_tasks clause is not applied so num_tasks is None.

Returns:: The grainsize specified by this transformation.
Return type:: int or None

class psyclone.psyir.transformations.OMPTaskTrans[source]

Apply an OpenMP Task Transformation to a Loop. The Loop must be within an OpenMP Serial region (Single or Master) at codegen time. Once lowering begins, no more modifications to the tree should occur as the task directives do not recompute dependencies after lowering. In the future it may be possible to do this through an _update_node implementation.

apply(node, options=None)[source]

Apply the OMPTaskTrans to the specified node in a Schedule.

Can only be applied to a Loop.

The specified node is wrapped by directives during code generation like so:

!$OMP TASK
...
!$OMP END TASK

At code-generation time, this node must be within (i.e. a child of) an OpenMP Serial region (OpenMP Single or OpenMP Master)

Any kernels or Calls will be inlined into the region before the task transformation is applied.

Parameters:

node (psyclone.psyir.nodes.Loop) – the supplied node to which we will apply the OMPTaskTrans transformation
options (dictionary of string:values or None) – a dictionary with options for transformations and validation.

class psyclone.psyir.transformations.OMPTaskwaitTrans[source]

Adds zero or more OpenMP Taskwait directives to an OMP parallel region. This transformation will add directives to satisfy dependencies between Taskloop directives without an associated taskgroup (i.e. no nogroup clause). It also tries to minimise the number added to maximise available parallelism.

For example:

>>> from pysclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> api = "gocean1.0"
>>> filename = "nemolite2d_alg.f90"
>>> ast, invokeInfo = parse(filename, api=api, invoke_name="invoke")
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> from psyclone.transformations import OMPParallelTrans, OMPSingleTrans
>>> from psyclone.transformations import OMPTaskloopTrans
>>> from psyclone.psyir.transformations import OMPTaskwaitTrans
>>> singletrans = OMPSingleTrans()
>>> paralleltrans = OMPParallelTrans()
>>> tasklooptrans = OMPTaskloopTrans()
>>> taskwaittrans = OMPTaskwaitTrans()
>>>
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> print(schedule.view())
>>>
>>> # Apply the OpenMP Taskloop transformation to *every* loop
>>> # in the schedule.
>>> # This ignores loop dependencies. These are handled by the
>>> # taskwait transformation.
>>> for child in schedule.children:
>>>     tasklooptrans.apply(child, nogroup = true)
>>> # Enclose all of these loops within a single OpenMP
>>> # SINGLE region
>>> singletrans.apply(schedule.children)
>>> # Enclose all of these loops within a single OpenMP
>>> # PARALLEL region
>>> paralleltrans.apply(schedule.children)
>>> taskwaittrans.apply(schedule.children)
>>> print(schedule.view())

apply(node, options=None)[source]

Apply an OMPTaskwait Transformation to the supplied node (which must be an OMPParallelDirective). In the generated code this corresponds to adding zero or more OMPTaskwaitDirectives as appropriate:

!$OMP PARALLEL
  ...
  !$OMP TASKWAIT
  ...
  !$OMP TASKWAIT
  ...
!$OMP END PARALLEL

Parameters:

node (psyclone.psyir.nodes.OMPParallelDirective) – the node to which to apply the transformation.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations and validation.
options["fail_on_no_taskloop"] (bool) – indicating whether this should throw an error if no OMPTaskloop nodes are found in this tree. This can be safely disabled as if there are no Taskloop nodes the result of this transformation is valid OpenMP code. Default is True

class psyclone.psyir.transformations.Product2LoopTrans[source]

Provides a transformation from a PSyIR PRODUCT IntrinsicCall node to an equivalent PSyIR loop structure that is suitable for running in parallel on CPUs and GPUs. Validity checks are also performed.

If PRODUCT contains a single positional argument which is an array, the maximum value of all of the elements in the array is returned in the the scalar R.

R = PRODUCT(ARRAY)

For example, if the array is two dimensional, the equivalent code for real data is:

R = 1.0
DO J=LBOUND(ARRAY,2),UBOUND(ARRAY,2)
  DO I=LBOUND(ARRAY,1),UBOUND(ARRAY,1)
    R = R * ARRAY(I,J)

If the mask argument is provided then the mask is used to determine whether the product is applied:

R = PRODUCT(ARRAY, mask=MOD(ARRAY, 2.0)==1)

If the array is two dimensional, the equivalent code for real data is:

R = 1.0
DO J=LBOUND(ARRAY,2),UBOUND(ARRAY,2)
  DO I=LBOUND(ARRAY,1),UBOUND(ARRAY,1)
    IF (MOD(ARRAY(I,J), 2.0)==1) THEN
      R = R * ARRAY(I,J)

The dimension argument is currently not supported and will result in a TransformationError exception being raised.

R = PRODUCT(ARRAY, dimension=2)

The array passed to PRODUCT may use any combination of array syntax, array notation, array sections and scalar bounds:

R = PRODUCT(ARRAY) ! array syntax
R = PRODUCT(ARRAY(:,:)) ! array notation
R = PRODUCT(ARRAY(1:10,lo:hi)) ! array sections
R = PRODUCT(ARRAY(1:10,:)) ! mix of array section and array notation
R = PRODUCT(ARRAY(1:10,2)) ! mix of array section and scalar bound

An example use of this transformation is given below:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.transformations import Product2LoopTrans
>>> code = ("subroutine product_test(array)\n"
...         "  real :: array(10,10)\n"
...         "  real :: result\n"
...         "  result = product(array)\n"
...         "end subroutine\n")
>>> psyir = FortranReader().psyir_from_source(code)
>>> product_node = psyir.children[0].children[0].children[1]
>>> Product2LoopTrans().apply(product_node)
>>> print(FortranWriter()(psyir))
subroutine product_test(array)
  real, dimension(10,10) :: array
  real :: result
  integer :: idx
  integer :: idx_1

  result = 1.0
  do idx = 1, 10, 1
    do idx_1 = 1, 10, 1
      result = result * array(idx_1,idx)
    enddo
  enddo

end subroutine product_test

apply(node, options=None)

Apply the array-reduction intrinsic conversion transformation to the specified node. This node must be one of these intrinsic operations which is converted to an equivalent loop structure.

Parameters:

node (psyclone.psyir.nodes.IntrinsicCall) – an array-reduction intrinsic.
options (Optional[Dict[str, Any]]) – options for the transformation.

class psyclone.psyir.transformations.ProfileTrans[source]

Create a profile region around a list of statements. For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.parse.utils import ParseError
>>> from psyclone.psyGen import PSyFactory, GenerationError
>>> from psyclone.psyir.transformations import ProfileTrans
>>> api = "gocean1.0"
>>> filename = "nemolite2d_alg.f90"
>>> ast, invokeInfo = parse(filename, api=api, invoke_name="invoke")
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> p_trans = ProfileTrans()
>>>
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> print(schedule.view())
>>>
>>> # Enclose all children within a single profile region
>>> p_trans.apply(schedule.children)
>>> print(schedule.view())

This implementation relies completely on the base class PSyDataTrans for the actual work, it only adjusts the name etc, and the list of valid nodes.

apply(nodes, options=None)

Apply this transformation to a subset of the nodes within a schedule - i.e. enclose the specified Nodes in the schedule within a single PSyData region.

Parameters:

nodes (psyclone.psyir.nodes.Node or list of psyclone.psyir.nodes.Node) – can be a single node or a list of nodes.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.
options["prefix"] (str) – a prefix to use for the PSyData module name (PREFIX_psy_data_mod) and the PSyDataType (PREFIX_PSYDATATYPE) - a “_” will be added automatically. It defaults to “”.
options["region_name"] ((str,str)) – an optional name to use for this PSyData area, provided as a 2-tuple containing a location name followed by a local name. The pair of strings should uniquely identify a region unless aggregate information is required (and is supported by the runtime library).

class psyclone.psyir.transformations.ReadOnlyVerifyTrans(node_class=<class 'psyclone.psyir.nodes.read_only_verify_node.ReadOnlyVerifyNode'>)[source]

This transformation inserts a ReadOnlyVerifyNode or a node derived from ReadOnlyVerifyNode into the PSyIR of a schedule. At code creation time this node will use the PSyData API to create code that will verify that read-only quantities are not modified.

After applying the transformation the Nodes marked for verification are children of the ReadOnlyVerifyNode. Nodes to verify can be individual constructs within an Invoke (e.g. Loops containing a Kernel or BuiltIn call) or entire Invokes.

Parameters:: node_class (psyclone.psyir.nodes.ReadOnlyVerifyNode or derived class) – The class of Node which will be inserted into the tree (defaults to ReadOnlyVerifyNode), but can be any derived class.

apply(nodes, options=None)

Apply this transformation to a subset of the nodes within a schedule - i.e. enclose the specified Nodes in the schedule within a single PSyData region.

Parameters:

nodes (psyclone.psyir.nodes.Node or list of psyclone.psyir.nodes.Node) – can be a single node or a list of nodes.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.
options["prefix"] (str) – a prefix to use for the PSyData module name (PREFIX_psy_data_mod) and the PSyDataType (PREFIX_PSYDATATYPE) - a “_” will be added automatically. It defaults to “”.
options["region_name"] ((str,str)) – an optional name to use for this PSyData area, provided as a 2-tuple containing a location name followed by a local name. The pair of strings should uniquely identify a region unless aggregate information is required (and is supported by the runtime library).

class psyclone.psyir.transformations.Reference2ArrayRangeTrans[source]

Provides a transformation from PSyIR Array Notation (a reference to an Array) to a PSyIR Range. For example:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Reference
>>> from psyclone.psyir.transformations import TransformationError
>>> CODE = ("program example\n"
...         "real :: a(:)\n"
...         "a = 0.0\n"
...         "end program\n")
>>> trans = Reference2ArrayRangeTrans()
>>> psyir = FortranReader().psyir_from_source(CODE)
>>> for reference in psyir.walk(Reference):
...    try:
...        trans.apply(reference)
...    except TransformationError:
...        pass
>>> print(FortranWriter()(psyir))
program example
  real, dimension(:) :: a

  a(:) = 0.0

end program example

This transformation does not currently support arrays within structures, see issue #1858.

apply(node, options=None)[source]

Apply the Reference2ArrayRangeTrans transformation to the specified node. The node must be a Reference to an array. The Reference is replaced by an ArrayReference with appropriate explicit range nodes (termed colon notation in Fortran).

Parameters:

node (psyclone.psyir.nodes.Reference) – a Reference node.
options (Optional[Dict[str, Any]]) – a dict with options for transformations.

class psyclone.psyir.transformations.ReplaceInductionVariablesTrans[source]

Move all supported induction variables out of the loop, and replace their usage inside the loop. For example:

>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.nodes import Loop
>>> from psyclone.psyir.transformations import             ReplaceInductionVariablesTrans
>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> psyir = FortranReader().psyir_from_source("""
... subroutine sub()
...     integer :: i, im, ic, tmp(100)
...     do i=1, 100
...         im = i - 1
...         ic = 2
...         tmp(i) = ic * im
...     enddo
... end subroutine sub""")
>>> loop = psyir.walk(Loop)[0]
>>> ReplaceInductionVariablesTrans().apply(loop)
>>> print(FortranWriter()(psyir))
subroutine sub()
  integer :: i
  integer :: im
  integer :: ic
  integer, dimension(100) :: tmp

  do i = 1, 100, 1
    tmp(i) = 2 * (i - 1)
  enddo
  ic = 2
  im = i - 1 - 1

end subroutine sub

The replaced induction variables assignments are added after the loop, so these variables will have the correct value if they are used elsewhere.

The following restrictions apply for the assignment to an induction variable:

the variable must be a scalar (i.e. no array access at all, not even a constant like a(3) or a%b(3)%c)
none of variables on the right-hand side can be written in the loop body (the loop variable is written in the Loop statement, not in the body, so it can be used).
Only intrinsic function calls are allowed on the RHS (since they are known to be elemental)
the assigned variable must not be read before the assignment.
the assigned variable cannot occur on the right-hand side (e.g. k = k + 3).
there must be only one assignment to this induction variable.

apply(node, options=None)[source]

Apply the ReplaceInductionVariablesTrans transformation to the specified node. The node must be a loop. In case of nested loops, the transformation might need to be applied several times, from the inner-most loop outwards.

Parameters:: node (psyclone.psyir.nodes.Loop) – a Loop node.

class psyclone.psyir.transformations.Sign2CodeTrans[source]

Provides a transformation from a PSyIR SIGN intrinsic node to equivalent code in a PSyIR tree. Validity checks are also performed.

The transformation replaces

R = SIGN(A, B)

with the following logic:

R = ABS(A)
if B < 0.0:
    R = R*-1.0

i.e. the value of A with the sign of B

apply(node, options=None)[source]

Apply the SIGN intrinsic conversion transformation to the specified node. This node must be a SIGN IntrinsicCall. The SIGN IntrinsicCall is converted to equivalent inline code. This is implemented as a PSyIR transform from:

R = ... SIGN(A, B) ...

to:

tmp_abs = A
if tmp_abs < 0.0:
    res_abs = tmp_abs*-1.0
else:
    res_abs = tmp_abs
res_sign = res_abs
tmp_sign = B
if tmp_sign < 0.0:
    res_sign = res_sign*-1.0
R = ... res_sign ...

where A and B could be arbitrarily complex PSyIR expressions, ... could be arbitrary PSyIR code and where ABS has been replaced with inline code by the NemoAbsTrans transformation.

This transformation requires the IntrinsicCall node to be a child of an assignment and will raise an exception if this is not the case.

Parameters:

node (psyclone.psyir.nodes.IntrinsicCall) – a SIGN IntrinsicCall node.
symbol_table (psyclone.psyir.symbols.SymbolTable) – the symbol table.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

Warning

This transformation assumes that the SIGN Intrinsic acts on PSyIR Real scalar data and does not check whether or not this is the case. Once issue #658 is on master then this limitation can be fixed.

class psyclone.psyir.transformations.Sum2LoopTrans[source]

Provides a transformation from a PSyIR SUM IntrinsicCall node to an equivalent PSyIR loop structure that is suitable for running in parallel on CPUs and GPUs. Validity checks are also performed.

If SUM contains a single positional argument which is an array, all elements of that array are summed and the result returned in the scalar R.

R = SUM(ARRAY)

For example, if the array is two dimensional, the equivalent code for real data is:

R = 0.0
DO J=LBOUND(ARRAY,2),UBOUND(ARRAY,2)
  DO I=LBOUND(ARRAY,1),UBOUND(ARRAY,1)
    R = R + ARRAY(I,J)

If the mask argument is provided then the mask is used to determine whether the sum is applied:

R = SUM(ARRAY, mask=MOD(ARRAY, 2.0)==1)

If the array is two dimensional, the equivalent code for real data is:

R = 0.0
DO J=LBOUND(ARRAY,2),UBOUND(ARRAY,2)
  DO I=LBOUND(ARRAY,1),UBOUND(ARRAY,1)
    IF (MOD(ARRAY(I,J), 2.0)==1) THEN
      R = R + ARRAY(I,J)

The dimension argument is currently not supported and will result in a TransformationError exception being raised.

R = SUM(ARRAY, dimension=2)

The array passed to MAXVAL may use any combination of array syntax, array notation, array sections and scalar bounds:

R = SUM(ARRAY) ! array syntax
R = SUM(ARRAY(:,:)) ! array notation
R = SUM(ARRAY(1:10,lo:hi)) ! array sections
R = SUM(ARRAY(1:10,:)) ! mix of array section and array notation
R = SUM(ARRAY(1:10,2)) ! mix of array section and scalar bound

For example:

>>> from psyclone.psyir.backend.fortran import FortranWriter
>>> from psyclone.psyir.frontend.fortran import FortranReader
>>> from psyclone.psyir.transformations import Sum2LoopTrans
>>> code = ("subroutine sum_test(array,n,m)\n"
...         "  integer :: n, m\n"
...         "  real :: array(10,10)\n"
...         "  real :: result\n"
...         "  result = sum(array)\n"
...         "end subroutine\n")
>>> psyir = FortranReader().psyir_from_source(code)
>>> sum_node = psyir.children[0].children[0].children[1]
>>> Sum2LoopTrans().apply(sum_node)
>>> print(FortranWriter()(psyir))
subroutine sum_test(array, n, m)
  integer :: n
  integer :: m
  real, dimension(10,10) :: array
  real :: result
  integer :: idx
  integer :: idx_1

  result = 0.0
  do idx = 1, 10, 1
    do idx_1 = 1, 10, 1
      result = result + array(idx_1,idx)
    enddo
  enddo

end subroutine sum_test

apply(node, options=None)

Apply the array-reduction intrinsic conversion transformation to the specified node. This node must be one of these intrinsic operations which is converted to an equivalent loop structure.

Parameters:

node (psyclone.psyir.nodes.IntrinsicCall) – an array-reduction intrinsic.
options (Optional[Dict[str, Any]]) – options for the transformation.

Algorithm-layer

The gocean1.0 API supports the transformation of the algorithm layer. In the future the LFRic (dynamo0.3) API will also support this. However, this is not relevant to the nemo API as it does not have the concept of an algorithm layer (just PSy and Kernel layers). The ability to transformation the algorithm layer is new and at this time no relevant transformations have been developed.

Kernels

PSyclone supports the transformation of Kernels as well as PSy-layer code. However, the transformation of kernels to produce new kernels brings with it additional considerations, especially regarding the naming of the resulting kernels. PSyclone supports two use cases:

the HPC expert wishes to optimise the same kernel in different ways, depending on where/how it is called;

the HPC expert wishes to transform the kernel just once and have the new version used throughout the Algorithm file.

The second case is really an optimisation of the first for the case where the same set of transformations is applied to every instance of a given kernel.

Since PSyclone is run separately for each Algorithm in a given application, ensuring that there are no name clashes for kernels in the application as a whole requires that some state is maintained between PSyclone invocations. This is achieved by requiring that the same kernel output directory is used for every invocation of PSyclone when building a given application. However, this is under the control of the user and therefore it is possible to use the same output directory for a subset of algorithms that require the same kernel transformation and then a different directory for another subset requiring a different transformation. Of course, such use would require care when building and linking the application since the differently-optimised kernels would have the same names.

By default, transformed kernels are written to the current working directory. Alternatively, the user may specify the location to which to write the modified code via the -okern command-line flag.

In order to support the two use cases given above, PSyclone supports two different kernel-renaming schemes: “multiple” and “single” (specified via the --kernel-renaming command-line flag). In the default, “multiple” scheme, PSyclone ensures that each transformed kernel is given a unique name (with reference to the contents of the kernel output directory). In the “single” scheme, it is assumed that any given kernel that is transformed is always transformed in the same way (or left unchanged) and thus just one transformed version of it is created. This assumption is checked by examining the Fortran code for any pre-existing transformed version of that kernel. If another transformed version of that kernel exists and does not match that created by the current transformation then PSyclone will raise an exception.

Rules

Kernel code that is to be transformed is subject to certain restrictions. These rules are intended to make kernel transformations as robust as possible, in particular by limiting the amount of code that must be parsed by PSyclone (via fparser). The rules are as follows:

Any variable or procedure accessed by a kernel must either be explicitly declared or named in the only clause of a module use statement within the scope of the subroutine containing the kernel implementation. This means that:
1. Kernel subroutines are forbidden from accessing data using COMMON blocks;
2. Kernel subroutines are forbidden from calling procedures declared via the EXTERN statement;
3. Kernel subroutines must not access data or procedures made available via their parent (containing) module.
The full Fortran source of a kernel must be available to PSyclone. This includes the source of any modules from which it accesses either routines or data. (However, kernel routines are permitted to make use of Fortran intrinsic routines.)

For instance, consider the following Fortran module containing the bc_ssh_code kernel:

module boundary_conditions_mod
  real :: forbidden_var

contains

  subroutine bc_ssh_code(ji, jj, istep, ssha)
    use kind_params_mod, only: go_wp
    use model_mod, only: rdt
    integer,                     intent(in)    :: ji, jj, istep
    real(go_wp), dimension(:,:), intent(inout) :: ssha
    real(go_wp) :: rtime

    rtime = real(istep, go_wp) * rdt
    ...
  end subroutine bc_ssh_code

end module boundary_conditions_mod

Since the kernel subroutine accesses data (the rdt variable) from the model_mod module, the source of that module must be available to PSyclone if a transformation is applied to this kernel. Should rdt not actually be defined in model_mod (i.e. model_mod itself imports it from another module) then the source containing its definition must also be available to PSyclone. Note that the rules forbid the bc_ssh_code kernel from accessing the forbidden_var variable that is available to it from the enclosing module scope.

Note

these rules only apply to kernels that are the target of PSyclone kernel transformations.

Available Kernel Transformations

The transformations listed below have to be applied specifically to a PSyclone kernel. There are a number of transformations not listed here that can be applied to either or both the PSy-layer and Kernel-layer PSyIR.

Note

Some of these transformations modify the PSyIR tree of both the InvokeSchedule where the transformed CodedKernel is located and its associated KernelSchedule.

class psyclone.transformations.ACCRoutineTrans[source]

Transform a kernel or routine by adding a “!$acc routine” directive (causing it to be compiled for the OpenACC accelerator device). For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> api = "gocean1.0"
>>> ast, invokeInfo = parse(GOCEAN_SOURCE_FILE, api=api)
>>> psy = PSyFactory(api).create(invokeInfo)
>>>
>>> from psyclone.transformations import ACCRoutineTrans
>>> rtrans = ACCRoutineTrans()
>>>
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> # Uncomment the following line to see a text view of the schedule
>>> # print(schedule.view())
>>> kern = schedule.children[0].children[0].children[0]
>>> # Transform the kernel
>>> rtrans.apply(kern)

apply(node, options=None)[source]

Add the ‘!$acc routine’ OpenACC directive into the code of the supplied Kernel (in a PSyKAl API such as GOcean or LFRic) or directly in the supplied Routine.

Parameters:

node (psyclone.psyGen.Kern or psyclone.psyir.nodes.Routine) – the kernel call or routine implementation to transform.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

property name

Returns:: the name of this transformation class.
Return type:: str

validate(node, options=None)[source]

Perform checks that the supplied kernel or routine can be transformed.

Parameters:

node (psyclone.psyGen.Kern | psyclone.psyir.nodes.Routine) – the kernel which is the target of the transformation.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

Raises:

TransformationError – if the node is not a kernel or a routine.
TransformationError – if the target is a built-in kernel.
TransformationError – if it is a kernel but without an associated PSyIR.
TransformationError – if any of the symbols in the kernel are accessed via a module use statement.
TransformationError – if the kernel contains any calls to other routines.

class psyclone.psyir.transformations.FoldConditionalReturnExpressionsTrans[source]

Provides a transformation that folds conditional expressions with only a return statement inside so that the Return statement is moved to the end of the Routine and therefore it can be safely removed. This simplifies the control flow of the kernel to facilitate other transformations like kernel fusions. For example, the following code:

subroutine test(i)
    if (i < 5) then
        return
    endif
    if (i > 10) then
        return
    endif
    ! CODE
end subroutine

will be transformed to:

subroutine test(i)
    if (.not.(i < 5)) then
        if (.not.(i > 10)) then
            ! CODE
        endif
    endif
end subroutine

apply(node, options=None)[source]

Apply this transformation to the supplied node.

Parameters:

node (psyclone.psyir.nodes.Routine) – the node to transform.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

property name: Returns the name of this transformation as a string.

validate(node, options=None)[source]

Ensure that it is valid to apply this transformation to the supplied node.

Parameters:

node (psyclone.psyir.nodes.Routine) – the node to validate.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

Raises:

TransformationError – if the node is not a Routine.

class psyclone.transformations.KernelImportsToArguments[source]

Transformation that removes any accesses of imported data from the supplied kernel and places them in the caller. The values/references are then passed by argument into the kernel.

apply(node, options=None)[source]

Convert the imported variables used inside the kernel into arguments and modify the InvokeSchedule to pass the same imported variables to the kernel call.

Parameters:

node (psyclone.psyGen.CodedKern) – a kernel call.
options (Optional[Dict[str, Any]]) – a dictionary with options for transformations.

Note

This transformation is only supported by the GOcean 1.0 API.

Applying

Transformations can be applied either interactively or through a script.

Interactive

To apply a transformation interactively we first parse and analyse the code. This allows us to generate a “vanilla” PSy layer. For example:

>>> from fparser.common.readfortran import FortranStringReader
>>> from psyclone.parse.algorithm import Parser
>>> from psyclone.psyGen import PSyFactory
>>> from fparser.two.parser import ParserFactory

>>> example_str = (
...     "program example\n"
...     "  use field_mod, only: field_type\n"
...     "  type(field_type) :: field\n"
...     "  call invoke(setval_c(field, 0.0))\n"
...     "end program example\n")

>>> parser = ParserFactory().create(std="f2008")
>>> reader = FortranStringReader(example_str)
>>> ast = parser(reader)
>>> invoke_info = Parser().invoke_info(ast)

# This example uses the LFRic (dynamo0.3) API
>>> api = "dynamo0.3"

# Create the PSy-layer object using the invokeInfo
>>> psy = PSyFactory(api, distributed_memory=False).create(invoke_info)

# Optionally generate the vanilla PSy layer fortran
>>> print(psy.gen)
  MODULE example_psy
    USE constants_mod, ONLY: r_def, i_def
    USE field_mod, ONLY: field_type, field_proxy_type
    IMPLICIT NONE
    CONTAINS
    SUBROUTINE invoke_0(field)
      TYPE(field_type), intent(in) :: field
      INTEGER(KIND=i_def) df
      INTEGER(KIND=i_def) loop0_start, loop0_stop
      TYPE(field_proxy_type) field_proxy
      INTEGER(KIND=i_def) undf_aspc1_field
      !
      ! Initialise field and/or operator proxies
      !
      field_proxy = field%get_proxy()
      !
      ! Initialise number of DoFs for aspc1_field
      !
      undf_aspc1_field = field_proxy%vspace%get_undf()
      !
      ! Set-up all of the loop bounds
      !
      loop0_start = 1
      loop0_stop = undf_aspc1_field
      !
      ! Call our kernels
      !
      DO df=loop0_start,loop0_stop
        field_proxy%data(df) = 0.0
      END DO
      !
    END SUBROUTINE invoke_0
  END MODULE example_psy

We then extract the particular schedule we are interested in. For example:

# List the various invokes that the PSy layer contains
>>> print(psy.invokes.names)
dict_keys(['invoke_0'])

# Get the required invoke
>>> invoke = psy.invokes.get('invoke_0')

# Get the schedule associated with the required invoke
> schedule = invoke.schedule
> print(schedule.view())
InvokeSchedule[invoke='invoke_0', dm=True]
    0: Loop[type='dof', field_space='any_space_1', it_space='dof', upper_bound='ndofs']
        Literal[value:'NOT_INITIALISED', Scalar<INTEGER, UNDEFINED>]
        Literal[value:'NOT_INITIALISED', Scalar<INTEGER, UNDEFINED>]
        Literal[value:'1', Scalar<INTEGER, UNDEFINED>]
        Schedule[]
            0: BuiltIn setval_c(field,0.0)

Now we have the schedule we can create and apply a transformation to it to create a new schedule and then replace the original schedule with the new one. For example:

# Create an OpenMPParallelLoopTrans
> from psyclone.transformations import OMPParallelLoopTrans
> ol = OMPParallelLoopTrans()

# Apply it to the loop schedule of the selected invoke
> ol.apply(schedule.children[0])
> print(schedule.view())

# Generate the Fortran code for the new PSy layer
> print(psy.gen)

Script

PSyclone provides a Python script (psyclone) that can be used from the command line to generate PSy layer code and to modify algorithm layer code appropriately. By default this script will generate “vanilla” (unoptimised) PSy-layer and algorithm layer code. For example:

> psyclone algspec.f90
> psyclone -oalg alg.f90 -opsy psy.f90 -api dynamo0.3 algspec.f90

The psyclone script has an optional -s flag which allows the user to specify a script file to modify the PSy layer as required. Script files may be specified without a path. For example:

> psyclone -s opt.py algspec.f90

In this case, the current directory is prepended to the Python search path PYTHONPATH which will then be used to try to find the script file. Thus, the search begins in the current directory and continues over any pre-existing directories in the search path, failing if the file cannot be found.

Alternatively, script files may be specified with a path. In this case the file must exist in the specified location. This location is then added to the Python search path PYTHONPATH as before. For example:

> psyclone -s ./opt.py algspec.f90
> psyclone -s ../scripts/opt.py algspec.f90
> psyclone -s /home/me/PSyclone/scripts/opt.py algspec.f90

PSyclone also provides the same functionality via a function (which is what the psyclone script calls internally).

###.. autofunction:: psyclone.generator.generate ### :noindex:

A valid script file must contain a trans function which accepts a PSy object as an argument and returns a PSy object, i.e.:

>>> def trans(psy):
...     # ...
...     return psy

It is up to the script what it does with the PSy object. The example below does the same thing as the example in the Interactive section.

>>> def trans(psy):
...     from psyclone.transformations import OMPParallelLoopTrans
...     invoke = psy.invokes.get('invoke_0_v3_kernel_type')
...     schedule = invoke.schedule
...     ol = OMPParallelLoopTrans()
...     ol.apply(schedule.children[0])
...     return psy

In the gocean1.0 API (and in the future the lfric (dynamo0.3) API) an optional trans_alg function may also be supplied. This function accepts PSyIR (representing the algorithm layer) as an argument and returns PSyIR i.e.:

>>> def trans_alg(psyir):
...     # ...
...    return psyir

As with the trans() function it is up to the script what it does with the algorithm PSyIR. Note that the trans_alg() script is applied to the algorithm layer before the PSy-layer is generated so any changes applied to the algorithm layer will be reflected in the PSy object that is passed to the trans() function.

For example, if the trans_alg() function in the script merged two invoke calls into one then the Alg object passed to the trans() function of the script would only contain one schedule associated with the merged invoke.

Of course the script may apply as many transformations as is required for a particular algorithm and/or schedule and may apply transformations to all the schedules (i.e. invokes and/or kernels) contained within the PSy layer.

Examples of the use of transformation scripts can be found in many of the examples, such as examples/lfric/eg3 and examples/lfric/scripts. Please read the examples/lfric/README file first as it explains how to run the examples (and see also the examples/check_examples script).

An example of the use of a script making use of the trans_alg() function can be found in examples/gocean/eg7.

OpenMP

OpenMP is added to a code by using transformations. The OpenMP transformations currently supported allow the addition of:

an OpenMP Parallel directive
an OpenMP Target directive
an OpenMP Declare Target directive
an OpenMP Do/For/Loop directive
an OpenMP Single directive
an OpenMP Master directive
an OpenMP Taskloop directive
multiple OpenMP Taskwait directives; and
an OpenMP Parallel Do directive.

The generic versions of these transformations (i.e. ones that theoretically work for all APIs) were given in the Standard Functionality section. The API-specific versions of these transformations are described in the API-specific sections of this document.

Reductions

PSyclone supports parallel scalar reductions. If a scalar reduction is specified in the Kernel metadata (see the API-specific sections for details) then PSyclone ensures the appropriate reduction is performed.

In the case of distributed memory, PSyclone will add GlobalSum’s at the appropriate locations. As can be inferred by the name, only “summation” reductions are currently supported for distributed memory.

In the case of an OpenMP parallel loop the standard reduction support will be used by default. For example

!$omp parallel do, reduction(+:x)
!loop
!$omp end parallel do

OpenMP reductions do not guarantee to give bit reproducible results for different runs of the same problem even if the same problem is run using the same resources. The reason for this is that the order in which data is reduced is not mandated.

Therefore, an additional reprod option has been added to the OpenMP Do transformation. If the reprod option is set to “True” then the OpenMP reduction support is replaced with local per-thread reductions which are reduced serially after the loop has finished. This implementation guarantees to give bit-wise reproducible results for different runs of the same problem using the same resources, but will not bit-wise compare if the code is rerun with different numbers of OpenMP threads.

Restrictions

If two reductions are used within an OpenMP region and the same variable is used for both reductions then PSyclone will raise an exception. In this case the solution is to use a different variable for each reduction.

PSyclone does not support (distributed-memory) halo swaps or global sums within OpenMP parallel regions. Attempting to create a parallel region for a set of nodes that includes halo swaps or global sums will produce an error. In such cases it may be possible to re-order the nodes in the Schedule using the MoveTrans transformation.

OpenMP Tasking

PSyclone supports OpenMP Tasking, through the OMPTaskloopTrans and OMPTaskwaitTrans transformations. OMPTaskloopTrans transformations can be applied to loops, whilst the OMPTaskwaitTrans operator is applied to an OpenMP Parallel Region, and computes the dependencies caused by Taskloops, and adds OpenMP Taskwait statements to satisfy those dependencies. An example of using OpenMP tasking is available in PSyclone/examples/nemo/eg1/openmp_taskloop_trans.py.

OpenCL

OpenCL is added to a code by using the GOOpenCLTrans transformation (see the Standard Functionality Section above). Currently this transformation is only supported for the GOcean1.0 API and is applied to the whole InvokeSchedule of an Invoke. This transformation will add an OpenCL driver infrastructure to the PSy layer and generate an OpenCL kernel for each of the Invoke kernels. This means that all kernels in that Invoke will be executed on the OpenCL device. The PSy-layer OpenCL code generated by PSyclone is still Fortran and makes use of the FortCL library (https://github.com/stfc/FortCL) to access OpenCL functionality. It also relies upon the device acceleration support provided by the dl_esm_inf library (https://github.com/stfc/dl_esm_inf).

Note

The generated OpenCL kernels are written in a file called opencl_kernels_<index>.cl where the index keeps increasing if the file name already exist.

The GOOpenCLTrans transformation accepts an options argument with a map of optional parameters to tune the OpenCL host code in the PSy layer. These options will be attached to the transformed InvokeSchedule. The current available options are:

Option	Description	Default
end_barrier	Whether a synchronization barrier should be placed at the end of the Invoke.	True
enable_profiling	Enables the profiling of OpenCL Kernels.	False
out_of_order	Allows the OpenCL implementation to execute the enqueued kernels out-of-order.	False

Additionally, each individual kernel (inside the Invoke that is going to be transformed) also accepts a map of options which are provided by the set_opencl_options() method of the Kern object. This can affect both the driver layer and/or the OpenCL kernels. The current available options are:

Option	Description	Default
local_size	Number of work-items to group together in a work-group execution (kernel instances executed at the same time).	64
queue_number	The identifier of the OpenCL command_queue to which the kernel should be submitted. If the kernel has a dependency on another kernel submitted to a different command_queue a barrier will be added to guarantee the execution order.	1

Below is an example of a PSyclone script that uses a GOOpenCLTrans with multiple InvokeSchedule and kernel-specific optimization options.

from psyclone.psyir.transformations import \
    FoldConditionalReturnExpressionsTrans
from psyclone.domain.gocean.transformations import GOOpenCLTrans, \
    GOMoveIterationBoundariesInsideKernelTrans


def trans(psy):
    '''
    Transformation routine for use with PSyclone. Applies the OpenCL
    transform to the first Invoke in the psy object.

    :param psy: the PSy object which this script will transform.
    :type psy: :py:class:`psyclone.psyGen.PSy`
    :returns: the transformed PSy object.
    :rtype: :py:class:`psyclone.psyGen.PSy`

    '''
    ocl_trans = GOOpenCLTrans()
    fold_trans = FoldConditionalReturnExpressionsTrans()
    move_boundaries_trans = GOMoveIterationBoundariesInsideKernelTrans()

    # Get the Schedule associated with the first Invoke
    invoke = psy.invokes.invoke_list[0]
    sched = invoke.schedule

    # Provide kernel-specific OpenCL optimization options
    for idx, kern in enumerate(sched.kernels()):
        # Move the PSy-layer loop boundaries inside the kernel as a kernel
        # mask, this allows to iterate through the whole domain
        move_boundaries_trans.apply(kern)
        # Change the syntax to remove the return statements introduced by the
        # previous transformation
        fold_trans.apply(kern.get_kernel_schedule())
        # Specify the OpenCL queue and workgroup size of the kernel
        # In this case we dispatch each kernel in a different queue to check
        # that the output code has the necessary barriers to guarantee the
        # kernel execution order.
        kern.set_opencl_options({"queue_number": idx+1, 'local_size': 4})

    # Transform the Schedule
    ocl_trans.apply(sched, options={"end_barrier": True})

OpenCL delays the decision of which and where kernels will execute until run-time, therefore it is important to use the environment variables provided by FortCL and DL_ESM_INF to inform how things should execute. Specifically:

FORTCL_KERNELS_FILE: Point to the file containing the kernels to execute, they can be compiled ahead-of-time or providing the source for JIT compilation. To link more than a single kernel, one must merge all the kernels generated by PSyclone in a single source file.
FORTCL_PLATFORM: If the system has more than 1 OpenCL platform. This environment variable may be used to select which platform on which to execute the kernels.
DL_ESM_ALIGNMENT: When using OpenCL <= 1.2 the local_size should be exactly divisible by the total size. If this is not the case some implementations fail silently. A way to solve this issue is to set the DL_ESM_ALIGNMENT variable to be equal to the local size.

Note

The OpenCL generation can be combined with distributed memory generation. In the case where there is more than one accelerator available on each node, the PSyclone configuration file parameter OCL_DEVICES_PER_NODE has to be set to the appropriate value and the number of MPI-ranks-per-node set by the mpirun command has to match this value accordingly.

For instance if there are 2 accelerators per nodes, psyclone.cfg should have OCL_DEVICES_PER_NODE=2 and the program must be executed with mpirun -n <total_ranks> -ppn 2 ./application (Note: -ppn is an Intel MPI specific parameter, use equivalent configuration parameters for other MPI implementations.)

For example, an execution of a PSyclone generated OpenCL code using all the mentioned run-time configuration options could look something like:

FORTCL_PLATFORM=3 FORTCL_KERNELS_FILE=allkernels.cl DL_ESM_ALIGNMENT=64 \
mpirun -n 2 ./application.exe

OpenACC

PSyclone supports the generation of code targetting GPUs through the addition of OpenACC directives. This is achieved by a user applying various OpenACC transformations to the PSyIR before the final Fortran code is generated. The steps to parallelisation are very similar to those in OpenMP with the added complexity of managing the movement of data to and from the GPU device. For the latter task PSyclone provides the ACCDataTrans and ACCEnterDataTrans transformations, as described in the Standard Functionality Section above. These two transformations add statically- and dynamically-scoped data regions, respectively. The former manages what data is on the remote device for a specific section of code while the latter allows run-time control of data movement. This second option is essential for minimising data movement as, without it, PSyclone-generated code would move data to and from the device upon every entry/exit of an Invoke. The first option is mainly provided as an aid to incremental porting and/or debugging of an OpenACC application as it provides explicit control over what data is present on a device for a given (part of an) Invoke routine.

The PGI compiler provides an alternative approach to controlling data movement through its ‘unified memory’ option (-ta=tesla:managed). When this is enabled the compiler itself takes on the task of ensuring that data is copied to/from the GPU when required. (Note that this approach can struggle with Fortran code containing derived types however.)

As well as ensuring the correct data is copied to and from the remote device, OpenACC directives must also be added to a code in order to tell the compiler how it should be parallelised. PSyclone provides the ACCKernelsTrans, ACCParallelTrans and ACCLoopTrans transformations for this purpose. The simplest of these is ACCKernelsTrans (currently only supported for the NEMO and Dynamo0.3 APIs) which encloses the code represented by a sub-tree of the PSyIR within an OpenACC kernels region. This essentially gives free-reign to the compiler to automatically parallelise any suitable loops within the specified region. An example of the use of ACCDataTrans and ACCKernelsTrans may be found in PSyclone/examples/nemo/eg3 and an example of ACCKernelsTrans may be found in PSyclone/examples/lfric/eg14.

However, as with any “automatic” approach, a more performant solution can almost always be obtained by providing the compiler with more explicit direction on how to parallelise the code. The ACCParallelTrans and ACCLoopTrans transformations allow the user to define thread-parallel regions and, within those, define which loops should be parallelised. For an example of their use please see PSyclone/examples/gocean/eg2 or PSyclone/examples/lfric/eg14.

In order for a given section of code to be executed on a GPU, any routines called from within that section must also have been compiled for the GPU. This then requires either that any such routines are in-lined or that the OpenACC routine directive be added to any such routines. This situation will occur routinely in those PSyclone APIs that use the PSyKAl separation of concerns since the user-supplied kernel routines are called from within PSyclone-generated loops in the PSy layer. PSyclone therefore provides the ACCRoutineTrans transformation which, given a Kernel node in the PSyIR, creates a new version of that kernel with the routine directive added. See either PSyclone/examples/gocean/eg2 or PSyclone/examples/lfric/eg14 for an example.

SIR

It is currently not possible for PSyclone to output SIR code without using a script. Three examples of such scripts are given in example 4 for the NEMO API. The first sir_trans.py simply outputs SIR. This will raise an exception if used with the tracer advection example as the example contains array-index notation which is not supported by the SIR backend, but will generate code for the other examples. The second, sir_trans_loop.py includes transformations to hoist code out of a loop, translate array-index notation into explicit loops and translate a single access to an array dimension to a one-trip loop (to make the code suitable for the SIR backend). This works with the tracer-advection example. The third script sir_trans_all.py additionally replaces any intrinsics with equivalent code and can also be used with the tracer-advection example (and the intrinsic_example.f90 example).