PSyIR : The PSyclone Internal Representation

The PSyIR is at the heart of PSyclone, representing code (at both the PSy- and kernel-layer levels) in a language-agnostic form. A PSyIR may be constructed from scratch (in Python) or by processing existing source code using a frontend. Transformations act on the PSyIR and ultimately the generated code is produced by one of the PSyIR’s backends.

PSyIR Nodes

The PSyIR consists of classes whose instances can be connected together to form a tree which represent computation in a language-independent way. These classes all inherit from the Node baseclass and, as a result, PSyIR instances are often referred to collectively as ‘PSyIR nodes’.

At the present time PSyIR classes can be essentially split into two types. PSy-layer classes and Kernel-layer classes. PSy-layer classes make use of a gen_code() or an update() method to create Fortran code whereas Kernel-layer classes make use of PSyIR backends to create code.

Note

This separation will be removed in the future and eventually all PSyIR classes will make use of backends with the expectation that gen_code() and update() methods will be removed. Further this separation will be superceded by a separation between language-level PSyIR and domain-specific PSyIR.

PSy-layer nodes

PSy-layer PSyIR classes are primarily used to create the PSy-layer. These tend to be relatively descriptive and do not specify how a particular PSyclone frontend would implement them. With the exception of Loop, these classes are currently not compatible with the PSyIR backends. The generic (non-api-specific) PSy-layer PSyIR nodes are: InvokeSchedule, Directive, GlobalSum, HaloExchange, Loop and Kern. The Directive class is subclassed into many directives associated with OpenMP and OpenACC. The Kern class is subclassed into CodedKern, InlinedKern and BuiltinKern.

Kernel-layer nodes

Kernel-layer PSyIR classes are currently used to describe existing code in a language independent way. Consequently these nodes are more prescriptive and are independent of a particular PSyclone frontend. These nodes are designed to be used with PSyIR backends. Two PSy-layer classes (Loop and Schedule) can also be used as Kernel-layer classes. Additionally, the Schedule class is further subclassed into a Routine and then a kernel-layer KernelSchedule. In addition to KernelSchedule, Kernel-layer PSyIR nodes are: Loop, IfBlock, CodeBlock, Assignment, Range, Reference, Operation, Literal, Call, Return and Container. The Reference class is further subclassed into ArrayReference, StructureReference and ArrayOfStructuresReference, the Operation class is further subclassed into UnaryOperation, BinaryOperation and NaryOperation and the Container class is further subclassed into FileContainer (representing a file that may contain more than one Container and/or Routine. Those nodes representing references to structures (derived types in Fortran) have a Member child node representing the member of the structure being accessed. The Member class is further subclassed into StructureMember (representing a member of a structure that is itself a structure), ArrayMember (a member of a structure that is an array of primitive types) and ArrayOfStructuresMember (a member of a structure this is itself an array of structures).

Node Descriptions

The Range node

class psyclone.psyir.nodes.Range(ast=None, children=None, parent=None, annotations=None)[source]

The Range node is used to capture a range of integers via start, stop and step expressions. For example, start=2, stop=6 and step=2 indicates the values 2, 4 and 6.

At the moment the only valid use of Range in the PSyIR is to describe a set of accesses to an Array dimension (so-called array notation in Fortran). Therefore, the parent of a Range node should only be an Array node.

The Range node has three children nodes, the first child captures the start of the range, the second child captures the end of the range and the third captures the step within the range.

The nodes for each of the children must return an integer. Potentially valid nodes are therefore Literal, Reference, Operation and CodeBlock.

A common use case is to want to specify all the elements of a given array dimension without knowing the extent of that dimension. In the PSyIR this is achieved by using the LBOUND, and UBOUND binary operators:

>>> one = Literal("1", INTEGER_TYPE)
>>> # Declare a 1D real array called 'a' with 10 elements
>>> symbol = DataSymbol("a", ArrayType(REAL_TYPE, [10]))
>>> # Return the lower bound of the first dimension of array 'a'
>>> lbound = BinaryOperation.create(
        BinaryOperation.Operator.LBOUND,
        Reference(symbol), one)
>>> # Return the upper bound of the first dimension of array 'a'
>>> ubound = BinaryOperation.create(
        BinaryOperation.Operator.UBOUND,
        Reference(symbol), one)
>>> # Step defaults to 1 so no need to include it when creating range
>>> my_range = Range.create(lbound, ubound)
>>> # Create an access to all elements in the 1st dimension of array 'a'
>>> array_access = Array.create(symbol, [my_range])

In Fortran the above access array_access can be represented by a(:). The Fortran front-ends and back-ends are aware of array notation. Therefore the Fortran frontend is able to convert array notation to PSyIR and the Fortran backend is able to convert PSyIR back to array notation.

static create(start, stop, step=None)[source]

Create an internally-consistent Range object. If no step is provided then it defaults to an integer Literal with value 1.

Parameters
  • start (psyclone.psyir.nodes.Node) – the PSyIR for the start value.

  • stop (psyclone.psyir.nodes.Node) – the PSyIR for the stop value.

  • step (psyclone.psyir.nodes.Node or NoneType) – the PSyIR for the increment/step or None.

  • parent (psyclone.psyir.nodes.Node or NoneType) – the parent node of this Range in the PSyIR.

Returns

a fully-populated Range object.

Return type

psyclone.psyir.nodes.ranges.Range

property start

Checks that this Range is valid and then returns the PSyIR for the starting value of the range.

Returns

the starting value of this range.

Return type

psyclone.psyir.nodes.Node

property step

Checks that this Range is valid and then returns the step (increment) value/expression.

Returns

the increment used in this range.

Return type

psyclone.psyir.nodes.Node

property stop

Checks that this Range is valid and then returns the end value/expression.

Returns

the end value of this range.

Return type

psyclone.psyir.nodes.Node

Text Representation

When developing a transformation script it is often necessary to examine the structure of the PSyIR. All nodes in the PSyIR have the view method that writes a text-representation of that node and all of its descendants to stdout. If the termcolor package is installed (see Getting Going) then colour highlighting is used for this output. For instance, part of the Schedule constructed for the second NEMO example is rendered as:

_images/schedule_with_indices.png

Note that in this view, only those nodes which are children of Schdules have their indices shown. This means that nodes representing e.g. loop bounds or the conditional part of if statements are not indexed. For the example shown, the PSyIR node representing the if(l_hst) code would be reached by schedule.children[6].if_body.children[1] or, using the shorthand notation (see below), schedule[6].if_body[1] where schedule is the overall parent Schedule node (omitted from the above image).

Tree Navigation

Each PSyIR node provides several ways to navigate the AST:

The children and parent properties (available in all nodes) provide an homogeneous method to go up and down the tree hierarchy. This method is recommended when applying general operations or analysis to the tree, however, if one intends to navigate the tree in a way that depends on the type of node, the children and parent methods should be avoided. The structure of the tree may change in different versions of PSyclone and the encoded navigation won’t be future-proof.

To solve this issue some Nodes also provide methods for semantic navigation:

  • Schedule:

    subscript operator for indexing the statements (children) inside the Schedule, e.g. sched[3] or sched[2:4].

  • Assignment:
    Assignment.lhs()
    Returns

    the child node representing the Left-Hand Side of the assignment.

    Return type

    psyclone.psyir.nodes.Node

    Raises

    InternalError – Node has fewer children than expected.

    Assignment.rhs()
    Returns

    the child node representing the Right-Hand Side of the assignment.

    Return type

    psyclone.psyir.nodes.Node

    Raises

    InternalError – Node has fewer children than expected.

  • IfBlock:
    IfBlock.condition()

    Return the PSyIR Node representing the conditional expression of this IfBlock.

    Returns

    IfBlock conditional expression.

    Return type

    psyclone.psyir.nodes.Node

    Raises

    InternalError – If the IfBlock node does not have the correct number of children.

    IfBlock.if_body()

    Return the Schedule executed when the IfBlock evaluates to True.

    Returns

    Schedule to be executed when IfBlock evaluates to True.

    Return type

    psyclone.psyir.nodes.Schedule

    Raises

    InternalError – If the IfBlock node does not have the correct number of children.

    IfBlock.else_body()

    If available return the Schedule executed when the IfBlock evaluates to False, otherwise return None.

    Returns

    Schedule to be executed when IfBlock evaluates to False, if it doesn’t exist returns None.

    Return type

    psyclone.psyir.nodes.Schedule or NoneType

  • Array nodes (e.g. ArrayReference, ArrayOfStructuresReference):
    ArrayReference.indices()

    Supports semantic-navigation by returning the list of nodes representing the index expressions for this array reference.

    Returns

    the PSyIR nodes representing the array-index expressions.

    Return type

    list of psyclone.psyir.nodes.Node

    Raises

    InternalError – if this node has no children or if they are not valid array-index expressions.

  • RegionDirective:
    RegionDirective.dir_body()
    Returns

    the Schedule associated with this directive.

    Return type

    psyclone.psyir.nodes.Schedule

    Raises

    InternalError – if this node does not have a Schedule as its first child.

    RegionDirective.clauses()
    Returns

    the Clauses associated with this directive.

    Return type

    List of psyclone.psyir.nodes.Clause

  • Nodes representing accesses of data within a structure (e.g. StructureReference, StructureMember):
    StructureReference.member()
    Returns

    the member of the structure that this reference is to.

    Return type

    psyclone.psyir.nodes.Member

    Raises

    InternalError – if the first child of this node is not an instance of Member.

These are the recommended methods to navigate the tree for analysis or operations that depend on the Node type.

Additionally, the walk method (available in all nodes) is able to recurse through the tree and return objects of a given type. This is useful when the objective is to move down the tree to a specific node or list of nodes without information about the exact location.

Node.walk(my_type, stop_type=None)[source]

Recurse through the PSyIR tree and return all objects that are an instance of ‘my_type’, which is either a single class or a tuple of classes. In the latter case all nodes are returned that are instances of any classes in the tuple. The recursion into the tree is stopped if an instance of ‘stop_type’ (which is either a single class or a tuple of classes) is found. This can be used to avoid analysing e.g. inlined kernels, or as performance optimisation to reduce the number of recursive calls.

Parameters
  • my_type (type | Tuple[type, ...]) – the class(es) for which the instances are collected.

  • stop_type (Optional[type | Tuple[type, ...]]) – class(es) at which recursion is halted (optional).

Returns

list with all nodes that are instances of my_type starting at and including this node.

Return type

List[psyclone.psyir.nodes.Node]

Finally, all nodes also provide the ancestor method which may be used to recurse back up the tree from a given node in order to find a node of a particular type:

Node.ancestor(my_type, excluding=None, include_self=False, limit=None)[source]

Search back up the tree and check whether this node has an ancestor that is an instance of the supplied type. If it does then we return it otherwise we return None. An individual (or tuple of) (sub-) class(es) to ignore may be provided via the excluding argument. If include_self is True then the current node is included in the search. If limit is provided then the search ceases if/when the supplied node is encountered.

Parameters
  • my_type (type | Tuple[type, ...]) – class(es) to search for.

  • excluding (Optional[type | Tuple[type, ...]]) – (sub-)class(es) to ignore or None.

  • include_self (bool) – whether or not to include this node in the search.

  • limit (Optional[psyclone.psyir.nodes.Node]) – an optional node at which to stop the search.

Returns

First ancestor Node that is an instance of any of the requested classes or None if not found.

Return type

Optional[psyclone.psyir.nodes.Node]

Raises
  • TypeError – if excluding is provided but is not a type or tuple of types.

  • TypeError – if limit is provided but is not an instance of Node.

DataTypes

The PSyIR supports the following datatypes: ScalarType, ArrayType, StructureType, DeferredType, UnknownType and NoType. These datatypes are used when creating instances of DataSymbol, RoutineSymbol and Literal (although note that NoType may only be used with a RoutineSymbol). DeferredType and UnknownType are both used when processing existing code. The former is used when a symbol is being imported from some other scope (e.g. via a USE statement in Fortran) that hasn’t yet been resolved and the latter is used when an unsupported form of declaration is encountered.

More information on each of these various datatypes is given in the following subsections.

Scalar DataType

A Scalar datatype consists of an intrinsic and a precision.

The intrinsic can be one of INTEGER, REAL, BOOLEAN and CHARACTER.

The precision can be UNDEFINED, SINGLE, DOUBLE, an integer value specifying the precision in bytes, or a datasymbol (see Section Symbols and Symbol Tables) that contains precision information. Note that UNDEFINED, SINGLE and DOUBLE allow the precision to be set by the system so may be different for different architectures. For example:

>>> char_type = ScalarType(ScalarType.Intrinsic.CHARACTER,
...                        ScalarType.Precision.UNDEFINED)
>>> int_type = ScalarType(ScalarType.Intrinsic.INTEGER,
...                       ScalarType.Precision.SINGLE)
>>> bool_type = ScalarType(ScalarType.Intrinsic.BOOLEAN, 4)
>>> symbol = DataSymbol("rdef", int_type, constant_value=4)
>>> scalar_type = ScalarType(ScalarType.Intrinsic.REAL, symbol)

For convenience PSyclone predefines a number of scalar datatypes:

REAL_TYPE, INTEGER_TYPE, BOOLEAN_TYPE and CHARACTER_TYPE all have precision set to UNDEFINED;

REAL_SINGLE_TYPE, REAL_DOUBLE_TYPE, INTEGER_SINGLE_TYPE and INTEGER_DOUBLE_TYPE;

REAL4_TYPE, REAL8_TYPE, INTEGER4_TYPE and INTEGER8_TYPE.

Array DataType

An Array datatype itself has another datatype (or DataTypeSymbol) specifying the type of its elements and a shape. The shape can have an arbitrary number of dimensions. Each dimension captures what is known about its extent. It is necessary to distinguish between four cases:

Description

Entry in shape list

An array has a static extent known at compile time.

ArrayType.ArrayBounds containing integer Literal values

An array has an extent defined by another symbol or (constant) PSyIR expression.

ArrayType.ArrayBounds containing Reference or Operation nodes

An array has a definite extent which is not known at compile time but can be queried at runtime.

ArrayType.Extent.ATTRIBUTE

It is not known whether an array has memory allocated to it in the current scoping unit.

ArrayType.Extent.DEFERRED

where ArrayType.ArrayBounds is a namedtuple with lower and upper members holding the lower- and upper-bounds of the extent of a given array dimension.

The distinction between the last two cases is that in the former the extents are known but are kept internally with the array (for example an assumed shape array in Fortran) and in the latter the array has not yet been allocated any memory (for example the declaration of an allocatable array in Fortran) so the extents may have not been defined yet.

For example:

>>> array_type = ArrayType(REAL4_TYPE, [5, 10])

>>> n_var = DataSymbol("n", INTEGER_TYPE)
>>> array_type = ArrayType(INTEGER_TYPE, [Reference(n_var),
...                                       Reference(n_var)])

>>> array_type = ArrayType(REAL8_TYPE, [ArrayType.Extent.ATTRIBUTE,
...                                     ArrayType.Extent.ATTRIBUTE])

>>> array_type = ArrayType(BOOLEAN_TYPE, [ArrayType.Extent.DEFERRED])

Structure Datatype

A Structure datatype consists of a dictionary of components where the name of each component is used as the corresponding key. Each component is stored as a named tuple with name, datatype and visibility members.

For example:

# Shorthand for a scalar type with REAL_KIND precision
SCALAR_TYPE = ScalarType(ScalarType.Intrinsic.REAL, REAL_KIND)

# Structure-type definition
GRID_TYPE = StructureType.create([
    ("dx", SCALAR_TYPE, Symbol.Visibility.PUBLIC),
    ("dy", SCALAR_TYPE, Symbol.Visibility.PUBLIC)])

GRID_TYPE_SYMBOL = DataTypeSymbol("grid_type", GRID_TYPE)

# A structure-type containing other structure types
FIELD_TYPE_DEF = StructureType.create(
    [("data", ArrayType(SCALAR_TYPE, [10]), Symbol.Visibility.PUBLIC),
     ("grid", GRID_TYPE_SYMBOL, Symbol.Visibility.PUBLIC),
     ("sub_meshes", ArrayType(GRID_TYPE_SYMBOL, [3]),
      Symbol.Visibility.PUBLIC),
     ("flag", INTEGER4_TYPE, Symbol.Visibility.PUBLIC)])

Unknown DataType

If a PSyIR frontend encounters an unsupported declaration then the corresponding Symbol is given UnknownType. The text of the original declaration is stored in the type object and is available via the declaration property.

NoType

NoType represents the empty type, equivalent to void in C. It is currently only used to describe a RoutineSymbol that has no return type (such as a Fortran subroutine).

Symbols and Symbol Tables

Some PSyIR nodes have an associated Symbol Table (psyclone.psyir.symbols.SymbolTable) which keeps a record of the Symbols (psyclone.psyir.symbols.Symbol) specified and used within them.

Symbol Tables can be nested (i.e. a node with an attached symbol table can be an ancestor or descendent of a node with an attached symbol table). If the same symbol name is used in a hierachy of symbol tables then the symbol within the symbol table attached to the closest ancestor node is in scope. By default, symbol tables are aware of other symbol tables and will return information about relevant symbols from all symbol tables.

The SymbolTable has the following interface:

class psyclone.psyir.symbols.SymbolTable(node=None, default_visibility=Visibility.PUBLIC)[source]

Encapsulates the symbol table and provides methods to add new symbols and look up existing symbols. Nested scopes are supported and, by default, the add and lookup methods take any ancestor symbol tables into consideration (ones attached to nodes that are ancestors of the node that this symbol table is attached to). If the default visibility is not specified then it defaults to Symbol.Visbility.PUBLIC.

Parameters
  • node (psyclone.psyir.nodes.Schedule, psyclone.psyir.nodes.Container or NoneType) – reference to the Schedule or Container to which this symbol table belongs.

  • default_visibility – optional default visibility value for this symbol table, if not provided it defaults to PUBLIC visibility.

Raises

TypeError – if node argument is not a Schedule or a Container.

Where each element is a Symbol with an immutable name:

class psyclone.psyir.symbols.Symbol(name, visibility=Visibility.PUBLIC, interface=None)[source]

Generic Symbol item for the Symbol Table and PSyIR References. It has an immutable name label because it must always match with the key in the SymbolTable. If the symbol is private then it is only visible to those nodes that are descendants of the Node to which its containing Symbol Table belongs.

Parameters
  • name (str) – name of the symbol.

  • visibility (psyclone.psyir.symbols.Symbol.Visibility) – the visibility of the symbol.

  • interface (psyclone.psyir.symbols.symbol.SymbolInterface) – optional object describing the interface to this symbol (i.e. whether it is passed as a routine argument or accessed in some other way). Defaults to psyclone.psyir.symbols.LocalInterface

Raises

TypeError – if the name is not a str.

There are several Symbol sub-classes to represent different labeled entities in the PSyIR. At the moment the available symbols are:

  • class psyclone.psyir.symbols.ContainerSymbol(name, wildcard_import=False, **kwargs)[source]

    Symbol that represents a reference to a Container. The reference is lazy evaluated, this means that the Symbol will be created without parsing and importing the referenced container, but this can be imported when needed.

    Parameters
    • name (str) – name of the symbol.

    • wildcard_import (bool) – if all public Symbols of the Container are imported into the current scope. Defaults to False.

    • kwargs (unwrapped dict.) – additional keyword arguments provided by psyclone.psyir.symbols.Symbol.

  • class psyclone.psyir.symbols.DataSymbol(name, datatype, constant_value=None, **kwargs)[source]

    Symbol identifying a data element. It contains information about: the datatype, the shape (in column-major order) and the interface to that symbol (i.e. Local, Global, Argument).

    Parameters
    • name (str) – name of the symbol.

    • datatype (psyclone.psyir.symbols.DataType) – data type of the symbol.

    • constant_value (NoneType, item of TYPE_MAP_TO_PYTHON or psyclone.psyir.nodes.Node) – sets a fixed known expression as a permanent value for this DataSymbol. If the value is None then this symbol does not have a fixed constant. Otherwise it can receive PSyIR expressions or Python intrinsic types available in the TYPE_MAP_TO_PYTHON map. By default it is None.

    • kwargs (unwrapped dict.) – additional keyword arguments provided by psyclone.psyir.symbols.TypedSymbol

  • class psyclone.psyir.symbols.RoutineSymbol(name, datatype=None, **kwargs)[source]

    Symbol identifying a callable routine.

    Parameters
    • name (str) – name of the symbol.

    • datatype (psyclone.psyir.symbols.DataType) – data type of the symbol. Default to NoType().

    • kwargs (unwrapped dict.) – additional keyword arguments provided by psyclone.psyir.symbols.TypedSymbol

See the reference guide for the full API documentation of the SymbolTable and the Symbol types.

Symbol Interfaces

Each symbol has a Symbol Interface with the information about how the variable data is provided into the local context. The currently available Interfaces are:

  • class psyclone.psyir.symbols.LocalInterface[source]

    The symbol just exists in the Local context

  • class psyclone.psyir.symbols.ImportInterface(container_symbol)[source]

    Describes the interface to a Symbol that is imported from an external PSyIR container.

    Parameters

    container_symbol (psyclone.psyir.symbols.ContainerSymbol) – symbol representing the external container from which the symbol is imported.

    Raises

    TypeError – if the container_symbol is not a ContainerSymbol.

  • class psyclone.psyir.symbols.ArgumentInterface(access=None)[source]

    Captures the interface to a Symbol that is accessed as a routine argument.

    Parameters

    access (psyclone.psyir.symbols.ArgumentInterface.Access) – specifies how the argument is used in the Schedule

  • class psyclone.psyir.symbols.UnresolvedInterface[source]

    We have a symbol but we don’t know where it is declared.

Creating PSyIR

Symbol names

PSyIR symbol names can be specified by a user. For example:

var_name = "my_name"
symbol_table = SymbolTable()
data = DataSymbol(var_name, REAL_TYPE)
symbol_table.add(data)
reference = Reference(data)

However, the SymbolTable add() method will raise an exception if a user tries to add a symbol with the same name as a symbol already existing in the symbol table.

Alternatively, the SymbolTable also provides the new_symbol() method (see Section Symbols and Symbol Tables for more details) that uses a new distinct name from any existing names in the symbol table. By default the generated name is the value PSYIR_ROOT_NAME variable specified in the DEFAULT section of the PSyclone config file, followed by an optional “_” and an integer. For example, the following code:

from psyclone.psyir.symbols import SymbolTable
symbol_table = SymbolTable()
for i in range(0, 3):
    var_name = symbol_table.new_symbol().name
    print(var_name)

gives the following output:

psyir_tmp
psyir_tmp_0
psyir_tmp_1

As the root name (psyir_tmp in the example above) is specified in PSyclone’s config file it can be set to whatever the user wants.

Note

The particular format used to create a unique name is the responsibility of the SymbolTable class and may change in the future.

A user might want to create a name that has some meaning in the context in which it is used e.g. idx for an index, i for an iterator, or temp for a temperature field. To support more readable names, the new_symbol() method allows the user to specify a root name as an argument to the method which then takes the place of the default root name. For example, the following code:

from psyclone.psyir.symbols import SymbolTable
symbol_table = SymbolTable()
for i in range(0, 3):
    var_name = symbol_table.new_symbol(root_name="something")
    print(var_name)

gives the following output:

something
something_0
something_1

By default, new_symbol() creates generic symbols, but often the user will want to specify a Symbol subclass with some given parameters. The new_symbol() method accepts a symbol_type parameter to specify the subclass. Arguments for the constructor of that subclass may be supplied as keyword arguments. For example, the following code:

from psyclone.psyir.symbols import SymbolTable, DataSymbol, REAL_TYPE
symbol_table = SymbolTable()
symbol_table.new_symbol(root_name="something",
                        symbol_type=DataSymbol,
                        datatype=REAL_TYPE,
                        constant_value=3)

declares a symbol named “something” of REAL_TYPE datatype where the constant_value argument will be passed to the DataSymbol constructor.

An example of using the new_symbol() method can be found in the PSyclone examples/psyir directory.

Nodes

PSyIR nodes are connected together via parent and child methods provided by the Node baseclass.

These nodes can be created in isolation and then connected together. For example:

assignment = Assignment()
literal = Literal("0.0", REAL_TYPE)
reference = Reference(symbol)
assignment.children = [reference, literal]

However, as connections get more complicated, creating the correct connections can become difficult to manage and error prone. Further, in some cases children must be collected together within a Schedule (e.g. for IfBlock and for Loop).

To simplify this complexity, each of the Kernel-layer nodes which contain other nodes have a static create method which helps construct the PSyIR using a bottom up approach. Using this method, the above example then becomes:

literal = Literal("0.0", REAL_TYPE)
reference = Reference(symbol)
assignment = Assignment.create(reference, literal)

Creating the PSyIR to represent a complicated access of a member of a structure is best performed using the create() method of the appropriate Reference subclass. For a relatively straightforward access such as (the Fortran) field1%region%nx, this would be:

from psyclone.psyir.nodes import StructureReference
fld_sym = symbol_table.lookup("field1")
ref = StructureReference.create(fld_sym, ["region", "nx"])

where symbol_table is assumed to be a pre-populated Symbol Table containing an entry for “field1”.

A more complicated access involving arrays of structures such as field1%sub_grids(idx, 1)%nx would be constructed as:

from psyclone.psyir.symbols import INTEGER_TYPE
from psyclone.psyir.nodes import StructureReference, Reference, Literal
idx_sym = symbol_table.lookup("idx")
fld_sym = symbol_table.lookup("field1")
ref = StructureReference.create(fld_sym,
    [("sub_grids", [Reference(idx_sym), Literal("1", INTEGER_TYPE)]),
     "nx"])

Note that the list of quantities passed to the create() method now contains a 2-tuple in order to describe the array access.

More examples of using this approach can be found in the PSyclone examples/psyir directory.

Comparing PSyIR nodes

The == (equality) operator for PSyIR nodes performs a specialised equality check to compare the value of each node. This is also useful when comparing entire subtrees since the equality operator automatically recurses through the children and compares each child with the appropriate equality semantics, e.g.

# Is the loop upper bound expression exactly the same?
if loop1.stop_expr == loop2.stop_expr:
        print("Same upper bound!")

The equality operator will handle expressions like my_array%my_field(:3) with the derived type fields and the range components automatically, but it cannot handle symbolically equivalent fields, i.e. my_array%my_field(:3) != my_array%my_field(:2+1).

Annotations and code comments are ignored in the equality comparison since they don’t alter the semantic meaning of the code. So these two statements compare to True:

a = a + 1
a = a + 1 !Increases a by 1

Sometimes there are cases where one really means to check for the specific instance of a node. In this case, Python provides the is operator, e.g.

# Is the self instance part of this routine?
is_here = any(node is self for node in routine.walk(Node))

Additionally, PSyIR nodes cannot be used as map keys or similar. The easiest way to do this is just use the id as the key:

node_map = {}
node_map[id(mynode)] = "element"

Modifying the PSyIR

Once we have a complete PSyIR AST there are 2 ways to modify its contents and/or structure: by applying transformations (see next section Transformations), or by direct PSyIR API methods. This section describes some of the methods that the PSyIR classes provide to modify the PSyIR AST in a consistent way (e.g. without breaking its many internal references). Some complete examples of modifying the PSyIR can be found in the PSyclone examples/psyir/modify.py script.

The rest of this section introduces examples of the available direct PSyIR modification methods.

Renaming symbols

The symbol table provides the method rename_symbol() that given a symbol and an unused name will rename the symbol. The symbol renaming will affect all the references in the PSyIR AST to that symbol. For example, the PSyIR representing the following Fortran code:

subroutine work(psyir_tmp)
    real, intent(inout) :: psyir_tmp
    psyir_tmp=0.0
end subroutine

could be modified by the following PSyIR statements:

symbol = symbol_table.lookup("psyir_tmp")
symbol_table.rename_symbol(tmp_symbol, "new_variable")

which would result in the following Fortran output code:

subroutine work(new_variable)
    real, intent(inout) :: new_variable
    new_variable=0.0
end subroutine

Specialising symbols

The Symbol class provides the method specialise() that given a subclass of Symbol will change the Symbol instance to the specified subclass. If the subclass has any additional properties then these would need to be set explicitly.

symbol = Symbol("name")
symbol.specialise(RoutineSymbol)
# Symbol is now a RoutineSymbol

This method is useful as it allows the class of a symbol to be changed without affecting any references to it.

Replacing PSyIR nodes

In certain cases one might want to replace a node in a PSyIR tree with another node. All nodes provide the replace_with() method to replace the node and its descendants with another given node and its descendants.

node.replace_with(new_node)

Detaching PSyIR nodes

Sometimes we just may wish to detach a certain PSyIR subtree in order to remove it from the root tree but we don’t want to delete it altogether, as it may be re-inserted again in another location. To achieve this, all nodes provide the detach method:

tmp = node.detach()

Copying nodes

Copying a PSyIR node and its children is often useful in order to avoid repeating the creation of similar PSyIR subtrees. The result of the copy allows the modification of the original and the copied subtrees independently, without altering the other subtree. Note that this is not equivalent to the Python copy or deepcopy functionality provided in the copy library. This method performs a bespoke copy operation where some components of the tree, like children, are recursively copied, while others, like the top-level parent reference are not.

new_node = node.copy()

Named arguments

The Call and three sub-classes of Operation node (UnaryOperation, BinaryOperation and NaryOperation) all support named arguments.

Named arguments can be set or modified via the create(), append_named_arg(), insert_named_arg() or replace_named_arg() methods.

If an argument is inserted directly (via the children list) then it is assumed that this is not a named argument. If the top node of an argument is replaced then it is assumed that this argument is no longer a named argument. If arguments are re-ordered then the names follow the re-ordering.

The names of named arguments can be accessed via the argument_names property. This list has an entry for each argument and either contains a name or None (if this is not a named argument).

The PSyIR does not constrain which arguments are specified as being named and what those names are. It is the developer’s responsibility to make sure that these names are consistent with any intrinsics that will be generated by the back-end. In the future, it is expected that the PSyIR will know about the number and type of arguments expected by Operation nodes, beyond simply being unary, binary or nary.

One restriction that Fortran has (but the PSyIR does not) is that all named arguments should be at the end of the argument list. If this is not the case then the Fortran backend writer will raise an exception.