PSyIR : The PSyclone Internal Representation
The PSyIR is at the heart of PSyclone, representing code (at both the PSy- and kernel-layer levels) in a language-agnostic form. A PSyIR may be constructed from scratch (in Python) or by processing existing source code using a frontend. Transformations act on the PSyIR and ultimately the generated code is produced by one of the PSyIR’s backends.
PSyIR Nodes
The PSyIR consists of classes whose instances can be connected
together to form a tree which represent computation in a
language-independent way. These classes all inherit from the Node
baseclass and, as a result, PSyIR instances are often referred to
collectively as ‘PSyIR nodes’.
At the present time PSyIR classes can be essentially split into two
types. PSy-layer classes and Kernel-layer classes. PSy-layer classes
make use of a gen_code()
method to create Fortran code whereas
Kernel-layer classes make use of PSyIR backends to create code.
Note
This separation will be removed in the future and eventually
all PSyIR classes will make use of backends with the
expectation that gen_code()
methods
will be removed. Further this separation will be superseded
by a separation between language-level PSyIR
and
domain-specific PSyIR
.
PSy-layer nodes
PSy-layer PSyIR classes are primarily used to create the
PSy-layer. These tend to be relatively descriptive and do not specify
how a particular PSyclone frontend would implement them. With the
exception of Loop
, these classes are currently not compatible with
the PSyIR backends. The generic (non-api-specific) PSy-layer PSyIR
nodes are: InvokeSchedule
, Directive
, GlobalSum
,
HaloExchange
, Loop
and Kern
. The Directive
class is
subclassed into many directives associated with OpenMP and
OpenACC. The Kern
class is subclassed into CodedKern
,
InlinedKern
and BuiltinKern
.
Kernel-layer nodes
Kernel-layer PSyIR classes are currently used to describe existing
code in a language independent way. Consequently these nodes are more
prescriptive and are independent of a particular PSyclone
frontend. These nodes are designed to be used with PSyIR backends. Two
PSy-layer classes (Loop
and Schedule
) can also be used as
Kernel-layer classes. Additionally, the Schedule
class is further
subclassed into a Routine
and then a kernel-layer
KernelSchedule
. In addition to KernelSchedule
, Kernel-layer
PSyIR nodes are: Loop
, WhileLoop
, IfBlock
, CodeBlock
,
Assignment
, Range
, Reference
, Operation
, Literal
, Call
,
Return
and Container
. The Reference
class is further
subclassed into ArrayReference
, StructureReference
and
ArrayOfStructuresReference
, the Operation
class is further
subclassed into UnaryOperation
, BinaryOperation
and
the Container
class is further subclassed
into FileContainer
(representing a file that may contain more than
one Container
and/or Routine
. Those nodes representing
references to structures (derived types in Fortran) have a Member
child node representing the member of the structure being
accessed. The Member
class is further subclassed into
StructureMember
(representing a member of a structure that is
itself a structure), ArrayMember
(a member of a structure that is
an array of primitive types) and ArrayOfStructuresMember
(a member
of a structure this is itself an array of structures).
Node Descriptions
The Range node
- class psyclone.psyir.nodes.Range(ast=None, children=None, parent=None, annotations=None)[source]
The
Range
node is used to capture a range of integers viastart
,stop
andstep
expressions. For example,start=2
,stop=6
andstep=2
indicates the values2
,4
and6
.At the moment the only valid use of
Range
in the PSyIR is to describe a set of accesses to an Array dimension (so-called array notation in Fortran). Therefore, the parent of aRange
node should only be anArray
node.The
Range
node has three children nodes, the first child captures the start of the range, the second child captures the end of the range and the third captures the step within the range.The nodes for each of the children must return an integer. Potentially valid nodes are therefore
Literal
,Reference
,Operation
andCodeBlock
.A common use case is to want to specify all the elements of a given array dimension without knowing the extent of that dimension. In the PSyIR this is achieved by using the
LBOUND
, andUBOUND
intrinsics:>>> one = Literal("1", INTEGER_TYPE) >>> # Declare a 1D real array called 'a' with 10 elements >>> symbol = DataSymbol("a", ArrayType(REAL_TYPE, [10])) >>> # Return the lower bound of the first dimension of array 'a' >>> lbound = IntrinsicCall.create( IntrinsicCall.Intrinsic.LBOUND, [Reference(symbol), one.copy()]) >>> # Return the upper bound of the first dimension of array 'a' >>> ubound = IntrinsicCall.create( IntrinsicCall.Intrinsic.UBOUND, [Reference(symbol), one.copy()]) >>> # Step defaults to 1 so no need to include it when creating range >>> my_range = Range.create(lbound, ubound) >>> # Create an access to all elements in the 1st dimension of array 'a' >>> array_access = ArrayReference.create(symbol, [my_range])
In Fortran the above access
array_access
can be represented bya(:)
. The Fortran front-ends and back-ends are aware of array notation. Therefore the Fortran frontend is able to convert array notation to PSyIR and the Fortran backend is able to convert PSyIR back to array notation.- static create(start, stop, step=None)[source]
Create an internally-consistent Range object. If no step is provided then it defaults to an integer Literal with value 1.
- Parameters:
start (
psyclone.psyir.nodes.Node
) – the PSyIR for the start value.stop (
psyclone.psyir.nodes.Node
) – the PSyIR for the stop value.step (
psyclone.psyir.nodes.Node
or NoneType) – the PSyIR for the increment/step or None.parent (
psyclone.psyir.nodes.Node
or NoneType) – the parent node of this Range in the PSyIR.
- Returns:
a fully-populated Range object.
- Return type:
- property start
Checks that this Range is valid and then returns the PSyIR for the starting value of the range.
- Returns:
the starting value of this range.
- Return type:
psyclone.psyir.nodes.Node
- property step
Checks that this Range is valid and then returns the step (increment) value/expression.
- Returns:
the increment used in this range.
- Return type:
psyclone.psyir.nodes.Node
- property stop
Checks that this Range is valid and then returns the end value/expression.
- Returns:
the end value of this range.
- Return type:
psyclone.psyir.nodes.Node
Text Representation
When developing a transformation script it is often necessary to examine
the structure of the PSyIR. All nodes in the PSyIR have the view
method
that writes a text-representation of that node and all of its
descendants to stdout. If the termcolor
package is installed
(see Getting Going) then colour highlighting is used for this
output. For instance, part of the Schedule constructed for the second NEMO
example is rendered as:
Note that in this view, only those nodes which are children of
Schedules have their indices shown. This means that nodes representing
e.g. loop bounds or the conditional part of if
statements are not
indexed. For the example shown, the PSyIR node representing the
if(l_hst)
code would be reached by
schedule.children[6].if_body.children[1]
or, using the shorthand
notation (see below), schedule[6].if_body[1]
where schedule
is
the overall parent Schedule node (omitted from the above image).
One problem with the view
method is that the output can become very
large for big ASTs and is not readable for users unfamiliar with the PSyIR.
An alternative to it is the debug_string
method that generates a
text representation with Fortran-like syntax but on which the high abstraction
constructs have not yet been lowered to Fortran level and instead they will be
embedded as < node > expressions.
Tree Interrogation
Each PSyIR node provides several ways to interrogate the AST:
Following the parent and children terminology, we define a node’s siblings as the children of its parent. Note that this definition implies that all nodes are their own siblings.
- property Node.siblings
- Returns:
list of sibling nodes, including self.
- Return type:
List[
psyclone.psyir.nodes.Node
]
We can check whether two nodes are siblings which immediately precede or follow one another using the following methods:
- Node.immediately_precedes(node_2)[source]
- Returns:
True if this node immediately precedes node_2, False otherwise
- Return type:
- Node.immediately_follows(node_1)[source]
- Returns:
True if this node immediately follows node_1, False otherwise
- Return type:
Finally, the get_sibling_lists method provides functionality to walk over the tree associated with a node and gather those which are immediate siblings.
- Node.get_sibling_lists(my_type, stop_type=None)[source]
Recurse through the PSyIR tree and return lists of Nodes that are instances of ‘my_type’ and are immediate siblings. Here ‘my_type’ is either a single class or a tuple of classes. In the latter case all nodes are returned that are instances of any classes in the tuple. The recursion into the tree is stopped if an instance of ‘stop_type’ (which is either a single class or a tuple of classes) is found.
- Parameters:
- Returns:
list of lists, each of which containing nodes that are instances of my_type and are immediate siblings, starting at and including this node.
- Return type:
List[List[
psyclone.psyir.nodes.Node
]]
DataTypes
The PSyIR supports the following datatypes: ScalarType
,
ArrayType
, StructureType
, UnresolvedType
, UnsupportedType
and NoType
. These datatypes are used when creating instances of
DataSymbol, RoutineSymbol and Literal (although note that NoType
may
only be used with a RoutineSymbol). UnresolvedType
and UnsupportedType
are both used when processing existing code. The former is used
when a symbol is being imported from some other scope (e.g. via a USE
statement in Fortran) that hasn’t yet been resolved and the latter is
used when an unsupported form of declaration is encountered.
More information on each of these various datatypes is given in the following subsections.
Scalar DataType
A Scalar datatype consists of an intrinsic and a precision.
The intrinsic can be one of INTEGER
, REAL
, BOOLEAN
and
CHARACTER
.
The precision can be UNDEFINED
, SINGLE
, DOUBLE
, an integer
value specifying the precision in bytes, or a datasymbol (see Section
Symbols and Symbol Tables) that contains precision information. Note that
UNDEFINED
, SINGLE
and DOUBLE
allow the precision to be set
by the system so may be different for different architectures. For
example:
>>> char_type = ScalarType(ScalarType.Intrinsic.CHARACTER,
... ScalarType.Precision.UNDEFINED)
>>> int_type = ScalarType(ScalarType.Intrinsic.INTEGER,
... ScalarType.Precision.SINGLE)
>>> bool_type = ScalarType(ScalarType.Intrinsic.BOOLEAN, 4)
>>> symbol = DataSymbol("rdef", int_type, initial_value=4)
>>> scalar_type = ScalarType(ScalarType.Intrinsic.REAL, symbol)
For convenience PSyclone predefines a number of scalar datatypes:
REAL_TYPE
, INTEGER_TYPE
, BOOLEAN_TYPE
and
CHARACTER_TYPE
all have precision set to UNDEFINED
;
REAL_SINGLE_TYPE
, REAL_DOUBLE_TYPE
, INTEGER_SINGLE_TYPE
and INTEGER_DOUBLE_TYPE
;
REAL4_TYPE
, REAL8_TYPE
, INTEGER4_TYPE
and
INTEGER8_TYPE
.
Array DataType
An Array datatype itself has another datatype (or DataTypeSymbol
)
specifying the type of its elements and a shape. The shape can have an
arbitrary number of dimensions. Each dimension captures what is known
about its extent. It is necessary to distinguish between four cases:
Description |
Entry in |
---|---|
An array has a static extent known at compile time. |
|
An array has an extent defined by another symbol or (constant) PSyIR expression. |
|
An array has a definite extent which is not known at compile time but can be queried at runtime. |
|
It is not known whether an array has memory allocated to it in the current scoping unit. |
|
where ArrayType.ArrayBounds
is a namedtuple
with lower
and
upper
members holding the lower- and upper-bounds of the extent of a
given array dimension.
The distinction between the last two cases is that in the former the extents are known but are kept internally with the array (for example an assumed shape array in Fortran) and in the latter the array has not yet been allocated any memory (for example the declaration of an allocatable array in Fortran) so the extents may have not been defined yet.
For example:
>>> array_type = ArrayType(REAL4_TYPE, [5, 10])
>>> n_var = DataSymbol("n", INTEGER_TYPE)
>>> array_type = ArrayType(INTEGER_TYPE, [Reference(n_var),
... Reference(n_var)])
>>> array_type = ArrayType(REAL8_TYPE, [ArrayType.Extent.ATTRIBUTE,
... ArrayType.Extent.ATTRIBUTE])
>>> array_type = ArrayType(BOOLEAN_TYPE, [ArrayType.Extent.DEFERRED])
Structure Datatype
A Structure datatype consists of a dictionary of components where the
name of each component is used as the corresponding key. Each component
is stored as a named tuple with name
, datatype
and visibility
members.
For example:
# Shorthand for a scalar type with REAL_KIND precision
SCALAR_TYPE = ScalarType(ScalarType.Intrinsic.REAL, REAL_KIND)
# Structure-type definition
GRID_TYPE = StructureType.create([
("dx", SCALAR_TYPE, Symbol.Visibility.PUBLIC),
("dy", SCALAR_TYPE, Symbol.Visibility.PUBLIC)])
GRID_TYPE_SYMBOL = DataTypeSymbol("grid_type", GRID_TYPE)
# A structure-type containing other structure types
FIELD_TYPE_DEF = StructureType.create(
[("data", ArrayType(SCALAR_TYPE, [10]), Symbol.Visibility.PUBLIC),
("grid", GRID_TYPE_SYMBOL, Symbol.Visibility.PUBLIC),
("sub_meshes", ArrayType(GRID_TYPE_SYMBOL, [3]),
Symbol.Visibility.PUBLIC),
("flag", INTEGER4_TYPE, Symbol.Visibility.PUBLIC)])
Unknown DataType
If a PSyIR frontend encounters an unsupported declaration then the
corresponding Symbol is given UnsupportedType.
The text of the original declaration is stored in the type object and is
available via the declaration
property.
NoType
NoType
represents the empty type, equivalent to void
in C. It
is currently only used to describe a RoutineSymbol that has no return
type (such as a Fortran subroutine).
Symbols and Symbol Tables
Some PSyIR nodes have an associated Symbol Table (psyclone.psyir.symbols.SymbolTable) which keeps a record of the Symbols (psyclone.psyir.symbols.Symbol) specified and used within them.
Symbol Tables can be nested (i.e. a node with an attached symbol table can be an ancestor or descendent of a node with an attached symbol table). If the same symbol name is used in a hierarchy of symbol tables then the symbol within the symbol table attached to the closest ancestor node is in scope. By default, symbol tables are aware of other symbol tables and will return information about relevant symbols from all symbol tables.
The SymbolTable
has the following interface:
- class psyclone.psyir.symbols.SymbolTable(node=None, default_visibility=Visibility.PUBLIC)[source]
Encapsulates the symbol table and provides methods to add new symbols and look up existing symbols. Nested scopes are supported and, by default, the add and lookup methods take any ancestor symbol tables into consideration (ones attached to nodes that are ancestors of the node that this symbol table is attached to). If the default visibility is not specified then it defaults to Symbol.Visbility.PUBLIC.
- Parameters:
node (
psyclone.psyir.nodes.Schedule
,psyclone.psyir.nodes.Container
or NoneType) – reference to the Schedule or Container to which this symbol table belongs.default_visibility – optional default visibility value for this symbol table, if not provided it defaults to PUBLIC visibility.
- Raises:
TypeError – if node argument is not a Schedule or a Container.
Where each element is a Symbol
with an immutable name:
- class psyclone.psyir.symbols.Symbol(name, visibility=Visibility.PUBLIC, interface=None)[source]
Generic Symbol item for the Symbol Table and PSyIR References. It has an immutable name label because it must always match with the key in the SymbolTable. If the symbol is private then it is only visible to those nodes that are descendants of the Node to which its containing Symbol Table belongs.
- Parameters:
name (str) – name of the symbol.
visibility (
psyclone.psyir.symbols.Symbol.Visibility
) – the visibility of the symbol.interface (Optional[
psyclone.psyir.symbols.symbol.SymbolInterface
]) – optional object describing the interface to this symbol (i.e. whether it is passed as a routine argument or accessed in some other way). Defaults topsyclone.psyir.symbols.AutomaticInterface
- Raises:
TypeError – if the name is not a str.
There are several Symbol
sub-classes to represent different
labeled entities in the PSyIR. At the moment the available symbols
are:
- class psyclone.psyir.symbols.ContainerSymbol(name, wildcard_import=False, **kwargs)[source]
Symbol that represents a reference to a Container. The reference is lazy evaluated, this means that the Symbol will be created without parsing and importing the referenced container, but this can be imported when needed.
- Parameters:
name (str) – name of the symbol.
wildcard_import (bool) – if all public Symbols of the Container are imported into the current scope. Defaults to False.
kwargs (unwrapped dict.) – additional keyword arguments provided by
psyclone.psyir.symbols.Symbol
.
- class psyclone.psyir.symbols.DataSymbol(name, datatype, is_constant=False, initial_value=None, **kwargs)[source]
Symbol identifying a data element. It contains information about: the datatype, the shape (in column-major order) and the interface to that symbol (i.e. Local, Global, Argument).
- Parameters:
name (str) – name of the symbol.
datatype (
psyclone.psyir.symbols.DataType
) – data type of the symbol.is_constant (bool) – whether this DataSymbol is a compile-time constant (default is False). If True then an initial_value must also be provided.
initial_value (Optional[item of TYPE_MAP_TO_PYTHON |
psyclone.psyir.nodes.Node
]) – sets a fixed known expression as an initial value for this DataSymbol. If is_constant is True then this Symbol will always have this value. If the value is None then this symbol does not have an initial value (and cannot be a constant). Otherwise it can receive PSyIR expressions or Python intrinsic types available in the TYPE_MAP_TO_PYTHON map. By default it is None.kwargs (unwrapped dict.) – additional keyword arguments provided by
psyclone.psyir.symbols.TypedSymbol
- class psyclone.psyir.symbols.DataTypeSymbol(name, datatype, visibility=Visibility.PUBLIC, interface=None)[source]
Symbol identifying a user-defined type (e.g. a derived type in Fortran).
- Parameters:
name (str) – the name of this symbol.
datatype (
psyclone.psyir.symbols.DataType
) – the type represented by this symbol.visibility (
psyclone.psyir.symbols.Symbol.Visibility
) – the visibility of this symbol.interface (
psyclone.psyir.symbols.SymbolInterface
) – the interface to this symbol.
- class psyclone.psyir.symbols.IntrinsicSymbol(name, datatype=None, **kwargs)[source]
Symbol identifying a callable intrinsic routine.
- Parameters:
name (str) – name of the symbol.
datatype (
psyclone.psyir.symbols.DataType
) – data type of the symbol. Default to NoType().kwargs (unwrapped dict.) – additional keyword arguments provided by
psyclone.psyir.symbols.TypedSymbol
- class psyclone.psyir.symbols.RoutineSymbol(name, datatype=None, **kwargs)[source]
Symbol identifying a callable routine.
- Parameters:
name (str) – name of the symbol.
datatype (
psyclone.psyir.symbols.DataType
) – data type of the symbol. Default to NoType().kwargs (unwrapped dict.) – additional keyword arguments provided by
psyclone.psyir.symbols.TypedSymbol
See the reference guide for the full API documentation of the SymbolTable and the Symbol types.
Symbol Interfaces
Each symbol has a Symbol Interface with the information about how the variable data is provided into the local context. The currently available Interfaces are:
- class psyclone.psyir.symbols.ImportInterface(container_symbol, orig_name=None)[source]
Describes the interface to a Symbol that is imported from an external PSyIR container. The symbol can be renamed on import and, if so, its original name in the Container is specified using the optional ‘orig_name’ argument.
- Parameters:
container_symbol (
psyclone.psyir.symbols.ContainerSymbol
) – symbol representing the external container from which the symbol is imported.orig_name (Optional[str]) – the name of the symbol in the external container before it is renamed, or None (the default) if it is not renamed.
- Raises:
TypeError – if the orig_name argument is an unexpected type.
Creating PSyIR
Symbol names
PSyIR symbol names can be specified by a user. For example:
var_name = "my_name"
symbol_table = SymbolTable()
data = DataSymbol(var_name, REAL_TYPE)
symbol_table.add(data)
reference = Reference(data)
However, the SymbolTable
add()
method will raise an exception if a
user tries to add a symbol with the same name as a symbol already existing
in the symbol table.
Alternatively, the SymbolTable
also provides the new_symbol()
method
(see Section Symbols and Symbol Tables for more details) that uses a new distinct
name from any existing names in the symbol table. By default the generated
name is the value PSYIR_ROOT_NAME
variable specified in the DEFAULT
section of the PSyclone config file, followed by an optional “_” and
an integer. For example, the following code:
from psyclone.psyir.symbols import SymbolTable
symbol_table = SymbolTable()
for i in range(0, 3):
var_name = symbol_table.new_symbol().name
print(var_name)
gives the following output:
psyir_tmp
psyir_tmp_0
psyir_tmp_1
As the root name (psyir_tmp
in the example above) is specified in
PSyclone’s config file it can be set to whatever the user wants.
Note
The particular format used to create a unique name is the responsibility of the SymbolTable class and may change in the future.
A user might want to create a name that has some meaning in the
context in which it is used e.g. idx
for an index, i
for an
iterator, or temp
for a temperature field. To support more
readable names, the new_symbol()
method allows the user to specify a
root name as an argument to the method which then takes the place of
the default root name. For example, the following code:
from psyclone.psyir.symbols import SymbolTable
symbol_table = SymbolTable()
for i in range(0, 3):
var_name = symbol_table.new_symbol(root_name="something")
print(var_name)
gives the following output:
something
something_0
something_1
By default, new_symbol()
creates generic symbols, but often the user
will want to specify a Symbol subclass with some given parameters. The
new_symbol()
method accepts a symbol_type
parameter to specify the
subclass. Arguments for the constructor of that subclass may be supplied
as keyword arguments. For example, the following code:
from psyclone.psyir.symbols import SymbolTable, DataSymbol, REAL_TYPE
symbol_table = SymbolTable()
symbol_table.new_symbol(root_name="something",
symbol_type=DataSymbol,
datatype=REAL_TYPE,
is_constant=True,
initial_value=3)
declares a symbol named “something” of REAL_TYPE datatype where the
is_constant
and initial_value
arguments will be passed to the
DataSymbol constructor.
An example of using the new_symbol()
method can be found in the
PSyclone examples/psyir
directory.
Nodes
PSyIR nodes are connected together via parent and child methods
provided by the Node
baseclass.
These nodes can be created in isolation and then connected together. For example:
assignment = Assignment()
literal = Literal("0.0", REAL_TYPE)
reference = Reference(symbol)
assignment.children = [reference, literal]
However, as connections get more complicated, creating the correct
connections can become difficult to manage and error prone. Further,
in some cases children must be collected together within a
Schedule
(e.g. for IfBlock
, Loop
and WhileLoop
).
To simplify this complexity, each of the Kernel-layer nodes which
contain other nodes have a static create
method which helps
construct the PSyIR using a bottom up approach. Using this method, the
above example then becomes:
literal = Literal("0.0", REAL_TYPE)
reference = Reference(symbol)
assignment = Assignment.create(reference, literal)
Creating the PSyIR to represent a complicated access of a member of a
structure is best performed using the create()
method of the
appropriate Reference
subclass. For a relatively straightforward
access such as (the Fortran) field1%region%nx
, this would be:
from psyclone.psyir.nodes import StructureReference
fld_sym = symbol_table.lookup("field1")
ref = StructureReference.create(fld_sym, ["region", "nx"])
where symbol_table
is assumed to be a pre-populated Symbol Table
containing an entry for “field1”.
A more complicated access involving arrays of structures such as
field1%sub_grids(idx, 1)%nx
would be constructed as:
from psyclone.psyir.symbols import INTEGER_TYPE
from psyclone.psyir.nodes import StructureReference, Reference, Literal
idx_sym = symbol_table.lookup("idx")
fld_sym = symbol_table.lookup("field1")
ref = StructureReference.create(fld_sym,
[("sub_grids", [Reference(idx_sym), Literal("1", INTEGER_TYPE)]),
"nx"])
Note that the list of quantities passed to the create()
method now
contains a 2-tuple in order to describe the array access.
More examples of using this approach can be found in the PSyclone
examples/psyir
directory.
Comparing PSyIR nodes
The ==
(equality) operator for PSyIR nodes performs a specialised equality check
to compare the value of each node. This is also useful when comparing entire
subtrees since the equality operator automatically recurses through the children
and compares each child with the appropriate equality semantics, e.g.
# Is the loop upper bound expression exactly the same?
if loop1.stop_expr == loop2.stop_expr:
print("Same upper bound!")
The equality operator will handle expressions like my_array%my_field(:3)
with the
derived type fields and the range components automatically, but it cannot handle
symbolically equivalent fields, i.e. my_array%my_field(:3) != my_array%my_field(:2+1)
.
Annotations and code comments are ignored in the equality comparison since they don’t alter the semantic meaning of the code. So these two statements compare to True:
a = a + 1
a = a + 1 !Increases a by 1
Sometimes there are cases where one really means to check for the specific instance
of a node. In this case, Python provides the is
operator, e.g.
# Is the self instance part of this routine?
is_here = any(node is self for node in routine.walk(Node))
Additionally, PSyIR nodes cannot be used as map keys or similar. The easiest way to do this is just use the id as the key:
node_map = {}
node_map[id(mynode)] = "element"
Modifying the PSyIR
Once we have a complete PSyIR AST there are 2 ways to modify its contents
and/or structure: by applying transformations (see next section
Transformations), or by direct PSyIR API methods. This section
describes some of the methods that the PSyIR classes provide to
modify the PSyIR AST in a consistent way (e.g. without breaking its many
internal references). Some complete examples of modifying the PSyIR can be found in the
PSyclone examples/psyir/modify.py
script.
The rest of this section introduces examples of the available direct PSyIR modification methods.
Renaming symbols
The symbol table provides the method rename_symbol()
that given a symbol
and an unused name will rename the symbol. The symbol renaming will affect
all the references in the PSyIR AST to that symbol. For example, the PSyIR
representing the following Fortran code:
subroutine work(psyir_tmp)
real, intent(inout) :: psyir_tmp
psyir_tmp=0.0
end subroutine
could be modified by the following PSyIR statements:
symbol = symbol_table.lookup("psyir_tmp")
symbol_table.rename_symbol(tmp_symbol, "new_variable")
which would result in the following Fortran output code:
subroutine work(new_variable)
real, intent(inout) :: new_variable
new_variable=0.0
end subroutine
Specialising symbols
The Symbol class provides the method specialise()
that given a
subclass of Symbol will change the Symbol instance to the specified
subclass. If the subclass has any additional properties then these
would need to be set explicitly.
symbol = Symbol("name")
symbol.specialise(RoutineSymbol)
# Symbol is now a RoutineSymbol
This method is useful as it allows the class of a symbol to be changed without affecting any references to it.
Replacing PSyIR nodes
In certain cases one might want to replace a node in a PSyIR tree with another node. All nodes provide the replace_with() method to replace the node and its descendants with another given node and its descendants.
node.replace_with(new_node)
When the node being replaced is part of a named context (in Calls or Operations) the name of the argument is conserved by default. For example
call named_subroutine(name1=1)
call.children[0].replace_with(Literal('2', INTEGER_TYPE))
will become:
call named_subroutine(name1=2)
This behaviour can be changed with the keep_name_in_context parameter.
call.children[0].replace_with(
Literal('3', INTEGER_TYPE),
keep_name_in_context=False
)
will become:
call named_subroutine(3)
Detaching PSyIR nodes
Sometimes we just may wish to detach a certain PSyIR subtree in order to remove it from the root tree but we don’t want to delete it altogether, as it may be re-inserted again in another location. To achieve this, all nodes provide the detach method:
tmp = node.detach()
Copying nodes
Copying a PSyIR node and its children is often useful in order to avoid
repeating the creation of similar PSyIR subtrees. The result of the copy
allows the modification of the original and the copied subtrees independently,
without altering the other subtree. Note that this is not equivalent to the
Python copy
or deepcopy
functionality provided in the copy
library.
This method performs a bespoke copy operation where some components of the
tree, like children, are recursively copied, while others, like the top-level
parent reference are not.
new_node = node.copy()
Named arguments
The Call node (and its sub-classes) support named arguments.
Named arguments can be set or modified via the create(), append_named_arg(), insert_named_arg() or replace_named_arg() methods.
If an argument is inserted directly (via the children list) then it is assumed that this is not a named argument. If the top node of an argument is replaced by removing and inserting a new node then it is assumed that this argument is no longer a named argument. If it is replaced with the replace_with method, it has a keep_name_in_context argument to choose the desired behaviour (defaults to True). If arguments are re-ordered then the names follow the re-ordering.
The names of named arguments can be accessed via the argument_names property. This list has an entry for each argument and either contains a name or None (if this is not a named argument).
The PSyIR does not constrain which arguments are specified as being named and what those names are. It is the developer’s responsibility to make sure that these names are consistent with any intrinsics that will be generated by the back-end. In the future, it is expected that the PSyIR will know about the number and type of arguments expected by Operation nodes, beyond simply being unary, binary or nary.
One restriction that Fortran has (but the PSyIR does not) is that all named arguments should be at the end of the argument list. If this is not the case then the Fortran backend writer will raise an exception.