llvm-project/flang/docs/RuntimeTypeInfo.md

11 KiB

The derived type runtime information table

.. contents::
   :local:

Overview

Many operations on derived types must be implemented, or can be implemented, with calls to the runtime support library rather than directly with generated code. Some operations might be initially implemented in the runtime library and then reimplemented later in generated code for compelling performance gains in optimized compilations.

The runtime library uses derived type description tables to represent the relevant characteristics of derived types. This note summarizes the requirements for these descriptions.

The semantics phase of the F18 frontend constructs derived type descriptions from its scoped symbol table after name resolution and semantic constraint checking have succeeded. The lowering phase then transfers the tables to the static read-only data section of the generated program by translating them into initialized objects. During execution, references to the tables occur by passing their addresses as arguments to relevant runtime library APIs and as pointers in the addenda of descriptors.

Requirements

The following Fortran language features require, or may require, the use of derived type descriptions in the runtime library.

Components

The components of a derived type need to be described in component order (7.4.7), but when there is a parent component, its components can be described by reference to the description of the type of the parent component.

The ordered component descriptions are needed to implement

  • default initialization
  • ALLOCATE, with and without SOURCE=
  • intrinsic assignment of derived types with ALLOCATABLE and automatic components
  • intrinsic I/O of derived type instances
  • NAMELIST I/O of derived type instances
  • "same type" tests

The characteristics of data components include their names, types, offsets, bounds, cobounds, derived type descriptions when appropriate, default component initializers, and flags for ALLOCATABLE, POINTER, PRIVATE, and automatic components (implicit allocatables). Procedure pointer components require only their offsets and address(es).

Calls to type-bound procedures

Only extensible derived types -- those without SEQUENCE or BIND(C) -- are allowed to have type-bound procedures. Calls to these bindings will be resolved at compilation time when the binding is NON_OVERRIDABLE or when an object is not polymorphic. Calls to overridable bindings of polymorphic objects requires the use of a runtime table of procedure addresses.

Each derived type (or instantiation of a parameterized derived type) will have a complete type-bound procedure table in which all of the bindings of its ancestor types appear first. (Specifically, the table offsets of any inherited bindings must be the same as they are in the table of the ancestral type's table.) These ancestral bindings reflect their overrides, if any.

The non-inherited bindings of a type then follow the inherited bindings, and they do so in alphabetical order of binding name. (This is an arbitrary choice -- we could also define them to appear in binding declaration order, I suppose -- but a consistent ordering should be used so that relocatables generated by distinct versions of the F18 compiler will have a better chance to interoperate.)

Type parameter values and "same type" testing

The values of the KIND and LEN parameters of a particular derived type instance can be obtained to implement type parameter inquiries without requiring derived type information tables. In the case of a KIND type parameter, it's a constant value known at compilation time, and in the case of a LEN type parameter, it's a member of the addendum to the object's descriptor.

The runtime library will have an API (TBD) to be called as part of the implementation of TYPE IS and CLASS IS guards of the SELECT TYPE construct. This language support predicate returns a true result when an object's type matches a particular type specification and KIND (but not LEN) type parameter values.

Note that this "is same type as" predicate is not the same as the one to be called to implement the SAME_TYPE_AS() intrinsic function, which is specified so as to ignore the values of KIND type parameters.

Subclause 7.5.2 defines what being the "same" derived type means in Fortran. In short, each definition of a derived type defines a distinct type, so type equality testing can usually compare addresses of derived type descriptions at runtime. The exceptions are SEQUENCE types and interoperable (BIND(C)) types. Independent definitions of each of these are considered to be the "same type" when these definitions match in terms of names, types, and attributes, both being either SEQUENCE or BIND(C), and containing no PRIVATE components. These "sequence" derived types cannot have type parameters, type-bound procedures, an absence of components, or components that are not themselves of a sequence type, so we can use a static hash code to implement their "same type" tests.

FINAL subroutines

When an instance of a derived type is deallocated or goes out of scope, one of its FINAL subroutines may be called. Subclause 7.5.6.3 defines when finalization occurs -- it doesn't happen in all situations.

The subroutines named in a derived type's FINAL statements are not bindings, so their arguments are not passed object dummy arguments and do not have to satisfy the constraints of a passed object. Specifically, they can be arrays, and cannot be polymorphic. If a FINAL subroutine's dummy argument is an array, it may be assumed-shape or assumed-rank, but it could also be an explicit-shape or assumed-size argument. This means that it may or may not be passed by means of a descriptor.

Note that a FINAL subroutine with a scalar argument does not define a finalizer for array objects unless the subroutine is elemental (and probably IMPURE). This seems to be a language pitfall and F18 will emit a warning when an array of a finalizable derived type is declared with a rank lacking a FINAL subroutine when other ranks do have one.

So the necessary information in the derived type table for a FINAL subroutine comprises:

  • address(es) of the subroutine
  • rank of the argument, or whether it is assumed-rank
  • for rank 0, whether the subroutine is elemental
  • for rank > 0, whether the argument requires a descriptor

This descriptor flag is needed to handle a difficult case with FINAL subroutines that most other implementations of Fortran fail to get right: a FINAL subroutine whose argument is a an explicit shape or assumed size array may have to be called upon the parent component of an array of an extended derived type.

  module m
    type :: parent
      integer :: n
     contains
      final :: subr
    end type
    type, extends(parent) :: extended
      integer :: m
    end type
   contains
    subroutine subr(a)
      type(parent) :: a(1)
    end subroutine
  end module
  subroutine demo
    use m
    type(extended) :: arr(1)
  end subroutine

If the FINAL subroutine doesn't use a descriptor -- and it will not if there are no LEN type parameters -- the runtime will have to allocate and populate a temporary array of copies elements of the parent component of the array so that it can be passed by reference to the FINAL subroutine.

Defined assignment

A defined assignment subroutine for a derived type can be declared by means of a generic INTERFACE ASSIGNMENT(=) and by means of a generic type-bound procedure. Defined assignments with non-type-bound generic interfaces are resolved to specific subroutines at compilation time. Most cases of type-bound defined assignment are resolved to their bindings at compilation time as well (with possible runtime resolution of overridable bindings).

Intrinsic assignment of derived types with components that have derived types with type-bound generic assignments is specified by subclause 10.2.1.3 paragraph 13 as invoking defined assignment subroutines, however.

This seems to be the only case of defined assignment that may be of interest to the runtime library. If this is correct, then the requirements are somewhat constrained; we know that the rank of the target of the assignment must match the rank of the source, and that one of the dummy arguments of the bound subroutine is a passed object dummy argument and satisfies all of the constraints of one -- in particular, it's scalar and polymorphic.

So the derived type information for a defined assignment needs to comprise:

  • address(es) of the subroutine
  • whether the first, second, or both arguments are descriptors
  • whether the subroutine is elemental (necessarily also impure)

User defined derived type I/O

Fortran programs can specify subroutines that implement formatted and unformatted READ and WRITE operations for derived types. These defined I/O subroutines may be specified with an explicit INTERFACE or with a type-bound generic. When specified with an INTERFACE, the first argument must not be polymorphic, but when specified with a type-bound generic, the first argument is a passed-object dummy argument and required to be so. In any case, the argument is scalar.

Nearly all invocations of user defined derived type I/O subroutines are resolved at compilation time to specific procedures or to overridable bindings. (The I/O library APIs for acquiring their arguments remain to be designed, however.) The case that is of interest to the runtime library is that of NAMELIST I/O, which is specified to invoke user defined derived type I/O subroutines if they have been defined.

The derived type information for a user defined derived type I/O subroutine comprises:

  • address(es) of the subroutine
  • whether it is for a read or a write
  • whether it is formatted or unformatted
  • whether the first argument is a descriptor (true if it is a binding of the derived type, or has a LEN type parameter)

Exporting derived type descriptions from module relocatables

Subclause 7.5.2 requires that two objects be considered as having the same derived type if they are declared "with reference to the same derived type definition". For derived types that are defined in modules and accessed by means of use association, we need to be able to describe the type in the read-only static data section of the module and access the description as a link-time external.

This is not always possible to achieve in the case of instantiations of parameterized derived types, however. Two identical instantiations in distinct compilation units of the same use associated parameterized derived type seem impractical to implement using the same address. (Perhaps some linkers would support unification of global objects with "mangled" names and identical contents, but this seems unportable.)

Derived type descriptions therefore will contain pointers to their "uninstantiated" original derived types. For derived types with no KIND type parameters, these pointers will be null; for uninstantiated derived types, these pointers will point at themselves.