diff --git a/llvm/docs/OpaquePointers.rst b/llvm/docs/OpaquePointers.rst new file mode 100644 index 000000000000..b3e5ffe7fe9a --- /dev/null +++ b/llvm/docs/OpaquePointers.rst @@ -0,0 +1,130 @@ +=============== +Opaque Pointers +=============== + +The Opaque Pointer Type +======================= + +Traditionally, LLVM IR pointer types have contained a pointee type. For example, +``i32 *`` is a pointer that points to an ``i32`` somewhere in memory. However, +due to a lack of pointee type semantics and various issues with having pointee +types, there is a desire to remove pointee types from pointers. + +The opaque pointer type project aims to replace all pointer types containing +pointee types in LLVM with an opaque pointer type. The new pointer type is +tentatively represented textually as ``ptr``. + +Anything to do with pointer address spaces is unaffected. + +Issues with explicit pointee types +================================== + +LLVM IR pointers can be cast back and forth between pointers with different +pointee types. The pointee type does not necessarily actually represent the +actual underlying type in memory. In other words, the pointee type contains no +real semantics. + +Lots of operations do not actually care about the underlying type. These +operations, typically intrinsics, usually end up taking an ``i8 *``. This causes +lots of redundant no-op bitcasts in the IR to and from a pointer with a +different pointee type. The extra bitcasts take up space and require extra work +to look through in optimizations. And more bitcasts increases the chances of +incorrect bitcasts, especially in regards to address spaces. + +Some instructions still need to know what type to treat the memory pointed to by +the pointer as. For example, a load needs to know how many bytes to load from +memory. In these cases, instructions themselves contain a type argument. For +example the load instruction from older versions of LLVM + +.. code-block:: llvm + + load i64* %p + +becomes + +.. code-block:: llvm + + load i64, ptr %p + +A nice analogous transition that happened earlier in LLVM is integer signedness. +There is no distinction between signed and unsigned integer types, rather the +integer operations themselves contain what to treat the integer as. Initially, +LLVM IR distinguished between unsigned and signed integer types. The transition +from manifesting signedness in types to instructions happened early on in LLVM's +life to the betterment of LLVM IR. + +I Still Need Pointee Types! +=========================== + +The frontend should already know what type each operation operates on based on +the input source code. However, some frontends like Clang may end up relying on +LLVM pointer pointee types to keep track of pointee types. The frontend needs to +keep track of frontend pointee types on its own. + +For optimizations around frontend types, pointee types are not useful due their +lack of semantics. Rather, since LLVM IR works on untyped memory, for a frontend +to tell LLVM about frontend types for the purposes of alias analysis, extra +metadata is added to the IR. For more information, see `TBAA +`_. + +Some specific operations still need to know what type a pointer types to. For +the most part, this is codegen and ABI specific. For example, `byval +`_ arguments are pointers, but backends need +to know the underlying type of the argument to properly lower it. In cases like +these, the attributes contain a type argument. For example, + +.. code-block:: llvm + + call void @f(ptr byval(i32) %p) + +signifies that ``%p`` as an argument should be lowered as an ``i32`` passed +indirectly. + +If you have use cases that this sort of fix doesn't cover, please email +llvm-dev. + +Transition Plan +=============== + +LLVM currently has many places that depend on pointee types. Each dependency on +pointee types needs to be resolved in some way or another. + +Making everything use opaque pointers in one huge commit is infeasible. This +needs to be done incrementally. The following steps need to be done, in no +particular order: + +* Introduce the opaque pointer type + +* Various ABI attributes and instructions that need a type can be changed one at + a time + + * This has already happened for many instructions like loads, stores, GEPs, + and various attributes like ``byval`` + +* Fix up existing in-tree users of pointee types to not rely on LLVM pointer + pointee types + +* Allow bitcode auto-upgrade of legacy pointer type to the new opaque pointer + type (not to be turned on until ready) + +* Migrate frontends to not keep track of frontend pointee types via LLVM pointer + pointee types + +* Add option to internally treat all pointer types opaque pointers and see what + breaks, starting with LLVM tests, then run Clang over large codebases + +* Replace legacy pointer types in LLVM tests with opaque pointer types + +Frontend Migration Steps +======================== + +If you have your own frontend, there are a couple of things to do after opaque +pointer types fully work. + +* Don't rely on LLVM pointee types to keep track of frontend pointee types + +* Migrate away from LLVM IR instruction builders that rely on pointee types + + * For example, ``IRBuilder::CreateGEP()`` has multiple overloads; make sure to + use one where the source element type is explicitly passed in, not inferred + from the pointer operand pointee type diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst index 665d1bda1bbf..43f52c3ce758 100644 --- a/llvm/docs/UserGuides.rst +++ b/llvm/docs/UserGuides.rst @@ -44,6 +44,7 @@ intermediate LLVM representation. MergeFunctions MCJITDesignAndImplementation ORCv2 + OpaquePointers JITLink NewPassManager NVPTXUsage