[Docs] Update opaque pointer docs (NFC)

Mention -opaque-pointers, write a bit more about migration pitfalls
and update the open issues.
This commit is contained in:
Nikita Popov 2022-01-14 17:41:42 +01:00
parent a1da73961d
commit 3bbf7f5ed8
2 changed files with 102 additions and 54 deletions

View File

@ -1199,7 +1199,7 @@ Currently, only the following parameter attributes are defined:
The ``elementtype`` argument attribute can be used to specify a pointer
element type in a way that is compatible with `opaque pointers
<OpaquePointers.html>`.
<OpaquePointers.html>`_.
The ``elementtype`` attribute by itself does not carry any specific
semantics. However, certain intrinsics may require this attribute to be

View File

@ -6,7 +6,7 @@ The Opaque Pointer Type
=======================
Traditionally, LLVM IR pointer types have contained a pointee type. For example,
``i32 *`` is a pointer that points to an ``i32`` somewhere in memory. However,
``i32*`` is a pointer that points to an ``i32`` somewhere in memory. However,
due to a lack of pointee type semantics and various issues with having pointee
types, there is a desire to remove pointee types from pointers.
@ -29,7 +29,7 @@ actual underlying type in memory. In other words, the pointee type contains no
real semantics.
Lots of operations do not actually care about the underlying type. These
operations, typically intrinsics, usually end up taking an ``i8 *``. This causes
operations, typically intrinsics, usually end up taking an ``i8*``. This causes
lots of redundant no-op bitcasts in the IR to and from a pointer with a
different pointee type. The extra bitcasts take up space and require extra work
to look through in optimizations. And more bitcasts increases the chances of
@ -57,6 +57,35 @@ LLVM IR distinguished between unsigned and signed integer types. The transition
from manifesting signedness in types to instructions happened early on in LLVM's
life to the betterment of LLVM IR.
Opaque Pointers Mode
====================
During the transition phase, LLVM can be used in two modes: In typed pointer
mode (currently still the default) all pointer types have a pointee type and
opaque pointers cannot be used. In opaque pointers mode, all pointers are
opaque. The opaque pointer mode can be enabled using ``-opaque-pointers`` in
LLVM tools like ``opt``, or ``-mllvm -opaque-pointers`` in clang.
In opaque pointer mode, all typed pointers used in IR, bitcode, or created
using ``PointerType::get()`` and similar APIs are automatically converted into
opaque pointers. This simplifies migration and allows testing existing IR with
opaque pointers.
.. code-block:: llvm
define i8* @test(i8* %p) {
%p2 = getelementptr i8, i8* %p, i64 1
ret i8* %p2
}
; Is automatically converted into the following if -opaque-pointers
; is enabled:
define ptr @test(ptr %p) {
%p2 = getelementptr i8, ptr %p, i64 1
ret ptr %p2
}
I Still Need Pointee Types!
===========================
@ -87,73 +116,92 @@ indirectly.
If you have use cases that this sort of fix doesn't cover, please email
llvm-dev.
Transition Plan
===============
Migration Instructions
======================
LLVM currently has many places that depend on pointee types. Each dependency on
pointee types needs to be resolved in some way or another. This essentially
translates to figuring out how to remove all calls to
``PointerType::getElementType`` and ``Type::getPointerElementType()``.
In order to support opaque pointers, two types of changes tend to be necessary.
The first is the removal of all calls to ``PointerType::getElementType()`` and
``Type::getPointerElementType()``.
Making everything use opaque pointers in one huge commit is infeasible. This
needs to be done incrementally. The following steps need to be done, in no
particular order:
In the LLVM middle-end and backend, this is usually accomplished by inspecting
the type of relevant operations instead. For example, memory access related
analyses and optimizations should use the types encoded in the load and store
instructions instead of querying the pointer type.
* Introduce the opaque pointer type
Frontends need to be adjusted to track pointee types independently of LLVM,
insofar as they are necessary for lowering. For example, clang now tracks the
pointee type in the ``Address`` structure.
* Already done
While direct usage of pointer element types is immediately apparent in code,
there is a more subtle issue that opaque pointers need to contend with: A lot
of code assumes that pointer equality also implies that the used load/store
type is the same. Consider the following examples with typed an opaque pointers:
* Remove remaining in-tree users of pointee types
.. code-block:: llvm
* There are many miscellaneous uses that should be cleaned up individually
define i32 @test(i32* %p) {
store i32 0, i32* %p
%bc = bitcast i32* %p to i64*
%v = load i64, i64* %bc
ret i64 %v
}
* Some of the larger use cases are mentioned below
define i32 @test(ptr %p) {
store i32 0, ptr %p
%v = load i64, ptr %p
ret i64 %v
}
* Various ABI attributes and instructions that rely on pointee types need to be
modified to specify the type separately
Without opaque pointers, a check that the pointer operand of the load and
store are the same also ensures that the accessed type is the same. Using a
different type requires a bitcast, which will result in distinct pointer
operands.
* This has already happened for all instructions like loads, stores, GEPs,
and various attributes like ``byval``
With opaque pointers, the bitcast is not present, and this check is no longer
sufficient. In the above example, it could result in store to load forwarding
of an incorrect type. Code making such assumptions needs to be adjusted to
check the accessed type explicitly:
``LI->getType() == SI->getValueOperand()->getType()``.
* More cases may be found as work continues
Frontends using the C API through an FFI interface should be aware that a
number of C API functions are deprecated and will be removed as part of the
opaque pointer transition::
* Remove calls to and deprecate ``IRBuilder`` methods that rely on pointee types
LLVMBuildLoad -> LLVMBuildLoad2
LLVMBuildCall -> LLVMBuildCall2
LLVMBuildInvoke -> LLVMBuildInvoke2
LLVMBuildGEP -> LLVMBuildGEP2
LLVMBuildInBoundsGEP -> LLVMBuildInBoundsGEP2
LLVMBuildStructGEP -> LLVMBuildStructGEP2
LLVMConstGEP -> LLVMConstGEP2
LLVMConstInBoundsGEP -> LLVMConstInBoundsGEP2
LLVMAddAlias -> LLVMAddAlias2
* For example, some of the ``IRBuilder::CreateGEP()`` methods use the pointer
operand's pointee type to determine the GEP operand type
Additionally, it will no longer be possible to call ``LLVMGetElementType()``
on a pointer type.
* Some methods are already deprecated with ``LLVM_ATTRIBUTE_DEPRECATED``, such
as some overloads of ``IRBuilder::CreateLoad()``
Transition State
================
* Allow bitcode auto-upgrade of legacy pointer type to the new opaque pointer
type (not to be turned on until ready)
As of January 2022 large parts of LLVM support opaque pointers, but there are
still some major open problems:
* To support legacy bitcode, such as legacy stores/loads, we need to track
pointee types for all values since legacy instructions may infer the types
from a pointer operand's pointee type
* Bitcode already fully supports opaque pointers, and reading up-to-date
typed pointer bitcode in opaque pointers mode also works. However, we
currently do not support pointee type based auto-upgrade of old bitcode in
opaque pointer mode.
* Migrate frontends to not keep track of frontend pointee types via LLVM pointer
pointee types
* While clang has limited support for opaque pointers (sufficient to compile
CTMark on Linux), a major effort will be needed to systematically remove all
uses of ``getPointerElementType()`` and the deprecated ``Address()``
constructor.
* This is mostly Clang, see ``clang::CodeGen::Address::getElementType()``
* We do not yet have a testing strategy for how we can test both typed and
opaque pointers during the migration. Currently, individual tests for
opaque pointers are being added, but the bulk of tests still uses typed
pointers.
* Add option to internally treat all pointer types opaque pointers and see what
breaks, starting with LLVM tests, then run Clang over large codebases
* Loop access analysis does not support opaque pointers yet, and is currently
the main source of assertion failurse in optimized builds.
* We don't want to start mass-updating tests until we're fairly confident that opaque pointers won't cause major issues
* Replace legacy pointer types in LLVM tests with opaque pointer types
Frontend Migration Steps
========================
If you have your own frontend, there are a couple of things to do after opaque
pointer types fully work.
* Don't rely on LLVM pointee types to keep track of frontend pointee types
* Migrate away from LLVM IR instruction builders that rely on pointee types
* For example, ``IRBuilder::CreateGEP()`` has multiple overloads; make sure to
use one where the source element type is explicitly passed in, not inferred
from the pointer operand pointee type
* Miscellanous uses of pointer element types remain everywhere.