forked from OSchip/llvm-project
276 lines
12 KiB
ReStructuredText
276 lines
12 KiB
ReStructuredText
===============
|
|
Opaque Pointers
|
|
===============
|
|
|
|
The Opaque Pointer Type
|
|
=======================
|
|
|
|
Traditionally, LLVM IR pointer types have contained a pointee type. For example,
|
|
``i32*`` is a pointer that points to an ``i32`` somewhere in memory. However,
|
|
due to a lack of pointee type semantics and various issues with having pointee
|
|
types, there is a desire to remove pointee types from pointers.
|
|
|
|
The opaque pointer type project aims to replace all pointer types containing
|
|
pointee types in LLVM with an opaque pointer type. The new pointer type is
|
|
represented textually as ``ptr``.
|
|
|
|
Some instructions still need to know what type to treat the memory pointed to by
|
|
the pointer as. For example, a load needs to know how many bytes to load from
|
|
memory and what type to treat the resulting value as. In these cases,
|
|
instructions themselves contain a type argument. For example the load
|
|
instruction from older versions of LLVM
|
|
|
|
.. code-block:: llvm
|
|
|
|
load i64* %p
|
|
|
|
becomes
|
|
|
|
.. code-block:: llvm
|
|
|
|
load i64, ptr %p
|
|
|
|
Address spaces are still used to distinguish between different kinds of pointers
|
|
where the distinction is relevant for lowering (e.g. data vs function pointers
|
|
have different sizes on some architectures). Opaque pointers are not changing
|
|
anything related to address spaces and lowering. For more information, see
|
|
`DataLayout <LangRef.html#langref-datalayout>`_. Opaque pointers in non-default
|
|
address space are spelled ``ptr addrspace(N)``.
|
|
|
|
This was proposed all the way back in
|
|
`2015 <https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html>`_.
|
|
|
|
Issues with explicit pointee types
|
|
==================================
|
|
|
|
LLVM IR pointers can be cast back and forth between pointers with different
|
|
pointee types. The pointee type does not necessarily represent the actual
|
|
underlying type in memory. In other words, the pointee type carries no real
|
|
semantics.
|
|
|
|
Historically LLVM was some sort of type-safe subset of C. Having pointee types
|
|
provided an extra layer of checks to make sure that the Clang frontend matched
|
|
its frontend values/operations with the corresponding LLVM IR. However, as other
|
|
languages like C++ adopted LLVM, the community realized that pointee types were
|
|
more of a hindrance for LLVM development and that the extra type checking with
|
|
some frontends wasn't worth it.
|
|
|
|
LLVM's type system was `originally designed
|
|
<https://llvm.org/pubs/2003-05-01-GCCSummit2003.html>` to support high-level
|
|
optimization. However, years of LLVM implementation experience have demonstrated
|
|
that the pointee type system design does not effectively support
|
|
optimization. Memory optimization algorithms, such as SROA, GVN, and AA,
|
|
generally need to look through LLVM's struct types and reason about the
|
|
underlying memory offsets. The community realized that pointee types hinder LLVM
|
|
development, rather than helping it. Some of the initially proposed high-level
|
|
optimizations have evolved into `TBAA
|
|
<https://llvm.org/docs/LangRef.html#tbaa-metadata>` due to limitations with
|
|
representing higher-level language information directly via SSA values.
|
|
|
|
Pointee types provide some value to frontends because the IR verifier uses types
|
|
to detect straightforward type confusion bugs. However, frontends also have to
|
|
deal with the complexity of inserting bitcasts everywhere that they might be
|
|
required. The community consensus is that the costs of pointee types
|
|
outweight the benefits, and that they should be removed.
|
|
|
|
Many operations do not actually care about the underlying type. These
|
|
operations, typically intrinsics, usually end up taking an arbitrary pointer
|
|
type ``i8*`` and sometimes a size. This causes lots of redundant no-op bitcasts
|
|
in the IR to and from a pointer with a different pointee type.
|
|
|
|
No-op bitcasts take up memory/disk space and also take up compile time to look
|
|
through. However, perhaps the biggest issue is the code complexity required to
|
|
deal with bitcasts. When looking up through def-use chains for pointers it's
|
|
easy to forget to call `Value::stripPointerCasts()` to find the true underlying
|
|
pointer obfuscated by bitcasts. And when looking down through def-use chains
|
|
passes need to iterate through bitcasts to handle uses. Removing no-op pointer
|
|
bitcasts prevents a category of missed optimizations and makes writing LLVM
|
|
passes a little bit easier.
|
|
|
|
Fewer no-op pointer bitcasts also reduces the chances of incorrect bitcasts in
|
|
regards to address spaces. People maintaining backends that care a lot about
|
|
address spaces have complained that frontends like Clang often incorrectly
|
|
bitcast pointers, losing address space information.
|
|
|
|
An analogous transition that happened earlier in LLVM is integer signedness.
|
|
Currently there is no distinction between signed and unsigned integer types, but
|
|
rather each integer operation (e.g. add) contains flags to signal how to treat
|
|
the integer. Previously LLVM IR distinguished between unsigned and signed
|
|
integer types and ran into similar issues of no-op casts. The transition from
|
|
manifesting signedness in types to instructions happened early on in LLVM's
|
|
timeline to make LLVM easier to work with.
|
|
|
|
Opaque Pointers Mode
|
|
====================
|
|
|
|
During the transition phase, LLVM can be used in two modes: In typed pointer
|
|
mode all pointer types have a pointee type and opaque pointers cannot be used.
|
|
In opaque pointers mode (the default), all pointers are opaque. The opaque
|
|
pointer mode can be disabled using ``-opaque-pointers=0`` in
|
|
LLVM tools like ``opt``, or ``-Xclang -no-opaque-pointers`` in clang.
|
|
Additionally, opaque pointer mode is automatically disabled for IR and bitcode
|
|
files that explicitly mention ``i8*`` style typed pointers.
|
|
|
|
In opaque pointer mode, all typed pointers used in IR, bitcode, or created
|
|
using ``PointerType::get()`` and similar APIs are automatically converted into
|
|
opaque pointers. This simplifies migration and allows testing existing IR with
|
|
opaque pointers.
|
|
|
|
.. code-block:: llvm
|
|
|
|
define i8* @test(i8* %p) {
|
|
%p2 = getelementptr i8, i8* %p, i64 1
|
|
ret i8* %p2
|
|
}
|
|
|
|
; Is automatically converted into the following if -opaque-pointers
|
|
; is enabled:
|
|
|
|
define ptr @test(ptr %p) {
|
|
%p2 = getelementptr i8, ptr %p, i64 1
|
|
ret ptr %p2
|
|
}
|
|
|
|
Migration Instructions
|
|
======================
|
|
|
|
In order to support opaque pointers, two types of changes tend to be necessary.
|
|
The first is the removal of all calls to ``PointerType::getElementType()`` and
|
|
``Type::getPointerElementType()``.
|
|
|
|
In the LLVM middle-end and backend, this is usually accomplished by inspecting
|
|
the type of relevant operations instead. For example, memory access related
|
|
analyses and optimizations should use the types encoded in the load and store
|
|
instructions instead of querying the pointer type.
|
|
|
|
Here are some common ways to avoid pointer element type accesses:
|
|
|
|
* For loads, use ``getType()``.
|
|
* For stores, use ``getValueOperand()->getType()``.
|
|
* Use ``getLoadStoreType()`` to handle both of the above in one call.
|
|
* For getelementptr instructions, use ``getSourceElementType()``.
|
|
* For calls, use ``getFunctionType()``.
|
|
* For allocas, use ``getAllocatedType()``.
|
|
* For globals, use ``getValueType()``.
|
|
* For consistency assertions, use
|
|
``PointerType::isOpaqueOrPointeeTypeEquals()``.
|
|
* To create a pointer type in a different address space, use
|
|
``PointerType::getWithSamePointeeType()``.
|
|
* To check that two pointers have the same element type, use
|
|
``PointerType::hasSameElementTypeAs()``.
|
|
* While it is preferred to write code in a way that accepts both typed and
|
|
opaque pointers, ``Type::isOpaquePointerTy()`` and
|
|
``PointerType::isOpaque()`` can be used to handle opaque pointers specially.
|
|
``PointerType::getNonOpaquePointerElementType()`` can be used as a marker in
|
|
code-paths where opaque pointers have been explicitly excluded.
|
|
* To get the type of a byval argument, use ``getParamByValType()``. Similar
|
|
method exists for other ABI-affecting attributes that need to know the
|
|
element type, such as byref, sret, inalloca and preallocated.
|
|
* Some intrinsics require an ``elementtype`` attribute, which can be retrieved
|
|
using ``getParamElementType()``. This attribute is required in cases where
|
|
the intrinsic does not naturally encode a needed element type. This is also
|
|
used for inline assembly.
|
|
|
|
Note that some of the methods mentioned above only exist to support both typed
|
|
and opaque pointers at the same time, and will be dropped once the migration
|
|
has completed. For example, ``isOpaqueOrPointeeTypeEquals()`` becomes
|
|
meaningless once all pointers are opaque.
|
|
|
|
While direct usage of pointer element types is immediately apparent in code,
|
|
there is a more subtle issue that opaque pointers need to contend with: A lot
|
|
of code assumes that pointer equality also implies that the used load/store
|
|
type or GEP source element type is the same. Consider the following examples
|
|
with typed an opaque pointers:
|
|
|
|
.. code-block:: llvm
|
|
|
|
define i32 @test(i32* %p) {
|
|
store i32 0, i32* %p
|
|
%bc = bitcast i32* %p to i64*
|
|
%v = load i64, i64* %bc
|
|
ret i64 %v
|
|
}
|
|
|
|
define i32 @test(ptr %p) {
|
|
store i32 0, ptr %p
|
|
%v = load i64, ptr %p
|
|
ret i64 %v
|
|
}
|
|
|
|
Without opaque pointers, a check that the pointer operand of the load and
|
|
store are the same also ensures that the accessed type is the same. Using a
|
|
different type requires a bitcast, which will result in distinct pointer
|
|
operands.
|
|
|
|
With opaque pointers, the bitcast is not present, and this check is no longer
|
|
sufficient. In the above example, it could result in store to load forwarding
|
|
of an incorrect type. Code making such assumptions needs to be adjusted to
|
|
check the accessed type explicitly:
|
|
``LI->getType() == SI->getValueOperand()->getType()``.
|
|
|
|
Frontends
|
|
---------
|
|
|
|
Frontends need to be adjusted to track pointee types independently of LLVM,
|
|
insofar as they are necessary for lowering. For example, clang now tracks the
|
|
pointee type in the ``Address`` structure.
|
|
|
|
Frontends using the C API through an FFI interface should be aware that a
|
|
number of C API functions are deprecated and will be removed as part of the
|
|
opaque pointer transition::
|
|
|
|
LLVMBuildLoad -> LLVMBuildLoad2
|
|
LLVMBuildCall -> LLVMBuildCall2
|
|
LLVMBuildInvoke -> LLVMBuildInvoke2
|
|
LLVMBuildGEP -> LLVMBuildGEP2
|
|
LLVMBuildInBoundsGEP -> LLVMBuildInBoundsGEP2
|
|
LLVMBuildStructGEP -> LLVMBuildStructGEP2
|
|
LLVMBuildPtrDiff -> LLVMBuildPtrDiff2
|
|
LLVMConstGEP -> LLVMConstGEP2
|
|
LLVMConstInBoundsGEP -> LLVMConstInBoundsGEP2
|
|
LLVMAddAlias -> LLVMAddAlias2
|
|
|
|
Additionally, it will no longer be possible to call ``LLVMGetElementType()``
|
|
on a pointer type.
|
|
|
|
It is possible to control whether opaque pointers are used (if you want to
|
|
override the default) using ``LLVMContext::setOpaquePointers`` or
|
|
``LLVMContextSetOpaquePointers()``.
|
|
|
|
Transition State
|
|
================
|
|
|
|
As of April 2022 both LLVM and Clang have complete support for opaque pointers,
|
|
and opaque pointers are enabled by default in LLVM and Clang.
|
|
|
|
For users of the clang driver interface, it is possible to temporarily restore
|
|
the old default using the ``-DCLANG_ENABLE_OPAQUE_POINTERS=OFF`` cmake option,
|
|
or by passing ``-Xclang -no-opaque-pointers`` to a single clang invocation.
|
|
|
|
For users of the clang cc1 interface, ``-no-opaque-pointers`` can be passed.
|
|
Note that the ``CLANG_ENABLE_OPAQUE_POINTERS`` cmake option has no effect on
|
|
the cc1 interface.
|
|
|
|
Usage for LTO can be disabled by passing ``-Wl,-plugin-opt=no-opaque-pointers``
|
|
to the clang driver.
|
|
|
|
For users of LLVM as a library, opaque pointers can be disabled by calling
|
|
``setOpaquePointers(false)`` on the ``LLVMContext``.
|
|
|
|
For users of LLVM tools like opt, opaque pointers can be disabled by passing
|
|
``-opaque-pointers=0``.
|
|
|
|
The next steps for the opaque pointer migration are:
|
|
|
|
* Migrate Clang/LLVM tests to use opaque pointers.
|
|
* Remove support for typed pointers after the LLVM 15 branch has been created.
|
|
|
|
Version Support
|
|
===============
|
|
|
|
**LLVM 14:** Supports all necessary APIs for migrating to opaque pointers and deprecates/removes incompatible APIs. However, using opaque pointers in the optimization pipeline is **not** fully supported. This release can be used to make out-of-tree code compatible with opaque pointers, but opaque pointers should **not** be enabled in production.
|
|
|
|
**LLVM 15:** Opaque pointers are enabled by default. Typed pointers are still available, but only supported on a best-effort basis and may be untested.
|
|
|
|
**LLVM 16:** Only opaque pointers will be supported. Typed pointers will not be supported.
|