[Docs] Update GEP docs for opaque pointers

Update the GEP FAQ to use opaque pointers. This requires more than
a syntactic change in some place, because some of the concerns just
don't make sense anymore (trying to index past a ptr member in a
struct for example).

This also fixes uses of incorrect syntax to declare or reference
globals.

Differential Revision: https://reviews.llvm.org/D130353
This commit is contained in:
Nikita Popov 2022-07-22 14:36:27 +02:00
parent 7ac7ec8202
commit b66ca91fe6
1 changed files with 51 additions and 64 deletions

View File

@ -88,15 +88,15 @@ looks like:
.. code-block:: llvm
define void @munge(%struct.munger_struct* %P) {
define void @munge(ptr %P) {
entry:
%tmp = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 1, i32 0
%tmp1 = load i32, i32* %tmp
%tmp2 = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 2, i32 1
%tmp3 = load i32, i32* %tmp2
%tmp = getelementptr %struct.munger_struct, ptr %P, i32 1, i32 0
%tmp1 = load i32, ptr %tmp
%tmp2 = getelementptr %struct.munger_struct, ptr %P, i32 2, i32 1
%tmp3 = load i32, ptr %tmp2
%tmp4 = add i32 %tmp3, %tmp1
%tmp5 = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 0, i32 0
store i32 %tmp4, i32* %tmp5
%tmp5 = getelementptr %struct.munger_struct, ptr %P, i32 0, i32 0
store i32 %tmp4, ptr %tmp5
ret void
}
@ -108,11 +108,11 @@ To make this clear, let's consider a more obtuse example:
.. code-block:: text
%MyVar = uninitialized global i32
@MyVar = external global i32
...
%idx1 = getelementptr i32, i32* %MyVar, i64 0
%idx2 = getelementptr i32, i32* %MyVar, i64 1
%idx3 = getelementptr i32, i32* %MyVar, i64 2
%idx1 = getelementptr i32, ptr @MyVar, i64 0
%idx2 = getelementptr i32, ptr @MyVar, i64 1
%idx3 = getelementptr i32, ptr @MyVar, i64 2
These GEP instructions are simply making address computations from the base
address of ``MyVar``. They compute, as follows (using C syntax):
@ -125,12 +125,12 @@ address of ``MyVar``. They compute, as follows (using C syntax):
Since the type ``i32`` is known to be four bytes long, the indices 0, 1 and 2
translate into memory offsets of 0, 4, and 8, respectively. No memory is
accessed to make these computations because the address of ``%MyVar`` is passed
accessed to make these computations because the address of ``@MyVar`` is passed
directly to the GEP instructions.
The obtuse part of this example is in the cases of ``%idx2`` and ``%idx3``. They
result in the computation of addresses that point to memory past the end of the
``%MyVar`` global, which is only one ``i32`` long, not three ``i32``\s long.
``@MyVar`` global, which is only one ``i32`` long, not three ``i32``\s long.
While this is legal in LLVM, it is inadvisable because any load or store with
the pointer that results from these GEP instructions would produce undefined
results.
@ -145,22 +145,21 @@ variable which is always a pointer type. For example, consider this:
.. code-block:: text
%MyStruct = uninitialized global { float*, i32 }
%MyStruct = external global { ptr, i32 }
...
%idx = getelementptr { float*, i32 }, { float*, i32 }* %MyStruct, i64 0, i32 1
%idx = getelementptr { ptr, i32 }, ptr %MyStruct, i64 0, i32 1
The GEP above yields an ``i32*`` by indexing the ``i32`` typed field of the
The GEP above yields a ``ptr`` by indexing the ``i32`` typed field of the
structure ``%MyStruct``. When people first look at it, they wonder why the ``i64
0`` index is needed. However, a closer inspection of how globals and GEPs work
reveals the need. Becoming aware of the following facts will dispel the
confusion:
#. The type of ``%MyStruct`` is *not* ``{ float*, i32 }`` but rather ``{ float*,
i32 }*``. That is, ``%MyStruct`` is a pointer to a structure containing a
pointer to a ``float`` and an ``i32``.
#. The type of ``%MyStruct`` is *not* ``{ ptr, i32 }`` but rather ``ptr``.
That is, ``%MyStruct`` is a pointer (to a structure), not a structure itself.
#. Point #1 is evidenced by noticing the type of the second operand of the GEP
instruction (``%MyStruct``) which is ``{ float*, i32 }*``.
instruction (``%MyStruct``) which is ``ptr``.
#. The first index, ``i64 0`` is required to step over the global variable
``%MyStruct``. Since the second argument to the GEP instruction must always
@ -181,39 +180,31 @@ only involved in the computation of addresses. For example, consider this:
.. code-block:: text
%MyVar = uninitialized global { [40 x i32 ]* }
@MyVar = external global { i32, ptr }
...
%idx = getelementptr { [40 x i32]* }, { [40 x i32]* }* %MyVar, i64 0, i32 0, i64 0, i64 17
%idx = getelementptr { i32, ptr }, ptr @MyVar, i64 0, i32 1
%arr = load ptr, ptr %idx
%idx = getelementptr [40 x i32], ptr %arr, i64 0, i64 17
In this example, we have a global variable, ``%MyVar`` that is a pointer to a
structure containing a pointer to an array of 40 ints. The GEP instruction seems
to be accessing the 18th integer of the structure's array of ints. However, this
is actually an illegal GEP instruction. It won't compile. The reason is that the
pointer in the structure *must* be dereferenced in order to index into the
array of 40 ints. Since the GEP instruction never accesses memory, it is
illegal.
In this example, we have a global variable, ``@MyVar``, which is a pointer to
a structure containing a pointer. Let's assume that this inner pointer points
to an array of type ``[40 x i32]``. The above IR will first compute the address
of the inner pointer, then load the pointer, and then compute the address of
the 18th array element.
In order to access the 18th integer in the array, you would need to do the
following:
This cannot be expressed in a single GEP instruction, because it requires
a memory dereference in between. However, the following example would work
fine:
.. code-block:: text
%idx = getelementptr { [40 x i32]* }, { [40 x i32]* }* %, i64 0, i32 0
%arr = load [40 x i32]*, [40 x i32]** %idx
%idx = getelementptr [40 x i32], [40 x i32]* %arr, i64 0, i64 17
In this case, we have to load the pointer in the structure with a load
instruction before we can index into the array. If the example was changed to:
.. code-block:: text
%MyVar = uninitialized global { [40 x i32 ] }
@MyVar = external global { i32, [40 x i32 ] }
...
%idx = getelementptr { [40 x i32] }, { [40 x i32] }*, i64 0, i32 0, i64 17
%idx = getelementptr { [40 x i32] }, ptr @MyVal, i64 0, i32 1, i64 17
then everything works fine. In this case, the structure does not contain a
pointer and the GEP instruction can index through the global variable, into the
first field of the structure and access the 18th ``i32`` in the array there.
In this case, the structure does not contain a pointer and the GEP instruction
can index through the global variable, into the second field of the structure
and access the 18th ``i32`` in the array there.
Why don't GEP x,0,0,1 and GEP x,1 alias?
----------------------------------------
@ -226,17 +217,15 @@ index. Consider this example:
.. code-block:: llvm
%MyVar = global { [10 x i32] }
%idx1 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 0, i32 0, i64 1
%idx2 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1
@MyVar = external global { [10 x i32] }
%idx1 = getelementptr { [10 x i32] }, ptr @MyVar, i64 0, i32 0, i64 1
%idx2 = getelementptr { [10 x i32] }, ptr @MyVar, i64 1
In this example, ``idx1`` computes the address of the second integer in the
array that is in the structure in ``%MyVar``, that is ``MyVar+4``. The type of
``idx1`` is ``i32*``. However, ``idx2`` computes the address of *the next*
structure after ``%MyVar``. The type of ``idx2`` is ``{ [10 x i32] }*`` and its
value is equivalent to ``MyVar + 40`` because it indexes past the ten 4-byte
integers in ``MyVar``. Obviously, in such a situation, the pointers don't
alias.
array that is in the structure in ``@MyVar``, that is ``MyVar+4``. However,
``idx2`` computes the address of *the next* structure after ``@MyVar``, that is
``MyVar+40``, because it indexes past the ten 4-byte integers in ``MyVar``.
Obviously, in such a situation, the pointers don't alias.
Why do GEP x,1,0,0 and GEP x,1 alias?
-------------------------------------
@ -244,18 +233,16 @@ Why do GEP x,1,0,0 and GEP x,1 alias?
Quick Answer: They compute the same address location.
These two GEP instructions will compute the same address because indexing
through the 0th element does not change the address. However, it does change the
type. Consider this example:
through the 0th element does not change the address. Consider this example:
.. code-block:: llvm
%MyVar = global { [10 x i32] }
%idx1 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1, i32 0, i64 0
%idx2 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1
@MyVar = global { [10 x i32] }
%idx1 = getelementptr { [10 x i32] }, ptr @MyVar, i64 1, i32 0, i64 0
%idx2 = getelementptr { [10 x i32] }, ptr @MyVar, i64 1
In this example, the value of ``%idx1`` is ``%MyVar+40`` and its type is
``i32*``. The value of ``%idx2`` is also ``MyVar+40`` but its type is ``{ [10 x
i32] }*``.
In this example, the value of ``%idx1`` is ``MyVar+40``, and the value of
``%idx2`` is also ``MyVar+40``.
Can GEP index into vector elements?
-----------------------------------
@ -447,8 +434,8 @@ value") if the GEP overflows (i.e. wraps around the end of the address space).
As such, there are some ramifications of this for inbounds GEPs: scales implied
by array/vector/pointer indices are always known to be "nsw" since they are
signed values that are scaled by the element size. These values are also
allowed to be negative (e.g. "``gep i32 *%P, i32 -1``") but the pointer itself
is logically treated as an unsigned value. This means that GEPs have an
allowed to be negative (e.g. "``gep i32, ptr %P, i32 -1``") but the pointer
itself is logically treated as an unsigned value. This means that GEPs have an
asymmetric relation between the pointer base (which is treated as unsigned) and
the offset applied to it (which is treated as signed). The result of the
additions within the offset calculation cannot have signed overflow, but when