IR: Define byref parameter attribute
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.
This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.
My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.
This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.
Background:
There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.
It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.
The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.
This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.
Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.
The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.
Additional context is in D79744.
2020-06-06 04:58:47 +08:00
|
|
|
; RUN: not llvm-as %s -o /dev/null 2>&1 | FileCheck %s
|
|
|
|
|
|
|
|
; CHECK: Attribute 'byref' type does not match parameter!
|
|
|
|
; CHECK-NEXT: void (i32*)* @byref_mismatched_pointee_type0
|
|
|
|
define void @byref_mismatched_pointee_type0(i32* byref(i8)) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK: Attribute 'byref' type does not match parameter!
|
|
|
|
; CHECK-NEXT: void (i8*)* @byref_mismatched_pointee_type1
|
|
|
|
define void @byref_mismatched_pointee_type1(i8* byref(i32)) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
%opaque.ty = type opaque
|
|
|
|
|
|
|
|
; CHECK: Attributes 'byval', 'byref', 'inalloca', and 'preallocated' do not support unsized types!
|
|
|
|
; CHECK-NEXT: void (%opaque.ty*)* @byref_unsized
|
|
|
|
define void @byref_unsized(%opaque.ty* byref(%opaque.ty)) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK: Attributes 'byval', 'inalloca', 'preallocated', 'inreg', 'nest', 'byref', and 'sret' are incompatible!
|
|
|
|
; CHECK-NEXT: void (i32*)* @byref_byval
|
|
|
|
define void @byref_byval(i32* byref(i32) byval(i32)) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK: Attributes 'byval', 'inalloca', 'preallocated', 'inreg', 'nest', 'byref', and 'sret' are incompatible!
|
|
|
|
; CHECK-NEXT: void (i32*)* @byref_inalloca
|
|
|
|
define void @byref_inalloca(i32* byref(i32) inalloca) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK: Attributes 'byval', 'inalloca', 'preallocated', 'inreg', 'nest', 'byref', and 'sret' are incompatible!
|
|
|
|
; CHECK-NEXT: void (i32*)* @byref_preallocated
|
|
|
|
define void @byref_preallocated(i32* byref(i32) preallocated(i32)) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK: Attributes 'byval', 'inalloca', 'preallocated', 'inreg', 'nest', 'byref', and 'sret' are incompatible!
|
|
|
|
; CHECK-NEXT: void (i32*)* @byref_sret
|
|
|
|
define void @byref_sret(i32* byref(i32) sret) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK: Attributes 'byval', 'inalloca', 'preallocated', 'inreg', 'nest', 'byref', and 'sret' are incompatible!
|
|
|
|
; CHECK-NEXT: void (i32*)* @byref_inreg
|
|
|
|
define void @byref_inreg(i32* byref(i32) inreg) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK: Attributes 'byval', 'inalloca', 'preallocated', 'inreg', 'nest', 'byref', and 'sret' are incompatible!
|
|
|
|
; CHECK-NEXT: void (i32*)* @byref_nest
|
|
|
|
define void @byref_nest(i32* byref(i32) nest) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
2020-09-29 21:33:55 +08:00
|
|
|
; CHECK: Wrong types for attribute: inalloca nest noalias nocapture nonnull readnone readonly byref(i32) byval(i32) preallocated(i32) sret(i32) align 1 dereferenceable(1) dereferenceable_or_null(1)
|
IR: Define byref parameter attribute
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.
This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.
My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.
This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.
Background:
There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.
It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.
The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.
This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.
Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.
The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.
Additional context is in D79744.
2020-06-06 04:58:47 +08:00
|
|
|
; CHECK-NEXT: void (i32)* @byref_non_pointer
|
|
|
|
define void @byref_non_pointer(i32 byref(i32)) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
define void @byref_callee([64 x i8]* byref([64 x i8])) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
define void @no_byref_callee(i8*) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK: cannot guarantee tail call due to mismatched ABI impacting function attributes
|
|
|
|
; CHECK-NEXT: musttail call void @byref_callee([64 x i8]* byref([64 x i8]) %cast)
|
|
|
|
; CHECK-NEXT: i8* %ptr
|
|
|
|
define void @musttail_byref_caller(i8* %ptr) {
|
|
|
|
%cast = bitcast i8* %ptr to [64 x i8]*
|
|
|
|
musttail call void @byref_callee([64 x i8]* byref([64 x i8]) %cast)
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK: cannot guarantee tail call due to mismatched ABI impacting function attributes
|
|
|
|
; CHECK-NEXT: musttail call void @byref_callee([64 x i8]* %ptr)
|
|
|
|
; CHECK-NEXT: [64 x i8]* %ptr
|
|
|
|
define void @musttail_byref_callee([64 x i8]* byref([64 x i8]) %ptr) {
|
|
|
|
musttail call void @byref_callee([64 x i8]* %ptr)
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
define void @byref_callee_align32(i8* byref([64 x i8]) align 32) {
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK: cannot guarantee tail call due to mismatched ABI impacting function attributes
|
|
|
|
; CHECK-NEXT: musttail call void @byref_callee_align32(i8* byref([64 x i8]) align 32 %ptr)
|
|
|
|
; CHECK-NEXT: i8* %ptr
|
|
|
|
define void @musttail_byref_caller_mismatched_align(i8* byref([64 x i8]) align 16 %ptr) {
|
|
|
|
musttail call void @byref_callee_align32(i8* byref([64 x i8]) align 32 %ptr)
|
|
|
|
ret void
|
|
|
|
}
|