Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument. The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.
This consolidates all the logic AMDGPU uses for deducing memory types into a single
function. This will make it much easier to support different ABIs in the future.
Reviewers: arsenm
Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl
Differential Revision: https://reviews.llvm.org/D24614
llvm-svn: 281781
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22783
llvm-svn: 281779
We used to only support instructions with same-type operands.
Instead, use the per-register type information to map each
operand more accurately.
llvm-svn: 281734
When a phi node is finally lowered to a machine instruction it is
important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.
Renamed the existing SkipPHIsAndLabels to SkipPHIsLabelsAndDebug to
more fully describe that it also skips debug entries. Then used the
"new" function SkipPHIsAndLabels when the debug information should not
be skipped when placing the lowered "load" instructions so that it is
placed before the debug entries.
Differential Revision: https://reviews.llvm.org/D23760
llvm-svn: 281727
(and the same for SREM)
This was causing buildbot failures earlier (time outs in the LNT suite).
However, we haven't been able to reproduce this and are suspecting this
was caused by another (reverted) patch.
llvm-svn: 281719
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.
It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.
llvm-svn: 281715
Currently, the machine combiner can proceed matching when -ffast-math is on.
It should also match when only -ffp-contract=fast is specified as was the
case before when DAGCombiner was doing the job.
Patch by: Abderrazek Zaafrani <a.zaafrani@samsung.com>.
Differential Revision: https://reviews.llvm.org/D24366
llvm-svn: 281649
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.
llvm-svn: 281604
Summary:
It was previously not possible for tools to use solely the stackmap
information emitted to reconstruct the return addresses of callsites in
the map, which is necessary to use the information to walk a stack. This
patch adds per-function callsite counts when emitting the stackmap
section in order to resolve the problem. Note that this slightly alters
the stackmap format, so external tools parsing these maps will need to
be updated.
**Problem Details:**
Records only store their offset from the beginning of the function they
belong to. While these records and the functions are output in program
order, it is not possible to determine where the end of one function's
records are without the callsite count when processing the records to
compute return addresses.
Patch by Kavon Farvardin!
Reviewers: atrick, ributzka, sanjoy
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D23487
llvm-svn: 281532
Until AVX512DQ we only support i64/vXi64 sitofp conversion as scalars.
This patch sees if the sign bit extends far enough that we can truncate to a i32 type and then perform sitofp without loss of precision.
Differential Revision: https://reviews.llvm.org/D24345
llvm-svn: 281502
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.
llvm-svn: 281488
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
ldr r0, .CPI0
bl printf
bx lr
.CPI0: &format_string
format_string: .asciz "hello, world!\n"
We can emit:
adr r0, .CPI0
bl printf
bx lr
.CPI0: .asciz "hello, world!\n"
This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).
llvm-svn: 281484
This patch corresponds to review:
https://reviews.llvm.org/D24021
In the initial implementation of this instruction, I forgot to account for
variable indices. This patch fixes PR30189 and should probably be merged into
3.9.1 (I'll open a bug according to the new instructions).
llvm-svn: 281479
Summary: When expanding mul in type legalization make sure the type for shift amount can actually fit the value. This fixes PR30354 https://llvm.org/bugs/show_bug.cgi?id=30354.
Reviewers: hfinkel, majnemer, RKSimon
Subscribers: RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D24478
llvm-svn: 281403
This allows us to, in some cases, create a vector_shuffle out of a build_vector, when
the inputs to the build are extract_elements from two different vectors, at least one
of which is wider than the output. (E.g. a <8 x i16> being constructed out of
elements from a <16 x i16> and a <8 x i16>).
Differential Revision: https://reviews.llvm.org/D24491
llvm-svn: 281402
Cleanup/change the code that checks for possible tailcall conventions to
look the same as the one in the X86 target. This makes the distinction
between calling conventions that can guarnatee tailcalls and the ones
that may tailcall more obvious.
- Add Swift to the mayTailCall list
- PreserveMost seemed to be incorrectly part of the guarnteed tail call
list, move it to the mayTailCall list.
llvm-svn: 281376
To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range.
Followup to D23007
llvm-svn: 281362