Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.
Differential Revision: https://reviews.llvm.org/D88540
This to help review the impact of https://reviews.llvm.org/D89952 which
allows targets to fine tune what SelectionDAG does when vector CTPOP is
not legal.
This patch make the outliner emit CFI instructions in a few more
places:
* after LR is restored, but before the return in an outlined
function
* around save/restore of LR to/from a register at calls to outlined
functions
* around save/restore of LR to/from the stack at calls to outlined
functions
The latter two only when the function does NOT spill LR. If the
function spills LR, then outliner generated saves/restores around
calls are not considered interesting for unwinding the frame.
Differential Revision: https://reviews.llvm.org/D89483
There were cases where a VCMP and a VPST were merged even if the VCMP
didn't have the same defs of its operands as the VPST. This is fixed by
adding RDA checks for the defs. This however gave rise to cases where
the new VPST created would precede the un-merged VCMP and so would fail
a predicate mask assertion since the VCMP wasn't predicated. This was
solved by converting the VCMP to a VPT instead of inserting the new
VPST.
Differential Revision: https://reviews.llvm.org/D90461
The legalization did not forward the listener which prevents dynamic
legalization and prevents rollbacks. This handled that and then changed
the associated pass to support all other std ops to support partial
conversion.
Previously, this lowering was failing, but due to the
initial bug, the op's modifications were not reverted, and thus the
pattern matching succeeded.
Differential Revision: https://reviews.llvm.org/D91079
Introduce struct FlattenInfo to group some of the bookkeeping. Besides this
being a bit of a clean-up, it is a prep step for next additions (D90640). I
could take things a bit further, but thought this was a good first step also
not to make this change too large.
Differential Revision: https://reviews.llvm.org/D90408
In C++ with -Werror=comment, multiline comments are not allowed.
clang-format could accidentally introduce multiline comments when reflowing.
This adapts clang-format to not introduce multiline comments by not allowing a
break after `\`. Note that this does not apply to comment lines that already are
multiline comments, such as comments in macros.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D90949
The altera kernel name restriction check finds kernel files and include
directives whose filename is "kernel.cl", "Verilog.cl", or "VHDL.cl".
Such kernel file names cause the Altera Offline Compiler to generate
intermediate design files that have the same names as certain internal
files, which leads to a compilation error.
As per the "Guidelines for Naming the Kernel" section in the "Intel FPGA
SDK for OpenCL Pro Edition: Programming Guide."
This reverts the reversion from 43a38a6523.
Enumerating elements in these classes is necessary to enable custom
operand accessors for variadic operands.
Depends On D90919
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D90923
Operations in a MLIR have a dictionary of attributes attached. Expose
those to Python bindings through a pseudo-container that can be indexed
either by attribute name, producing a PyAttribute, or by a contiguous
index for enumeration purposes, producing a PyNamedAttribute.
Depends On D90917
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D90919
This patch turns VPWidenCall into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D84681
When we fold a VCMP into a VPST instruction any kill flags between the
old VCMP position and the new insertion point need to be removed, in
order to keep the verifier happy.
Differential Revision: https://reviews.llvm.org/D90964
Add support for the Neoverse V1 CPU to the ARM and AArch64 backends.
This is based on patches from Mark Murray and Victor Campos.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D90765
Fold
VT = (and (sign_extend NarrowVT to VT) #bitmask)
into
VT = (zero_extend NarrowVT)
With this combine, the test replaces a sign extended load + an
unsigned extention with a zero extended load to render one of the
operands of the last multiplication.
BEFORE | AFTER
f_i16_i32: | f_i16_i32:
.fnstart | .fnstart
ldrsh r0, [r0] | ldrh r1, [r1]
ldrsh r1, [r1] | ldrsh r0, [r0]
smulbb r0, r1, r0 | smulbb r0, r0, r1
uxth r1, r1 | mul r0, r0, r1
mul r0, r0, r1 | bx lr
bx lr |
Reviewed By: resistor
Differential Revision: https://reviews.llvm.org/D90605
If an enum has different names for the same constant, make sure only the first one declared gets added into the switch. Failing to do so results in a compiler error as 2 case labels can't represent the same value.
```
lang=c
enum Numbers{
One,
Un = One,
Two,
Deux = Two,
Three,
Trois = Three
};
// Old behaviour
switch (<Number>) {
case One:
case Un:
case Two:
case Duex:
case Three:
case Trois: break;
}
// New behaviour
switch (<Number>) {
case One:
case Two:
case Three: break;
}
```
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D90555
Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated.
This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D89628
When LLDB Python bindings are used and stack backtraces are enabled
for logging, getMainExecutable() is called with argv0 being null.
This caused the fallback function getprogpath() (used on FreeBSD, NetBSD
and Linux) to segfault. Make it handle null executable name gracefully.
Differential Revision: https://reviews.llvm.org/D91012
TestWatchpointMultipleThreads currently accounts for two scenarios:
setting the watchpoint before a new thread starts (presumably, verifying
that it will be propagated to the new thread) and setting it after
the thread starts (presumably, verifying that a new watchpoint is set
on all threads). However, the latter test currently assumes that
the thread will be reported to the debugger before the breakpoint is
hit. This is not the case on FreeBSD and NetBSD.
On NetBSD, new threads do not inherit debug registers from their parent
threads. Instead, LLDB copies them manually after the new thread is
reported. Since the thread is actually reported after the second
breakpoint location, both tests effectively check the same behavior
(i.e. watchpoint being set before the new thread is reported).
On FreeBSD, new threads inherit debug registers and we seem to hit
an interesting race condition. While the thread is reported after
the breakpoint is hit, the kernel seems to construct it and copy
the debug register before that happens. As a result, setting
the watchpoint at the second breakpoint location modifies the debug
registers of the first thread after they have been copied to the second
thread but before the debugger is aware of it. Therefore,
the watchpoint is not propagated to the second thread and the test
fails.
Extend the test to cover all three possible scenarios: setting
watchpoint before the thread is lanched, after it is launched but before
it is guaranteed to have started and after it has actually started. Add
a second barrier to account for the last case. This should ensure that
the second assumption (i.e. that the watchpoint is set on all currently
known threads) is actually tested on FreeBSD and NetBSD.
Differential Revision: https://reviews.llvm.org/D91030