This patch adds omp.taskgroup operation according to OpenMP 5.0 2.17.6.
Also added tests for the same.
Reviewed By: kiranchandramohan, peixin
Differential Revision: https://reviews.llvm.org/D127250
To accomodate macOS universal configuration include the assembly files
and `blake3_neon.c` without a CMake check but instead guard their source
with architecture "#ifdef" checks.
Differential Revision: https://reviews.llvm.org/D128132
The failure that caused the previous revert has been fixed
by https://reviews.llvm.org/D126048
Original commit message:
RVV makes heavy use of subregisters due to LMUL>1 and segment
load/store tuples. Enabling subregister liveness tracking improves the quality
of the register allocation.
I've added a command line that can be used to turn it off if it causes compile
time or functional issues. I used the command line to keep the old behavior
for one interesting test case that was testing register allocation.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D128016
There is no instruction to fold NZCV, so, just do not do it.
Without the fix the added test case crashes with an assert
"Mismatched register size in non subreg COPY"
Reviewed By: danilaml
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D127294
This patch implements a new way to generate the CTR loops. Now the
intrinsics inserted in hardware loop pass will be mapped to pseudo
instructions and these pseudo instructions will be expanded to CTR
loop or normal compare+branch loop in this post ISEL pass.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D122125
Use it in place of VSELECT_VL+VRGATHER*_VL.
This simplifies the isel patterns.
Overall, I think trying to match select+op to create masked instructions
in isel doesn't scale. We either need to do it in DAG combine, pre-isel
peepole, or post-isel peephole. I don't yet know which is the right
answer, but for this case it seemed best to be able to request the
masked form directly from lowering.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D128023
For below case, virtual register is defined twice in the self loop. We
don't need to spill %0 after the third instruction `%0 = def (tied %0)`,
because it is defined in the second instruction `%0 = def`.
1 bb.1
2 %0 = def
3 %0 = def (tied %0)
4 ...
5 jmp bb.1
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D125079
This fixes a useless filecheck and wrong comment for always-inline.ll. Testing
has been done using ninja check-llvm and llvm-lit always-inline.ll --show-all.
Reviewed By: modimo, hoy
Differential Revision: https://reviews.llvm.org/D127815
Microsoft does not seem to document the flag. Ignoring it for now is probably
better than getting an unknown flag error.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D128231
This patch implements VSCode DAP logpoints feature (also called tracepoint
in other VS debugger).
This will provide a convenient way for user to do printf style logging
debugging without pausing debuggee.
Differential Revision: https://reviews.llvm.org/D127702
The error used to look like this:
ld64.lld: error: undefined symbol: _foo
>>> referenced by /path/to/bar.o:(symbol _baz+0x4)
If DWARF line information is available, we now show where in the source
the references are coming from:
ld64.lld: error: unreferenced symbol: _foo
>>> referenced by: bar.cpp:42 (/path/to/bar.cpp:42)
>>> /path/to/bar.o:(symbol _baz+0x4)
Differential Revision: https://reviews.llvm.org/D128184
Add missing intrinsics and tests for them. An expanding macro
from _vel_pack_f32p to __builtin_ve_vl_pack_f32p and others is
already defined in clang/lib/Headers/velintrin.h.
Reviewed By: efocht
Differential Revision: https://reviews.llvm.org/D128120
The instructions that generate the source of dual source blend export
should run in strict-wqm. That is if any lane in a quad is active,
we need to enable all four lanes of that quad to make the shuffling
operation before exporting to dual source blend target work correctly.
Differential Revision: https://reviews.llvm.org/D127981
LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad
(if any pixel is enabled in the quad, data is written
to all 4 pixels/threads in the quad).
Tag LDS_PARAM_LOAD and LDS_DIRECT_LOAD as using strict_wqm
to enforce this and avoid lane clobbering issues.
Note that only the instruction itself is tagged.
The implicit uses of these do not need to be set WQM.
The reduces unnecessary WQM calculation of M0.
Differential Revision: https://reviews.llvm.org/D127977
Detect LDS direct WAR/WAW hazards and compute values for
wait_vdst (va_vdst) parameter. Where appropriate this
raises wait_vdst from the default 0 to allow concurrent
issue of LDS direct with VALU execution.
Also detect LDS direct versus VMEM source VGPR hazards
and insert vm_vsrc=0 waits using s_waitcnt_depctr.
Differential Revision: https://reviews.llvm.org/D127963
If we would scalarize a fixed vector, we know we can't do so for a scalable one. However, there's no need to crash, we can instead simply return a invalid cost which will work its way through the computation (since invalid is sticky), and the client should bail out.
Sorry for the lack of test here. The particular codepath I saw this reached on was the result of another bug.
Make Offsets and OpcodeOperandTypes tables human-readable by printing the
instruction name before the operand list.
In effect, this makes debugging generated `getOperandType` possible.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D127931
This was a bug introduced in d764aa. A pointer type is not a primitive type, and thus we were ending up dividing by zero when computing VLMax.
Differential Revision: https://reviews.llvm.org/D128219
There are instances where using paired vector stores leads to significant
performance degradation due to issues with store forwarding.To avoid falling
into this trap with compiler - generated code, we will not emit these
instructions unless the user requests them explicitly(with a builtin or by
specifying the option).
Reviewed By : lei, amyk, saghir
Differential Revision: https://reviews.llvm.org/D127218
Make Offsets and OpcodeOperandTypes tables human-readable by printing the
instruction name before the operand list.
In effect, this makes debugging generated `getOperandType` possible.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D127931
Add trace load functionality to SBDebugger via the `LoadTraceFromFile` method.
Update intelpt test case class to have `testTraceLoad` method so we can take advantage of
the testApiAndSB decorator to test both the CLI and SB without duplicating code.
Differential Revision: https://reviews.llvm.org/D128107
An AArch64ISD::DUP is just a splat, where the known bits for each lane
are the same as the input. This teaches that to computeKnownBitsForTargetNode.
Problems arise for constants though, as a constant BUILD_VECTOR can be
lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then
turn back into a constant BUILD_VECTOR leading to an infinite cycle.
This has been prevented by adding a isTargetCanonicalConstantNode node
to prevent the conversion back into a BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D128144