Commit Graph

159242 Commits

Author SHA1 Message Date
Alexey Bataev f1ee2738b3 [SLP]Fix a crash when insert subvector is out of range.
If the OffsetBeg + InsertVecSz is greater than VecSz, need to estimate
the cost as shuffle of 2 vector, not as insert of subvector. Otherwise,
the inserted subvector is out of range and compiler may crash.

Differential Revision: https://reviews.llvm.org/D128071
2022-06-21 07:16:35 -07:00
Simon Pilgrim ac4cb1775b [X86] fold (and (mul x, c1), c2) -> (mul x, (and c1, c2)) iff c2 is all/no bits mask
Noticed on D128216 - if we're zeroing out vector elements of a mul/mulh result then see if we can merge the and-mask into the mul by just multiplying by zero.

Ideally we'd make this generic (similar to the existing foldSelectWithIdentityConstant?), but these cases are appearing very late, after the constants have been lowered to constant-pool loads.
2022-06-21 15:10:43 +01:00
Florian Hahn 4ea6891f95
[ConstraintElimination] Remove unneeded StackEntry::Condition (NFC).
The field was only used for debug printing. Print constraint from the
system instead.
2022-06-21 15:57:29 +02:00
Nico Weber 6a4056ab2a Revert "[JITLink][Orc] Add MemoryMapper interface with InProcess implementation"
This reverts commit 6ede652050.
Doesn't build on Windows, see https://reviews.llvm.org/D127491#3598773
2022-06-21 09:56:49 -04:00
Jay Foad 929a8ad2b6 [AMDGPU] Update SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE for GFX11
The granularity of SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE changed
in GFX11. It is now in units of 256 dwords instead of 128 dwords.

COMPUTE_PGM_RSRC2.LDS_SIZE is unaffected. It is still in units of
128 dwords.

Differential Revision: https://reviews.llvm.org/D128179
2022-06-21 14:48:12 +01:00
Nikita Popov ed63fcb232 [GlobalsModRef] Remove check for allocator calls
As the FIXME already indicates, I don't see why this code would be
necessary. If there's a call to an allocator function, that should
get treated just like any other function call -- usually it will be
a declaration and handled conservatively based on memory attributes
only. There should be no need to explicitly force it to be modref.
No test failures either, so I think this is just dead code.

Differential Revision: https://reviews.llvm.org/D127273
2022-06-21 14:24:13 +02:00
Anubhab Ghosh 6ede652050 [JITLink][Orc] Add MemoryMapper interface with InProcess implementation
MemoryMapper class takes care of cross-process and in-process address space
reservation, mapping, transferring content and applying protections.

Implementations of this class can support different ways to do this such
as using shared memory, transferring memory contents over EPC or just
mapping memory in the same process (InProcessMemoryMapper).

Reviewed By: sgraenitz, lhames

Differential Revision: https://reviews.llvm.org/D127491
2022-06-21 13:44:17 +02:00
Simon Pilgrim 057db2002b [X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X)
If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc.
2022-06-21 12:31:01 +01:00
David Green fb4d3d238f [AArch64] Remove unnecessary funnel shift sve costs.
D127680 added some unnecessary funnel shift costs for AArch64 to "match
the legacy behaviour". The default costs are closer to the correct
values and line up with the scalar/neon costs better. Remove the lines
again to clean up the code, they can be added back at a later date with
better values if needed.
2022-06-21 12:21:37 +01:00
Simon Pilgrim 843d43e62a [X86] computeKnownBitsForTargetNode - add X86ISD::VBROADCAST_LOAD handling
This requires us to override the isTargetCanonicalConstantNode callback introduced in D128144, so we can recognise the various cases where a VBROADCAST_LOAD constant is being reused at different vector widths to prevent infinite loops.
2022-06-21 11:48:01 +01:00
Florian Hahn 2a9313ee0b
[ConstraintElimination] Move logic to check condition to helper (NFC). 2022-06-21 11:50:33 +02:00
David Green 3f81841474 [AArch64] Add Extract(DUP(C)) as a canonical constant.
As a followup to D128144, this adds extract(DUP(C)) as a canonical
constant to prevent it being transformed back into a BUILD_VECTOR,
leading to an infinite loop.
2022-06-21 09:51:22 +01:00
Carl Ritson 62abc8c200 [AMDGPU] Set GFX11 null export target based on export attributes
If shader only has depth exports use MRTZ otherwise use MRT0.

Differential Revision: https://reviews.llvm.org/D128185
2022-06-21 09:40:31 +01:00
Markus Lavin 3815ae29b5 [machinesink] fix debug invariance issue
Do not include debug instructions when comparing block sizes with
thresholds.

Differential Revision: https://reviews.llvm.org/D127208
2022-06-21 08:13:09 +02:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Chen Zheng 9cfbe7bbfe [PowerPC][ctrloop] handles calls in preheader before MTCTRloop 2022-06-21 01:22:39 -04:00
Argyrios Kyrtzidis bb095880f8 [Support/BLAKE3] Do a CMake check for the `-mavx512vl` flag before applying it 2022-06-20 22:04:14 -07:00
Argyrios Kyrtzidis 34362f96d2 [Support/BLAKE3] Enable the SIMD implementations for macOS universal builds
To accomodate macOS universal configuration include the assembly files
and `blake3_neon.c` without a CMake check but instead guard their source
with architecture "#ifdef" checks.

Differential Revision: https://reviews.llvm.org/D128132
2022-06-20 21:18:44 -07:00
Craig Topper e01353f816 [RISCV] Add RISCVISD opcode for PseudoAddTPRel.
Use it along with RISCVISD::HI and ADD_LO to avoid emitting
MachineSDNodes during lowering.
2022-06-20 20:56:52 -07:00
Craig Topper 59cde2133d Recommit "[RISCV] Enable subregister liveness tracking for RVV."
The failure that caused the previous revert has been fixed
by https://reviews.llvm.org/D126048

Original commit message:

RVV makes heavy use of subregisters due to LMUL>1 and segment
load/store tuples. Enabling subregister liveness tracking improves the quality
of the register allocation.

I've added a command line that can be used to turn it off if it causes compile
time or functional issues. I used the command line to keep the old behavior
for one interesting test case that was testing register allocation.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D128016
2022-06-20 20:46:06 -07:00
Serguei Katkov 163c77b2e0 [AARCH64 folding] Do not fold any copy with NZCV
There is no instruction to fold NZCV, so, just do not do it.

Without the fix the added test case crashes with an assert
"Mismatched register size in non subreg COPY"

Reviewed By: danilaml
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D127294
2022-06-21 10:38:49 +07:00
Kazu Hirata d66cbc565a Don't use Optional::hasValue (NFC) 2022-06-20 20:26:05 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Kazu Hirata 064a08cd95 Don't use Optional::hasValue (NFC) 2022-06-20 20:05:16 -07:00
Chen Zheng a71fe49bb5 [PowerPC] add a new pass to expand ctr loop pseudos
This patch implements a new way to generate the CTR loops. Now the
intrinsics inserted in hardware loop pass will be mapped to pseudo
instructions and these pseudo instructions will be expanded to CTR
loop or normal compare+branch loop in this post ISEL pass.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D122125
2022-06-20 22:57:24 -04:00
Craig Topper 16d3a82de5 [RISCV] Add merge operand to RISCVISD::VRGATHER*_VL nodes.
Use it in place of VSELECT_VL+VRGATHER*_VL.

This simplifies the isel patterns.

Overall, I think trying to match select+op to create masked instructions
in isel doesn't scale. We either need to do it in DAG combine, pre-isel
peepole, or post-isel peephole. I don't yet know which is the right
answer, but for this case it seemed best to be able to request the
masked form directly from lowering.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D128023
2022-06-20 18:58:24 -07:00
chenglin.bi 6c951c5ee6 [SelectionDAG][DAGCombiner] Reuse exist node by reassociate
When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122539
2022-06-21 09:45:19 +08:00
Luo, Yuanke 44e8a205f4 [fastregalloc] Enhance the heuristics for liveout in self loop.
For below case, virtual register is defined twice in the self loop. We
don't need to spill %0 after the third instruction `%0 = def (tied %0)`,
because it is defined in the second instruction `%0 = def`.

1 bb.1
2 %0 = def
3 %0 = def (tied %0)
4 ...
5 jmp bb.1

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D125079
2022-06-21 09:18:49 +08:00
Phoebe Wang edcc68e86f [X86] Make sure SF is updated when optimizing for `jg/jge/jl/jle`
This fixes issue #56103.

Reviewed By: mingmingl

Differential Revision: https://reviews.llvm.org/D128122
2022-06-21 09:09:27 +08:00
Ruiling Song 732eed40fd [AMDGPU] Mark GFX11 dual source blend export as strict-wqm
The instructions that generate the source of dual source blend export
should run in strict-wqm. That is if any lane in a quad is active,
we need to enable all four lanes of that quad to make the shuffling
operation before exporting to dual source blend target work correctly.

Differential Revision: https://reviews.llvm.org/D127981
2022-06-20 21:58:12 +01:00
Piotr Sobczak 29621c13ef [AMDGPU] Tag GFX11 LDS loads as using strict_wqm
LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad
(if any pixel is enabled in the quad, data is written
to all 4 pixels/threads in the quad).

Tag LDS_PARAM_LOAD and LDS_DIRECT_LOAD as using strict_wqm
to enforce this and avoid lane clobbering issues.
Note that only the instruction itself is tagged.
The implicit uses of these do not need to be set WQM.
The reduces unnecessary WQM calculation of M0.

Differential Revision: https://reviews.llvm.org/D127977
2022-06-20 21:58:12 +01:00
Jay Foad 13107c2770 [AMDGPU] Add support for GFX11 LDSDIR hazards
Detect LDS direct WAR/WAW hazards and compute values for
wait_vdst (va_vdst) parameter.  Where appropriate this
raises wait_vdst from the default 0 to allow concurrent
issue of LDS direct with VALU execution.

Also detect LDS direct versus VMEM source VGPR hazards
and insert vm_vsrc=0 waits using s_waitcnt_depctr.

Differential Revision: https://reviews.llvm.org/D127963
2022-06-20 21:58:12 +01:00
Philip Reames 0aebd1d875 [RISCV] Fix crash when costing scalable gather/scatter of pointer
This was a bug introduced in d764aa. A pointer type is not a primitive type, and thus we were ending up dividing by zero when computing VLMax.

Differential Revision: https://reviews.llvm.org/D128219
2022-06-20 12:50:42 -07:00
Florian Hahn 6dd772d348
[ConstraintElimination] Move logic to get a constraint to helper (NFC). 2022-06-20 21:34:07 +02:00
Nemanja Ivanovic e09f6ff3c1 [PowerPC] Disable automatic generation of STXVP
There are instances where using paired vector stores leads to significant
performance degradation due to issues with store forwarding.To avoid falling
into this trap with compiler - generated code, we will not emit these
instructions unless the user requests them explicitly(with a builtin or by
specifying the option).

Reviewed By : lei, amyk, saghir

Differential Revision: https://reviews.llvm.org/D127218
2022-06-20 14:30:29 -05:00
Kazu Hirata ad7ce1e769 Don't use Optional::hasValue (NFC) 2022-06-20 11:49:10 -07:00
Kazu Hirata 5413bf1bac Don't use Optional::hasValue (NFC) 2022-06-20 11:33:56 -07:00
David Green c0ecbfa4fd [AArch64] Known bits for AArch64ISD::DUP
An AArch64ISD::DUP is just a splat, where the known bits for each lane
are the same as the input. This teaches that to computeKnownBitsForTargetNode.

Problems arise for constants though, as a constant BUILD_VECTOR can be
lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then
turn back into a constant BUILD_VECTOR leading to an infinite cycle.
This has been prevented by adding a isTargetCanonicalConstantNode node
to prevent the conversion back into a BUILD_VECTOR.

Differential Revision: https://reviews.llvm.org/D128144
2022-06-20 19:11:57 +01:00
Simon Pilgrim 8254966062 [X86] LowerINSERT_VECTOR_ELT - always lower v32i8/v16i16 allones insertions on AVX1 as OR ops
v32i8/v16i16 blend shuffles on AVX1 will expand to OR(AND,ANDN) patterns which can be easily broken by other combines
2022-06-20 18:43:03 +01:00
Philip Reames db85345f2d [BasicTTI] Allow generic handling of scalable vector fshr/fshl
This change removes an explicit scalable vector bailout for fshl and fshr. This bailout was added in 60e4698b9a, when sinking a unconditional bailout for all intrinsics into selected cases. Its not clear if the bailout was originally unneeded, or if our cost model infrastructure has simply matured in the meantime. Either way, the generic code appears to handle scalable vectors without issue.

Note that the RISC-V cost model changes here aren't particularly interesting. They do probably better match the current lowering, but the main point is to have coverage of the BasicTTI path and simply show lack of crashing.

AArch64 costing was changed to preserve legacy behavior.  There will most likely be an upcoming change to use the generic costs there too, but I didn't want to make that change not being particularly familiar with the target.

Differential Revision: https://reviews.llvm.org/D127680
2022-06-20 10:38:51 -07:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Arthur Eubanks 13ff7d6f39 Revert "[GlobalOpt] Perform store->dominated load forwarding for stored once globals"
This reverts commit 6f348b146b.

Am seeing internal test failures plus a linux kernel breakage reported due to this.
2022-06-20 10:26:47 -07:00
Arthur Eubanks 1cd2c72bef Revert "[GlobalOpt] Preserve CFG analyses"
This reverts commit cc65f3e167.

Causes crashes: https://github.com/llvm/llvm-project/issues/56131
2022-06-20 10:25:10 -07:00
Philip Reames 14847098f9 [RISCV] Delete unexercised VL=0 vsetvli compatibility logic
The code being removed is technically correct; if we end up with two VL=0 instructions next to each other, we can avoid a state transition if the second is a scalar move.  However, since both ops are also nops, we should simply delete them instead.  As such, this compatibility rule simply complicates the code for no purpose.
2022-06-20 10:15:31 -07:00
David Candler d3919a8cc5 [ConstantFolding] Respect denormal handling mode attributes when folding instructions
Depending on the environment, a floating point instruction should
treat denormal inputs as zero, and/or flush a denormal output to zero.
Denormals are not currently accounted for when an instruction gets
folded to a constant, which can lead to differences in output between
a folded and a unfolded instruction when running on the target. The
denormal handling mode can be set by the function level attribute
denormal-fp-math, which this patch uses to determine whether any
denormal inputs to or outputs from folding should be zero, and that
the sign is set appropriately.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D116952
2022-06-20 16:41:46 +01:00
Guillaume Chatelet d3cf49e984 [Alignment] Remove alignTo version taking a MaybeAlign 2022-06-20 15:15:53 +00:00
Guillaume Chatelet 589c8d6fb9 [NFC] Simplify alignment code in MemorySanitizer 2022-06-20 15:15:53 +00:00
Guillaume Chatelet 7296811910 [NFC] Simplify alignment code in CoroFrame 2022-06-20 15:15:52 +00:00
Guillaume Chatelet d154d0ac06 [NFC] Simplify code 2022-06-20 15:15:52 +00:00
Florian Hahn cebe7ae881
[ConstraintElimination] Move logic to add constraint to helper (NFC). 2022-06-20 17:08:35 +02:00