llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexey Bataev	f1ee2738b3	[SLP]Fix a crash when insert subvector is out of range. If the OffsetBeg + InsertVecSz is greater than VecSz, need to estimate the cost as shuffle of 2 vector, not as insert of subvector. Otherwise, the inserted subvector is out of range and compiler may crash. Differential Revision: https://reviews.llvm.org/D128071	2022-06-21 07:16:35 -07:00
Simon Pilgrim	ac4cb1775b	[X86] fold (and (mul x, c1), c2) -> (mul x, (and c1, c2)) iff c2 is all/no bits mask Noticed on D128216 - if we're zeroing out vector elements of a mul/mulh result then see if we can merge the and-mask into the mul by just multiplying by zero. Ideally we'd make this generic (similar to the existing foldSelectWithIdentityConstant?), but these cases are appearing very late, after the constants have been lowered to constant-pool loads.	2022-06-21 15:10:43 +01:00
Florian Hahn	4ea6891f95	[ConstraintElimination] Remove unneeded StackEntry::Condition (NFC). The field was only used for debug printing. Print constraint from the system instead.	2022-06-21 15:57:29 +02:00
Nico Weber	6a4056ab2a	Revert "[JITLink][Orc] Add MemoryMapper interface with InProcess implementation" This reverts commit `6ede652050`. Doesn't build on Windows, see https://reviews.llvm.org/D127491#3598773	2022-06-21 09:56:49 -04:00
Jay Foad	929a8ad2b6	[AMDGPU] Update SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE for GFX11 The granularity of SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE changed in GFX11. It is now in units of 256 dwords instead of 128 dwords. COMPUTE_PGM_RSRC2.LDS_SIZE is unaffected. It is still in units of 128 dwords. Differential Revision: https://reviews.llvm.org/D128179	2022-06-21 14:48:12 +01:00
Nikita Popov	ed63fcb232	[GlobalsModRef] Remove check for allocator calls As the FIXME already indicates, I don't see why this code would be necessary. If there's a call to an allocator function, that should get treated just like any other function call -- usually it will be a declaration and handled conservatively based on memory attributes only. There should be no need to explicitly force it to be modref. No test failures either, so I think this is just dead code. Differential Revision: https://reviews.llvm.org/D127273	2022-06-21 14:24:13 +02:00
Anubhab Ghosh	6ede652050	[JITLink][Orc] Add MemoryMapper interface with InProcess implementation MemoryMapper class takes care of cross-process and in-process address space reservation, mapping, transferring content and applying protections. Implementations of this class can support different ways to do this such as using shared memory, transferring memory contents over EPC or just mapping memory in the same process (InProcessMemoryMapper). Reviewed By: sgraenitz, lhames Differential Revision: https://reviews.llvm.org/D127491	2022-06-21 13:44:17 +02:00
Simon Pilgrim	057db2002b	[X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X) If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc.	2022-06-21 12:31:01 +01:00
David Green	fb4d3d238f	[AArch64] Remove unnecessary funnel shift sve costs. D127680 added some unnecessary funnel shift costs for AArch64 to "match the legacy behaviour". The default costs are closer to the correct values and line up with the scalar/neon costs better. Remove the lines again to clean up the code, they can be added back at a later date with better values if needed.	2022-06-21 12:21:37 +01:00
Simon Pilgrim	843d43e62a	[X86] computeKnownBitsForTargetNode - add X86ISD::VBROADCAST_LOAD handling This requires us to override the isTargetCanonicalConstantNode callback introduced in D128144, so we can recognise the various cases where a VBROADCAST_LOAD constant is being reused at different vector widths to prevent infinite loops.	2022-06-21 11:48:01 +01:00
Florian Hahn	2a9313ee0b	[ConstraintElimination] Move logic to check condition to helper (NFC).	2022-06-21 11:50:33 +02:00
David Green	3f81841474	[AArch64] Add Extract(DUP(C)) as a canonical constant. As a followup to D128144, this adds extract(DUP(C)) as a canonical constant to prevent it being transformed back into a BUILD_VECTOR, leading to an infinite loop.	2022-06-21 09:51:22 +01:00
Carl Ritson	62abc8c200	[AMDGPU] Set GFX11 null export target based on export attributes If shader only has depth exports use MRTZ otherwise use MRT0. Differential Revision: https://reviews.llvm.org/D128185	2022-06-21 09:40:31 +01:00
Markus Lavin	3815ae29b5	[machinesink] fix debug invariance issue Do not include debug instructions when comparing block sizes with thresholds. Differential Revision: https://reviews.llvm.org/D127208	2022-06-21 08:13:09 +02:00
Kazu Hirata	7a47ee51a1	[llvm] Don't use Optional::getValue (NFC)	2022-06-20 22:45:45 -07:00
Chen Zheng	9cfbe7bbfe	[PowerPC][ctrloop] handles calls in preheader before MTCTRloop	2022-06-21 01:22:39 -04:00
Argyrios Kyrtzidis	bb095880f8	[Support/BLAKE3] Do a CMake check for the `-mavx512vl` flag before applying it	2022-06-20 22:04:14 -07:00
Argyrios Kyrtzidis	34362f96d2	[Support/BLAKE3] Enable the SIMD implementations for macOS universal builds To accomodate macOS universal configuration include the assembly files and `blake3_neon.c` without a CMake check but instead guard their source with architecture "#ifdef" checks. Differential Revision: https://reviews.llvm.org/D128132	2022-06-20 21:18:44 -07:00
Craig Topper	e01353f816	[RISCV] Add RISCVISD opcode for PseudoAddTPRel. Use it along with RISCVISD::HI and ADD_LO to avoid emitting MachineSDNodes during lowering.	2022-06-20 20:56:52 -07:00
Craig Topper	59cde2133d	Recommit "[RISCV] Enable subregister liveness tracking for RVV." The failure that caused the previous revert has been fixed by https://reviews.llvm.org/D126048 Original commit message: RVV makes heavy use of subregisters due to LMUL>1 and segment load/store tuples. Enabling subregister liveness tracking improves the quality of the register allocation. I've added a command line that can be used to turn it off if it causes compile time or functional issues. I used the command line to keep the old behavior for one interesting test case that was testing register allocation. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D128016	2022-06-20 20:46:06 -07:00
Serguei Katkov	163c77b2e0	[AARCH64 folding] Do not fold any copy with NZCV There is no instruction to fold NZCV, so, just do not do it. Without the fix the added test case crashes with an assert "Mismatched register size in non subreg COPY" Reviewed By: danilaml Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D127294	2022-06-21 10:38:49 +07:00
Kazu Hirata	d66cbc565a	Don't use Optional::hasValue (NFC)	2022-06-20 20:26:05 -07:00
Kazu Hirata	0916d96d12	Don't use Optional::hasValue (NFC)	2022-06-20 20:17:57 -07:00
Kazu Hirata	064a08cd95	Don't use Optional::hasValue (NFC)	2022-06-20 20:05:16 -07:00
Chen Zheng	a71fe49bb5	[PowerPC] add a new pass to expand ctr loop pseudos This patch implements a new way to generate the CTR loops. Now the intrinsics inserted in hardware loop pass will be mapped to pseudo instructions and these pseudo instructions will be expanded to CTR loop or normal compare+branch loop in this post ISEL pass. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D122125	2022-06-20 22:57:24 -04:00
Craig Topper	16d3a82de5	[RISCV] Add merge operand to RISCVISD::VRGATHER_VL nodes. Use it in place of VSELECT_VL+VRGATHER_VL. This simplifies the isel patterns. Overall, I think trying to match select+op to create masked instructions in isel doesn't scale. We either need to do it in DAG combine, pre-isel peepole, or post-isel peephole. I don't yet know which is the right answer, but for this case it seemed best to be able to request the masked form directly from lowering. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D128023	2022-06-20 18:58:24 -07:00
chenglin.bi	6c951c5ee6	[SelectionDAG][DAGCombiner] Reuse exist node by reassociate When already have (op N0, N2), reassociate (op (op N0, N1), N2) to (op (op N0, N2), N1) to reuse the exist (op N0, N2) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122539	2022-06-21 09:45:19 +08:00
Luo, Yuanke	44e8a205f4	[fastregalloc] Enhance the heuristics for liveout in self loop. For below case, virtual register is defined twice in the self loop. We don't need to spill %0 after the third instruction `%0 = def (tied %0)`, because it is defined in the second instruction `%0 = def`. 1 bb.1 2 %0 = def 3 %0 = def (tied %0) 4 ... 5 jmp bb.1 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D125079	2022-06-21 09:18:49 +08:00
Phoebe Wang	edcc68e86f	[X86] Make sure SF is updated when optimizing for `jg/jge/jl/jle` This fixes issue #56103. Reviewed By: mingmingl Differential Revision: https://reviews.llvm.org/D128122	2022-06-21 09:09:27 +08:00
Ruiling Song	732eed40fd	[AMDGPU] Mark GFX11 dual source blend export as strict-wqm The instructions that generate the source of dual source blend export should run in strict-wqm. That is if any lane in a quad is active, we need to enable all four lanes of that quad to make the shuffling operation before exporting to dual source blend target work correctly. Differential Revision: https://reviews.llvm.org/D127981	2022-06-20 21:58:12 +01:00
Piotr Sobczak	29621c13ef	[AMDGPU] Tag GFX11 LDS loads as using strict_wqm LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad (if any pixel is enabled in the quad, data is written to all 4 pixels/threads in the quad). Tag LDS_PARAM_LOAD and LDS_DIRECT_LOAD as using strict_wqm to enforce this and avoid lane clobbering issues. Note that only the instruction itself is tagged. The implicit uses of these do not need to be set WQM. The reduces unnecessary WQM calculation of M0. Differential Revision: https://reviews.llvm.org/D127977	2022-06-20 21:58:12 +01:00
Jay Foad	13107c2770	[AMDGPU] Add support for GFX11 LDSDIR hazards Detect LDS direct WAR/WAW hazards and compute values for wait_vdst (va_vdst) parameter. Where appropriate this raises wait_vdst from the default 0 to allow concurrent issue of LDS direct with VALU execution. Also detect LDS direct versus VMEM source VGPR hazards and insert vm_vsrc=0 waits using s_waitcnt_depctr. Differential Revision: https://reviews.llvm.org/D127963	2022-06-20 21:58:12 +01:00
Philip Reames	0aebd1d875	[RISCV] Fix crash when costing scalable gather/scatter of pointer This was a bug introduced in d764aa. A pointer type is not a primitive type, and thus we were ending up dividing by zero when computing VLMax. Differential Revision: https://reviews.llvm.org/D128219	2022-06-20 12:50:42 -07:00
Florian Hahn	6dd772d348	[ConstraintElimination] Move logic to get a constraint to helper (NFC).	2022-06-20 21:34:07 +02:00
Nemanja Ivanovic	e09f6ff3c1	[PowerPC] Disable automatic generation of STXVP There are instances where using paired vector stores leads to significant performance degradation due to issues with store forwarding.To avoid falling into this trap with compiler - generated code, we will not emit these instructions unless the user requests them explicitly(with a builtin or by specifying the option). Reviewed By : lei, amyk, saghir Differential Revision: https://reviews.llvm.org/D127218	2022-06-20 14:30:29 -05:00
Kazu Hirata	ad7ce1e769	Don't use Optional::hasValue (NFC)	2022-06-20 11:49:10 -07:00
Kazu Hirata	5413bf1bac	Don't use Optional::hasValue (NFC)	2022-06-20 11:33:56 -07:00
David Green	c0ecbfa4fd	[AArch64] Known bits for AArch64ISD::DUP An AArch64ISD::DUP is just a splat, where the known bits for each lane are the same as the input. This teaches that to computeKnownBitsForTargetNode. Problems arise for constants though, as a constant BUILD_VECTOR can be lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then turn back into a constant BUILD_VECTOR leading to an infinite cycle. This has been prevented by adding a isTargetCanonicalConstantNode node to prevent the conversion back into a BUILD_VECTOR. Differential Revision: https://reviews.llvm.org/D128144	2022-06-20 19:11:57 +01:00
Simon Pilgrim	8254966062	[X86] LowerINSERT_VECTOR_ELT - always lower v32i8/v16i16 allones insertions on AVX1 as OR ops v32i8/v16i16 blend shuffles on AVX1 will expand to OR(AND,ANDN) patterns which can be easily broken by other combines	2022-06-20 18:43:03 +01:00
Philip Reames	db85345f2d	[BasicTTI] Allow generic handling of scalable vector fshr/fshl This change removes an explicit scalable vector bailout for fshl and fshr. This bailout was added in `60e4698b9a`, when sinking a unconditional bailout for all intrinsics into selected cases. Its not clear if the bailout was originally unneeded, or if our cost model infrastructure has simply matured in the meantime. Either way, the generic code appears to handle scalable vectors without issue. Note that the RISC-V cost model changes here aren't particularly interesting. They do probably better match the current lowering, but the main point is to have coverage of the BasicTTI path and simply show lack of crashing. AArch64 costing was changed to preserve legacy behavior. There will most likely be an upcoming change to use the generic costs there too, but I didn't want to make that change not being particularly familiar with the target. Differential Revision: https://reviews.llvm.org/D127680	2022-06-20 10:38:51 -07:00
Kazu Hirata	e0e687a615	[llvm] Don't use Optional::hasValue (NFC)	2022-06-20 10:38:12 -07:00
Arthur Eubanks	13ff7d6f39	Revert "[GlobalOpt] Perform store->dominated load forwarding for stored once globals" This reverts commit `6f348b146b`. Am seeing internal test failures plus a linux kernel breakage reported due to this.	2022-06-20 10:26:47 -07:00
Arthur Eubanks	1cd2c72bef	Revert "[GlobalOpt] Preserve CFG analyses" This reverts commit `cc65f3e167`. Causes crashes: https://github.com/llvm/llvm-project/issues/56131	2022-06-20 10:25:10 -07:00
Philip Reames	14847098f9	[RISCV] Delete unexercised VL=0 vsetvli compatibility logic The code being removed is technically correct; if we end up with two VL=0 instructions next to each other, we can avoid a state transition if the second is a scalar move. However, since both ops are also nops, we should simply delete them instead. As such, this compatibility rule simply complicates the code for no purpose.	2022-06-20 10:15:31 -07:00
David Candler	d3919a8cc5	[ConstantFolding] Respect denormal handling mode attributes when folding instructions Depending on the environment, a floating point instruction should treat denormal inputs as zero, and/or flush a denormal output to zero. Denormals are not currently accounted for when an instruction gets folded to a constant, which can lead to differences in output between a folded and a unfolded instruction when running on the target. The denormal handling mode can be set by the function level attribute denormal-fp-math, which this patch uses to determine whether any denormal inputs to or outputs from folding should be zero, and that the sign is set appropriately. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D116952	2022-06-20 16:41:46 +01:00
Guillaume Chatelet	d3cf49e984	[Alignment] Remove alignTo version taking a MaybeAlign	2022-06-20 15:15:53 +00:00
Guillaume Chatelet	589c8d6fb9	[NFC] Simplify alignment code in MemorySanitizer	2022-06-20 15:15:53 +00:00
Guillaume Chatelet	7296811910	[NFC] Simplify alignment code in CoroFrame	2022-06-20 15:15:52 +00:00
Guillaume Chatelet	d154d0ac06	[NFC] Simplify code	2022-06-20 15:15:52 +00:00
Florian Hahn	cebe7ae881	[ConstraintElimination] Move logic to add constraint to helper (NFC).	2022-06-20 17:08:35 +02:00

1 2 3 4 5 ...

159242 Commits