llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Moll	1db4dbba24	Recommit "[VP,Integer,#2] ExpandVectorPredication pass" This reverts the revert `02c5ba8679` Fix: Pass was registered as DUMMY_FUNCTION_PASS causing the newpm-pass functions to be doubly defined. Triggered in -DLLVM_ENABLE_MODULE=1 builds. Original commit: This patch implements expansion of llvm.vp.* intrinsics (https://llvm.org/docs/LangRef.html#vector-predication-intrinsics). VP expansion is required for targets that do not implement VP code generation. Since expansion is controllable with TTI, targets can switch on the VP intrinsics they do support in their backend offering a smooth transition strategy for VP code generation (VE, RISC-V V, ARM SVE, AVX512, ..). Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D78203	2021-05-04 11:47:52 +02:00
Tomas Matheson	9d86095ff8	Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0" This reverts commit `753185031d`.	2021-05-03 21:48:20 +01:00
Tomas Matheson	753185031d	[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0 atomicrmw instructions are expanded by AtomicExpandPass before register allocation into cmpxchg loops. Register allocation can insert spills between the exclusive loads and stores, which invalidates the exclusive monitor and can lead to infinite loops. To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them after register allocation. Floating point legalisation: f16 ATOMIC_LOAD_FADD(f16, f16) is legalised to f32 ATOMIC_LOAD_FADD(i16, f32) and then eventually f32 ATOMIC_LOAD_FADD_16(*i16, f32) Differential Revision: https://reviews.llvm.org/D101164 Originally submitted as `3338290c18`. Reverted in `c7df6b1223`.	2021-05-03 20:25:15 +01:00
Adrian Prantl	02c5ba8679	Revert "[VP,Integer,#2] ExpandVectorPredication pass" This reverts commit `43bc584dc0`. The commit broke the -DLLVM_ENABLE_MODULES=1 builds. http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/31603/consoleFull#2136199809a1ca8a51-895e-46c6-af87-ce24fa4cd561	2021-04-30 17:02:28 -07:00
Tomas Matheson	c7df6b1223	Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0" This reverts commit `3338290c18`. Broke expensive checks on debian.	2021-04-30 16:53:14 +01:00
Tomas Matheson	3338290c18	[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0 atomicrmw instructions are expanded by AtomicExpandPass before register allocation into cmpxchg loops. Register allocation can insert spills between the exclusive loads and stores, which invalidates the exclusive monitor and can lead to infinite loops. To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them after register allocation. Floating point legalisation: f16 ATOMIC_LOAD_FADD(f16, f16) is legalised to f32 ATOMIC_LOAD_FADD(i16, f32) and then eventually f32 ATOMIC_LOAD_FADD_16(*i16, f32) Differential Revision: https://reviews.llvm.org/D101164	2021-04-30 16:40:33 +01:00
Simon Moll	43bc584dc0	[VP,Integer,#2] ExpandVectorPredication pass This patch implements expansion of llvm.vp.* intrinsics (https://llvm.org/docs/LangRef.html#vector-predication-intrinsics). VP expansion is required for targets that do not implement VP code generation. Since expansion is controllable with TTI, targets can switch on the VP intrinsics they do support in their backend offering a smooth transition strategy for VP code generation (VE, RISC-V V, ARM SVE, AVX512, ..). Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D78203	2021-04-30 15:47:28 +02:00
David Green	94c7bd7eb2	[ARM] Expand VMOVRRD simplification pattern This expands the VMOVRRD(extract(..(build_vector(a, b, c, d)))) pattern, to also handle insert_vectors. Providing we can find the correct insert, this helps further simplify patterns by removing the redundant VMOVRRD. Differential Revision: https://reviews.llvm.org/D100245	2021-04-26 12:27:38 +01:00
Dávid Bolvanský	ef2dc7ed9f	[Analysis] Attribute alignment should not prevent tail call optimization Fixes tail folding issue mentioned in D100879. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D101230	2021-04-24 19:57:42 +02:00
David Green	21a8b9d9e9	[ARM] Limit PerformExtractEltToVMOVRRD to when f64 is legal. The generic SoftFloatVectorExtract.ll test was failing when run on arm machines, as it tries to create a f64 under soft float. Limit the transform to when f64 is legal. Also add a missing override, as reported in D100244.	2021-04-20 16:24:36 +01:00
David Green	48cef1fa8e	[ARM] Create VMOVRRD from adjacent vector extracts This adds a combine for extract(x, n); extract(x, n+1) -> VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the same time in a single instruction, and thanks to the other VMOVRRD folds we have added recently can help reduce the amount of executed instructions. Floating point types are very similar, but will include a bitcast to an integer type. This also adds a shouldRewriteCopySrc, to prevent copy propagation from DPR to SPR, which can break as not all DPR regs can be extracted from directly. Otherwise the machine verifier is unhappy. Differential Revision: https://reviews.llvm.org/D100244	2021-04-20 15:15:43 +01:00
David Green	806b47ade3	[ARM] Regenerate a couple of tests. NFC	2021-04-20 10:54:41 +01:00
David Penry	ca8eef7e3d	[CodeGen] Use ProcResGroup information in SchedBoundary When the ProcResGroup has BufferSize=0, 1. if there is a subunit in the list of write resources for the scheduling class, do not attempt to schedule the ProcResGroup. 2. if there is not a subunit in the list of write resources for the scheduling class, choose a subunit to use instead of the ProcResGroup. 3. having both the ProcResGroup and any of its subunits in the resources implied by a InstRW is not supported. Used to model parallel uses from a pool of resources. Differential Revision: https://reviews.llvm.org/D98976	2021-04-19 21:27:45 +01:00
David Penry	78a871abf7	[ARM] Use ProcResGroup in Cortex-M7 scheduling model Used to model structural hazards on FP issue, where some instructions take up 2 issue slots and others one as well as similar structural hazards on load issue, where some instructions take up two load lanes and others one. Differential Revision: https://reviews.llvm.org/D98977	2021-04-19 21:23:05 +01:00
Tim Northover	5e3d9fcc3a	StackProtector: ensure protection does not interfere with tail call frame. The IR stack protector pass must insert stack checks before the call instead of between it and the return. Similarly, SDAG one should recognize that ADJCALLFRAME instructions could be part of the terminal sequence of a tail call. In this case because such call frames cannot be nested in LLVM the stack protection code must skip over the whole sequence (or risk clobbering argument registers).	2021-04-13 15:14:57 +01:00
Arthur Eubanks	c88b87f9ce	Revert "Remove "Rewrite Symbols" from codegen pipeline" This reverts commit `6210261ecb`. addr-label.ll crashes on armv7.	2021-04-10 23:28:16 -07:00
Arthur Eubanks	6210261ecb	Remove "Rewrite Symbols" from codegen pipeline It breaks up the function pass manager in the codegen pipeline. With empty parameters, it looks at the -mllvm flag -rewrite-map-file. This is likely not in use. Add a check that we only have one function pass manager in the codegen pipeline. This required reverting commit 9583a3f2625818b78c0cf6d473cdedb9f23ad82c: "[AsmPrinter] Delete dead takeDeletedSymbsForFunction()". This was not NFC as initially thought. By coalescing two function psas managers, this exposed the reverted code as necessary. addr-label.ll was crashing due to an emitted blockaddress's block being removed but the label not emitted. Some tests relied on the fact that we had a module pass somewhere in the codegen pipeline. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99707	2021-04-10 22:38:44 -07:00
Simonas Kazlauskas	777a58e05b	Support {S,U}REMEqFold before legalization This allows these optimisations to apply to e.g. `urem i16` directly before `urem` is promoted to i32 on architectures where i16 operations are not intrinsically legal (such as on Aarch64). The legalization then later can happen more directly and generated code gets a chance to avoid wasting time on computing results in types wider than necessary, in the end. Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88785	2021-04-01 01:35:41 +03:00
David Green	7b6f760fcd	[ARM] MVE vector lane interleaving MVE does not have a single sext/zext or trunc instruction that takes the bottom half of a vector and extends to a full width, like NEON has with MOVL. Instead it is expected that this happens through top/bottom instructions. So the MVE equivalent VMOVLT/B instructions take either the even or odd elements of the input and extend them to the larger type, producing a vector with half the number of elements each of double the bitwidth. As there is no simple instruction for a normal extend, we often have to expand sext/zext/trunc into a series of lane moves (or stack loads/stores, which we do not do yet). This pass takes vector code that starts at truncs, looks for interconnected blobs of operations that end with sext/zext and transforms them by adding shuffles so that the lanes are interleaved and the MVE VMOVL/VMOVN instructions can be used. This is done pre-ISel so that it can work across basic blocks. This initial version of the pass just handles a limited set of instructions, not handling constants or splats or FP, which can all come as extensions to this base. Differential Revision: https://reviews.llvm.org/D95804	2021-03-28 19:34:58 +01:00
Simon Pilgrim	9d2df96407	[DAG] computeKnownBits - add ISD::MULHS/MULHU/SMUL_LOHI/UMUL_LOHI handling Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops. Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement. Differential Revision: https://reviews.llvm.org/D98857	2021-03-19 16:02:31 +00:00
Simon Pilgrim	d9b5338cfb	[ARM] Regenerate select-imm.ll tests	2021-03-18 11:07:16 +00:00
David Green	402f2cae7d	[ARM] Use lrdsb for more thumb1 loads. Given a sextload i16, we can usually generate "ldrsh [rn. rm]". If we don't naturally have a rn, rm addressing mode, we can either generate "ldrh [rn, #0]; sxth" or "mov rm, #0; ldrsh [rn. rm]". We currently generate the first, always creating a sxth. They are both the same number of instructions, but if we generate the second then the mov #0 will likely be CSE'd or pulled out of a loop, etc. This adjusts the ISel patterns to do that, creating a mov instead of a sxth. Differential Revision: https://reviews.llvm.org/D98693	2021-03-17 15:29:02 +00:00
Simonas Kazlauskas	a2eca31da2	Test cases for rem-seteq fold with illegal types This also briefly tests a larger set of architectures than the more exhaustive functionality tests for AArch64 and x86. As requested in D88785 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98339	2021-03-12 16:28:04 +02:00
Daniel Sanders	134a179dee	[mir] Change 'undef' for MMO base addresses to 'unknown-address' Differential Revision: https://reviews.llvm.org/D98100	2021-03-10 16:46:44 -08:00
Craig Topper	0eb405c3b8	[SelectionDAG] Add computeKnownBits support for ISD::USUBSAT. The result of ISD::USUBSAT will never be larger than the LHS. We can use this to put a bound on the number of leading zeros. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98133	2021-03-07 09:48:42 -08:00
Daniel Sanders	9fc2be6f28	[mir] Fix confusing MIR when MMO's value is nullptr but offset is non-zero :: (store 1 + 4, addrspace 1) -> :: (store 1 into undef + 4, addrspace 1) An offset without a base isn't terribly useful but it's convenient to update the offset without checking the value. For example, when breaking apart stores into smaller units Differential Revision: https://reviews.llvm.org/D97812	2021-03-04 10:34:30 -08:00
Arthur Eubanks	040c1b49d7	Move EntryExitInstrumentation pass location This seems to be more of a Clang thing rather than a generic LLVM thing, so this moves it out of LLVM pipelines and as Clang extension hooks into LLVM pipelines. Move the post-inline EEInstrumentation out of the backend pipeline and into a late pass, similar to other sanitizer passes. It doesn't fit into the codegen pipeline. Also fix up EntryExitInstrumentation not running at -O0 under the new PM. PR49143 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D97608	2021-03-01 10:08:10 -08:00
Craig Topper	fe50be12c8	[LegalizeIntegerTypes] Further improve ExpandIntRes_SADDSUBO for targets where SADDO/SSUBO aren't supported. Rather than converting 3 signbits to bools and comparing them, we can do bitwise logic on the whole vector and convert the resulting sign bit to a bool at the end. This is still a different algorithm than what we do in LegalizeDAG through expandSADDOSSUBO. That algorithm needs to know that the RHS of SSUBO is > 0, but that's costly when the type is split. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97325	2021-02-24 10:05:38 -08:00
David Green	03892a27d6	[ARM] Expand the range of allowed post-incs in load/store optimizer Currently the load/store optimizer will only fold in increments of the same size as the load/store. This patch expands that to any legal immediate for the post-inc instruction. This is a recommit of `3b34b06fc5` with correctness fixes and extra tests. Differential Revision: https://reviews.llvm.org/D95885	2021-02-24 08:46:15 +00:00
David Green	8fa2bbaed9	[ARM] Mir test for pre/postinc ldstopt combines. NFC	2021-02-23 22:27:06 +00:00
Craig Topper	eb165090bb	[LegalizeIntegerTypes] Improve ExpandIntRes_SADDSUBO codegen on targets without SADDO/SSUBO. This code creates 3 setccs that need to be expanded. It was creating a sign bit test as setge X, 0 which is non-canonical. Canonical would be setgt X, -1. This misses the special case in IntegerExpandSetCCOperands for sign bit tests that assumes canonical form. If we don't hit this special case we end up with a multipart setcc instead of just checking the sign of the high part. To fix this I've reversed the polarity of all of the setccs to setlt X, 0 which is canonical. The rest of the logic should still work. This seems to produce better code on RISCV which lacks a setgt instruction. This probably still isn't the best code sequence we could use here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97181	2021-02-23 09:40:32 -08:00
Sjoerd Meijer	e1c3bf6afe	[ARM] do not consider sp as deprecated for ldm/stm Early versions of the ARMv7 reference manuals considered the sp register as a deprecated register for ldm/stm familiy of instructions. However, later versions such as ARM DDI 0406C.d added a note to the Appendix: D9.3 Use of the SP as a general-purpose register Most ARM instructions, unlike Thumb instructions, provide exactly the same access to the SP as to R0-R12. This means that it is possible to use the SP as a general-purpose register. Earlier issues of this manual deprecated the use of SP in an ARM instruction, in any way that is deprecated, not permitted, or not possible in the corresponding Thumb instruction. However, user feedback indicates a number of cases where these instructions are useful. Therefore, ARM no longer deprecates these instruction uses. Also Armv8 manuals no longer consider SP as deprecated register for ldm/ stm A32 instructions. Furthermore, GNU as also does not print a deprecated warning when using SP with those instructions. Drop deprecation warning for pop/ldm/push/stm instructions. Patch by: Stefan Agner. Differential Revision: https://reviews.llvm.org/D82692	2021-02-23 13:26:18 +00:00
David Green	ebb6583e02	[ARM] Add pre/post inc tests of various sizes. NFC	2021-02-23 10:53:22 +00:00
David Green	7a5c26e99a	Revert "[ARM] Expand the range of allowed post-incs in load/store optimizer" This reverts commit `3b34b06fc5` as runtime errors were reported.	2021-02-19 13:15:10 +00:00
David Green	3b34b06fc5	[ARM] Expand the range of allowed post-incs in load/store optimizer Currently the load/store optimizer will only fold in increments of the same size as the load/store. This patch expands that to any legal immediate for the post-inc instruction. Differential Revision: https://reviews.llvm.org/D95885	2021-02-18 14:59:02 +00:00
David Green	1a6744e3dc	[ARM] Add larger than legal ICmp costs A v8i32 compare will produce a v8i1 predicate, but during codegen the v8i32 will be split into two v4i32, potentially requiring two v4i1 predicates to be merged into a single v8i1. Because this merging of two v4i1's into a v8i1 is very expensive, we need to make the cost of the compare equally high. This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost. Because we don't know whether the user of the predicate can be split, and the cost model is mostly pre-instruction, we may be pessimistic but that should only be for larger and legal types. This also adds min/max detection to the costmodel where it can be detected, to keep those in line with the cost of simple min/max instructions. Otherwise for the most part, costs that were already expensive have become more expensive. Differential Revision: https://reviews.llvm.org/D96692	2021-02-18 11:42:17 +00:00
David Green	1fbb3287fc	[ARM] MVE ICmp costing tests. NFC	2021-02-18 10:50:34 +00:00
Sjoerd Meijer	f78aa8b2c2	[LSR] Add a flag that overrides the target's preferred addressing mode This adds a new flag -lsr-preferred-addressing-mode to override the target's preferred addressing mode. It replaces flag -lsr-backedge-indexing, which is equivalent to preindexed addressing that is one of the options that -lsr-preferred-addressing-mode accepts. Differential Revision: https://reviews.llvm.org/D96855	2021-02-17 16:50:21 +00:00
David Green	a838a4f69f	[ARM] Extend search for increment in load/store optimizer Currently the findIncDecAfter will only look at the next instruction for post-inc candidates in the load/store optimizer. This extends that to a search through the current BB, until an instruction that modifies or uses the increment reg is found. This allows more post-inc load/stores and ldm/stm's to be created, especially in cases where a schedule might move instructions further apart. We make sure not to look any further for an SP, as that might invalidate stack slots that are still in use. Differential Revision: https://reviews.llvm.org/D95881	2021-02-15 13:17:21 +00:00
Simon Pilgrim	60ba5397df	[DAG] PromoteIntRes_ADDSUBSHLSAT - use promoted ISD::USUBSAT directly As discussed on D96413, as long as the promoted bits of the args are zero we can use the basic ISD::USUBSAT pattern directly, without the shifting like we do for other ops. I think something similar should be possible for ISD::UADDSAT as well, which I'll look at later. Also, create a ISD::USUBSAT node directly - this will be expanded back by the legalizer later on if necessary. Differential Revision: https://reviews.llvm.org/D96622	2021-02-13 12:35:10 +00:00
Serge Pavlov	816053bc71	[FPEnv][ARM] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96501	2021-02-13 11:16:29 +07:00
Simon Pilgrim	4841a225b7	[DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets. This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization. We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation. The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987. Differential Revision: https://reviews.llvm.org/D96413	2021-02-12 18:22:57 +00:00
Lukas Sommer	6577cef9b0	[CodeGen] New pass: Replace vector intrinsics with call to vector library This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM intrinsics operating on vector operands) with calls to a vector library. Currently, calls to LLVM intrinsics are only replaced with calls to vector libraries when scalar calls to intrinsics are vectorized by the Loop- or SLP-Vectorizer. With this pass, it is now possible to replace calls to LLVM intrinsics already operating on vector operands, e.g., if such code was generated by MLIR. For the replacement, information from the TargetLibraryInfo, e.g., as specified via -vector-library is used. This is a re-try of the original commit `2303e93e66` that was reverted due to pass manager problems. Other minor changes have also been made. Differential Revision: https://reviews.llvm.org/D95373	2021-02-12 12:53:27 -05:00
Sanjay Patel	c981f6f8e1	Revert "[Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library" This reverts commit `2303e93e66`. Investigating bot failures.	2021-02-05 15:10:11 -05:00
Lukas Sommer	2303e93e66	[Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM intrinsics operating on vector operands) with calls to a vector library. Currently, calls to LLVM intrinsics are only replaced with calls to vector libraries when scalar calls to intrinsics are vectorized by the Loop- or SLP-Vectorizer. With this pass, it is now possible to replace calls to LLVM intrinsics already operating on vector operands, e.g., if such code was generated by MLIR. For the replacement, information from the TargetLibraryInfo, e.g., as specified via -vector-library is used. Differential Revision: https://reviews.llvm.org/D95373	2021-02-05 14:25:19 -05:00
Ayke van Laethem	aecdf15cc7	[ARM] Do not emit ldrexd/strexd on Cortex-M chips The ldrexd/strexd instructions are not supported on M-class chips, see for example https://developer.arm.com/documentation/dui0489/e/arm-and-thumb-instructions/memory-access-instructions/ldrex-and-strex which says: > All these 32-bit Thumb instructions are available in ARMv6T2 and > above, except that LDREXD and STREXD are not available in the ARMv7-M > architecture. Looking at the ARMv8-M architecture, it appears that these instructions aren't supported either. The Architecture Reference Manual lists ldrex/strex but not ldrexd/strexd: https://developer.arm.com/documentation/ddi0553/bn/ Godbolt example on LLVM 11.0.0, which incorrectly emits ldrexd/strexd instructions: https://llvm.godbolt.org/z/5qqPnE Differential Revision: https://reviews.llvm.org/D95891	2021-02-04 21:55:34 +01:00
David Green	649a3d00df	[ARM] Handle f16 in GeneratePerfectShuffle This new f16 shuffle under Neon would hit an assert in GeneratePerfectShuffle as it would try to treat a f16 vector as an i8. Add f16 handling, treating them like an i16. Differential Revision: https://reviews.llvm.org/D95446	2021-02-04 11:14:52 +00:00
David Green	5805521207	[ARM] Simplify VMOVRRD from extracts of buildvectors Under the softfp calling convention, we are often left with VMOVRRD(extract(bitcast(build_vector(a, b, c, d)))) for the return value of the function. These can be simplified to a,b or c,d directly, depending on the value of the extract. Big endian is a little different because the bitcast switches the lanes around, meaning we end up with b,a or d,c. Differential Revision: https://reviews.llvm.org/D94989	2021-02-01 16:09:25 +00:00
David Green	ad12e6ee95	[ARM] Turn sext_inreg(VGetLaneu) into VGetLaneu This adds a DAG combine for converting sext_inreg of VGetLaneu into VGetLanes, providing the types match correctly. Differential Revision: https://reviews.llvm.org/D95073	2021-02-01 11:10:35 +00:00
Roman Lebedev	ddc4b56eef	[ExpandMemCmpPass] Preserve Dominator Tree, if available This finishes getting rid of all the avoidable Dominator Tree recalculations in X86 optimized codegen pipeline.	2021-01-30 01:14:51 +03:00

1 2 3 4 5 ...

4301 Commits