llvm-project

Commit Graph

Author	SHA1	Message	Date
Sjoerd Meijer	630d37dc1b	[AArch64] Enable Cortex-A55 schedmodel The model was committed in `4b8ade837e` but not yet enabled to allow for a few fix ups. This adds a few of these fixes, and also a LLVM MCA test to check most instructions. While I do have plans to look into some more tuning, it's time to enable this as it better than using the A53 schedule. Differential Revision: https://reviews.llvm.org/D88017	2020-11-30 19:28:34 +00:00
Harald van Dijk	cdac34bd47	[X86] Zero-extend pointers to i64 for x86_64 For LP64 mode, this has no effect as pointers are already 64 bits. For ILP32 mode (x32), this extension is specified by the ABI. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D91338	2020-11-30 18:51:23 +00:00
Simon Pilgrim	e425d0b92a	[InstCombine][X86] Add basic addsub intrinsic SimplifyDemandedVectorElts support (PR46277) Pass through the demanded elts mask to the source operands. The next step will be to add support for folding to add/sub if we only demand odd/even elements.	2020-11-30 18:40:16 +00:00
Fangrui Song	7c4555f60d	[PowerPC] Delete remnant Darwin code in PPCAsmParser Continue the work started at D50989. The code has been long dead since the triple has been removed (D75494). Reviewed By: nickdesaulniers, void Differential Revision: https://reviews.llvm.org/D91836	2020-11-30 10:16:19 -08:00
Kazushi (Jam) Marukawa	3d872cbc2f	[VE][NFC] Update comments Update comments. I forgot to update it previously when I modified code.	2020-12-01 02:56:16 +09:00
Kazushi (Jam) Marukawa	6834b3d6d5	[VE] Optimize prologue/epilogue instructions about GOT Optimize prologue/epilogue instructions if a given function use GOT but do not call other functions by eliminating FP. Previously, we had wrong implementations taken from other architectures. Update regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92313	2020-12-01 02:22:31 +09:00
Kazushi (Jam) Marukawa	6fe610535f	[VE] Clean check routines of branch types Previously, these check routines accepted non-generatble instructions. This time, I clean them and add assert for those non-generatable instructions. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92254	2020-12-01 02:19:37 +09:00
Craig Topper	bfc4f29f46	[RISCV] Combine (GORCI (GORCI x, C2), C1) -> (GORCI x, C1\|C2). Unlike GREVI, GORCI stages can't be undone, but they are redundant if done more than once. Differential Revision: https://reviews.llvm.org/D92295	2020-11-30 08:42:46 -08:00
Craig Topper	76d1026b59	[RISCV] Custom legalize bswap/bitreverse to GREVI with Zbp extension to enable them to combine with other GREVI instructions This enables bswap/bitreverse to combine with other GREVI patterns or each other without needing to add more special cases to the DAG combine or new DAG combines. I've also enabled the existing GREVI combine for GREVIW so that it can pick up the i32 bswap/bitreverse on RV64 after they've been type legalized to GREVIW. Differential Revision: https://reviews.llvm.org/D92253	2020-11-30 08:30:40 -08:00
Fangrui Song	25c8fbb3d9	[X86] Don't emit R_X86_64_[REX_]GOTPCRELX for a GOT load with an offset clang may produce `movl x@GOTPCREL+4(%rip), %eax` when loading the high 32 bits of the address of a global variable in -fpic/-fpie mode. If assembled by GNU as, the fixup emits R_X86_64_GOTPCRELX with an addend != -4. The instruction loads from the GOT entry with an offset and thus it is incorrect to relax the instruction. This patch does not emit a relaxable relocation for a GOT load with an offset because R_X86_64_[REX_]GOTPCRELX do not make sense for instructions which cannot be relaxed. The result is good enough for LLD to work. GNU ld relaxes mov+GOTPCREL as well, but it suppresses the relaxation if addend != -4. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D92114	2020-11-30 08:27:31 -08:00
Craig Topper	cbbd7021f1	[RISCV] Only combine (or (GREVI x, shamt), x) -> GORCI if shamt is a power of 2. GORCI performs an OR between each stage. So we need to ensure only one stage is active before doing this combine. Initial attempts at finding a test case for this failed due to the order things get combined. It's most likely that we'll form one stage of GREVI then combine to GORCI before the two stages of GREVI are able to be formed and combined with each other to form a multi stage GREVI. Differential Revision: https://reviews.llvm.org/D92289	2020-11-30 08:10:39 -08:00
Kazushi (Jam) Marukawa	686988a50f	[VE] Optimize prologue/epilogue instructions Optimize eliminate FP mechanism. This time optimize a function which has no call but fixed stack objects. LLVM eliminates FP on such functions now. Also, optimize GOT/PLT registers save/restore instructions if a given function doesn't uses them. In addition, remove generating mechanism of `.cfi` instructions since those are taken from other architectures and not inspected yet. Update regression tests, also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92251	2020-11-30 22:22:33 +09:00
Kazushi (Jam) Marukawa	44a679eaa4	[VE] Change the behaviour of truncate Change the way to truncate i64 to i32 in I64 registers. VE assumed sext values previously. Change it to zext values this time to make it match to the LLVM behaviour. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92226	2020-11-30 22:12:45 +09:00
Kazushi (Jam) Marukawa	33eac0f283	[VE] Specify vector alignments Specify alignments for all vector types. Update a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92256	2020-11-30 22:09:21 +09:00
Sjoerd Meijer	5110ff0817	[AArch64][CostModel] Fix cost for mul <2 x i64> This was modeled to have a cost of 1, but since we do not have a MUL.2d this is scalarized into vector inserts/extracts and scalar muls. Motivating precommitted test is test/Transforms/SLPVectorizer/AArch64/mul.ll, which we don't want to SLP vectorize. Test Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll unfortunately needed changing, but the reason is documented in LoopVectorize.cpp:6855: // The cost of executing VF copies of the scalar instruction. This opcode // is unknown. Assume that it is the same as 'mul'. which I will address next as a follow up of this. Differential Revision: https://reviews.llvm.org/D92208	2020-11-30 11:36:55 +00:00
Simon Pilgrim	83d79ca5bf	[X86][AVX512] Only lower to VPALIGNR if we have BWI (PR48322)	2020-11-30 10:51:24 +00:00
Evgeny Leviant	112b3cb6ba	[TableGen][SchedModels] Fix read/write variant substitution Patch fixes multiple issues related to expansion of variant sched reads and writes. Differential revision: https://reviews.llvm.org/D90844	2020-11-30 11:55:55 +03:00
Fangrui Song	e6db1416ae	[RISCV] Remove unused Addend parameter from classifySymbolRef. NFC It is confusing as well since in the case of A - B + Cst, the returned Addend is not Cst.	2020-11-29 19:17:59 -08:00
Craig Topper	84aad9b5da	[RISCV] Change predicate on InstAliases for GORCI/GREVI/SHFLI/UNSHFLI to HasStdExtZbp instead of HasStdExtZbbOrZbp. This matches the predicate on the instructions. Though I think some specific encodings are valid in Zbb, but not all of them.	2020-11-29 11:23:23 -08:00
Harald van Dijk	47e2fafbf3	[X86] Do not allow FixupSetCC to relax constraints The build bots caught two additional pre-existing problems exposed by the test change part of my change https://reviews.llvm.org/D91339, when expensive checks are enabled. https://reviews.llvm.org/D91924 fixes one of them, this fixes the other. FixupSetCC will change code in the form of %setcc = SETCCr ... %ext1 = MOVZX32rr8 %setcc to %zero = MOV32r0 %setcc = SETCCr ... %ext2 = INSERT_SUBREG %zero, %setcc, %subreg.sub_8bit and replace uses of %ext1 with %ext2. The register class for %ext2 did not take into account any constraints on %ext1, which may have been required by its uses. This change ensures that the original constraints are honoured, by instead of creating a new %ext2 register, reusing %ext1 and further constraining it as needed. This requires a slight reorganisation to account for the fact that it is possible for the constraining to fail, in which case no changes should be made. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D91933	2020-11-28 17:46:56 +00:00
Harald van Dijk	47c902ba84	[X86] Have indirect calls take 64-bit operands in 64-bit modes The build bots caught two additional pre-existing problems exposed by the test change part of my change https://reviews.llvm.org/D91339, when expensive checks are enabled. This fixes one of them. X86 has CALL64r and CALL32r opcodes, where CALL64r takes a 64-bit register, and CALL32r takes a 32-bit register. CALL64r can only be used in 64-bit mode, CALL32r can only be used in 32-bit mode. LLVM would assume that after picking the appropriate CALLr opcode, a pointer-sized register would be a valid operand, but in x32 mode, a 64-bit mode, pointers are 32 bits. In this mode, it is invalid to directly pass a pointer to CALL64r, it needs to be extended to 64 bits first. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D91924	2020-11-28 16:46:30 +00:00
Craig Topper	6ee22ca6ce	[RISCV] Add tests for existing (rotr (bswap X), (i32 16))->grevi pattern for RV32. Extend same pattern to rotl and GREVIW. Not sure why bswap was treated specially. This also applies to bitreverse or generic grevi. We can improve this in future patches. For now I just wanted to get the consistency and the test coverage as I plan to make some other changes around bswap.	2020-11-27 18:09:01 -08:00
Kazushi (Jam) Marukawa	3bd78b7cc0	[VE] Optimize emitSPAdjustment function Optimize emitSPAdjustment function to generate as small as possible instructions to adjust SP. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92174	2020-11-28 08:06:31 +09:00
Craig Topper	8709d9d872	[RISCV] Replace getSimpleValueType() with getValueType() in DAG combines to prevent asserts with weird types.	2020-11-27 12:49:12 -08:00
Craig Topper	f325b4bbce	[RISCV] Replace sexti32/zexti32 in isel patterns where only one part of their PatFrags can match. NFCI We had an zexti32 after a sign_extend_inreg. The AND X, 0xffffffff part of the zexti32 should never occur since SimplifyDemandedBits from the sign_extend_inreg would have removed it. We also had sexti32 as the root node of a pattern, but SelectionDAGISel matches assertsext early before the tablegen based patterns are evaluated.	2020-11-27 11:37:25 -08:00
Krzysztof Parzyszek	b7bde0e4f3	[Hexagon] Improve check for HVX types Allow non-simple types, like <17 x i32> to be treated as HVX vector types.	2020-11-27 13:33:10 -06:00
Simon Pilgrim	c4628460b7	[Hexagon] Add HVX support for ISD::SMAX/SMIN/UMAX/UMIN instead of custom dag patterns Followup to D92112 now that I've learnt about HVX type splitting. This is some necessary cleanup work for min/max ops to eventually help us move the add/sub sat patterns into DAGCombine - D91876. Differential Revision: https://reviews.llvm.org/D92169	2020-11-27 15:46:11 +00:00
Simon Pilgrim	969918e177	[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion. Allows us to move the x86-specific lowering to the generic expansion code. Differential Revision: https://reviews.llvm.org/D92183	2020-11-27 11:18:58 +00:00
Craig Topper	e0481048ab	[RISCV] Don't remove (and X, 0xffffffff) from inputs when matching RISCVISD::DIVUW/REMUW to 64-bit DIVU/REMU. These patterns are using zexti32 which matches either assertzexti32 or (and X, 0xffffffff). But if we match (and X, 0xffffffff) it will remove the AND and the inputs may no longer have the zero bits needed to guarantee the result has enough zeros. This commit changes the patterns to only match assertzexti32. I'm not sure how to test the broken case since the DIVUW/REMUW nodes are created during type legalization, but type legalization won't create an (and X, 0xfffffffff) directly on the inputs. I've also changed the zexti32 on the root of the pattern to just checking for AND. We were previously also matching assertzexti32, but I doubt that pattern would ever occur.	2020-11-26 23:15:41 -08:00
QingShan Zhang	4d83aba422	[DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will have the impact the precision. As the fsqrt added belong to the cold path of the cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve the precision if the input is denormal. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D80974	2020-11-27 02:10:55 +00:00
Craig Topper	5836e52063	[RISCV] Add isel patterns to use SBSET for (1 << X) by using X0 as the input.	2020-11-26 15:35:13 -08:00
Arthur Eubanks	92a67e131f	[BPF][NewPM] Port bpf-adjust-opt to NPM and add it to pipeline Reviewed By: yonghong-song Differential Revision: https://reviews.llvm.org/D91990	2020-11-26 10:11:26 -08:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
Zarko Todorovski	6d648e69c0	[AIX] Add support for non var_arg extended vector ABI calling convention on AIX This patch enables passing non variadic vector type parameters on the caller and callee side and vector return on AIX that are passed in vector registers only. So far, support is enabled for only the AIX extended Altivec ABI Calling convention. Reviewed By: sfertile, DiggerLin Differential Revision: https://reviews.llvm.org/D86476	2020-11-26 12:03:51 -05:00
David Green	0e49a40d75	[ARM] Cleanup for the MVETailPrediction pass This strips out a lot of the code that should no longer be needed from the MVETailPredictionPass, leaving the important part - find active lane mask instructions and convert them to VCTP operations. Differential Revision: https://reviews.llvm.org/D91866	2020-11-26 15:10:44 +00:00
Simon Pilgrim	8057ebf4a0	Revert rG12d59b696b330 "[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal" This reverts commit `12d59b696b`. Prematurely pushed this to trunk	2020-11-26 15:07:45 +00:00
Simon Pilgrim	12d59b696b	[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion. Allows us to move the x86-specific lowering to the generic expansion code.	2020-11-26 14:47:28 +00:00
Craig Topper	d9500c2e23	[RISCV] Add isel patterns for sbsetw/sbclrw/sbinvw with sext_inreg as the root. This handles cases were the input isn't known to be sign extended.	2020-11-26 02:03:06 -08:00
Jay Foad	4f87d30a06	[AMDGPU] Introduce and use isGFX10Plus. NFC. It's more future-proof to use isGFX10Plus from the start, on the assumption that future architectures will be based on current architectures. Also make use of the existing isGFX9Plus in a few places. Differential Revision: https://reviews.llvm.org/D92092	2020-11-26 09:02:36 +00:00
Craig Topper	2254e014a9	[RISCV] Add isel pattern to match (i64 (sra (shl X, 32), C)) to SRAIW if C > 32.	2020-11-25 21:57:48 -08:00
Craig Topper	f78ad68b6d	[RISCV] Remove unused PatFrag argument from the tablegen class used for c.beqz/c.bnez. NFC	2020-11-25 20:35:23 -08:00
Craig Topper	ed95cafbc5	[RISCV] Add an implementation of isFMAFasterThanFMulAndFAdd Start with an assumption that FMA is faster than Fmul+FAdd. If thats not true on some particular implementation we can add a tuning parameter in the future. I've update the fmuladd test cases and added new test cases for fast math flag based contraction. Differential Revision: https://reviews.llvm.org/D91987	2020-11-25 15:07:34 -08:00
Craig Topper	751b0d970e	[RISCV] Make SMIN/SMAX/UMIN/UMAX legal with Zbb extension. This is the logically correct thing to do. But it generates worse code for i32 umin/umax on the rv64 due to type legalize requesting zext even though the arguments are sext. Maybe we can teach type legalizer to use sext for umin/umax for RISCV. It's also producing possibly worse code on i64 on RV32 since we still end up with selects that become branches. But this seems like something we could improve in type legalization or DAG combine. Hopefully this makes D92095 work for RISCV with Zbb.	2020-11-25 12:48:43 -08:00
Simon Pilgrim	a015635629	[Hexagon] Add support for ISD::SMAX/SMIN/UMAX/UMIN instead of custom dag patterns This should handle the basic integer min/max handling - the HVX ops are still TODO. This is some necessary cleanup work for min/max ops to eventually help us move the add/sub sat patterns into DAGCombine - D91876. Differential Revision: https://reviews.llvm.org/D92112	2020-11-25 19:02:17 +00:00
Simon Pilgrim	385a27d6cd	[CostModel][X86] Refresh ISD::ABS costs Update costs now that D92095 and D92102 have tweaked the SSE2 implementation The SSE42 BLENDVPD cost can actually be used on SSE41 as we don't attempt to generate PCMPGT anymore Add scalar i16/i32/i64 costs as we can do this cheaply with CMOV	2020-11-25 18:40:19 +00:00
Craig Topper	c26e8697d7	[RISCV] Custom type legalize i32 fshl/fshr on RV64 with Zbt. This adds custom opcodes for FSLW/FSRW so we can type legalize fshl/fshr without needing to match a sign_extend_inreg. I've used the operand order from fshl/fshr to make the isel pattern similar to the non-W form. It was also hard to decide another order since the register instruction has the shift amount as the second operand, but the immediate instruction has it as the third operand. Differential Revision: https://reviews.llvm.org/D91479	2020-11-25 10:01:47 -08:00
Sebastian Neubauer	edd675643d	[AMDGPU] Emit stack frame size in metadata Add .shader_functions to pal metadata, which contains the stack frame size for all non-entry-point functions. Differential Revision: https://reviews.llvm.org/D90036	2020-11-25 16:30:02 +01:00
Simon Pilgrim	0637dfe88b	[DAG] Legalize abs(x) -> smax(x,sub(0,x)) iff smax/sub are legal If smax() is legal, this is likely to result in smaller codegen expansion for abs(x) than the xor(add,ashr) method. This is also what PowerPC has been doing for its abs implementation, so it lets us get rid of a load of custom lowering code there (and which was never updated when they added smax lowering). Alive2: https://alive2.llvm.org/ce/z/xRk3cD Differential Revision: https://reviews.llvm.org/D92095	2020-11-25 15:03:03 +00:00
Mark Murray	2b6691894a	[ARM][AArch64] Adding Neoverse N2 CPU support Add support for the Neoverse N2 CPU to the ARM and AArch64 backends. Differential Revision: https://reviews.llvm.org/D91695	2020-11-25 11:42:54 +00:00
Kerry McLaughlin	603d40da9d	[SVE][CodeGen] Add a DAG combine to extend mscatter indices This patch adds a target-specific DAG combine for mscatter to promote indices with element types i8 or i16 before legalisation, plus various tests with illegal types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90945	2020-11-25 11:18:22 +00:00

1 2 3 4 5 ...

60244 Commits