llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	c9eaed5149	[ARM] MVE VMOV.i64 In the original batch of MVE VMOVimm code generation VMOV.i64 was left out due to the way it was done downstream. It turns out that it's fairly simple though. This adds the codegen for it, similar to NEON. Bigendian is technically incorrect in this version, which John is fixing in a Neon patch.	2020-03-30 07:44:23 +01:00
Simon Pilgrim	9c8ec99c80	[X86][AVX] Combine 128/256-bit lane shuffles with zeroable upper subvectors to EXTRACT_SUBVECTOR (PR40720) As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128/SHUF128, and we can use the implicit zeroing of the uppers.	2020-03-29 19:51:38 +01:00
Simon Pilgrim	8206c50cde	[X86] Add isAnyZero shuffle mask helper	2020-03-29 19:51:37 +01:00
Matt Arsenault	d15723ef06	AMDGPU/GlobalISel: Remove redundant virtual	2020-03-29 14:03:07 -04:00
Matt Arsenault	ab7a41069e	AMDGPU: Fix using wrong instruction for FP conversion This was was never actually hit, but FTRUNC was clearly not the intent here.	2020-03-29 14:03:07 -04:00
Matt Arsenault	97bbe7ad2a	AMDGPU: Fix typo	2020-03-29 14:03:06 -04:00
Simon Pilgrim	7734e4b3a3	[X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720) As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128, and we can use the implicit zeroing of the upper half. I've added some extra tests to vector-shuffle-combining-avx2.ll to make sure we don't lose coverage.	2020-03-29 16:41:59 +01:00
Simon Pilgrim	da4c7db793	[X86] Rename matchShuffleAsByteRotate to matchShuffleAsElementRotate. NFC. This was an inner helper function for the real matchShuffleAsByteRotate function, but it is more generic and is used directly for VALIGN lowering which doesn't work at the byte level.	2020-03-29 16:41:58 +01:00
Simon Pilgrim	10439f9e32	[X86][AVX] Add X86ISD::VALIGN target shuffle decode support Allows us to combine VALIGN instructions with other shuffles - the combiner doesn't create VALIGN yet though.	2020-03-29 16:41:58 +01:00
Simon Pilgrim	a7115d51be	[X86] X86CallFrameOptimization - generalize slow push code path Replace the explicit isAtom() \|\| isSLM() test with the more general (and more specific) slowTwoMemOps() check to avoid the use of the PUSHrmm push from memory case. This is actually very tricky to test in anything but quite complex code, but the atomic-idempotent.ll tests seem to be the most straightforward to use. Differential Revision: https://reviews.llvm.org/D76239	2020-03-29 11:01:59 +01:00
Fangrui Song	fc93787d7e	[MC][PowerPC] Make .reloc support arbitrary relocation types Generalizes `ad7199f3e6` (R_PPC_NONE/R_PPC64_NONE).	2020-03-28 17:04:31 -07:00
Matt Arsenault	9564f46766	AMDGPU: Make use of default operands	2020-03-28 17:33:29 -04:00
Benjamin Kramer	2d24d74b85	[AMDGPU] Stabilize sort order Found by the expensive checks in llvm::sort.	2020-03-28 20:20:14 +01:00
Yonghong Song	ced0d1f42b	[BPF] support 128bit int explicitly in layout spec Currently, bpf does not specify 128bit alignment in its layout spec. So for a structure like struct ipv6_key_t { unsigned pid; unsigned __int128 saddr; unsigned short lport; }; clang will generate IR type %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] } Additional padding is to ensure later IR->MIR can generate correct stack layout with target layout spec. But it is common practice for a tracing program to be first compiled with target flag (e.g., x86_64 or aarch64) through clang to generate IR and then go through llc to generate bpf byte code. Tracing program often refers to kernel internal data structures which needs to be compiled with non-bpf target. But such a compilation model may cause a problem on aarch64. The bcc issue https://github.com/iovisor/bcc/issues/2827 reported such a problem. For the above structure, since aarch64 has "i128:128" in its layout string, the generated IR will have %struct.ipv6_key_t = type { i32, i128, i16 } Since bpf does not have "i128:128" in its spec string, the selectionDAG assumes alignment 8 for i128 and computes the stack storage size for the above is 32 bytes, which leads incorrect code later. The x86_64 does not have this issue as it does not have "i128:128" in its layout spec as it does permits i128 to be alignmented at 8 bytes at stack. Its IR type looks like %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] } The fix here is add i128 support in layout spec, the same as aarch64. The only downside is we may have less optimal stack allocation in certain cases since we require 16byte alignment for i128 instead of 8. But this is probably fine as i128 is not used widely and in most cases users should already have proper alignment. Differential Revision: https://reviews.llvm.org/D76587	2020-03-28 11:46:29 -07:00
Benjamin Kramer	4065e92195	Upgrade some instances of std::sort to llvm::sort. NFC.	2020-03-28 19:23:29 +01:00
Michael Liao	d2dd0fac48	Fix `-Wsign-compare` warning. NFC.	2020-03-28 10:20:27 -04:00
Kamlesh Kumar	aabc24acf0	[RISCV] Support llvm.thread.pointer Fixes https://bugs.llvm.org/show_bug.cgi?id=45303 (clang crashed on __builtin_thread_pointer) Reviewed By: lenary, MaskRay, luismarques Differential Revision: https://reviews.llvm.org/D76828	2020-03-27 17:30:12 -07:00
Francesco Petrogalli	c66d1f38f6	[llvm][Support] Add isZero method for TypeSize. [NFC] Summary: The method is used where TypeSize is implicitly cast to integer for being checked against 0. Reviewers: sdesmalen, efriedma Reviewed By: sdesmalen, efriedma Subscribers: efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76748	2020-03-27 21:03:44 +00:00
Matt Arsenault	a8cc9047de	CodeGen: Add -denormal-fp-math-f32 flag Make the set of FP related attributes and command flags closer.	2020-03-27 14:00:39 -07:00
Jay Foad	a6dfd827e5	[AMDGPU] Fix getEUsPerCU for gfx10 in CU mode Summary: "Per CU" is a bit simplistic for gfx10, but I couldn't think of a better name. Reviewers: arsenm, rampitec, nhaehnle, dstuttard, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76861	2020-03-27 20:36:49 +00:00
Fangrui Song	152d14da64	[MC][X86] Make .reloc support arbitrary relocation types Generalizes D62014 (R_386_NONE/R_X86_64_NONE). Unlike ARM (D76746) and AArch64 (D76754), we cannot delete FK_NONE from getFixupKindSize because FK_NONE is still used by R_386_TLS_DESC_CALL/R_X86_64_TLSDESC_CALL.	2020-03-27 13:33:15 -07:00
diggerlin	9c20f09985	[AIX] Address comment https://reviews.llvm.org/D76162#inline-701237 SUMMARY: Address clang format issue: "clang format this block, I don't think the spaces are aligned correctly." Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D76162	2020-03-27 16:21:53 -04:00
Matt Arsenault	348735b723	AMDGPU: Stop setting attributes based on TargetOptions Having arbitrary passes looking at the TargetOptions is pretty messy. This was also disregarding if a function already had an explicit attribute setting on it. opt/llc now add the attributes to functions that don't specify the attribute. clang and lld do not call the function to do this, which they maybe should. This was also treating unsafe-fp-math as implying the others, and setting the other attributes based on it. This is not done anywhere else, and I'm not sure is correct based on the current description of the option bit. Effectively reverts `1d8cf2be89`	2020-03-27 13:13:43 -07:00
Matt Arsenault	0ab5b5b858	Fix denormal-fp-math flag and attribute interaction Make these behave the same way unsafe-fp-math and co. The command line flag should add the attribute to functions that do not already have it, and leave existing attributes. The attribute is the actual implementation, but the flag is useful in some testing situations. AMDGPU has a variety of tests with denormals enabled/disabled that would require a painful level of test duplication without a flag. This doesn't expose setting the separate input/output modes, or add a flag for the f32 version yet. Tests will be included in future patch.	2020-03-27 12:48:58 -07:00
Fangrui Song	34d77516b8	[MC][AArch64] Make .reloc support arbitrary relocation types Depends on D76746. Generalizes D61973. Differential Revision: https://reviews.llvm.org/D76754	2020-03-27 12:30:52 -07:00
Fangrui Song	c389526171	[MC][ARM] Make .reloc support arbitrary relocation types Generalizes D61992. In GNU as, the .reloc directive supports arbitrary relocation types. A MCFixupKind value `V` larger than or equal to FirstLiteralRelocationKind is used to represent the relocation type whose number is V-FirstLiteralRelocationKind. This is useful for linker tests. Without the feature the assembler cannot produce certain relocation records (e.g. R_ARM_ALU_PC_G0/R_ARM_LDR_PC_G0) This helps move forward D75349 and D76575. Differential Revision: https://reviews.llvm.org/D76746	2020-03-27 12:29:49 -07:00
Craig Topper	cdd1cd7120	[X86] Don't form masked instructions if the operation has an additional user. This will cause the operation to be repeated in both a mask and another masked or unmasked form. This can a wasted of execution resources. Differential Revision: https://reviews.llvm.org/D60940	2020-03-27 10:44:22 -07:00
Simon Pilgrim	950ea61653	[X86] Remove orphan LowerSTRICT_FSETCC declaration. NFCI. LowerSETCC handles strict cases as well, we don't have a separate function.	2020-03-27 17:03:19 +00:00
Guillaume Chatelet	74eac9031a	[Alignment][NFC] MachineMemOperand::getAlign/getBaseAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76925	2020-03-27 15:49:13 +00:00
Sam Parker	d7084fa34a	[ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros Given that some instructions generate wider result elements than their inputs, flag them as being able to generate non zeros in the false lanes. Differential Revision: https://reviews.llvm.org/D76766	2020-03-27 15:26:13 +00:00
Sam Parker	0e6aa08381	[ARM][MVE] Add DoubleWidthResult flag Add a flag for those instructions which read from the top/bottom halves of their inputs and produce a vector of results with double width elements. Differential Revision: https://reviews.llvm.org/D76762	2020-03-27 13:44:04 +00:00
Guillaume Chatelet	e2ef6127d9	[Alignment] Fix overaligning bug Summary: This was discovered while converting to Align type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76914	2020-03-27 12:57:50 +00:00
Simon Pilgrim	d6ddabd7ef	Revert rG6ff1ea3244c543ad24fc99c7f4979db2f2078593 "Fix "use of uninitialized variable" static analyzer warning. NFCI." @dblaikie noticed that this may interfere with msan analysis	2020-03-27 11:44:03 +00:00
Jonas Paulsson	35173dddd1	[SystemZ] Fix typos in comments.	2020-03-27 12:31:48 +01:00
David Green	8689f98e9b	[ARM] Fix MVE VCMPr f16 pattern This patterns seemed to be using the f32 instruction, not f16. Fix it to use the correct one. Differential Revision: https://reviews.llvm.org/D76841	2020-03-27 11:18:24 +00:00
Fangrui Song	6728a9ae19	[MCInstPrinter] Add parameter `Address` to printCustomAliasOperand. NFC Follow-up of D72172 and llvmorg-11-init-6896-gb3cc5dcef0f.	2020-03-27 00:38:20 -07:00
Fangrui Song	b3cc5dcef0	[MCInstPrinter] Add parameter `Address` to MCInstPrinter::printAliasInstr. NFC Follow-up of D72172.	2020-03-27 00:03:32 -07:00
Shengchen Kan	1fb4f99a21	[X86][MC] Fix the bug for prefix padding support Summary: There is a tiny logic error of D75300, making branch is not correctly aligned with option -x86-pad-max-prefix-size Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight Reviewed By: reames Subscribers: hiraditya, llvm-commits, annita.zhang Tags: #llvm Differential Revision: https://reviews.llvm.org/D76285	2020-03-27 14:16:09 +08:00
Dan Gohman	d865437d9c	[WebAssembly] Fix the order of destructors in the LowerGlobalDtors pass. Fix the LowerGlobalDtors pass to run destructors in the same order as the regular LLVM destructor lowering -- in reverse order. Adjacent destructors with the same associated object are grouped, but destructors are not reordered based on associated objects. Differential Revision: https://reviews.llvm.org/D70685	2020-03-26 16:19:02 -07:00
Stanislav Mekhanoshin	4c4b71843b	[AMDGPU] Propagate amdgpu-waves-per-eu to callees Differential Revision: https://reviews.llvm.org/D76868	2020-03-26 14:43:44 -07:00
Craig Topper	9f7d4150b9	[X86] Move combineLoopMAddPattern and combineLoopSADPattern to an IR pass before SelecitonDAG. These transforms rely on a vector reduction flag on the SDNode set by SelectionDAGBuilder. This flag exists because SelectionDAG can't see across basic blocks so SelectionDAGBuilder is looking across and saving the info. X86 is the only target that uses this flag currently. By removing the X86 code we can remove the flag and the SelectionDAGBuilder code. This pass adds a dedicated IR pass for X86 that looks across the blocks and transforms the IR into a form that the X86 SelectionDAG can finish. An advantage of this new approach is that we can enhance it to shrink the phi nodes and final reduction tree based on the zeroes that we need to concatenate to bring the partially reduced reduction back up to the original width. Differential Revision: https://reviews.llvm.org/D76649	2020-03-26 14:10:20 -07:00
Simon Pilgrim	ad36491ebb	[X86] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on all targets Extends rG9d1721ce3926 to support AVX2+ targets.	2020-03-26 20:46:24 +00:00
Jay Foad	0fe096c4e9	[AMDGPU] Rename overloaded getMaxWavesPerEU to getWavesPerEUForWorkGroup Summary: I think Max in the name was misleading. NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76860	2020-03-26 20:21:04 +00:00
Jay Foad	bb9c4fd7ea	[AMDGPU] Remove getMaxWavesPerCU in favour of getWavesPerWorkGroup. Summary: These methods were identical. I chose to remove getMaxWavesPerCU because I think Max in the name was misleading. NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76859	2020-03-26 20:21:04 +00:00
Derek Schuff	e110897e28	[WEbAssembly] Clear frame base vreg in explicit-locals when stack pointer is dead Having an alloca in a function causes the stack pointer to be generated in the prolog, but if it's unused other than for debug info, explicit-locals will drop it and not allocate a local. In this case we need to reset the FrameBaseVreg. Differential Revision: https://reviews.llvm.org/D76784	2020-03-26 13:07:32 -07:00
Simon Pilgrim	39a52a19ed	[X86] lowerV16I8Shuffle - create v8i16 mask for PACKUS(AND(),AND()) patterns. We can improve computeKnownBits results by avoiding excess bitcasts. For this pattern we were doing: (v16i8 PACKUS(v8i16 BITCAST(v16i8 AND(V1, MASK)), v8i16 BITCAST(v16i8 AND(V2, MASK)))) By performing the MASK/AND with a v8i16 type and bitcasting V1/V2 directly we can help computeKnownBits see that the mask is clearing the upper bits and allows shuffle combining to peek through later on. This will be necessary to extend rG9d1721ce3926 to AVX2+ targets in a future patch.	2020-03-26 19:59:57 +00:00
diggerlin	fdfe411e7c	[AIX] discard the label in the csect of function description and use qualname for linkage SUMMARY: SUMMARY for a source file "test.c" void foo() {}; llc will generate assembly code as (assembly patch) .globl foo .globl .foo .csect foo[DS] foo: .long .foo .long TOC[TC0] .long 0 and symbol table as (xcoff object file) [4] m 0x00000004 .data 1 unamex foo [5] a4 0x0000000c 0 0 SD DS 0 0 [6] m 0x00000004 .data 1 extern foo [7] a4 0x00000004 0 0 LD DS 0 0 After first patch, the assembly will be as .globl foo[DS] # -- Begin function foo .globl .foo .align 2 .csect foo[DS] .long .foo .long TOC[TC0] .long 0 and symbol table will as [6] m 0x00000004 .data 1 extern foo [7] a4 0x00000004 0 0 DS DS 0 0 Change the code for the assembly path and xcoff objectfile patch for llc. Reviewers: Jason Liu Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D76162	2020-03-26 15:46:52 -04:00
Scott Linder	bd12ecb88f	[AMDGPU] Fix PC register mapping in wave32 mode Summary: The PC_32 DWARF register is for a 32-bit process address space which we don't implement in AMDGCN; another way of putting this is that the size of the PC register is not a function of the wavefront size. If we ever implement a 32-bit process address space we will need to add two more DwarfFlavours i.e. we will need to represent the product of (wave32, wave64) x (64-bit address space, 32-bit address space). Tags: #llvm Differential Revision: https://reviews.llvm.org/D76732	2020-03-26 14:43:25 -04:00
David Blaikie	9002db05a2	Roll otherwise unused subexpressions into an assertion	2020-03-26 11:32:33 -07:00
Guillaume Chatelet	b727aabcb8	[Alignment][NFC] Use llvmTargetFrameLowering::getStackAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Reviewed By: courbet Subscribers: wuzish, arsenm, jyknight, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, fedor.sergeev, jrtc27, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76613	2020-03-26 18:15:53 +00:00
Jay Foad	0602c20b1b	[AMDGPU] Make use of divideCeil. NFC.	2020-03-26 16:11:35 +00:00
Jay Foad	596bed3fd3	[AMDGPU] Remove unused methods. NFC.	2020-03-26 16:11:35 +00:00
Justin Hibbits	459e8e9488	[PowerPC]: Don't allow r0 as a target for LD_GOT_TPREL_L/32 Summary: The linker is free to relax this (relocation R_PPC_GOT_TPREL16) against R_PPC_TLS, if it sees fit (initial exec to local exec). If r0 is used, this can generate execution-invalid code (converts to 'addi %rX, %r0, FOO, which translates in PPC-lingo to li %rX, FOO). Forbid this instead. This fixes static binaries using locales on FreeBSD/powerpc (tested on FreeBSD/powerpcspe). Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D76662	2020-03-26 10:59:28 -05:00
Simon Pilgrim	9d1721ce39	[X86][SSE] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on pre-AVX2 targets As discussed on PR31443, we should be trying to use PACKUS for binary truncation patterns to reduce the number of shuffles. The plan is to support AVX2+ targets once we've worked around PR45315 - we fail to peek through a VBROADCAST_LOAD mask to recognise zero upper bits in a PACKUS pattern. We should also be able to add support for v8i16 and possibly 256/512-bit vectors as well.	2020-03-26 15:47:43 +00:00
Fangrui Song	3eef47407b	[PPCInstPrinter] Change printBranchOperand(calltarget) to print the target address in hexadecimal form ``` // llvm-objdump -d output (before) 0: bl .-4 4: bl .+0 8: bl .+4 // llvm-objdump -d output (after) ; GNU objdump -d 0: bl 0xfffffffc / bl 0xfffffffffffffffc 4: bl 0x4 8: bl 0xc ``` Many Operand's are not annotated as OPERAND_PCREL. They are not affected (e.g. `b .+67108860`). I plan to fix them in future patches. Modified test/tools/llvm-objdump/ELF/PowerPC/branch-offset.s to test address space wraparound for powerpc32 and powerpc64. Reviewed By: sfertile, jhenderson Differential Revision: https://reviews.llvm.org/D76591	2020-03-26 08:32:29 -07:00
Fangrui Song	87de9a0786	[X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form ``` // llvm-objdump -d output (before) 400000: e8 0b 00 00 00 callq 11 400005: e8 0b 00 00 00 callq 11 // llvm-objdump -d output (after) 400000: e8 0b 00 00 00 callq 0x400010 400005: e8 0b 00 00 00 callq 0x400015 // GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled 400000: e8 0b 00 00 00 callq 400010 400005: e8 0b 00 00 00 callq 400015 ``` In llvm-objdump, we pass the address of the next MCInst. Ideally we should just thread the address of the current address, unfortunately we cannot call X86MCCodeEmitter::encodeInstruction (X86MCCodeEmitter requires MCInstrInfo and MCContext) to get the length of the MCInst. MCInstPrinter::printInst has other callers (e.g llvm-mc -filetype=asm, llvm-mca) which set Address to 0. They leave MCInstPrinter::PrintBranchImmAsAddress as false and this change is a no-op for them. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D76580	2020-03-26 08:28:59 -07:00
Fangrui Song	5fad05e80d	[MCInstPrinter] Pass `Address` parameter to MCOI::OPERAND_PCREL typed operands. NFC Follow-up of D72172 and D72180 This patch passes `uint64_t Address` to print methods of PC-relative operands so that subsequent target specific patches can change `*InstPrinter::print{Operand,PCRelImm,...}` to customize the output. Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by llvm-objdump. ``` // Current llvm-objdump -d output aarch64: 20000: bl #0 ppc: 20000: bl .+4 x86: 20000: callq 0 // Ideal output aarch64: 20000: bl 0x20000 ppc: 20000: bl 0x20004 x86: 20000: callq 0x20005 // GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled aarch64: 20000: bl 20000 ppc: 20000: bl 0x20004 x86: 20000: callq 20005 ``` In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`): ``` case 12: // CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J... - printPCRelImm(MI, 0, O); + printPCRelImm(MI, Address, 0, O); return; ``` Some targets have 2 `printOperand` overloads, one without `Address` and one with `Address`. They should annotate derived `Operand` properly with `let OperandType = "OPERAND_PCREL"`. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D76574	2020-03-26 08:21:15 -07:00
Simon Pilgrim	e30d29ebc1	[X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT()) As long we extract from a source vector with smaller elements and we zero-extend the element in the final shuffle mask then we can safely peek through truncations and any/zero-extensions to find the source extraction.	2020-03-26 11:57:45 +00:00
Jonas Paulsson	8bf9e317e4	[SystemZ] Bugfix in tieOpsIfNeeded() This function did a check which was broken to see if an opcode requires op0 and op1 to be tied. By chance this is NFC. Review: Ulrich Weigand	2020-03-26 12:22:14 +01:00
Kang Zhang	4673699a47	[PowerPC] Remove the repeated definition for some InstAlias for mtspr/mfspr Summary: Below InstAlias have been redefined, this patch is to remove the repeated definition. mtdec/mfdec mtsdr1/mfsdr1 mtsrr0/mfsrr0 mtsrr1/mfsrr1 mtasr Reviewed By: nemanjai, steven.zhang Differential Revision: https://reviews.llvm.org/D75821	2020-03-26 09:58:30 +00:00
Cullen Rhodes	9086db707d	[AArch64][SVE] Implement structured store intrinsics Summary: This patch adds initial support for the following intrinsics: * llvm.aarch64.sve.st2 * llvm.aarch64.sve.st3 * llvm.aarch64.sve.st4 For storing two, three and four vectors worth of data. Basic codegen for reg+immediate forms are implemented. Reg+reg addressing modes will be addressed in a later patch. These intrinsics are intended for use in the Arm C Language Extension (ACLE). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D75947	2020-03-26 09:34:51 +00:00
Ties Stuij	71ae267d1f	[PATCH] [ARM] ARMv8.6-a command-line + BFloat16 Asm Support Summary: This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a in addition to the GCC patch for the 8..6-a CLI: https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html In detail this patch - march options for armv8.6-a - BFloat16 assembly This is part of a patch series, starting with command-line and Bfloat16 assembly support. The subsequent patches will upstream intrinsics support for BFloat16, followed by Matrix Multiplication and the remaining Virtualization features of the armv8.6-a architecture. Based on work by: - labrinea - MarkMurrayARM - Luke Cheeseman - Javed Asbar - Mikhail Maltsev - Luke Geeson Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson Reviewed By: SjoerdMeijer Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D76062	2020-03-26 09:17:20 +00:00
David Green	37b9cc8f29	[ARM] Sink splats to vector float instructions Some MVE floating point instructions have gpr register variants that take the scalar gpr value and splat them to all lanes. In order to accept them in loops, the shuffle_vector and insert need to be sunk down into the loop, next to the instruction so that ISel can see the whole pattern. This does that sinking for FAdd, FSub, FMul and FCmp. The patterns for mul are slightly more constrained as there are no fms variants taking register arguments. Differential Revision: https://reviews.llvm.org/D76023	2020-03-26 09:02:18 +00:00
QingShan Zhang	1ef7bf4121	[PowerPC] Improve the way legalize mul for v8i16 and add pattern to match mul + add We can legalize the operation MUL for v8i16 with instruction (vmladduhm A, B, 0) if altivec enabled. Now, it is set as custom and expand it later, which is not the right way. And then, we can add the pattern to match the mul + add with (vmladduhm A, B, C) Reviewed By: Nemanjai Differential Revision: https://reviews.llvm.org/D76751	2020-03-26 04:46:49 +00:00
Stanislav Mekhanoshin	e06d707aa2	[AMDGPU] Fixed function traversal in attribute propagation AMDGPUPropagateAttributes pass was skipping some of the functions when cloning. Functions were added to root set and then skipped on the next interation because they are already in the root set, while were meant to be processed with different features. Differential Revision: https://reviews.llvm.org/D76815	2020-03-25 18:47:09 -07:00
Stanislav Mekhanoshin	6e00e3fcb0	[AMDGPU] Preserve original symbol during attribute propagation AMDGPUPropagateAttributes can swap names while cloning a function. Only do it if original symbol was not externally visible. Differential Revision: https://reviews.llvm.org/D76789	2020-03-25 15:26:30 -07:00
Simon Pilgrim	c6e5531f9b	[X86][AVX] Combine shuffles to TRUNCATE/VTRUNC patterns Add support for combining shuffles to AVX512 truncate instructions - another step toward fixing D56387/D66004. It also fixes SKX code on PR31443. We could probably extend this further to handle non-VLX truncation cases.	2020-03-25 17:41:51 +00:00
Mikhail Maltsev	bb4da94e5b	[ARM,CDE] Implement predicated Q-register CDE intrinsics Summary: This patch implements the following CDE intrinsics: T __arm_vcx1q_m(int coproc, T inactive, uint32_t imm, mve_pred_t p); T __arm_vcx2q_m(int coproc, T inactive, U n, uint32_t imm, mve_pred_t p); T __arm_vcx3q_m(int coproc, T inactive, U n, V m, uint32_t imm, mve_pred_t p); T __arm_vcx1qa_m(int coproc, T acc, uint32_t imm, mve_pred_t p); T __arm_vcx2qa_m(int coproc, T acc, U n, uint32_t imm, mve_pred_t p); T __arm_vcx3qa_m(int coproc, T acc, U n, V m, uint32_t imm, mve_pred_t p); The intrinsics are not part of the released ACLE spec, but internally at Arm we have reached consensus to add them to the next ACLE release. Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76610	2020-03-25 17:08:19 +00:00
Yvan Roux	bd069ad39c	[ARM] Move ConstantIsland and LowOverheadLoops Passes. Move ARM ConstantIsland and LowOverheadLopps passes later in the pipeline such that they will be run after the upcoming Machine Outlining pass. Differential Revision: https://reviews.llvm.org/D76065	2020-03-25 16:49:21 +01:00
cdevadas	ce984129ea	[AMDGPU] Add SIPreEmitPeephole pass. This pass can handle all the optimization opportunities found just before code emission. Presently it includes the handling of vcc branch optimization that was handled earlier in SIInsertSkips. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D76712	2020-03-25 15:35:35 +00:00
Jonas Paulsson	f09b891d4a	[SystemZ] Improve foldMemoryOperandImpl() A spilled load of an immediate can use MVHI/MVGHI instead. A compare of a spilled register against an immediate can use CHSI/CGHSI. A logical compare can use CLFHSI/CLGHSI. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D76055	2020-03-25 16:21:08 +01:00
Sean Fertile	3282d875d6	[PowerPC][AIX] ByVal formal arguments in a single register. Adds support for passing ByVal formal arguments as long as they fit in a single register. Differential Revision: https://reviews.llvm.org/D76401	2020-03-25 11:09:40 -04:00
Kerry McLaughlin	05606329e2	[AArch64][SVE] Add SVE intrinsics for masked loads & stores Summary: Implements the following intrinsics for contiguous loads & stores: - @llvm.aarch64.sve.ld1 - @llvm.aarch64.sve.st1 Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, dancgr, rengolin Reviewed By: cameron.mcinally Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76688	2020-03-25 11:48:40 +00:00
Sam Parker	e87250202d	[ARM][MVE] Add HorizontalReduction flag Add a target flag for instructions that reduce into one, or more, scalar reg(s), including variants of: - VADDV - VABAV - VMINV/VMAXV - VMLADAV Differential Revision: https://reviews.llvm.org/D76683	2020-03-25 11:12:03 +00:00
Kazushi (Jam) Marukawa	28a42dd1b9	[VE] Change name of enum to CondCode Summary: Change enum name for condition codes from CondCodes to CondCode. Reviewers: arsenm, simoll, k-ishizaka Reviewed By: arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm, #ve Differential Revision: https://reviews.llvm.org/D76747	2020-03-25 09:20:05 +01:00
Amara Emerson	472d282046	[AArch64][GlobalISel] Don't localize TLS G_GLOBAL_VALUEs on Darwin. On Darwin these need to be selected into a function call for the TLS address lookup. As a result, they can't be moved below a physreg write, which happens in call sequences. In the long term, we should have some mechanism in the localizer to prevent localizing into target-specific atomic instruction sequences. rdar://60056248 Differential Revision: https://reviews.llvm.org/D76652	2020-03-24 13:35:50 -07:00
Reid Kleckner	597718aae0	Re-land "Avoid emitting unreachable SP adjustments after `throw`" This reverts commit `4e0fe038f4`. Re-lands `65b21282c7`. After landing `5ff5ddd0ad` to add int3 into trailing unreachable blocks, we can now remove these extra stack adjustments without confusing the Win64 unwinder. See https://llvm.org/45064#c4 or X86AvoidTrailingCall.cpp for a full explanation. Fixes PR45064.	2020-03-24 12:04:43 -07:00
Matt Arsenault	26ebc51a34	AMDGPU/GlobalISel: Fix smrd loads of v4i64	2020-03-24 13:44:41 -04:00
David Green	f8c79b94af	[ARM] Fold VMOVrh VLDR to LDRH This adds a simple fold to combine VMOVrh load to a integer load. Similar to what is already performed for BITCAST, but needs to account for the types being of different sizes, creating an zero extending load. Differential Revision: https://reviews.llvm.org/D76485	2020-03-24 15:51:03 +00:00
Simon Pilgrim	714402147d	[X86][SSE1] Add support for logic+movmsk patterns (PR42870) rL368506 handled the basic case, but we need to account for boolean logic patterns as well.	2020-03-24 14:28:40 +00:00
David Green	1232cfa385	[ARM] Don't split trunc stores that can be better handled as VMOVN We deliberately split stores of the form store(truncate(larger-than-legal-type)) into two stores, allowing each store to perform part of the truncate for free. There are times however where it makes more sense to use VMOVN to de-interlace the results back into a single vector, and store that in one go. This adds a check for that situation, not splitting the store if it looks like a VMOVN can be more useful. Differential Revision: https://reviews.llvm.org/D76511	2020-03-24 08:48:52 +00:00
Sam Parker	94cacebcca	[ARM][LowOverheadLoops] Add checks for narrowing Modify ValidateLiveOuts to track 'FalseLaneZeros' more precisely, including checks on specific operations that can generate non-zeros from zero values, e.g VMVN. We can then check that any instructions that retain some information in their output register (all narrowing instructions) that they only use and def registers that always have zeros in their falsely predicated bytes, whether or not tail predication happens. Most of the logic remains the same, just the names of the data structures and helpers have been renamed to reflect the change in logic. The key change, apart from the opcode checkers, is that the FalseZeros set now strictly contains only instructions which will always generate zeros, and not instructions that could also have their false bytes masked away later. Differential Revision: https://reviews.llvm.org/D76235	2020-03-24 08:41:48 +00:00
Sam Parker	6f86e6bf40	[ARM][MVE] Add target flag for narrowing insts Add a flag, 'RetainsPreviousHalfElement', for operations that operate on top/bottom halves of their input and only write to half of their destination, leaving the other half to retain its previous value. Differential Revision: https://reviews.llvm.org/D76608	2020-03-24 08:36:44 +00:00
Chen Zheng	9d07d91fb6	[PowerPC] fix a typo in commit `3f85134d71` Implement target hook isProfitableToHoist - typo fix.	2020-03-24 01:56:15 -04:00
Nemanja Ivanovic	bfa9ce1cb2	[PowerPC] Improve handling of some BUILD_VECTOR nodes An analysis of real world code turned up a number of patterns with BUILD_VECTOR of nodes resulting from operations on extracted vector elements for which we produce poor code. This addresses those cases. No attempt is made for completeness as that would entail a large amount of work for something that there is no evidence of in real code. Differential revision: https://reviews.llvm.org/D72660	2020-03-23 17:34:29 -05:00
Justin Hibbits	f0990e104b	[PowerPC]: e500 target can't use lwsync, use msync instead The e500 core has a silicon bug that triggers an illegal instruction program trap on any sync other than msync. Other cores will typically ignore illegal sync types, and the documentation even implies that the 'illegal' bits are ignored. Address this hardware deficiency by only using msync, like the PPC440. Differential Revision: https://reviews.llvm.org/D76614	2020-03-23 17:15:27 -05:00
Matt Arsenault	66073953a5	AMDGPU: Allow vectorization of round intrinsic There seems to be a small benefit to the legalized sequence for v2f16 round with packed instructions, so allow vectorizing it by reducing the cost. An unintended side effect is vectorization of f32 round also happens. The current FMA logic seems off to me, and isn't checking for packed instructions.	2020-03-23 17:00:41 -04:00
Matt Arsenault	2ad5fc1d91	AMDGPU/GlobalISel: Implement computeNumSignBitsForTargetInstr	2020-03-23 15:02:30 -04:00
Reid Kleckner	5ff5ddd0ad	[Win64] Insert int3 into trailing empty BBs Otherwise, the Win64 unwinder considers direct branches to such empty trailing BBs to be a branch out of the function. It treats such a branch as a tail call, which can only be part of an epilogue. If the unwinder misclassifies such a branch as part of the epilogue, it will fail to unwind the stack further. This can lead to bad stack traces, or failure to handle exceptions properly. This is described in https://llvm.org/PR45064#c4, and by the comment at the top of the X86AvoidTrailingCallPass.cpp file. It should be safe to insert int3 for such blocks. An empty trailing BB that reaches this pass is pretty much guaranteed to be unreachable. If a program executed such a block, it would fall off the end of the function. Most of the complexity in this patch comes from threading through the "EHFuncletEntry" boolean on the MIRParser and registering the pass so we can stop and start codegen around it. I used an MIR test because we should teach LLVM to optimize away these branches as a follow-up. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D76531	2020-03-23 08:50:37 -07:00
Ram Nalamothu	24698e526f	Implement wave32 DWARF register mapping Implement the DWARF register mapping described in llvm/docs/AMDGPUUsage.rst. This enables generating appropriate DWARF register numbers for wave64 and wave32 modes.	2020-03-23 10:24:16 -04:00
Sanjay Patel	0eeee83d75	[VectorUtils] move x86's scaleShuffleMask to generic VectorUtils We have some long-standing missing shuffle optimizations that could use this transform via VectorCombine now: https://bugs.llvm.org/show_bug.cgi?id=35454 (and we still don't get that case in the backend either) This function is apparently templated because there's existing code in IR that treats mask values as unsigned and backend code that treats masks values as signed. The mask values are not endian-dependent (as shown by the existing bitcast transform from DAGCombiner). Differential Revision: https://reviews.llvm.org/D76508	2020-03-23 09:58:55 -04:00
Jonas Paulsson	9adc7fc3cd	[SystemZ] Perform instruction shortening for fused fp ops. Replace single-lane (W... form) vector "multiply and add" and "multiply and subtract" instructions with equivalent floating point instructions whenever possible in SystemZShortenInst. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D76370	2020-03-23 14:12:13 +01:00
Guillaume Chatelet	3ba550a05a	[Alignment][NFC] Use TFL::getStackAlign() Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: dylanmckay, sdardis, nemanjai, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76551	2020-03-23 13:48:29 +01:00
Guillaume Chatelet	ea64ee0edb	[Alignment][NFC] Deprecate ensureMaxAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76368	2020-03-23 11:31:33 +01:00
Craig Topper	e2cb121374	[X86] Remove maximum vector length limit from combineBasicSADPattern. createPSADBW uses SplitsOpsAndApply so should be able to handle any size. Restrict the extract result type to i32 or i64 since that's what we have coverage for today and probably matches what the isSimple() check gave us before. Differential Revision: https://reviews.llvm.org/D76560	2020-03-22 15:02:05 -07:00
Craig Topper	f4c67dfa92	[X86] More accurately model the cost of horizontal reductions. This patch attempts to more accurately model the reduction of power of 2 vectors of types we natively support. This takes into account the narrowing of vectors that occur as we go from 512 bits to 256 bits, to 128 bits. It also takes into account the use of wider elements in the shuffles for the first 2 steps of a reduction from 128 bits. And uses a v8i16 shift for the final step of vXi8 reduction. The default implementation uses the legalized type for the arithmetic for all levels. And uses the single source permute cost of the legalized type for all levels. This penalizes things like lack of v16i8 pshufb on pre-sse3 targets and the splitting and joining that needs to be done for integer types on AVX1. We never need v16i8 shuffle for a reduction and we only need split AVX1 ops when type the type wide and needs to be split. I think we're still over costing splits and joins for AVX1, but we're closer now. I've also removed all pairwise special casing because I don't think we ever want to generate that on X86. I've also adjusted the add handling to more accurately account for any type splitting that occurs before we reach a legal type. Differential Revision: https://reviews.llvm.org/D76478	2020-03-22 14:20:15 -07:00
Simon Atanasyan	2dc4eb08cd	[mips] Implement .cpadd directive This directive inserts code to add $gp to the argument's register when support for position independent code is enabled. For example, this code: .cpadd $4 expands to: addu $4, $4, $gp	2020-03-22 23:34:32 +03:00
Simon Atanasyan	9bbddfbeaa	[mips] Implement sne pseudo instruction The `sne Dst, Src1, Src2/Imm` pseudo instruction sets register `Dst` to 1 if register `Src1` is not equal to `Src2/Imm` and to 0 otherwise.	2020-03-22 23:34:31 +03:00
Simon Atanasyan	dca9e40c0c	[mips] Implement sle/sleu pseudo instructions The `sle/sleu Dst, Src1, Src2/Imm` pseudo instructions set register `Dst` to 1 if register `Src1` is less than or equal `Src2/Imm` and to 0 otherwise.	2020-03-22 23:34:31 +03:00
Simon Atanasyan	862f120fdb	[mips] Remove instructions related to "wired paired single" from the P5600 model.	2020-03-22 23:34:31 +03:00
Simon Atanasyan	ecc92fd018	[mips] Add HasMips3D to the list of features unsupported by P5600 model.	2020-03-22 23:34:31 +03:00
Simon Atanasyan	0f15ace018	[mips] Rename target feature Mips3D => HasMips3D. NFC	2020-03-22 23:34:31 +03:00
Craig Topper	b89ae50795	[X86] Remove maximum vector width restriction from combineLoopSADPattern. SplitsOpsAndApply will take care of any needed splitting correctly. All that we need to check is that the vector element count is a power of 2. Differential Revision: https://reviews.llvm.org/D76558	2020-03-22 11:09:14 -07:00
Matt Arsenault	830cfda19f	Utils: Mostly convert memcpy expansion to use Align The TTI hooks aren't converted. I also think the intrinsics should have mandatory alignment and never return MaybeAlign.	2020-03-22 11:21:44 -04:00
Fangrui Song	140d6245af	Delete TargetLoweringObjectFile::Ctx We can use the parent MCObjectFileInfo::Ctx which has the same value.	2020-03-21 22:36:29 -07:00
Fangrui Song	b445643632	[X86] Delete unneeded X86ELFTargetObjectFile::Initialize. NFC	2020-03-21 22:03:42 -07:00
Simon Pilgrim	25eb9056d7	[X86] getTargetShuffleAndZeroables - add insert_subvector(undef, sub, c) handling. We often widen xmm/ymm vectors to ymm/zmm by insertion into an undef base vector. By letting getTargetShuffleAndZeroables track the undef elts we can help avoid a lot of unnecessary cross-lane shuffles. Fixes PR44694	2020-03-21 19:11:42 +00:00
Simon Pilgrim	4ceade0428	[X86] Combine concat(shufps,shufps) -> shufps(concat,concat) Now that rG18c19441d105 has improved VPERM2X128 handling, we can perform this to improve x64->x32 truncation without poor cross-lane issues. Someday combineX86ShufflesRecursively will handle this, but we're still really bad at dealing with different vector widths.	2020-03-21 12:44:10 +00:00
Simon Pilgrim	f424d51c3e	Revert rGe6a7e3b5e3e7 "[X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles." This reverts commit `e6a7e3b5e3`. Avoids register pressure regression reported at PR45263	2020-03-21 12:14:19 +00:00
Simon Pilgrim	ff3aae6908	Fix Wdocumentation warning. NFCI.	2020-03-21 11:21:57 +00:00
Fangrui Song	85c30f3374	[X86] Reland D71360 Clean up UseInitArray initialization for X86ELFTargetObjectFile -fuse-init-array is now the CC1 default but TargetLoweringObjectFileELF::UseInitArray still defaults to false. The following two unknown OS target triples continue using .ctors/.dtors because InitializeELF is not called. clang -target i386 -c a.c clang -target x86_64 -c a.c This cleanup fixes this as a bonus. X86SpeculativeLoadHardeningPass::tracePredStateThroughCall can call MCContext::createTempSymbol before TargetLoweringObjectFileELF::Initialize(). We need to call TargetLoweringObjectFileELF::Initialize() ealier. test/CodeGen/X86/speculative-load-hardening-indirect.ll Differential Revision: https://reviews.llvm.org/D71360	2020-03-20 21:57:34 -07:00
Eric Christopher	fc7233d774	Temporarily Revert "[X86] Reland D71360 Clean up UseInitArray initialization for X86ELFTargetObjectFile" as it's causing msan failures. This reverts commit `7899fe9da8`.	2020-03-20 17:36:12 -07:00
Fangrui Song	df4cc35efd	[VE] Fix -Wunused-private-field after D72598 and -Wdeprecated-declarations after D76348	2020-03-20 15:06:58 -07:00
Fangrui Song	7899fe9da8	[X86] Reland D71360 Clean up UseInitArray initialization for X86ELFTargetObjectFile UseInitArray is now the CC1 default but TargetLoweringObjectFileELF::UseInitArray still defaults to false. The following two unknown OS target triples continue using .ctors/.dtors because InitializeELF is not called. clang -target i386 -c a.c clang -target x86_64 -c a.c This cleanup fixes this as a bonus. Differential Revision: https://reviews.llvm.org/D71360	2020-03-20 11:18:36 -07:00
Craig Topper	32fbea1548	[X86] Prevent (bitcast (broadcast_load)) combine from producing vXf16 broadcast instructions. The combine tries to put the broadcast in either the integer or fp domain to match the bitcast domain. But we can only do this if the broadcast size is 32 or larger.	2020-03-20 09:15:07 -07:00
Simon Tatham	1adfa4c991	[ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family. Summary: I've implemented them as target-specific IR intrinsics rather than using `@llvm.experimental.vector.reduce.add`, on the grounds that the 'experimental' intrinsic doesn't currently have much code generation benefit, and my replacements encapsulate the sign- or zero-extension so that you don't expose the illegal MVE vector type (`<4 x i64>`) in IR. The machine instructions come in two versions: with and without an input accumulator. My new IR intrinsics, like the 'experimental' one, don't take an accumulator parameter: we represent that by just adding on the input value using an ordinary i32 or i64 add. So if you write the `vaddvaq` C-language intrinsic with an input accumulator of zero, it can be optimised to VADDV, and conversely, if you write something like `x += vaddvq(y)` then that can be combined into VADDVA. Most of this is achieved in isel lowering, by converting these IR intrinsics into the existing `ARMISD::VADDV` family of custom SDNode types. For the difficult case (64-bit accumulators), isel lowering already implements the optimization of folding an addition into a VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is already done, except that I had to introduce a parallel set of ARMISD nodes for the //predicated// forms of VADDLV. For the simpler VADDV, we handle the predicated form by just leaving the IR intrinsic alone and matching it in an ordinary dag pattern. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76491	2020-03-20 15:42:33 +00:00
Simon Tatham	45a9945b9e	[ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family. Summary: I've implemented these as target-specific IR intrinsics, because they're not //quite// enough like @llvm.experimental.vector.reduce.min (which doesn't take the extra scalar parameter). Also this keeps the predicated and unpredicated versions looking similar, and the floating-point minnm/maxnm versions fold into the same schema. We had a couple of min/max reductions already implemented, from the initial pathfinding exercise in D67158. Those were done by having separate IR intrinsic names for the signed and unsigned integer versions; as part of this commit, I've changed them to use a flag parameter indicating signedness, which is how we ended up deciding that the rest of the MVE intrinsics family ought to work. So now hopefully the ewhole lot is consistent. In the new llc test, the output code from the `v8f16` test functions looks quite unpleasant, but most of it is PCS lowering (you can't pass a `half` directly in or out of a function). In other circumstances, where you do something else with your `half` in the same function, it doesn't look nearly as nasty. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76490	2020-03-20 15:42:33 +00:00
Matt Arsenault	a950e3beef	AMDGPU: Move towards deprecating alignbit intrinsic This is equivalent to llvm.fshr, so legalize the intrinsic to the generic node.	2020-03-20 11:03:04 -04:00
alex-t	6e34e71869	[AMDGPU] Enable divergence driven ISel for ADD/SUB i64 Summary: Currently we custom select add/sub with carry out to scalar form relying on later replacing them to vector form if necessary. This change enables custom selection code to take the divergence of adde/addc SDNodes into account and select the appropriate form in one step. Reviewers: arsenm, vpykhtin, rampitec Reviewed By: arsenm, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa Differential Revision: https://reviews.llvm.org/D76371	2020-03-20 17:06:11 +03:00
Mikhail Maltsev	969034b860	[ARM,CDE] Implement CDE unpredicated Q-register intrinsics Summary: This patch implements the following intrinsics: uint8x16_t __arm_vcx1q_u8 (int coproc, uint32_t imm); T __arm_vcx1qa(int coproc, T acc, uint32_t imm); T __arm_vcx2q(int coproc, T n, uint32_t imm); uint8x16_t __arm_vcx2q_u8(int coproc, T n, uint32_t imm); T __arm_vcx2qa(int coproc, T acc, U n, uint32_t imm); T __arm_vcx3q(int coproc, T n, U m, uint32_t imm); uint8x16_t __arm_vcx3q_u8(int coproc, T n, U m, uint32_t imm); T __arm_vcx3qa(int coproc, T acc, U n, V m, uint32_t imm); Most of them are polymorphic. Furthermore, some intrinsics are polymorphic by 2 or 3 parameter types, such polymorphism is not supported by the existing MVE/CDE tablegen backends, also we don't really want to have a combinatorial explosion caused by 1000 different combinations of 3 vector types. Because of this some intrinsics are implemented as macros involving a cast of the polymorphic arguments to uint8x16_t. The IR intrinsics are even more restricted in terms of types: all MVE vectors are cast to v16i8. Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76299	2020-03-20 14:01:56 +00:00
Mikhail Maltsev	d22e661712	[ARM,CDE] Implement CDE S and D-register intrinsics Summary: This patch implements the following ACLE intrinsics: uint32_t __arm_vcx1_u32(int coproc, uint32_t imm); uint32_t __arm_vcx1a_u32(int coproc, uint32_t acc, uint32_t imm); uint32_t __arm_vcx2_u32(int coproc, uint32_t n, uint32_t imm); uint32_t __arm_vcx2a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t imm); uint32_t __arm_vcx3_u32(int coproc, uint32_t n, uint32_t m, uint32_t imm); uint32_t __arm_vcx3a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t m, uint32_t imm); uint64_t __arm_vcx1d_u64(int coproc, uint32_t imm); uint64_t __arm_vcx1da_u64(int coproc, uint64_t acc, uint32_t imm); uint64_t __arm_vcx2d_u64(int coproc, uint64_t m, uint32_t imm); uint64_t __arm_vcx2da_u64(int coproc, uint64_t acc, uint64_t m, uint32_t imm); uint64_t __arm_vcx3d_u64(int coproc, uint64_t n, uint64_t m, uint32_t imm); uint64_t __arm_vcx3da_u64(int coproc, uint64_t acc, uint64_t n, uint64_t m, uint32_t imm); Since the semantics of CDE instructions is opaque to the compiler, the ACLE intrinsics require dedicated LLVM IR intrinsics. The 64-bit and 32-bit variants share the same IR intrinsic. Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76298	2020-03-20 14:01:53 +00:00
Mikhail Maltsev	7a85e3585e	[ARM,CDE] Implement GPR CDE intrinsics Summary: This change implements ACLE CDE intrinsics that translate to instructions working with general-purpose registers. The specification is available at https://static.docs.arm.com/101028/0010/ACLE_2019Q4_release-0010.pdf Each ACLE intrinsic gets a corresponding LLVM IR intrinsic (because they have distinct function prototypes). Dual-register operands are represented as pairs of i32 values. Because of this the instruction selection for these intrinsics cannot be represented as TableGen patterns and requires custom C++ code. Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76296	2020-03-20 14:01:51 +00:00
Adrian Kuegel	baa6f6a782	Revert "[TableGen][GlobalISel] Account for HwMode in RegisterBank register sizes" This reverts commit `e9f22fd429`. When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70 failing tests with this revision, and 29 without this revision.	2020-03-20 11:02:50 +01:00
David Green	b3499f572d	[ARM] Change VDUP type to i32 for MVE The MVE VDUP instruction take a GPR and splats into every lane of a vector register. Unlike NEON we do not have a VDUPLANE equivalent instruction, doing the same splat from a fp register. Previously a VDUP to a v4f32/v8f16 would be represented as a (v4f32 VDUP f32), which would mean the instruction pattern needs to add a COPY_TO_REGCLASS to the GPR. Instead this now converts that earlier during an ISel DAG combine, converting (VDUP x) to (VDUP (bitcast x)). This can allow instruction selection to tell that the input needs to be an i32, which in one of the testcases allows it to use ldr (or specifically ldm) over (vldr;vmov). Whilst being simple enough for floats, as the types sizes are the same, these is no BITCAST equivalent for getting a half into a i32. This uses a VMOVrh ARMISD node, which doesn't know the same tricks yet. Differential Revision: https://reviews.llvm.org/D76292	2020-03-20 09:48:45 +00:00
Roger Ferrer Ibanez	3c24aee7ee	[RISCV] Select +0.0 immediate using fmv.{w,d}.x / fcvt.d.w Floating point positive zero can be selected using fmv.w.x / fmv.d.x / fcvt.d.w and the zero source register. Differential Revision: https://reviews.llvm.org/D75729	2020-03-20 09:42:24 +00:00
Austin Kerbow	2cbb8c946a	[AMDGPU] Reuse register during frame index elimination If there were no free VGPRs we would need two emergency spill slots for register scavenging during PEI/frame index elimination. Reuse 'ResultReg' for scale calculation so that only one spill is needed. Differential Revision: https://reviews.llvm.org/D76387	2020-03-20 00:19:15 -07:00
cdevadas	728b878de6	[AMDGPU] Set the CostPerUse value for vgpr registers. Apart from the argument registers, set the CostPerUse value as per the ratio reg_index/allocation_granularity. It is a pre-commit for introducing the scratch registers in the ABI. This change should help in a balanced register allocation. Differential Revision: https://reviews.llvm.org/D76417	2020-03-20 11:49:35 +05:30
Wei Mi	a035726e5a	Revert "Generate Callee Saved Register (CSR) related cfi directives like .cfi_restore." This reverts commit `3c96d01d2e`. Got report that it caused test failures in libc++.	2020-03-19 22:45:27 -07:00
Yuta Saito	08670d435b	[WebAssembly] Support swiftself and swifterror for WebAssembly target Summary: Swift ABI is based on basic C ABI described here https://github.com/WebAssembly/tool-conventions/blob/master/BasicCABI.md Swift Calling Convention on WebAssembly is a little deffer from swiftcc on another architectures. On non WebAssembly arch, swiftcc accepts extra parameters that are attributed with swifterror or swiftself by caller. Even if callee doesn't have these parameters, the invocation succeed ignoring extra parameters. But WebAssembly strictly checks that callee and caller signatures are same. https://github.com/WebAssembly/design/blob/master/Semantics.md#calls So at WebAssembly level, all swiftcc functions end up extra arguments and all function definitions and invocations explicitly have additional parameters to fill swifterror and swiftself. This patch support signature difference for swiftself and swifterror cc is swiftcc. e.g. ``` declare swiftcc void @foo(i32, i32) @data = global i8* bitcast (void (i32, i32)* @foo to i8) define swiftcc void @bar() { %1 = load i8, i8** @data %2 = bitcast i8* %1 to void (i32, i32, i32)* call swiftcc void %2(i32 1, i32 2, i32 swiftself 3) ret void } ``` For swiftcc, emit additional swiftself and swifterror parameters if there aren't while lowering. These additional parameters are added for both callee and caller. They are necessary to match callee and caller signature for direct and indirect function call. Differential Revision: https://reviews.llvm.org/D76049	2020-03-19 17:39:52 -07:00
Thomas Lively	34db3c3a18	[WebAssembly] SIMD integer abs instructions Summary: These were merged to the SIMD proposal in https://github.com/WebAssembly/simd/pull/128. Depends on D76397 to avoid merge conflicts. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76399	2020-03-19 17:25:58 -07:00
Thomas Lively	a3f974f3c3	[WebAssembly] SIMD bitmask intrinsics and builtin functions Summary: These experimental new instructions are proposed in https://github.com/WebAssembly/simd/pull/201. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76397	2020-03-19 17:15:37 -07:00
Matt Arsenault	678da7b109	AMDGPU/GlobalISel: Remove leftover #if 0 The subtarget feature used to be missing from subtargets, but that was fixed.	2020-03-19 20:07:05 -04:00
Stefan Agner	f87563661d	[MC][ARM] add implicit immediate form for ldrsbt/ldrht/ldrsht Add pseudo instructions for ldrsbt/ldrht/ldrsht with implicit immediate and add fall back C++ code to transform the instruction to the equivalent LDRSBTi/LDRHTi/LDRSHTi form. This is similar to how it has been done in commit `fb3950ec63` This fixes: https://bugs.llvm.org/show_bug.cgi?id=45070	2020-03-19 22:36:42 +01:00
Scott Linder	0e9368cc8c	[AMDGPU] Move frame pointer from s34 to s33 Remove the gap left between the stack pointer (s32) and frame pointer (s34) now that the scratch wave offset is no longer a part of the calling convention ABI. Update llvm/docs/AMDGPUUsage.rst to reflect the change. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75657	2020-03-19 15:35:16 -04:00
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
Scott Linder	db099f994b	[AMDGPU][NFC] Refactor some uses of unsigned to Register Tags: #llvm Differential Revision: https://reviews.llvm.org/D76035	2020-03-19 15:35:16 -04:00
Scott Linder	30bb113beb	[AMDGPU][NFC] Refactor emitEntryFunctionPrologue Remove dead code and factor repeated conditions out into a single check. Rename and move code to make it more obvious what is running only for entry functions. Simplify function arguments to make it clearer what the relevant inputs are. Make flat scratch init accept an MBB iterator and move it to where it was logically being emitted within the prologue. These changes will make a future update to the calling convention simpler. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75092	2020-03-19 15:35:16 -04:00
Cameron McInally	018dde4ce5	[AArch64][SVE] Add support for DestructiveBinaryImm DestructiveInstType Support prefixing destructive operations, with the MOVPRFX instruction, to build constructive operations. Differential Revision: https://reviews.llvm.org/D75064	2020-03-19 13:11:46 -05:00
Craig Topper	c13aa36bb7	[X86] Attempt to more accurately model the cost of a bool reduction of wide vector type. Previously we multiplied the cost for the table entries by the number of splits needed. But that implies that each split goes through a reduction to scalar independently. I think what really happens is that the we AND/OR the split pieces until we're down to a single value with a legal type and then do special reduction sequence on that. So to model that this patch takes the number of splits minus one multiplied by the cost of a AND/OR at the legal element count and adds that on top of the table lookup. Differential Revision: https://reviews.llvm.org/D76400	2020-03-19 09:31:05 -07:00
Djordje Todorovic	d9b9621009	Reland D73534: [DebugInfo] Enable the debug entry values feature by default The issue that was causing the build failures was fixed with the D76164.	2020-03-19 13:57:30 +01:00
Andrzej Warzynski	0ea4fb5bb7	[AArch64][SVE] Rename intrinsics for gather prefetch [NFC] Summary: In order to keep the names consistent with other SVE gather loads, the intrinsics for gather prefetch are renamed as follows: * @llvm.aarch64.sve.gather.prfb -> @llvm.aarch64.sve.prfb.gather Reviewed by: fpetrogalli Differential Revision: https://reviews.llvm.org/D76421	2020-03-19 12:53:36 +00:00
Chen Zheng	3f85134d71	[PowerPC] implement target hook isProfitableToHoist On Powerpc fma is faster than fadd + fmul for some types, (PPCTargetLowering::isFMAFasterThanFMulAndFAdd). we should implement target hook isProfitableToHoist to prevent simplifyCFGpass from breaking fma pattern by hoisting fmul to predecessor block. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D76207	2020-03-19 00:17:25 -04:00
Chen Zheng	aacf022cd5	[PowerPC] add IR level isFMAFasterThanFMulAndFAdd - NFC And also refactor legacy MIR level isFMAFasterThanFMulAndFAdd. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D76265	2020-03-18 23:24:40 -04:00
Eli Friedman	e24e95fe90	Remove CompositeType class. The existence of the class is more confusing than helpful, I think; the commonality is mostly just "GEP is legal", which can be queried using APIs on GetElementPtrInst. Differential Revision: https://reviews.llvm.org/D75660	2020-03-18 13:53:17 -07:00
lewis-revill	e9f22fd429	[TableGen][GlobalISel] Account for HwMode in RegisterBank register sizes This patch generates TableGen descriptions for the specified register banks which contain a list of register sizes corresponding to the available HwModes. The appropriate size is used during codegen according to the current HwMode. As this HwMode was not available on generation, it is set upon construction of the RegisterBankInfo class. Targets simply need to provide the HwMode argument to the <target>GenRegisterBankInfo constructor. The RISC-V RegisterBankInfo constructor has been updated accordingly (plus an unused argument removed). Differential Revision: https://reviews.llvm.org/D76007	2020-03-18 19:52:23 +00:00
Nemanja Ivanovic	e009fad342	[PowerPC] Remove UB from PPCInstrInfo when handling rotates fed by constants As pointed out in https://bugs.llvm.org/show_bug.cgi?id=45232 this code can end up shifting a 64-bit unsigned value left by 64 bits. Althought this works as expected on some platforms it is definitely UB. This patch removes the UB and adds the associated test case. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45232	2020-03-18 13:40:39 -05:00
Simon Tatham	e13d153c1b	[ARM,MVE] Add intrinsics for the VQDMLAD family. Summary: This is another set of instructions too complicated to be sensibly expressed in IR by anything short of a target-specific intrinsic. Given input vectors a,b, the instruction generates intermediate values 2(a[0]b[0]+a[1]+b[1]), 2(a[2]b[2]+a[3]+b[3]), etc; takes the high half of each double-width values, and overwrites half the lanes in the output vector c, which you therefore have to provide the input value of. Optionally you can swap the elements of b so that the are things like a[0]b[1]+a[1]b[0]; optionally you can round to nearest when taking the high half; and optionally you can take the difference rather than sum of the two products. Finally, saturation is applied when converting back to a single-width vector lane. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76359	2020-03-18 17:11:22 +00:00
Matt Arsenault	4ea1baf6a0	AMDGPU: Initial, crude support for indirect calls This isn't really usable, and requires using the -amdgpu-fixed-function-abi flag to work. Assumes a uniform call target, and will hit a verifier error if the call target ends up in a VGPR. Also doesn't attempt to do anything sensible for the reported register/stack usage.	2020-03-18 12:03:48 -04:00
Matt Arsenault	ea4597eef1	Reapply "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize" This reverts commit `9bca8fc4cf`. Rearrange handling to avoid changing the instruction in the case where it's going to be erased and replaced with undef.	2020-03-18 12:01:22 -04:00
Piotr Sobczak	d1a7bfca74	[AMDGPU] Fix AMDGPUUnifyDivergentExitNodes Summary: For the case where "done" bits on existing exports are removed by unifyReturnBlockSet(), unify all return blocks - even the uniformly reached ones. We do not want to end up with a non-unified, uniformly reached block containing a normal export with the "done" bit cleared. That case is believed to be rare - possible with infinite loops in pixel shaders. This is a fix for D71192. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76364	2020-03-18 16:49:30 +01:00
Chris Bowler	c21866476e	[PowerPC][AIX] Implement by-val caller arguments in a single register. This is the first of a series of patches that adds caller support for by-value arguments. This patch add support for arguments that are passed in a single GPR. There are 3 limitation cases: -The by-value argument is larger than a single register. -There are no remaining GPRs even though the by-value argument would otherwise fit in a single GPR. -The by-value argument requires alignment greater than register width. Future patches will be required to add support for these cases as well as for the callee handling (in LowerFormalArguments_AIX) that corresponds to this work. Differential Revision: https://reviews.llvm.org/D75863	2020-03-18 10:57:28 -04:00
Oliver Stannard	73cea83a6f	[IPRA][ARM] Spill extra registers at -Oz When optimising for code size at the expense of performance, it is often worth saving and restoring some of r0-r3, if IPRA will be able to take advantage of them. This doesn't cost any extra code size if we already have a PUSH/POP pair, and increases the number of available registers across any calls to the function. We already have an optimisation which tries fold the subtract/add of the SP into the PUSH/POP by using extra registers, which somewhat conflicts with this. I've made the new optimisation less aggressive in cases where the existing one is likely to trigger, which gives better results than either of these optimisations by themselves. Differential revision: https://reviews.llvm.org/D69936	2020-03-18 13:51:16 +00:00
Guillaume Chatelet	d000655a8c	[Alignment][NFC] Deprecate getMaxAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76348	2020-03-18 14:48:45 +01:00
Oliver Stannard	6739805e24	[ARM] Track epilogue instructions with FrameDestroy flag (NFC) Rather than trying to work out which instructions are part of the epilogue by examining them, we can just mark them with the FrameDestroy flag, like we do in the AArch64 backend.	2020-03-18 13:32:59 +00:00
Francesco Petrogalli	9bdcd9bf44	[llvm][SVE] Addressing mode for FF/NF loads. Summary: This patch adds addressing mode computation for the following SVE instructions: * ldff1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn\|SP>{, <Xm>{, lsl #imm}}] * ldnf1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn\|SP>{, #<imm>, mul vl}] Reviewers: andwar, sdesmalen, rengolin, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76209	2020-03-18 12:46:07 +00:00
Sander de Smalen	4788ca450f	[AArch64][SVE] Change pointer type of nontemporal load/store intrinsics Summary: This fixes a discrepancy between the non-temporal loads/store intrinsics and other SVE load intrinsics (such as nf/ff), so that Clang can use the same code to generate these intrinsics. Reviewers: andwar, kmclaughlin, rengolin, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76237	2020-03-18 12:44:51 +00:00
Simon Tatham	928776de92	[ARM,MVE] Add intrinsics for the VQDMLAH family. Summary: These are complicated integer multiply+add instructions with extra saturation, taking the high half of a double-width product, and optional rounding. There's no sensible way to represent that in standard IR, so I've converted the clang builtins directly to target-specific intrinsics. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76123	2020-03-18 10:55:04 +00:00
Simon Tatham	28c5d97bee	[ARM,MVE] Add intrinsics and isel for MVE integer VMLA. Summary: These instructions compute multiply+add in integers, with one of the operands being a splat of a scalar. (VMLA and VMLAS differ in whether the splat operand is a multiplier or the addend.) I've represented these in IR using existing standard IR operations for the unpredicated forms. The predicated forms are done with target- specific intrinsics, as usual. When operating on n-bit vector lanes, only the bottom n bits of the i32 scalar operand are used. So we have to tell that to isel lowering, to allow it to remove a pointless sign- or zero-extension instruction on that input register. That's done in `PerformIntrinsicCombine`, but first I had to enable `PerformIntrinsicCombine` for MVE targets (previously all the intrinsics it handled were for NEON), and make it a method of `ARMTargetLowering` so that it can get at `SimplifyDemandedBits`. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76122	2020-03-18 10:55:04 +00:00
Guillaume Chatelet	c3df69faa0	[Alignment][NFC] Deprecate getTransientStackAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76301	2020-03-18 09:02:48 +01:00
Pengfei Wang	974d649f8e	CET for Exception Handle Summary: Bug fix for https://bugs.llvm.org/show_bug.cgi?id=45182 Exception handle may indirectly jump to catch pad, So we should add ENDBR instruction before catch pad instructions. Reviewers: craig.topper, hjl.tools, LuoYuanke, annita.zhang, pengfei Reviewed By: LuoYuanke Subscribers: hiraditya, llvm-commits Patch By: Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D76190	2020-03-17 22:35:05 -07:00
Vitaly Buka	9bca8fc4cf	Revert "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize" The patch introduced use-after-poison. This reverts commit `d0fe13ecf9`.	2020-03-17 22:04:14 -07:00
Nico Weber	4e0fe038f4	Revert "Avoid emitting unreachable SP adjustments after `throw`" This reverts commit `65b21282c7`. Breaks sanitizer bots (https://reviews.llvm.org/D75712#1927668) and causes https://crbug.com/1062021 (which may or may not be a compiler bug, not clear yet).	2020-03-17 20:49:22 -04:00
Matt Arsenault	c9b454a1b7	AMDGPU/GlobalISel: Fix verifier errors on image atomics	2020-03-17 20:06:25 -04:00
Scott Linder	68f163df0e	[AMDGPU] Print DWARF register numbers in AMDGPUInstPrinter Summary: Explanation is in a comment in the diff, but essentially printing a physical register name here is ambiguous. Until we can implement printing a DWARF register name here just use the encoding directly. Tags: #llvm Differential Revision: https://reviews.llvm.org/D76253	2020-03-17 19:42:10 -04:00
Simon Pilgrim	68224c1952	[TargetLowering] Only demand a rotation's modulo amount bits ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits. Differential Revision: https://reviews.llvm.org/D76201	2020-03-17 21:23:46 +00:00
Scott Constable	080dd10f7d	Move RDF from Hexagon to Codegen RDF is designed to be target agnostic. Therefore it would be useful to have it available for other targets, such as X86. Based on a previous patch by Krzysztof Parzyszek Differential Revision: https://reviews.llvm.org/D75932	2020-03-17 12:43:14 -07:00
Sebastian Neubauer	6e29846b29	[AMDGPU] Fix whole wavefront mode We cannot move wwm over exec copies because the exec register needs an exact exec mask. Differential Revision: https://reviews.llvm.org/D76232	2020-03-17 17:23:23 +01:00
Simon Pilgrim	c9656a3b31	[DAGCombiner] matchRotateSub - handle shift amount truncation Under certain circumstances we'll end up in the position where the negated shift amount will get truncated to the type specified getScalarShiftAmountTy(), so we need to test for a truncated version of the shift amount as well. This allows us to remove half of the remaining patterns tested for by X86ISelLowering's combineOrShiftToFunnelShift.	2020-03-17 16:01:23 +00:00
Matt Arsenault	039c917b43	AMDGPU/GlobalISel: Fix asserting on gather4 intrinsics	2020-03-17 11:07:30 -04:00
alex-t	48a9cf9043	[AMDGPU] Enable SEXT divergence driven selection. Summary: This change enable the divergence driven selection for the SEXT DAG opcode. Reviewers: vpykhtin, rampitec Reviewed By: vpykhtin Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Differential Revision: https://reviews.llvm.org/D76230	2020-03-17 17:30:11 +03:00
Simon Atanasyan	73b1da1605	[MIPS] Implement MIPS3D vector instructions Patch by Michael Roe. Differential Revision: https://reviews.llvm.org/D76247	2020-03-17 17:17:51 +03:00
Matt Arsenault	d0fe13ecf9	AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize For normal loads, fully eliminate the load. For the TFE case, adjust the dmask value in the instruction so the selector doesn't need to handle it. For the TFE special case, I guess it would be possible to replace the loaded data register with undef, but as-is this will start treating it as a well defined value.	2020-03-17 10:15:30 -04:00
Matt Arsenault	d9a012ed8a	AMDGPU/GlobalISel: Adjust image load register type based on dmask Trim elements that won't be written. The equivalent still needs to be done for writes. Also start widening 3 elements to 4 elements. Selection will get the count from the dmask.	2020-03-17 10:09:18 -04:00
Matt Arsenault	83ffbf2618	AMDGPU/GlobalISel: Legalize non-a16 non-NSA images	2020-03-17 10:02:09 -04:00
Matt Arsenault	2aba9b6cf8	AMDGPU/GlobalISel: Legalize a16 images Pack the address registers in the legalizer. Avoid introducing a huge family of new intermediate operations by filling dead operands with noreg.	2020-03-17 10:02:09 -04:00
Kazushi (Jam) Marukawa	6bbbead7be	[VE] Move VEInstPrinter.cpp and VEInstPrinter.h into MCTargetDesc Summary: Move them into MCTargetDesc to follow other architectures (`a263aa2`). Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D76270	2020-03-17 11:33:54 +01:00
QingShan Zhang	b83490bdb7	[PowerPC] Fix a typo of the condition of checking the fusion candidate	2020-03-17 10:04:18 +00:00
QingShan Zhang	0b126eec6d	[NFC][PowerPC] Simplify the logic in lower select_cc The logic in select_cc is messy and hard to follow. This is a NFC patch to simplify the logic. Differential Revision: https://reviews.llvm.org/D75834	2020-03-17 03:47:39 +00:00
Vitaly Buka	f20dcc31e3	Fix unused function warning	2020-03-16 19:45:36 -07:00
Shengchen Kan	39bcc76a92	[X86] Disable nop padding before instruction following hardcode Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight Reviewed By: LuoYuanke Subscribers: annita.zhang, llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D76176	2020-03-17 09:45:12 +08:00
Craig Topper	85726bbcba	[X86] Disable fast-isel call lowering for functions with vXi1 arguments on avx512. This fails an assert because the type is marked in the calling convention td file as needing promotion, but the code doesn't know how to do it. It also much more complicated because we try to pass these in xmm/ymm/zmm registers. As of a few weeks ago we do this promotion from getRegisterTypeForCallingConv before the td file generated code gets involved.	2020-03-16 18:20:42 -07:00
Eric Christopher	8b3b04eb41	Make isValidImmForSVEVecImmAddrMode inline static rather than just static. Fixes -Werror builds.	2020-03-16 17:39:01 -07:00
Evgenii Stepanov	2a3723ef11	[memtag] Plug in stack safety analysis. Summary: Run StackSafetyAnalysis at the end of the IR pipeline and annotate proven safe allocas with !stack-safe metadata. Do not instrument such allocas in the AArch64StackTagging pass. Reviewers: pcc, vitalybuka, ostannard Reviewed By: vitalybuka Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, gilang, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73513	2020-03-16 16:35:25 -07:00
Sriraman Tallam	df082ac45a	Basic Block Sections support in LLVM. This is the second patch in a series of patches to enable basic block sections support. This patch adds support for: * Creating direct jumps at the end of basic blocks that have fall through instructions. * New pass, bbsections-prepare, that analyzes placement of basic blocks in sections. * Actual placing of a basic block in a unique section with special handling of exception handling blocks. * Supports placing a subset of basic blocks in a unique section. * Support for MIR serialization and deserialization with basic block sections. Parent patch : D68063 Differential Revision: https://reviews.llvm.org/D73674	2020-03-16 16:06:54 -07:00
Craig Topper	378b1e6080	[X86] Assign avx512bf16 instructions to the SSEPackedSingle ExeDomain.	2020-03-16 14:07:01 -07:00
Benjamin Kramer	05ff3323e0	[AArch64] Remove unused variable	2020-03-16 21:59:55 +01:00
Francesco Petrogalli	0f2b68d9c7	Implement IR intrinsics for gather prefetch. Summary: Intrinsics and relative codegen has been implemented for the following SVE instructions: 1. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.S, <mod>] -> 32-bit scaled offset 2. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D, <mod>] -> 32-bit unpacked scaled offset 3. PRF<T> <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D] -> 64-bit scaled offset 4. PRF<T> <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element 5. PRF<T> <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element The instructions are associated the following intrinsics, respectively: 1. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx4vi32( i8* %base, <vscale x 4 x i32> %offset, <vscale x 4 x i1> %Pg, i32 %prfop) 2. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx2vi32( i8* %base, <vscale x 2 x i32> %offset, <vscale x 2 x i1> %Pg, i32 %prfop) 3. void @llvm.aarch64.sve.gather.prf<T>.scaled.nx2vi64( i8* %base, <vscale x 2 x i64> %offset, <vscale x 2 x i1> %Pg, i32 %prfop) 4. void @llvm.aarch64.sve.gather.prf<T>.nx4vi32( <vscale x 4 x i32> %bases, i64 %imm, <vscale x 4 x i1> %Pg, i32 %prfop) 5. void @llvm.aarch64.sve.gather.prf<T>.nx2vi64( <vscale x 2 x i64> %bases, i64 %imm, <vscale x 2 x i1> %Pg, i32 %prfop) The intrinsics are the IR counterpart of the following SVE ACLE functions: * void svprf<T>(svbool_t pg, const void base, svprfop op) void svprf<T>_vnum(svbool_t pg, const void base, int64_t vnum, svprfop op) void svprf<T>_gather[_u32base](svbool_t pg, svuint32_t bases, svprfop op) * void svprf<T>_gather[_u64base](svbool_t pg, svuint64_t bases, svprfop op) * void svprf<T>_gather_[s32]offset(svbool_t pg, const void base, svint32_t offsets, svprfop op) void svprf<T>_gather_[u32]offset(svbool_t pg, const void base, svint32_t offsets, svprfop op) void svprf<T>_gather_[s64]offset(svbool_t pg, const void base, svint64_t offsets, svprfop op) void svprf<T>_gather_[u64]offset(svbool_t pg, const void base, svint64_t offsets, svprfop op) void svprf<T>_gather[_u32base]_offset(svbool_t pg, svuint32_t bases, int64_t offset, svprfop op) * void svprf<T>_gather[_u64base]_offset(svbool_t pg, svuint64_t bases,int64_t offset, svprfop op) Reviewers: andwar, sdesmalen, efriedma, rengolin Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75580	2020-03-16 18:52:35 +00:00
Simon Pilgrim	ebb181cf40	[X86] matchScalarReduction - add support for partial reductions Add optional support for opt-in partial reduction cases by providing an optional partial mask to indicate which elements have been extracted for the scalar reduction.	2020-03-16 18:01:02 +00:00
Matt Arsenault	80b627d69d	AMDGPU/GlobalISel: Fix handling of G_ANYEXT with s1 source We were letting G_ANYEXT with a vcc register bank through, which was incorrect and would select to an invalid copy. Fix this up like G_ZEXT and G_SEXT. Also drop old code to fixup the non-boolean case in RegBankSelect. We now have to perform that expansion during selection, so there's no benefit to doing it during RegBankSelect.	2020-03-16 12:59:54 -04:00
Matt Arsenault	c460dc6eeb	AMDGPU/GlobalISel: Fix some illegal scalar argument types Fixes integers that don't evenly divide to i32 pieces. We should probably extract some of the code in the legalizer to start handling argument breakdowns. I'm dissatisfied with the argument lowering's handling of vectors for example, and we should not be producing the weird G_EXTRACTs we do now.	2020-03-16 12:51:23 -04:00
Matt Arsenault	84386b2d8a	AMDGPU: Drop special case f64 fround lowering The result is better if ftrunc is emitted and separately legalized when unavailable.	2020-03-16 12:09:30 -04:00
Matt Arsenault	57d896e838	AMDGPU/GlobalISel: Make some large merges legal We allow up to 1024-bit registers, so we should support merges all the way to the maximum.	2020-03-16 10:49:10 -04:00
Simon Pilgrim	e43a085781	[X86] X86::isConstantSplat - enable partial undef bit handling by default. We currently only ever use this for lowering constant uniform values (shift/rotate by immediate) so we can safely enable it by default (it treats the undef bits as zero when extracting constants). This is necessary for an upcoming patch that will use SimplifyDemandedBits more aggressively on funnel shift amounts and causes regressions in vXi64 constant without it.	2020-03-16 12:56:24 +00:00
Simon Pilgrim	ac4609cb1d	[X86] LowerRotate - use X86::isConstantSplat to detect constant splat rotation amounts. Avoid code duplication and matches what we do for the similar LowerFunnelShift and LowerScalarImmediateShift methods.	2020-03-16 12:56:23 +00:00
Jonas Paulsson	132f25bcca	[SystemZ] Avoid scalarization of [SU]INT_TO_FP ISD-nodes. The type legalizer will scalarize vector conversions from integer to floating point if the source element size is less than that of the result. This is avoided now by inserting a zero/sign-extension of the source vector before type legalization. Review: Ulrich Weigand Differential revision: https://reviews.llvm.org/D75978	2020-03-16 13:07:42 +01:00
Shengchen Kan	b1a7a245ec	[NFC][MC] Rename alignBranches* to emitInstruction* alignBranches is X86 specific, change the name in a more general one since other target can do some state chang before and after emitting the instruction.	2020-03-16 17:13:14 +08:00
Simon Atanasyan	e0ab0e6a28	[MIPS] Implement PUL.PS and PUU.PS instructions Patch by Michael Roe. Differential Revision: https://reviews.llvm.org/D75812	2020-03-16 09:39:47 +03:00
Philip Reames	a79863f2f7	Support prefix padding for alignment purposes (Relaxable instructions only) Now that D75203 has landed and baked for a few days, extend the basic approach to prefix padding as well. The patch itself is fairly straight forward. For the moment, this patch adds the functional support and some basic testing there of, but defaults to not enabling prefix padding. I want to be able to phrase a separate patch which adds the target specific reasoning and test it cleanly. I haven't decided whether I want to common it with the nop logic or not. Differential Revision: https://reviews.llvm.org/D75300	2020-03-15 19:53:41 -07:00
Craig Topper	b2da1ddaef	[X86] Add a non-zero cost for truncating v32i16->v32i8 on avx512bw.	2020-03-15 17:18:46 -07:00
Benjamin Kramer	caef4a81c9	[AVR] Make helper functions static. NFC.	2020-03-15 16:50:15 +01:00

... 2 3 4 5 6 ...

56885 Commits