llvm-project

Commit Graph

Author	SHA1	Message	Date
David Spickett	3d233d5d4d	[AArch64] Add v8.5-a Memory Tagging STZGM instruction This instruction writes a block of allocation tags and stores zero to the associated data locations. It differs from STGM by 1 bit and has the same arguments. The specification can be found here: https://developer.arm.com/docs/ddi0596/c Differential Revision: https://reviews.llvm.org/D60065 llvm-svn: 357397	2019-04-01 14:56:37 +00:00
David Spickett	9142b8ef1b	[AArch64] Add v8.5-a Memory Tagging STGM/LDGM instructions The STGV/LDGV instructions were replaced with STGM/LDGM. The encodings remain the same but there is no longer writeback so there are no unpredictable encodings to check for. The specfication can be found here: https://developer.arm.com/docs/ddi0596/c Differential Revision: https://reviews.llvm.org/D60064 llvm-svn: 357395	2019-04-01 14:52:18 +00:00
Alex Bradbury	da20f5ca74	[RISCV] Generate address sequences suitable for mcmodel=medium This patch adds an implementation of a PC-relative addressing sequence to be used when -mcmodel=medium is specified. With absolute addressing, a 'medium' codemodel may cause addresses to be out of range. This is because while 'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as opposed to 'small', which implies the first 2 GiB only. Note that LLVM/Clang currently specifies code models differently to GCC, where small and medium imply the same functionality as GCC's medlow and medany respectively. Differential Revision: https://reviews.llvm.org/D54143 Patch by Lewis Revill. llvm-svn: 357393	2019-04-01 14:42:56 +00:00
David Spickett	efe376add6	[AArch64] Add v8.5-a Memory Tagging GMID_EL1 register The latest version of the MTE spec added a system register 'GMID_EL1'. It contains the block size used by the LDGM and STGM instructions and is read only. The specification can be found here: https://developer.arm.com/docs/ddi0596/c llvm-svn: 357392	2019-04-01 14:41:14 +00:00
Mikael Holmen	150a7ec2dc	[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder Summary: This fixes PR41270. The recursive function evaluateInDifferentElementOrder expects to be called on a vector Value, so when we call it on a vector GEP's arguments, we must first check that the argument is indeed a vector. Reviewers: reames, spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60058 llvm-svn: 357389	2019-04-01 14:10:10 +00:00
Mikael Holmen	3e527cd823	Revert "[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder" This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5. I'll recommit with a better commit message with reference to the phabricator review. llvm-svn: 357387	2019-04-01 14:06:45 +00:00
Matt Arsenault	0276b94356	InstSimplify: Add baseline test for upcoming change llvm-svn: 357386	2019-04-01 14:03:44 +00:00
Mikael Holmen	d66a47f90a	[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder This fixes PR41270. The recursive function evaluateInDifferentElementOrder expects to be called on a vector Value, so when we call it on a vector GEP's arguments, we must first check that the argument is indeed a vector. llvm-svn: 357385	2019-04-01 13:48:56 +00:00
Clement Courbet	7e062c9b1f	[X86] Make post-ra scheduling macrofusion-aware. Subscribers: MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59688 llvm-svn: 357384	2019-04-01 13:48:50 +00:00
Sanjay Patel	97d1bc4454	[InstCombine] eliminate commuted select-shuffles + binop (PR41304) If we have a commutable vector binop with inverted select-shuffles, we don't care about the order of the operands in each vector lane: LHS = shuffle V1, V2, <0, 5, 6, 3> RHS = shuffle V2, V1, <0, 5, 6, 3> LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2 PR41304: https://bugs.llvm.org/show_bug.cgi?id=41304 ...is currently titled as an SLP enhancement, but at least for the given example, we can reduce that in instcombine because we are just eliminating shuffles. As noted in the TODO, this could be generalized, but I haven't thought through those patterns completely, so this is limited to what appears to be always safe. Differential Revision: https://reviews.llvm.org/D60048 llvm-svn: 357382	2019-04-01 13:36:40 +00:00
Clement Courbet	d9f6ee1c3c	[X86MacroFusion][NFC] Add more tests. In preparation for D59688. llvm-svn: 357381	2019-04-01 13:18:34 +00:00
Krasimir Georgiev	7af32444b9	[X86] Fix a test from r357317 Summary: The missing `<` causes the lld command to override the test file, which fails in environments marking the test files as readonly. Reviewers: bkramer Reviewed By: bkramer Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60060 llvm-svn: 357380	2019-04-01 11:42:54 +00:00
Simon Pilgrim	e8c3136994	[X86][SSE] Add fcmp constant folding tests Initial test coverage for D60006 llvm-svn: 357379	2019-04-01 10:54:04 +00:00
Luis Marques	3091884e25	[RISCV] Add seto pattern expansion Adds a `seto` pattern expansion. Without it the lowerings of `fcmp one` and `fcmp ord` would be inefficient due to an unoptimized double negation. Differential Revision: https://reviews.llvm.org/D59699 llvm-svn: 357378	2019-04-01 09:54:14 +00:00
Alex Bradbury	ca81a56f65	[RISCV] Don't evaluatePCRelLo if a relocation will be forced (e.g. due to linker relaxation) A pcrel_lo will point to the associated pcrel_hi fixup which in turn points to the real target. RISCVMCExpr::evaluatePCRelLo will work around this indirection in order to allow the fixup to be evaluate properly. However, if relocations are forced (e.g. due to linker relaxation is enabled) then its evaluation is undesired and will result in a relocation with the wrong target. This patch modifies evaluatePCRelLo so it will not try to evaluate if the fixup will be forced as a relocation. A new helper method is added to RISCVAsmBackend to query this. Differential Revision: https://reviews.llvm.org/D59686 llvm-svn: 357374	2019-04-01 02:38:27 +00:00
Sanjay Patel	7ac1186b58	[InstCombine] add tests for inverted select-shuffles + binop (PR41304); NFC llvm-svn: 357368	2019-03-31 15:45:47 +00:00
Sanjay Patel	e1bc360fc6	[x86] allow movmsk with 2-element reductions One motivation for making this change is that the lack of using movmsk is likely a main source of perf difference between clang and gcc on the C-Ray benchmark as shown here: https://www.phoronix.com/scan.php?page=article&item=gcc-clang-2019&num=5 ...but this change alone isn't enough to solve that problem. The 'all-of' examples show what is likely the worst case trade-off: we end up with an extra instruction (or 2 if we count the 'xor' register clearing). The 'any-of' examples look clearly better using movmsk because we've traded 2 vector instructions for 2 scalar instructions, and movmsk may have better timing than the generic 'movq'. If we examine the llvm-mca output for these cases, it appears that even though the 'all-of' movmsk variant looks worse on paper, it would perform better on both Haswell and Jaguar. $ llvm-mca -mcpu=haswell no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 504 Total uOps: 400 Dispatch Width: 4 uOps Per Cycle: 0.79 IPC: 0.79 Block RThroughput: 1.0 $ llvm-mca -mcpu=haswell movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 358 Total uOps: 600 Dispatch Width: 4 uOps Per Cycle: 1.68 IPC: 1.68 Block RThroughput: 1.5 $ llvm-mca -mcpu=btver2 no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 407 Total uOps: 400 Dispatch Width: 2 uOps Per Cycle: 0.98 IPC: 0.98 Block RThroughput: 2.0 $ llvm-mca -mcpu=btver2 movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 311 Total uOps: 600 Dispatch Width: 2 uOps Per Cycle: 1.93 IPC: 1.93 Block RThroughput: 3.0 Finally, there may be CPUs where movmsk is horribly slow (old AMD small cores?), but if that's true, then we're also almost certainly making the wrong transform already for reductions with >2 elements, so that should be fixed independently. Differential Revision: https://reviews.llvm.org/D59997 llvm-svn: 357367	2019-03-31 15:11:34 +00:00
Sanjay Patel	b276dd195a	[InstCombine] canonicalize select shuffles by commuting In PR41304: https://bugs.llvm.org/show_bug.cgi?id=41304 ...we have a case where we want to fold a binop of select-shuffle (blended) values. Rather than try to match commuted variants of the pattern, we can canonicalize the shuffles and check for mask equality with commuted operands. We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a special case that the backend is required to handle because we already canonicalize vector select to this shuffle form. So there should be no codegen difference from this change. It's possible that this improves CSE in IR though. Differential Revision: https://reviews.llvm.org/D60016 llvm-svn: 357366	2019-03-31 15:01:30 +00:00
Luqman Aden	7c67dbdc65	[NFC][InstCombine] Add tests for combining icmp of no-wrap sub w/ constant. llvm-svn: 357360	2019-03-31 08:58:50 +00:00
Simon Pilgrim	ec56621a5c	[SystemZ] Remove fcmp undef from reduced test Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC) Approved by @uweigand (Ulrich Weigand) llvm-svn: 357355	2019-03-30 20:24:26 +00:00
Simon Pilgrim	513e6b9d58	[MIPS] Remove fcmp undef from reduced test Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC) Approved by @atanasyan (Simon Atanasyan) llvm-svn: 357354	2019-03-30 20:16:16 +00:00
Craig Topper	e4a0fc7d75	[X86] Teach isel for RMW binops to handle negate Negate updates flags like a subtract. We should be able to use the flags from the RMW form of negate when we have (store (X86ISD::SUB 0, load A), A) Differential Revision: https://reviews.llvm.org/D60007 llvm-svn: 357353	2019-03-30 18:59:17 +00:00
Alex Bradbury	0b2803ee65	[RISCV] Add codegen support for ilp32f, ilp32d, lp64f, and lp64d ("hard float") ABIs This patch adds support for the RISC-V hard float ABIs, building on top of rL355771, which added basic target-abi parsing and MC layer support. It also builds on some re-organisations and expansion of the upstream ABI and calling convention tests which were recently committed directly upstream. A number of aspects of the RISC-V float hard float ABIs require frontend support (e.g. flattening of structs and passing int+fp for fp+fp structs in a pair of registers), and will be addressed in a Clang patch. As can be seen from the tests, it would be worthwhile extending RISCVMergeBaseOffsets to handle constant pool as well as global accesses. Differential Revision: https://reviews.llvm.org/D59357 llvm-svn: 357352	2019-03-30 17:59:30 +00:00
Simon Pilgrim	10c9032c02	[X86][SSE] detectAVGPattern - Match zext(or(x,y)) 'add like' patterns (PR41316) Fixes PR41316 where the expanded PAVG intrinsic had had one of its ADDs turned into an OR due to its operands having no conflicting bits. llvm-svn: 357351	2019-03-30 17:12:29 +00:00
Alex Bradbury	b5498cbf64	[RISCV] Add RV64 CHECK lines to test/CodeGen/RISCV/vararg.ll and prepare for hard float tests vararg.ll previously missed RV64 tests. This patch also prepares for using vararg.ll to test handling of varargs for the ilp32f/ilp32d/lp64f/lp64d hard float ABIs. In these ABIs, varargs are passed as in either the ilp32 or lp64 ABI. Due to some slight codegen differences, different check lines are needed for when RV32D is enabled. llvm-svn: 357350	2019-03-30 15:53:38 +00:00
Simon Pilgrim	cfdf09ba7d	[X86][SSE] Add PAVG test case from PR41316 llvm-svn: 357346	2019-03-30 13:53:11 +00:00
Heejin Ahn	c4ac74fb49	[WebAssembly] Fix unwind destination mismatches in CFG stackify Summary: Linearing the control flow by placing `try`/`end_try` markers can create mismatches in unwind destinations. This patch resolves these mismatches by wrapping those instructions with an incorrect unwind destination with a nested `try`/`catch`/`end_try` and branching to the right destination within the new catch block. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, chrib, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D48345 llvm-svn: 357343	2019-03-30 11:04:48 +00:00
Heejin Ahn	e9fd9073e4	[WebAssembly] Run ExplicitLocals pass after CFGStackify Summary: While this does not change any final output, this will greatly simplify ixing unwind destination mismatches in CFGStackify (D48345), because we have to create some new registers there. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59652 llvm-svn: 357342	2019-03-30 09:29:57 +00:00
Alex Bradbury	9681b01c21	[RISCV] Add DAGCombine for (SplitF64 (ConstantFP x)) The SplitF64 node is used on RV32D to convert an f64 directly to a pair of i32 (necessary as bitcasting to i64 isn't legal). When performed on a ConstantFP, this will result in a FP load from the constant pool followed by a store to the stack and two integer loads from the stack (necessary as there is no way to directly move between f64 FPRs and i32 GPRs on RV32D). It's always cheaper to just materialise integers for the lo and hi parts of the FP constant, so do that instead. llvm-svn: 357341	2019-03-30 09:15:47 +00:00
Alex Bradbury	98b8ecde64	[RISCV][NFC] Remove floating point operations from test/CodeGen/RISCV/vararg.ll This minimises differences in output when compiling with hardware floating point support, which will be done in a future patch (to demonstrate the same vararg calling convention is used). llvm-svn: 357339	2019-03-30 05:24:42 +00:00
Heejin Ahn	7e7aad1510	[WebAssembly] Optimize the number of routing blocks in FixIrreducibleCFG Summary: Currently we create a routing block to the dispatch block for every predecessor of every entry. So the total number of routing blocks created will be (# of preds) * (# of entries). But we don't need to do this: we need at most 2 routing blocks per loop entry, one for when the predecessor is inside the loop and one for it is outside the loop. (We can't merge these into one because this will creates another loop cycle between blocks inside and blocks outside) This patch fixes this and creates at most 2 routing blocks per entry. This also renames variable `Split` to `Routing`, which I think is a bit clearer. Reviewers: kripken Subscribers: sunfish, dschuff, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59462 llvm-svn: 357337	2019-03-30 01:31:11 +00:00
Thomas Lively	5f0c4c67bb	[WebAssembly] Add mutable globals feature Summary: This feature is not actually used for anything in the WebAssembly backend, but adding it allows users to get it into the target features sections of their objects, which makes these objects future-compatible. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jdoerfert, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60013 llvm-svn: 357321	2019-03-29 22:00:18 +00:00
Sanjoy Das	32fd32bc6f	[SCEV] Check the cache in get{S\|U}MaxExpr before doing any work Summary: This lets us avoid e.g. checking if A >=s B in getSMaxExpr(A, B) if we've already established that (A smax B) is the best we can do. Fixes PR41225. Reviewers: asbirlea Subscribers: mcrosier, jlebar, bixia, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60010 llvm-svn: 357320	2019-03-29 22:00:12 +00:00
Alina Sbirlea	f085cc5aa7	[MemorySSA] Limit clobber walks. Summary: This patch limits all getClobberingMemoryAccess() walks to MaxCheckLimit. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59569 llvm-svn: 357319	2019-03-29 21:56:09 +00:00
Jessica Paquette	d3ffd47df9	[GlobalISel][AArch64] Add isel support for G_INSERT_VECTOR_ELT on v2s32s This adds support for v2s32 vector inserts, and updates the selection + regbankselect tests for G_INSERT_VECTOR_ELT. Differential Revision: https://reviews.llvm.org/D59910 llvm-svn: 357318	2019-03-29 21:39:36 +00:00
Amara Emerson	d413f41de6	[X86] When using Win64 ABI, exit with error if SSE is disabled for varargs We need XMM registers to handle varargs with the Win64 ABI. Before we would silently generate bad code resulting in an assertion failure elsewhere in the backend. llvm-svn: 357317	2019-03-29 21:30:51 +00:00
Alina Sbirlea	e589067e61	[MemorySSA] Don't optimize incomplete phis. Summary: MemoryPhis cannot be optimized out until they are complete. Resolves PR41254. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59966 llvm-svn: 357315	2019-03-29 21:16:31 +00:00
Heejin Ahn	67f74aceab	[WebAssembly] Handle END_LOOP in unreachable BB in CFGStackify Summary: This fixes crashes when a BB in which an END_LOOP is to be placed is unreachable and does not have any predecessors. Fixes PR41307. Reviewers: dschuff Subscribers: yurydelendik, sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60004 llvm-svn: 357303	2019-03-29 19:36:51 +00:00
Matt Arsenault	055e4dce45	AMDGPU: Remove dx10-clamp from subtarget features Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302	2019-03-29 19:14:54 +00:00
Simon Pilgrim	d395bc1cc2	[Hexagon] Remove fcmp undef from reduced tests Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC) Approved by @kparzysz (Krzysztof Parzyszek) llvm-svn: 357301	2019-03-29 19:14:52 +00:00
Craig Topper	103fbbbfca	[X86] Add test cases showing failure to use RMW form of negate when only flags are used. NFC llvm-svn: 357300	2019-03-29 19:09:37 +00:00
Simon Pilgrim	759cbee744	[SystemZ] Regenerate double constant comparison test Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357295	2019-03-29 18:23:08 +00:00
Simon Pilgrim	05e2621342	[MIPS] Regenerate double constant comparison test Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357294	2019-03-29 18:22:18 +00:00
Simon Pilgrim	a3fb3d5583	[ARM] Regenerate execute-only float comparison tests Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357293	2019-03-29 18:21:19 +00:00
Sanjay Patel	01c07b1a45	[InstCombine] autogenerate complete checks; NFC llvm-svn: 357291	2019-03-29 17:51:39 +00:00
Sanjay Patel	2bff8b4272	[InstCombine] regenerate test checks; NFC llvm-svn: 357288	2019-03-29 17:47:51 +00:00
Simon Pilgrim	dee8a14389	[AArch64] Regenerate half precision tests Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357286	2019-03-29 17:46:06 +00:00
Nirav Dave	fe59e14031	[DAGCombine] Prune unnused nodes. Summary: Nodes that have no uses are eventually pruned when they are selected from the worklist. Record nodes newly added to the worklist or DAG and perform pruning after every combine attempt. Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight Reviewed By: jyknight Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58070 llvm-svn: 357283	2019-03-29 17:35:56 +00:00
Simon Pilgrim	b4b98a528b	[ARM] Regenerate vector comparison tests Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357281	2019-03-29 17:35:11 +00:00
Simon Pilgrim	4e00a93558	[X86] Fix some tests using fcmp with undef arguments Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357278	2019-03-29 17:20:27 +00:00
Jordan Rupprecht	871baa2551	[llvm-readobj] Add some generic notes (e.g. NT_VERSION) Summary: Support reading notes that don't have a standard note name. Reviewers: MaskRay Reviewed By: MaskRay Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59969 llvm-svn: 357271	2019-03-29 16:48:19 +00:00
Jordan Rupprecht	342aaa14b1	[llvm-readelf] Allow prefix flags for -p and -x Summary: This allows syntax like `llvm-readelf -p.data1 -x.data2`. Reviewers: jhenderson Reviewed By: jhenderson Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59965 llvm-svn: 357270	2019-03-29 16:43:13 +00:00
Simon Pilgrim	6a75c36ea9	[SLP] Add support for commutative icmp/fcmp predicates For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands. This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?). Differential Revision: https://reviews.llvm.org/D59992 llvm-svn: 357266	2019-03-29 15:28:25 +00:00
Simon Atanasyan	f26f56d6d3	[mips] Fix lowering a signed immediate for *.d MSA instructions The `lowerMSASplatImm` function zero-extends `i32` immediates while building constant. If target type is `i64`, negative immediate loses the sign. As a result, for example `__builtin_msa_ldi_d(-1)` lowered to series of instruction loads incorrect value 0xffffffff to the `$w0` register instead of single `ldi.d $w0, -1` instruction. The fix zero-extends unsigned immediates and signed-extend signed immediates. Differential Revision: http://reviews.llvm.org/D59884 llvm-svn: 357264	2019-03-29 15:15:22 +00:00
Dmitry Preobrazhensky	d6827ce3a3	[AMDGPU][MC] Corrected conversion rules for inlinable constants to match rules for literals See bug 40806: https://bugs.llvm.org/show_bug.cgi?id=40806 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59786 llvm-svn: 357262	2019-03-29 14:50:20 +00:00
Sanjay Patel	12685d0f7c	[DAGCombiner] simplify shuffle of shuffle After investigating the examples from D59777 targeting an SSE4.1 machine, it looks like a very different problem due to how we map illegal types (256-bit in these cases). We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand. We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that generality means it is limited to patterns with a one-use constraint, and the examples here have 2 uses. We don't need any uses or legality limitations for a simplification (no new value is created). It looks like we miss this pattern in IR too. In one of the zext examples here, we have shuffle masks like this: Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7> Shuf = vector_shuffle<4,u,6,7,u,u,u,u> ...so that's moving the high half of the 1st vector into the low half. But the high half of the 1st vector is already identical to the low half. Differential Revision: https://reviews.llvm.org/D59961 llvm-svn: 357258	2019-03-29 14:20:38 +00:00
Nirav Dave	9259de217e	[DAGCombine] Improve Lifetime node chains. Improve both start and end lifetime nodes chain dependencies. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59795 llvm-svn: 357256	2019-03-29 14:09:47 +00:00
Sanjay Patel	665a385035	[DAGCombiner] fold sext into decrement This is a sibling to rL357178 that I noticed we'd hit if we chose an alternate transform in D59818. %z = zext i8 %x to i32 %dec = add i32 %z, -1 %r = sext i32 %dec to i64 => %z2 = zext i8 %x to i64 %r = add i64 %z2, -1 https://rise4fun.com/Alive/kPP The x86 vector diffs show a slight regression, so there's a chance that we should limit this and the previous transform to scalars. But given that we allowed vectors before, I'm matching that behavior here. We should change both transforms together if that's the right thing to do. llvm-svn: 357254	2019-03-29 13:49:08 +00:00
Hans Wennborg	800b12f90a	Switch lowering: exploit unreachable fall-through when lowering case range cluster In the example below, we would previously emit two range checks, one for cases 1--3 and one for 4--6. This patch makes us exploit the fact that the fall-through is unreachable and only one range check is necessary. switch i32 %i, label %default [ i32 1, label %bb1 i32 2, label %bb1 i32 3, label %bb1 i32 4, label %bb2 i32 5, label %bb2 i32 6, label %bb2 ] default: unreachable llvm-svn: 357252	2019-03-29 13:40:05 +00:00
Sanjay Patel	881bcbe094	[x86] add tests for decrement+sext; NFC llvm-svn: 357251	2019-03-29 13:34:48 +00:00
Dmitry Preobrazhensky	7f33574be3	[AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59878 llvm-svn: 357249	2019-03-29 12:16:04 +00:00
Andrea Di Biagio	e074ac60b4	[MCA] Add an experimental MicroOpQueue stage. This patch adds an experimental stage named MicroOpQueueStage. MicroOpQueueStage can be used to simulate a hardware micro-op queue (basically, a decoupling queue between 'decode' and 'dispatch'). Users can specify a queue size, as well as a optional MaxIPC (which - in the absence of a "Decoders" stage - can be used to simulate a different throughput from the decoders). This stage is added to the default pipeline between the EntryStage and the DispatchStage only if PipelineOption::MicroOpQueue is different than zero. By default, llvm-mca sets PipelineOption::MicroOpQueue to the value of hidden flag -micro-op-queue-size. Throughput from the decoder can be simulated via another hidden flag named -decoder-throughput. That flag allows us to quickly experiment with different frontend throughputs. For targets that declare a loop buffer, flag -decoder-throughput allows users to do multiple runs, each time simulating a different throughput from the decoders. This stage can/will be extended in future. For example, we could add a "buffer full" event to notify bottlenecks caused by backpressure. flag -decoder-throughput would probably go away if in future we delegate to another stage (DecoderStage?) the simulation of a (potentially variable) throughput from the decoders. For now, flag -decoder-throughput is "good enough" to run some simple experiments. Differential Revision: https://reviews.llvm.org/D59928 llvm-svn: 357248	2019-03-29 12:15:37 +00:00
Konstantin Zhuravlyov	2b766ed774	AMDGPU: Make sram-ecc off by default for Vega20 Differential Revision: https://reviews.llvm.org/D59718 llvm-svn: 357247	2019-03-29 12:04:18 +00:00
James Henderson	814ab373ac	[llvm-readelf]Merge dynamic and static relocation printing to avoid code duplication The majority of the printRelocation and printDynamicRelocation functions were identical. This patch factors this all out into a new function. There are a couple of minor differences to do with printing of symbols without names, but I think these are harmless, and in some cases a small improvement. Reviewed by: grimar, rupprecht, Higuoxing Differential Revision: https://reviews.llvm.org/D59823 llvm-svn: 357246	2019-03-29 11:47:19 +00:00
Simon Pilgrim	aeaf7fcdde	[X86] Add X86TargetLowering::isCommutativeBinOp override. We currently just have test coverage for PMULUDQ - will add more in the future. llvm-svn: 357244	2019-03-29 11:25:58 +00:00
Simon Pilgrim	62f0d1650a	[SLP] Add support for swapping icmp/fcmp predicates to permit vectorization We should be able to match elements with the swapped predicate as well - as long as we commute the source operands. Differential Revision: https://reviews.llvm.org/D59956 llvm-svn: 357243	2019-03-29 10:41:00 +00:00
Kang Zhang	05f78b35ae	[PowerPC] Add the support for __builtin_setrnd() Summary: PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode. double __builtin_setrnd(int mode); The effective values for mode are: 0 - round to nearest 1 - round to zero 2 - round to +infinity 3 - round to -infinity Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2). Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D59405 llvm-svn: 357241	2019-03-29 08:45:24 +00:00
Matt Arsenault	5fddf09187	AMDGPU/GlobalISel: Insert waterfall loop for vector indexing The register index can only really be an SGPR. Lie that a VGPR index is legal, and then rewrite the instruction in a waterfall loop to handle the index. llvm-svn: 357235	2019-03-29 03:54:56 +00:00
Zi Xuan Wu	1445b77e8c	[PowerPC] Strength reduction of multiply by a constant by shift and add/sub in place A shift and add/sub sequence combination is faster in place of a multiply by constant. Because the cycle or latency of multiply is not huge, we only consider such following worthy patterns. ``` (mul x, 2^N + 1) => (add (shl x, N), x) (mul x, -(2^N + 1)) => -(add (shl x, N), x) (mul x, 2^N - 1) => (sub (shl x, N), x) (mul x, -(2^N - 1)) => (sub x, (shl x, N)) ``` And the cycles or latency is subtarget-dependent so that we need consider the subtarget to determine to do or not do such transformation. Also data type is considered for different cycles or latency to do multiply. Differential Revision: https://reviews.llvm.org/D58950 llvm-svn: 357233	2019-03-29 03:08:39 +00:00
Thomas Lively	3f34e1b883	[WebAssembly] Merge used feature sets, update atomics linkage policy Summary: It does not currently make sense to use WebAssembly features in some functions but not others, so this CL adds an IR pass that takes the union of all used feature sets and applies it to each function in the module. This allows us to prevent atomics from being lowered away if some function has opted in to using them. When atomics is not enabled anywhere, we detect whether there exists any atomic operations or thread local storage that would be stripped and disallow linking with objects that contain atomics if and only if atomics or tls are stripped. When atomics is enabled, mark it as used but do not require it of other objects in the link. These changes allow libraries that do not use atomics to be built once and linked into both single-threaded and multithreaded binaries. Reviewers: aheejin, sbc100, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59625 llvm-svn: 357226	2019-03-29 00:14:01 +00:00
Jordan Rupprecht	1dc28b6d2b	[llvm-readobj] Fix formatting of unknown note types llvm-svn: 357221	2019-03-28 23:08:06 +00:00
Puyan Lotfi	6c82695753	[yaml2obj] Fixing opening empty yaml files. Essentially echo "" \| yaml2obj crashes. This patch attempts to trim whitespace and determine if the yaml string in the file is empty or not. If the input is empty then it will not properly print out an error message and return an error code. Differential Revision: https://reviews.llvm.org/D59964 A test/tools/yaml2obj/empty.yaml M tools/yaml2obj/yaml2obj.cpp llvm-svn: 357219	2019-03-28 22:55:08 +00:00
Florian Hahn	45682fd633	[LSR] Fix signed overflow in GenerateCrossUseConstantOffsets. For the attached test case, unchecked addition of immediate starts and ends overflows, as they can be arbitrary i64 constants. Proof: https://rise4fun.com/Alive/Plqc Reviewers: qcolombet, gilr, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59218 llvm-svn: 357217	2019-03-28 22:17:29 +00:00
Yonghong Song	360a4e2ca6	[BPF] add proper multi-dimensional array support For multi-dimensional array like below int a[2][3]; the previous implementation generates BTF_KIND_ARRAY type like below: . element_type: int . index_type: unsigned int . number of elements: 6 This is not the best way to represent arrays, esp., when converting BTF back to headers and users will see int a[6]; instead. This patch generates proper support for multi-dimensional arrays. For "int a[2][3]", the two BTF_KIND_ARRAY types will be generated: Type #n: . element_type: int . index_type: unsigned int . number of elements: 3 Type #(n+1): . element_type: #n . index_type: unsigned int . number of elements: 2 The linux kernel already supports such a multi-dimensional array representation properly. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59943 llvm-svn: 357215	2019-03-28 21:59:49 +00:00
Eli Friedman	3dd72ea810	[MC] Fix floating-point literal lexing. This patch has three related fixes to improve float literal lexing: 1. Make AsmLexer::LexDigit handle floats without a decimal point more consistently. 2. Make AsmLexer::LexFloatLiteral print an error for floats which are apparently missing an "e". 3. Make APFloat::convertFromString use binutils-compatible exponent parsing. Together, this fixes some cases where a float would be incorrectly rejected, fixes some cases where the compiler would crash, and improves diagnostics in some cases. Patch by Brandon Jones. Differential Revision: https://reviews.llvm.org/D57321 llvm-svn: 357214	2019-03-28 21:12:28 +00:00
Eli Friedman	96f295e23b	[InterleavedAccessPass] Don't increase the number of bytes loaded. Even if the interleaving transform would otherwise be legal, we shouldn't introduce an interleaved load that is wider than the original load: it might have undefined behavior. It might be possible to perform some sort of mask-narrowing transform in some cases (using a narrower interleaved load, then extending the results using shufflevectors). But I haven't tried to implement that, at least for now. Fixes https://bugs.llvm.org/show_bug.cgi?id=41245 . Differential Revision: https://reviews.llvm.org/D59954 llvm-svn: 357212	2019-03-28 20:44:50 +00:00
Simon Pilgrim	ceb3de5d25	[SLP][X86] Add tests showing failure to commute icmp/fcmp by swapping predicate By swapping icmp/fcmp predicates we can commute their operands to improve vectorization llvm-svn: 357204	2019-03-28 19:13:38 +00:00
Simon Pilgrim	66b5e322fc	[SLP][X86] Add tests showing failure to commute icmp/fcmp operands Some predicates are fully commutative - we should be able to easily commute their operands to improve vectorization llvm-svn: 357202	2019-03-28 19:03:53 +00:00
Craig Topper	c25c9b4d16	[X86] Teach the isel optimization for (x << C1) op C2 to (x op (C2>>C1)) << C1 to consider cases where C2>>C1 can fit an unsigned 32-bit immediate For 64-bit operations we should consider if the immediate can be made to fit in an unsigned 32-bits immedate. For OR/XOR this allows us to load the immediate with MOV32ri instead of movabsq. For AND this allows us to fold the immediate. Differential Revision: https://reviews.llvm.org/D59867 llvm-svn: 357196	2019-03-28 18:05:37 +00:00
Petar Avramovic	1af05df3de	[MIPS GlobalISel] Select float constants Select 32 and 64 bit float constants for MIPS32. Differential Revision: https://reviews.llvm.org/D59933 llvm-svn: 357183	2019-03-28 16:58:12 +00:00
Sanjay Patel	ffa8d3def7	[DAGCombiner] fold sext into negation As noted in D59818: %z = zext i8 %x to i32 %neg = sub i32 0, %z %r = sext i32 %neg to i64 => %z2 = zext i8 %x to i64 %r = sub i64 0, %z2 https://rise4fun.com/Alive/KzSR llvm-svn: 357178	2019-03-28 15:46:02 +00:00
Sanjay Patel	e781528278	[x86] add vector test for sext of negate; NFC llvm-svn: 357177	2019-03-28 15:30:09 +00:00
Sanjay Patel	5bbf6f0bd8	[x86] avoid cmov in movmsk reduction This is probably the least important of our movmsk problems, but I'm starting at the bottom to reduce distractions. We were creating a select_cc which bypasses the select and bitmask codegen optimizations that we have now. If we produce a compare+negate instead, we allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov. There's no partial register update danger in these sequences because we always produce the zero-register xor ahead of the 'set' if needed. There seems to be a missing fold for sext of a bool bit here: negl %ecx movslq %ecx, %rax ...but that's an independent transform. Differential Revision: https://reviews.llvm.org/D59818 llvm-svn: 357172	2019-03-28 14:16:13 +00:00
Clement Courbet	699dc025a6	[X86MacroFusion] Handle branch fusion (AMD CPUs). Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171	2019-03-28 14:12:46 +00:00
Matt Arsenault	a353fd572a	AMDGPU: Make exec mask optimzations more resistant to block splits Also improve the check for SALU instructions to also ignore implicit_def and other fake instructions. llvm-svn: 357170	2019-03-28 14:01:39 +00:00
Roman Lebedev	c325be6cef	[X86] AMD Piledriver (BdVer2): fine-tune some latencies Based on llvm-exegesis measurements. Now that llvm-exegesis is ~2 magnitudes faster, and is a bit smarter, it is now possible to continue cleanup of the scheduler model. With this, there are no more latency inconsistencies for the opcodes that produce stable measurements, and only a few inconsistencies for unstable measurements (MMX_* opcodes, opcodes that llvm-exegesis measures by chaining - CMP, TEST, BT, SETcc, CVT, MOV, etc.) llvm-svn: 357169	2019-03-28 13:40:34 +00:00
Simon Pilgrim	38a0616c1d	[DAGCombiner] Fold truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y)) If scalar truncates are free, attempt to pre-truncate build_vectors source operands. Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering. Differential Revision: https://reviews.llvm.org/D59654 llvm-svn: 357161	2019-03-28 11:34:21 +00:00
Diana Picus	13ef0c5309	[ARM GlobalISel] Run regbankselect test for Thumb. NFCI This should just work, since ARM mode and Thumb2 mode are at the same level of support now and should map the same to GPR and FPR. llvm-svn: 357159	2019-03-28 10:57:29 +00:00
George Rimar	4111299584	[yaml2obj][obj2yaml] - Teach yaml2obj/obj2yaml tools about STB_GNU_UNIQUE symbols. yaml2obj/obj2yaml does not support the symbols with STB_GNU_UNIQUE yet. Currently, obj2yaml fails with llvm_unreachable when met such a symbol. I faced it when investigated the https://bugs.llvm.org/show_bug.cgi?id=41196. Differential revision: https://reviews.llvm.org/D59875 llvm-svn: 357158	2019-03-28 10:52:14 +00:00
Pierre Gousseau	a833c2bd3e	[asan] Add options -asan-detect-invalid-pointer-cmp and -asan-detect-invalid-pointer-sub options. This is in preparation to a driver patch to add gcc 8's -fsanitize=pointer-compare and -fsanitize=pointer-subtract. Disabled by default as this is still an experimental feature. Reviewed By: morehouse, vitalybuka Differential Revision: https://reviews.llvm.org/D59220 llvm-svn: 357157	2019-03-28 10:51:24 +00:00
Florian Hahn	e21ed594d8	[VPlan] Determine Vector Width programmatically. With this change, the VPlan native path is triggered with the directive: #pragma clang loop vectorize(enable) There is no need to specify the vectorize_width(N) clause. Patch by Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D57598 llvm-svn: 357156	2019-03-28 10:37:12 +00:00
Simon Pilgrim	22be913ac0	[X85][AVX] Add missing vXi16 broadcast fold patterns Now that D59484 has landed its easier to add these. Added missing AVX512BW v32i16 equivalents while I was at it. llvm-svn: 357155	2019-03-28 10:25:13 +00:00
Diana Picus	52495c472f	[ARM GlobalISel] Fix G_STORE with s1 G_STORE for 1-bit values uses a STRBi12, which stores the whole byte. Zero out the undefined bits before writing. llvm-svn: 357154	2019-03-28 09:09:36 +00:00
Diana Picus	4d512df300	[ARM GlobalISel] Fix selection of G_SELECT G_SELECT uses a 1-bit scalar for the condition, and is currently implemented with a plain CMPri against 0. This means that values such as 0x1110 are interpreted as true, when instead the higher bits should be treated as undefined and therefore ignored. Replace the CMPri with a TSTri against 0x1, which performs an implicit AND, yielding the expected result. llvm-svn: 357153	2019-03-28 09:09:27 +00:00
Roman Lebedev	c2423fe689	[llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880) Summary: This is an alternative to D59539. Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`. Let's suppose we are using `-analysis-clustering-epsilon=0.5`. By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster. Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster. Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster. So all these points ended up in the same cluster. This may or may not be a correct implementation of dbscan clustering algorithm. But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data. Let's suppose all those opcodes are currently in the same sched cluster. If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter the LLVM values this cluster will never match the LLVM values, and thus this cluster will always be displayed as inconsistent. The solution is obviously to split off some of these opcodes into different sched cluster. But how do i do that? Out of 4 opcodes displayed in the inconsistency report, which ones are the "bad ones"? Which ones are the most different from the checked-in data? I'd need to go in to the `.yaml` and look it up manually. The trivial solution is to, when creating clusters, don't use the full dbscan algorithm, but instead "pick some unclustered point, pick all unclustered points that are it's neighbor, put them all into a new cluster, repeat". And just so as it happens, we can arrive at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step. But that won't work well once we teach analyze mode to operate in on-1D mode (i.e. on more than a single measurement type at a time), because the clustering would depend on the order of the measurements. Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster. And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster, and if they are not, the cluster (==opcode) is unstable. This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model.. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 \| PR40880 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59820 llvm-svn: 357152	2019-03-28 08:55:01 +00:00
Piotr Sobczak	f896785cb7	[SelectionDAG] Add 2 tests for selection across basic blocks Summary: Add tests for selection across basic block boundary: * one test containing a buffer load, where part of the offset computation is placed in the predecessor of the load * similar test, but containing two buffer loads and shared computations Please note that the behaviour being tested will be updated in a subsequent commit. This commit was extracted from https://reviews.llvm.org/D59535. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: jvesely, nhaehnle, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59690 llvm-svn: 357149	2019-03-28 07:06:26 +00:00
Eric Christopher	0a2d0c1f5f	Add reproduction instructions to llvm-objdump's embedded source test. llvm-svn: 357142	2019-03-28 01:56:16 +00:00
Chandler Carruth	923ff550b9	[NewPM] Fix a nasty bug with analysis invalidation in the new PM. The issue here is that we actually allow CGSCC passes to mutate IR (and therefore invalidate analyses) outside of the current SCC. At a minimum, we need to support mutating parent and ancestor SCCs to support the ArgumentPromotion pass which rewrites all calls to a function. However, the analysis invalidation infrastructure is heavily based around not needing to invalidate the same IR-unit at multiple levels. With Loop passes for example, they don't invalidate other Loops. So we need to customize how we handle CGSCC invalidation. Doing this without gratuitously re-running analyses is even harder. I've avoided most of these by using an out-of-band preserved set to accumulate the cross-SCC invalidation, but it still isn't perfect in the case of re-visiting the same SCC repeatedly but it coming off the worklist. Unclear how important this use case really is, but I wanted to call it out. Another wrinkle is that in order for this to successfully propagate to function analyses, we have to make sure we have a proxy from the SCC to the Function level. That requires pre-creating the necessary proxy. The motivating test case now works cleanly and is added for ArgumentPromotion. Thanks for the review from Philip and Wei! Differential Revision: https://reviews.llvm.org/D59869 llvm-svn: 357137	2019-03-28 00:51:36 +00:00
Craig Topper	929932954d	[X86] Add test cases from PR27202. llvm-svn: 357132	2019-03-27 23:12:19 +00:00
Sanjay Patel	1df0bb6264	[x86] improve AVX lowering of vector zext If we know the 2 halves of an oversized zext-in-reg are the same, don't create those halves independently. I tried several different approaches to fold this, but it's difficult to get right during legalization. In the default path, we are creating a generic shuffle that looks like an unpack high, but it can get transformed into a different mask (a blend), so it's not straightforward to match that. If we try to fold after it actually becomes an X86ISD::UNPCKH node, we can't be sure what the operand node is - it might be a generic shuffle, or it could be some x86-specific op. From the test output, we should be doing something like this for SSE4.1 as well, but I'd rather leave that as a follow-up since it involves changing lowering actions. Differential Revision: https://reviews.llvm.org/D59777 llvm-svn: 357129	2019-03-27 22:42:11 +00:00

1 2 3 4 5 ...

60512 Commits