llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	9ffd8b5a6f	AMDGPU/GlobalISel: Remove unnecesssary REQUIREs This has been a mandatory part of the build for a while. llvm-svn: 361956	2019-05-29 13:14:35 +00:00
Peter Collingbourne	31fda09b2d	Add IR support, ELF section and user documentation for partitioning feature. The partitioning feature was proposed here: http://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html This is mostly just documentation. The feature itself will be contributed in subsequent patches. Differential Revision: https://reviews.llvm.org/D60242 llvm-svn: 361923	2019-05-29 03:29:01 +00:00
Sanjay Patel	19f703e0d7	[AArch64] auto-generate complete test checks; NFC llvm-svn: 361908	2019-05-29 01:37:44 +00:00
Sanjay Patel	860736cc3c	[AArch64] auto-generate complete test checks; NFC llvm-svn: 361906	2019-05-29 01:35:10 +00:00
Quentin Colombet	a6f57ad2c9	[RegUsageInfoCollector] Don't mark as saved registers that don't have subregister lanes To determine the list of clobbered registers, the RegUsageInfoCollector pass uses the list of callee saved registers provided by the target and then augments it with the list of registers which have all their subregisters saved. It then basically does the difference between all the registers and the saved registers to come up with what is clobbered (plus it checks that the register is defined within that functions). The patch fixes a bug where when register does not have any subregister lane, hence when checking if any of its subregister are not saved, we would find none and think the register is saved as well. That's obviously wrong. The code was actually kind of checking for something like that with the CoveredBySubRegs bit. What this bit says is that a register is completely covered by its subregisters. We required that this bit was set, to check that a register was saved by its subregister lanes, since without this bit, we potentially would miss to check some part of the register. However, this bit is used de facto on registers that don't have any subregisters (e.g., on ARM) and the code was not prepared for that. This patch fixes this by checking that a register has subregisters before declaring it saved when none of its lanes are modified. llvm-svn: 361901	2019-05-28 23:43:12 +00:00
Jessica Paquette	b73ea75b38	[AArch64][GlobalISel] Select FCMPSri/FCMPDri when comparing against 0.0 Add support for selecting FCMPSri and FCMPDri when comparing against 0.0, and factor out opcode selection for G_FCMP into its own function. Add a test to show that we don't do this with other immediates. Differential Revision: https://reviews.llvm.org/D62539 llvm-svn: 361888	2019-05-28 22:52:49 +00:00
Heejin Ahn	5514658591	[WebAssembly] Support for atomic fences Summary: This adds support for translation of LLVM IR fence instruction. We convert a singlethread fence to a pseudo compiler barrier which becomes 0 instructions in final binary, and a thread fence to an idempotent atomicrmw instruction to a memory address. Reviewers: dschuff, jfb, sunfish, tlively Subscribers: sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D50277 llvm-svn: 361884	2019-05-28 22:09:12 +00:00
Konstantin Zhuravlyov	fe23ed2c68	AMDGPU: Temporary drop s_mul_hi_i/u32 patterns It introduces performance regressions in several applications. This has already been submitted downstream. llvm-svn: 361879	2019-05-28 21:18:34 +00:00
Adhemerval Zanella	34d8daae53	[AArch64] Handle ISD::LRINT and ISD::LLRINT This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus fcvtzs. It currently only handles the scalar version. Reviewed By: SjoerdMeijer, mstorsjo Differential Revision: https://reviews.llvm.org/D62018 llvm-svn: 361877	2019-05-28 21:04:29 +00:00
Adhemerval Zanella	6d7bf5e8df	[CodeGen] Add lrint/llrint builtins This patch add the ISD::LRINT and ISD::LLRINT along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lrint/llrint generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D62017 llvm-svn: 361875	2019-05-28 20:47:44 +00:00
Roman Lebedev	dfc34f0211	[DAGCombine] (x - C) - y -> (x - y) - C fold. Try 2 Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq This is a recommit, originally committed in rL361856, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 361874	2019-05-28 20:40:10 +00:00
Roman Lebedev	d485c6bc9f	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 2 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361855, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361873	2019-05-28 20:40:03 +00:00
Roman Lebedev	96c9986199	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 2 Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh This is a recommit, originally committed in rL361853, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 361872	2019-05-28 20:39:55 +00:00
Roman Lebedev	2feb7e56e2	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2 Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361871	2019-05-28 20:39:39 +00:00
Michael Liao	5fc1dfa784	[AMDGPU] Correct the handling of inlineasm output registers. Summary: - There's a regression due to the cross-block RC assignment. Use the proper way to derive the output register RC in inline asm. Reviewers: rampitec, alex-t Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62537 llvm-svn: 361868	2019-05-28 19:37:09 +00:00
Roman Lebedev	272d70c366	Revert DAGCombine "hoist binop with const" folds Appear to introduce test-suite compile-time hang. http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825 This reverts r361852,r361853,r361854,r361855,r361856 llvm-svn: 361865	2019-05-28 19:04:21 +00:00
Roman Lebedev	caeec8501e	[NFC][MIPS] Autogenerater madd-msub.ll test Being affected by upcoming patch llvm-svn: 361860	2019-05-28 18:31:36 +00:00
Roman Lebedev	7669665432	[DAGCombine] (x - C) - y -> (x - y) - C fold Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 361856	2019-05-28 17:54:21 +00:00
Roman Lebedev	8c9b3e4e4a	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361855	2019-05-28 17:54:13 +00:00
Roman Lebedev	6a24c9b9ab	[DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold Summary: Only vector tests are being affected here, since subtraction by scalar constant is rewritten as addition by negated constant. No surprising test changes. https://rise4fun.com/Alive/pbT Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62257 llvm-svn: 361854	2019-05-28 17:54:04 +00:00
Roman Lebedev	1499f65ac1	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 361853	2019-05-28 17:53:54 +00:00
Roman Lebedev	19f51ec04a	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361852	2019-05-28 17:53:43 +00:00
Sanjay Patel	f7980e727f	Revert "[x86] split 256-bit store of concatenated vectors" This reverts commit `d5a8637072`. Most likely suspect for this bot failure: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684 llvm-svn: 361850	2019-05-28 17:37:58 +00:00
Matt Arsenault	24e80b8d04	AMDGPU: Don't enable all lanes with non-CSR VGPR spills If the only VGPRs used for SGPR spilling were not CSRs, this was enabling all laness and immediately restoring exec. This is the usual situation in leaf functions. llvm-svn: 361848	2019-05-28 16:46:02 +00:00
Michael Liao	7166843f1e	[AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register. Summary: - Don't treat the use of a scalar register as `vreg_1` an VGPR usage. Otherwise, that promotes that scalar register into vector one, which breaks the assumption that scalar register holds the lane mask. - The issue is triggered in a complicated case, where if the uses of that (lane mask) scalar register is legalized firstly before its definition, e.g., due to the mismatch block placement and its topological order or loop. In that cases, the legalization of PHI introduces the use of that scalar register as `vreg_1`. Reviewers: rampitec, nhaehnle, arsenm, alex-t Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62492 llvm-svn: 361847	2019-05-28 16:29:39 +00:00
Simon Tatham	760df47b77	[ARM] Replace fp-only-sp and d16 with fp64 and d32. Those two subtarget features were awkward because their semantics are reversed: each one indicates the _lack_ of support for something in the architecture, rather than the presence. As a consequence, you don't get the behavior you want if you combine two sets of feature bits. Each SubtargetFeature for an FP architecture version now comes in four versions, one for each combination of those options. So you can still say (for example) '+vfp2' in a feature string and it will mean what it's always meant, but there's a new string '+vfp2d16sp' meaning the version without those extra options. A lot of this change is just mechanically replacing positive checks for the old features with negative checks for the new ones. But one more interesting change is that I've rearranged getFPUFeatures() so that the main FPU feature is appended to the output list before rather than after the features derived from the Restriction field, so that -fp64 and -d32 can override defaults added by the main feature. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60691 llvm-svn: 361845	2019-05-28 16:13:20 +00:00
David Greene	561fcc0d63	[X86-64] Fix 256-bit SET0 lowering for non-VLX targets If we don't have VLX then 256-bit SET0 should be lowered to VPXOR with ZMM registers. This restores functionality accidentally removed by r309926. Differential Revision: https://reviews.llvm.org/D62415 llvm-svn: 361843	2019-05-28 15:37:01 +00:00
Kevin P. Neal	71f8f745b4	Revert 361827. It broke the bots. llvm-svn: 361831	2019-05-28 14:37:45 +00:00
Kevin P. Neal	6d458fa866	Add constrained intrinsic tests for powerpc64 and powerpc64le. Submitted by: Drew Wock Reviewed by: Hal Finkel Approved by: Hal Finkel Differential Revision: https://reviews.llvm.org/D62388 llvm-svn: 361827	2019-05-28 14:17:48 +00:00
Sanjay Patel	d5a8637072	[x86] split 256-bit store of concatenated vectors This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is the reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 361822	2019-05-28 13:54:17 +00:00
Matt Arsenault	d3ed418ad3	MIR: Fix printer crashing on dead CSR frame indexes llvm-svn: 361819	2019-05-28 13:08:31 +00:00
Sjoerd Meijer	c0f43bee37	Follow up of r361810: test case fix attempt for Windows builder llvm-svn: 361817	2019-05-28 13:04:47 +00:00
Sanjay Patel	6bf4ca9d2e	[x86] fix 256-bit vector store splitting to honor 'volatile' Forking this out of the discussion in D62498 (and assuming that will be committed later, so adding the helper function here). The LangRef says: "the backend should never split or merge target-legal volatile load/store instructions." Differential Revision: https://reviews.llvm.org/D62506 llvm-svn: 361815	2019-05-28 12:58:07 +00:00
Benjamin Kramer	57e267a2e9	[X86] Custom lower CONCAT_VECTORS of v2i1 The generic legalizer cannot handle this. Add an assert instead of silently miscompiling vectors with elements smaller than 8 bits. llvm-svn: 361814	2019-05-28 12:52:57 +00:00
Hans Wennborg	d936e40575	Re-commit r357452 (take 2): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)" This was reverted in r360086 as it was supected of causing mysterious test failures internally. However, it was never concluded that this patch was the root cause. > The code was previously checking that candidates for sinking had exactly > one use or were a store instruction (which can't have uses). This meant > we could sink call instructions only if they had a use. > > That limitation seemed a bit arbitrary, so this patch changes it to > "instruction has zero or one use" which seems more natural and removes > the need to special-case stores. > > Differential revision: https://reviews.llvm.org/D59936 llvm-svn: 361811	2019-05-28 12:19:38 +00:00
Sjoerd Meijer	4df2baadd2	[ARM] Use CHECK-NEXT in CodeGen/ARM/O3-pipeline.ll. NFC. Use CHECK-NEXT, like in other pipeline tests, so that we actually notice when the pipeline is changed. llvm-svn: 361810	2019-05-28 12:06:26 +00:00
Sanjay Patel	165663aeeb	[x86] add test to show volatile store splitting; NFC From the LangRef: "the backend should never split or merge target-legal volatile load/store instructions." See also: D62498 llvm-svn: 361785	2019-05-27 23:56:41 +00:00
Matt Arsenault	ca84c4be4b	RegAllocFast: Set MayLiveAcrossBlocks when allocating uses Setting mayLiveOut based only on use instructions after allocating the def block did not work if the use block was allocated before the def block, since the virtual register uses were already removed. Fixes bug 41973. llvm-svn: 361781	2019-05-27 20:37:31 +00:00
Michael Liao	9c70c574b4	[SelectionDAG] Enhance the simplification of `copyto` from `implicit-def`. Summary: - The current implementation simplifies the case where the source of `copyto` is `implicit-def`ed. However, it only works when that `implicit-def` is single-used since it detects that from `implicit-def` and cannot determine which destination vreg should be used if there are multiple uses. - This patch changes that detection when `copyto` is being emitted. If that `copyto`'s source is defined from `implicit-def`, it simplifies it. Hence, it works even that `implicit-def` is multi-used. - Except it simplifies the internal IR, it won't improve the quality of code generation. However, it helps to detect 'implicit-def` in a straight-forward manner in some passes, such as `si-i1-copies`. A test case is added. Reviewers: sunfish, nhaehnle Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62342 llvm-svn: 361777	2019-05-27 18:26:29 +00:00
Diana Picus	c675215f67	[ARM GlobalISel] Un-XFAIL some tests. NFC It turns out we support big endian now (probably since r332449, but I haven't bisected to confirm). llvm-svn: 361756	2019-05-27 10:32:34 +00:00
David L. Jones	0ff41b8a5a	Revert r361356: "[MIR] Add simple PRE pass to MachineCSE" This is problematic on buildbots, as discussed here: https://reviews.llvm.org/rL361356 It seems like the plan already was to revert, but that hasn't happened yet. llvm-svn: 361746	2019-05-27 06:00:00 +00:00
Yonghong Song	e698958ad8	[BPF] generate R_BPF_NONE relocation for BTF DataSec variables The variables in BTF DataSec type encode in-section offset. R_BPF_NONE should be generated instead of R_BPF_64_32. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D62460 llvm-svn: 361742	2019-05-26 21:26:06 +00:00
Alexander Timofeev	ba447bae74	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741	2019-05-26 20:33:26 +00:00
Simon Pilgrim	a044410f37	[X86][SSE] Add shuffle combining support for ISD::ANY_EXTEND_VECTOR_INREG Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel llvm-svn: 361734	2019-05-26 16:00:35 +00:00
David Green	0dbafe191e	[ARM] Select fp16 fma This adds a pattern for fma, similar to the float and double patterns. Differential Revision: https://reviews.llvm.org/D62330 llvm-svn: 361719	2019-05-26 11:34:30 +00:00
David Green	21542cd6f4	[ARM] Select a number of fp16 rounding functions This add patterns for fp16 round and ceil etc. Same as the float and double patterns. Differential Revision: https://reviews.llvm.org/D62326 llvm-svn: 361718	2019-05-26 11:13:00 +00:00
David Green	c9f4b7d201	[ARM] Promote various fp16 math intrinsics Promote a number of fp16 math intrinsics to float, so that the relevant float math routines can be used. Copysign is expanded so as to be handled in-place. Differential Revision: https://reviews.llvm.org/D62325 llvm-svn: 361717	2019-05-26 10:59:21 +00:00
Simon Pilgrim	58a8541dcc	[X86][AVX] combineBitcastvxi1 - peek through bitops to determine size of original vector We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well. There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well. llvm-svn: 361716	2019-05-26 10:54:23 +00:00
David Green	2881325b17	[ARM] Select fp16 fabs This adds a pattern for the fabs intrinsic, the same as float and double. Differential Revision: https://reviews.llvm.org/D62324 llvm-svn: 361715	2019-05-26 10:51:58 +00:00
David Green	aeade651f3	[ARM] Select fp16 fsqrt This adds a pattern for the sqrt intrinsic, the same as float and double. Differential Revision: https://reviews.llvm.org/D62322 llvm-svn: 361714	2019-05-26 10:42:24 +00:00
David Green	caf8a11b65	[ARM] Promote fp16 frem Promote fp16 frem operations on ARM to floats so they call fmodf. Differential Revision: https://reviews.llvm.org/D62321 llvm-svn: 361713	2019-05-26 10:30:22 +00:00
David Green	1c1e2ca022	[ARM] Add some base fullfp16 tests. NFC llvm-svn: 361712	2019-05-26 10:06:40 +00:00
Simon Pilgrim	40fa52b174	[X86] lowerBuildVectorToBitOp - support build_vector(shift()) -> shift(build_vector(),C) Commonly occurs in sign-extension cases llvm-svn: 361706	2019-05-25 18:02:17 +00:00
Nikita Popov	d87eceda0e	[X86] Combine fminnum/fmaxnum with non-nan operand to fmin/fmax If we have a known non-nan operand, place it in the second operand of fmin/fmax that is returned if either operand is nan. Differential Revision: https://reviews.llvm.org/D62448 llvm-svn: 361704	2019-05-25 16:44:29 +00:00
Simon Pilgrim	34d5a74b03	[X86][SSE] vector-sext - cleanup prefix lists Add X32-SSE common prefix to merge some checks llvm-svn: 361702	2019-05-25 16:33:17 +00:00
Sanjay Patel	3f0905e46f	[SelectionDAG] define binops as a superset of commutative binops The test diffs show improved vector narrowing for integer min/max opcodes because those were all absent from the list. I'm not sure if we can expose functional diffs for all of the moved/added opcodes though. It seems like we are missing an AVX512 opportunity to use 256-bit ops in place of 512-bit ops on some tests/targets, but I think that can be a follow-up. Preliminary steps to make sure the callers are not misusing these queries: rL361268 rL361547 Differential Revision: https://reviews.llvm.org/D62191 llvm-svn: 361701	2019-05-25 15:28:55 +00:00
Nikita Popov	c9de92ee76	[X86] Add tests for min/maxnum with const operand; NFC llvm-svn: 361700	2019-05-25 15:06:54 +00:00
David Bolvansky	2149811854	[NFC] Make tests more robust for new optimizations llvm-svn: 361697	2019-05-25 14:10:20 +00:00
Sanjay Patel	91131b6500	[SelectionDAG] soften assertion when legalizing narrow vector FP ops The test based on PR42010: https://bugs.llvm.org/show_bug.cgi?id=42010 ...may show an inaccuracy for PPC's target defs, but we should not be so aggressive with an assert here. There's no telling what out-of-tree targets look like. llvm-svn: 361696	2019-05-25 13:48:07 +00:00
Craig Topper	46e5052b8e	[X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly. INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags. This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg. One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input. Differential Revision: https://reviews.llvm.org/D61472 llvm-svn: 361691	2019-05-25 06:17:47 +00:00
Peter Collingbourne	3b93737446	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688	2019-05-25 01:52:38 +00:00
Jessica Paquette	97d668d70f	[GlobalISel][AArch64] Make FP constraint checks consider possible use/def banks In a few places in getInstrMapping, we check if use/def instructions for the instruction we're mapping have floating point constraints. We can improve this check and reduce the number of copies in GISel-compiled code if we make a couple observations: - For a def instruction, it only matters if the def instruction must always output a value stored on a FPR - For a use instruction, it only matters if the use instruction must always only take in values stored in FPRs This adds two new functions: - onlyUsesFP - onlyDefinesFP Then we can use those when we're checking the uses/defs instead. Without this patch, the load, unmerge, store, and select in the added test would have unnecessary copies. Differential Revision: https://reviews.llvm.org/D62426 llvm-svn: 361679	2019-05-24 23:08:45 +00:00
Jason Liu	8e1d921bb3	Implement call lowering without parameters on AIX Summary:dd This patch implements call lowering for calls without parameters on AIX as initial support. Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma Differential Revision: https://reviews.llvm.org/D61948 llvm-svn: 361669	2019-05-24 20:54:35 +00:00
Jessica Paquette	56503865ed	[GlobalISel][AArch64] Improve register bank mappings for G_SELECT The fcsel and csel instructions differ in only the register banks they work on. So, they're entirely interchangeable otherwise. With this in mind, this does two things: - Teach AArch64RegisterBankInfo to consider the inputs to G_SELECT as well as the outputs. - Teach it to choose the best register bank mapping based off the constraints of the inputs and outputs. The "best" in this case means the one that requires the smallest number of copies to properly emit a fcsel/csel. For example, if the inputs are all already going to be on FPRs, we should emit a fcsel, even if the output is a GPR. This costs one copy to produce the result, but saves us from copying the inputs into GPRs. Also update the regbank-select.mir to check that we end up with the right select instruction. Differential Revision: https://reviews.llvm.org/D62267 llvm-svn: 361665	2019-05-24 19:35:25 +00:00
Matt Arsenault	3d59e388ca	AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spills If some lanes weren't active on entry to the function, this could clobber their VGPR values. llvm-svn: 361655	2019-05-24 18:18:51 +00:00
Alexander Timofeev	dffedea014	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644	2019-05-24 15:32:18 +00:00
Stefan Pintilie	522307fa40	[PowerPC] Remove CRBits Copy Of Unset/set CBit For the situation, where we generate the following code: crxor 8, 8, 8 < Some instructions> .LBB0_1: < Some instructions> cror 1, 8, 8 cror (COPY of CRbit) depends on the result of the crxor instruction. CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on any previous instruction. This patch will optimize it to: < Some instructions> .LBB0_1: < Some instructions> cror 1, 1, 1 Patch By: Victor Huang (NeHuang) Differential Revision: https://reviews.llvm.org/D62044 llvm-svn: 361632	2019-05-24 12:05:37 +00:00
Simon Pilgrim	95b8d9bbf8	[SelectionDAG] computeKnownBits - support constant pool values from target This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620	2019-05-24 10:03:11 +00:00
Tim Northover	3b2157aeed	GlobalISel: support swifterror attribute on AArch64. swifterror marks an argument as a register pretending to be a pointer, so we need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the infrastructure can be reused from the DAG world. llvm-svn: 361608	2019-05-24 08:40:13 +00:00
Simon Atanasyan	c1b482f2a5	[mips] Always check that `shift and add` optimization is efficient. The D45316 introduced the `shouldTransformMulToShiftsAddsSubs` function to check that breaking down constant multiplications into a series of shifts, adds, and subs is efficient. Unfortunately, this function does not check maximum number of steps on all paths of the algorithm. This patch fixes this bug. Fix for PR41929. Differential Revision: https://reviews.llvm.org/D62166 llvm-svn: 361606	2019-05-24 08:39:40 +00:00
QingShan Zhang	449bfdd1b0	[Power9] Add a specific heuristic to schedule the addi before the load When we are scheduling the load and addi, if all other heuristic didn't take effect, we will try to schedule the addi before the load, to hide the latency, and avoid the true dependency added by RA. And this only take effects for Power9. Differential Revision: https://reviews.llvm.org/D61930 llvm-svn: 361600	2019-05-24 05:30:09 +00:00
Craig Topper	af0add6c39	[X86] Add test case that was supposed to go with r360102. Found in my working area. Guess I forgot 'git add' before committing. llvm-svn: 361599	2019-05-24 04:46:56 +00:00
Reid Kleckner	b7a78c7dff	[AArch64] Preserve X8 for thunks ending in variadic musttail calls Summary: On Windows, X8 may be used to pass in the address of an aggregate that is returned indirectly. Therefore, it should be forwarded to variadic musttail calls and preserved in thunks. Fixes PR41997 Reviewers: mgrang, efriedma Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62344 llvm-svn: 361585	2019-05-24 01:27:20 +00:00
Serge Pavlov	ed595e8627	[AArch64] Add nvcast patterns for v2f32 -> v1f64 Summary: Constant stores of f32 values can create such NvCast nodes. Reviewers: t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62285 llvm-svn: 361584	2019-05-24 01:20:34 +00:00
Thomas Lively	55229f6b10	[WebAssembly] Expand more SIMD float ops Summary: These were previously causing ISel failures. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62354 llvm-svn: 361577	2019-05-24 00:15:04 +00:00
Roman Lebedev	f81ebfb045	UpdateTestChecks: ppc32 triple support Summary: Appears identical to powerpc64{,le}. Regenerate test that is being affected by upcoming patch. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: nemanjai, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62339 llvm-svn: 361543	2019-05-23 19:54:41 +00:00
Matt Arsenault	5c714cbdd8	AMDGPU: Correct maximum possible private allocation size We were assuming a much larger possible per-wave visible stack allocation than is possible: `faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)` Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize. Remove the corresponding subtarget feature and option that made this configurable. llvm-svn: 361541	2019-05-23 19:38:14 +00:00
Robert Lougher	170dfeb2ff	Resubmit r360436 "[X86] Avoid SFB - Fix inconsistent codegen with/without debug info" Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 361527	2019-05-23 18:15:12 +00:00
Thomas Lively	e18b5c6237	[WebAssembly] Implement ReplaceNodeResults to fix a SIMD crash Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61037 llvm-svn: 361526	2019-05-23 18:09:26 +00:00
Roman Lebedev	702a152e6a	[NFC][PPC] Autogenerate vec_add_sub_quadword.ll test Being affected by (sub %x, C) -> add %X, (sub 0, C) 'for vectors' patch. llvm-svn: 361525	2019-05-23 18:08:26 +00:00
Roman Lebedev	c8364ef567	[NFC][PPC] Autogenerate vec_add_sub_doubleword.ll test Being affected by (sub %x, C) -> add %X, (sub 0, C) 'for vectors' patch. llvm-svn: 361524	2019-05-23 18:08:21 +00:00
Roman Lebedev	a8a470c45b	[NFC][Mips] Autogenerate msa/i5-s.ll test Being affected by (sub %x, C) -> add %X, (sub 0, C) 'for vectors' patch. llvm-svn: 361523	2019-05-23 18:08:17 +00:00
Roman Lebedev	06688fe715	[NFC][Mips] Autogenerate msa/arithmetic.ll test Being affected by (sub %x, C) -> add %X, (sub 0, C) 'for vectors' patch. llvm-svn: 361522	2019-05-23 18:08:13 +00:00
Matt Arsenault	0f3ba44b57	AMDGPU/GlobalISel: Legality for integer min/max llvm-svn: 361519	2019-05-23 17:58:48 +00:00
Thomas Lively	eafe8ef6f2	[WebAssembly] Add multivalue and tail-call target features Summary: These features will both be implemented soon, so I thought I would save time by adding the boilerplate for both of them at the same time. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62047 llvm-svn: 361516	2019-05-23 17:26:47 +00:00
Shoaib Meenai	87226a7202	[AsmPrinter] Treat a narrowing PtrToInt like Trunc When printing assembly for PtrToInt, AsmPrinter::lowerConstant incorrectly assumed that if PtrToInt was not converting to an int with exactly the same number of bits, it must be widening to a larger int. But this isn't necessarily true; PtrToInt can also shrink the size, which is useful when you want to produce a known 32-bit pointer on a 64-bit platform (on x86_64 ELF this yields a R_X86_64_32 relocation). The old behavior of falling through to the widening case for a narrowing PtrToInt yields bogus assembly code like this, which fails to assemble because the no-op bit and it accidentally creates is not a valid relocation: ``` .long a&-1 ``` The fix is to treat a narrowing PtrToInt exactly the same as it already treats Trunc: just emit the expression and let the assembler deal with truncating it in the appropriate way. Patch by Mat Hostetter <mjh@fb.com>. Differential Revision: https://reviews.llvm.org/D61325 llvm-svn: 361508	2019-05-23 16:29:09 +00:00
Simon Pilgrim	46806749ac	[X86] Regenerate LZCNT tests on x86/x32/x64 targets llvm-svn: 361495	2019-05-23 13:30:10 +00:00
Alex Bradbury	5dabe03b41	[RISCV][NFC] Add nounwind attribute to functions missing it in test/CodeGen/RISCV r360897 was incomplete, must have applied an old/wip patch. This is in preparation for emitting CFI directives. llvm-svn: 361493	2019-05-23 12:43:13 +00:00
Simon Pilgrim	46165b2409	[AMDGPU] Regenerate vector sub tests llvm-svn: 361485	2019-05-23 11:27:28 +00:00
Roman Lebedev	32d976bac1	[NFC][X86] Fix check prefixes and autogenerate fold-pcmpeqd-2.ll test Being affected by (sub %x, c) -> (add %x, (sub 0, c)) patch in an uncertain way. llvm-svn: 361483	2019-05-23 10:55:13 +00:00
Sam Parker	617cdc5a6d	[ARM][CGP] Clear SafeWrap before each search The previous patch added a member set to store instructions that we could allow to wrap. But this wasn't cleared between searches meaning that they could get promoted, incorrectly, during the promotion of a separate valid chain. Differential Revision: https://reviews.llvm.org/D62254 llvm-svn: 361462	2019-05-23 07:46:39 +00:00
Thomas Lively	1a3cbe720c	[WebAssembly] Implement __builtin_return_address for emscripten Summary: In this patch, `ISD::RETURNADDR` is lowered on the emscripten target to the new Emscripten runtime function `emscripten_return_address`, which implements the functionality. Patch by Guanzhong Chen Reviewers: tlively, aheejin Reviewed By: tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62210 llvm-svn: 361454	2019-05-23 01:24:01 +00:00
Fangrui Song	86c9ca48c3	[X86] Support -fno-plt __tls_get_addr calls In general dynamic/local dynamic TLS models, with -fno-plt, * x86: emit `calll ___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT` Note, on x86, if we can get rid of %ebx as the PIC register, it may be better to use a register not preserved across function calls. x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT` Reorganize the code by separating 32-bit and 64-bit. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62106 llvm-svn: 361453	2019-05-23 01:05:13 +00:00
Craig Topper	93f38e1f1a	[X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. Remove a bunch of isel patterns that become unnecessary. We effectively had a second set of isel patterns that tried to use a regular store instruction and an extract_subreg instruction. Or a masked move and an extract_subreg. These patterns were intended to override the matching of VEXTRACT instructions by taking advantage of the priority of the explicit immediate 0 for the index. This patch instaed just disables the immediate 0 matchin the VEXTRACT patterns. This each of the component pieces of the larger patterns will match by themselves. This found a bug of sorts were we didn't use 128-bit store for 512->128 extract on KNL. Its unclear what the right thing here should be. Using the vextract avoids constraining the register allocator to use xmm0-15. But it always results in a longer encoding if the register allocator ends up choosing xmm0-15 anyway. llvm-svn: 361431	2019-05-22 21:00:18 +00:00
Craig Topper	9816d55776	[X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics into llvm.ceil/floor. Remove some isel patterns that existed because that was happening. We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond to the libm functions. For the libm functions we need to disable the precision exception so the llvm.floor/ceil functions should always map to encodings 0x9 and 0xA. We had a mix of isel patterns where some used 0x9 and 0xA and others used 0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA. Since we have no way in isel of knowing where the llvm.ceil/floor came from, we can't map X86 specific intrinsics with encodings 1 or 2 to it. We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like to see a use case and optimization advantage first. I've left the backend test cases to show the blend we now emit without the extra isel patterns. But I've removed the InstCombine tests completely. llvm-svn: 361425	2019-05-22 20:04:55 +00:00
Alexey Lapshin	53726588f6	[DebugInfo][AArch64] Recognise target specific instruction as mov instr This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714. Specifically, Simple Register Coalescing creates following conversion : undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24; It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis is not able to create debug location record for that instruction. So the problem is in that debug info for argc variable is incorrect. The fix is to write custom isCopyInstrImpl() which would recognize the ORRWrs instr. llvm-svn: 361417	2019-05-22 18:48:58 +00:00
Roman Lebedev	5e1ce15c5d	[NFC][X86][AArch64] Add tests for missing (x - y) + -1 -> not(y) + x fold https://rise4fun.com/Alive/OaY llvm-svn: 361409	2019-05-22 16:58:26 +00:00
Matt Arsenault	ca64ef2043	MC: Allow getMaxInstLength to depend on the subtarget Keep it optional in cases this is ever needed in some global context. Currently it's only used for getting an upper bound inline asm code size. For AMDGPU, gfx10 increases the maximum instruction size to 20-bytes. This avoids penalizing older subtargets when estimating code size, and making some annoying branch relaxation test adjustments. llvm-svn: 361405	2019-05-22 16:28:41 +00:00
Kees Cook	c2187c20a4	[TargetLowering] Extend bool args to inline-asm according to getBooleanType Summary: This extends Krzysztof Parzyszek's X86-specific solution (https://reviews.llvm.org/D60208) to the generic code pointed out by James Y Knight. Reviewers: kparzysz, craig.topper, nickdesaulniers Subscribers: efriedma, sdardis, nemanjai, javed.absar, eraman, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, srhines, void, nickdesaulniers, jyknight Tags: #llvm Differential Revision: https://reviews.llvm.org/D60224 llvm-svn: 361404	2019-05-22 16:16:15 +00:00
Sanjay Patel	5a4f7cf2ff	[IR] allow fast-math-flags on select of FP values This is a minimal start to correcting a problem most directly discussed in PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 We have been hacking around a limitation for FP select patterns by using the fast-math-flags on the condition of the select rather than the select itself. This patch just allows FMF to appear with the 'select' opcode. No changes are needed to "FPMathOperator" because it already includes select-of-FP because that definition is based on the (return) value type. Once we have this ability, we can start correcting and adding IR transforms to use the FMF on a 'select' instruction. The instcombine and vectorizer test diffs only show that the IRBuilder change is behaving as expected by applying an FMF guard value to 'select'. For reference: rL241901 - allowed FMF with fcmp rL255555 - allowed FMF with FP calls Differential Revision: https://reviews.llvm.org/D61917 llvm-svn: 361401	2019-05-22 15:50:46 +00:00
Roman Lebedev	1f63d7fef9	[NFC][ARM] addsubcarry-promotion.ll: whoops - replace '.' with '-' in check-prefix Does not affect update_llc_test_checks, or the actual output, but is not accepted by the actual FileCheck. Sorry, i should have noticed this before committing, not the very next second after.. llvm-svn: 361398	2019-05-22 15:42:33 +00:00
Roman Lebedev	1b45bdf5ba	[NFC][ARM] Autogenerate addsubcarry-promotion.ll test Being affected by upcoming patch llvm-svn: 361397	2019-05-22 15:34:51 +00:00
Roman Lebedev	6a53135698	[NFC][X86] Autogenerate negative-offset.ll test Being affected by upcoming patch llvm-svn: 361396	2019-05-22 15:34:43 +00:00
Roman Lebedev	406421b332	[NFC][X86][AArch64] Rewrite sink-addsub-of-const.ll tests to have full permutation coverage Somehow missed some patterns initially.. While there, add comments. llvm-svn: 361390	2019-05-22 14:42:41 +00:00
Roman Lebedev	7c72ca012d	UpdateTestChecks: sparc march handling Summary: Another target that prefers to use `-march` in tests ``` llvm/test/CodeGen/SPARC$ grep -ri mtriple \| wc -l 25 llvm/test/CodeGen/SPARC$ grep -ri march \| wc -l 165 ``` This test is being affected by a further patch, so regenerate it to better visualize the changes Reviewers: RKSimon, dcederman, gberry Reviewed By: RKSimon Subscribers: jyknight, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62242 llvm-svn: 361381	2019-05-22 13:04:34 +00:00
Roman Lebedev	4bf35671b5	[NFC][SystemZ] Autogenerate alloca-03.ll test to make test changes more visible The check lines are being affected by an upcoming patch, regenerate the checklines to visualize the changes better. llvm-svn: 361380	2019-05-22 13:04:24 +00:00
Anton Afanasyev	df00c6a54f	[MIR] Add simple PRE pass to MachineCSE This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917 llvm-svn: 361356	2019-05-22 07:41:34 +00:00
Fangrui Song	1c61471ab1	[PPC64] Parse -elfv1 -elfv2 when specified on target triple Summary: For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2]. The following results are expected: ELFv1 when using: -target powerpc64-unknown-freebsd12.0 -target powerpc64-unknown-freebsd12.0 -mabi=elfv1 -target powerpc64-unknown-freebsd12.0-elfv1 ELFv2 when using: -target powerpc64-unknown-freebsd12.0 -mabi=elfv2 -target powerpc64-unknown-freebsd12.0-elfv2 [1] https://wiki.freebsd.org/powerpc/llvm-elfv2 [2] https://clang.llvm.org/docs/CrossCompilation.html Patch by Alfredo Dal'Ava Júnior! Differential Revision: https://reviews.llvm.org/D61950 llvm-svn: 361355	2019-05-22 07:29:59 +00:00
Nikita Popov	15df05152d	[X86] Don't compare i128 through vector if construction not cheap (PR41971) Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the combineVectorSizedSetCCEquality() transform more conservative by checking that the bitcast to the vector type will be cheap/free for both operands. I'm considering it cheap if it's a constant, a load or already a vector. I've dropped the explicit check for f128 because it should fall out naturally (in the cases where it'd be detrimental). Differential Revision: https://reviews.llvm.org/D62220 llvm-svn: 361352	2019-05-22 06:47:06 +00:00
Chen Zheng	9970665f60	[PowerPC] [ISEL] select x-form instruction for unaligned offset Differential Revision: https://reviews.llvm.org/D62173 llvm-svn: 361346	2019-05-22 02:57:31 +00:00
Pengfei Wang	6a0d432e9e	[X86] [CET] Deal with return-twice function such as vfork, setjmp when CET-IBT enabled Return-twice functions will indirectly jump after the caller's position. So when CET-IBT is enable, we should make sure these is endbr* instructions follow these Return-twice function caller. Like GCC does. Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D61881 llvm-svn: 361342	2019-05-22 00:50:21 +00:00
Matt Arsenault	2cba91b8db	AMDGPU: Assume calls read exec llvm-svn: 361333	2019-05-21 23:23:16 +00:00
Matt Arsenault	eea81c20fe	AMDGPU: Add some tests for inlineasm behavior llvm-svn: 361332	2019-05-21 23:23:12 +00:00
Matt Arsenault	dd1ffa00a5	AMDGPU: Assume call pseudos are convergent There should probably be nonconvergent versions, but my guess is it doesn't matter in practice. llvm-svn: 361331	2019-05-21 23:23:10 +00:00
Matt Arsenault	60ba03e210	AMDGPU: Fix not marking new gfx10 SGPRs as CSRs llvm-svn: 361330	2019-05-21 23:23:05 +00:00
Dan Gohman	a49496fb2a	[WebAssembly] Add the signature for the new llround builtin function r360889 added new llround builtin functions. This patch adds their signatures for the WebAssembly backend. It also adds wasm32 support to utils/update_llc_test_checks.py, since that's the script other targets are using for their testcases for this feature. Differential Revision: https://reviews.llvm.org/D62207 llvm-svn: 361327	2019-05-21 23:06:34 +00:00
Roman Lebedev	675307b1f1	[NFC][AMDGPU] Autogenerate llvm.amdgcn.s.barrier.ll test llvm-svn: 361320	2019-05-21 21:49:14 +00:00
Roman Lebedev	21e8ec8d4f	[NFC][X86] Autogenerate ragreedy-hoist-spill.ll test llvm-svn: 361319	2019-05-21 21:49:10 +00:00
Roman Lebedev	079d8b425f	[NFC][Thumb2] Autogenerate thumb2-ldr_pre.ll test llvm-svn: 361318	2019-05-21 21:49:05 +00:00
Nikita Popov	d34d96770e	[X86] Add large integer comparison tests for PR41971; NFC In these cases we would prefer a direct comparison over going through a vector type. llvm-svn: 361315	2019-05-21 21:27:08 +00:00
Yi-Hong Lyu	00e85f7535	Move csr-save-restore-order.ll to the right place llvm-svn: 361306	2019-05-21 20:28:31 +00:00
Roman Lebedev	a7e88f8570	[NFC][X86][AArch64] Add tests for sinking of add/sub by constant through add/sub Looks we can transform all 8 variants of the pattern: https://rise4fun.com/Alive/auH This comes up as an issue on the path towards https://bugs.llvm.org/show_bug.cgi?id=41952 llvm-svn: 361303	2019-05-21 20:14:54 +00:00
Stanislav Mekhanoshin	44d17ca02e	Fix register coalescer failure to prune value Register coalescer fails for the test in the patch with the assertion in JoinVals::ConflictResolution `DefMI != nullptr'. It attempts to join live intervals for two adjacent instructions and erase the copy: %2:vreg_256 = COPY %1 %3:vreg_256 = COPY killed %1 The LI needs to be adjusted to kill subrange for the erased instruction and extend the subrange of the original def. That was done for the main interval only but not for the subrange. As a result subrange had a VNI pointing to the erased slot resulting in the above failure. Differential Revision: https://reviews.llvm.org/D62162 llvm-svn: 361293	2019-05-21 19:32:41 +00:00
Leonard Chan	0bada7ce6c	[Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D55720 llvm-svn: 361289	2019-05-21 19:17:19 +00:00
Simon Pilgrim	4b82e50315	[X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP support Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692 llvm-svn: 361270	2019-05-21 15:20:24 +00:00
Roman Lebedev	d8db224ecb	[NFC][X86][AArch64] Shift amount masking: tests that show that 'neg' doesn't last Meaning if we were to produce 'neg' in dagcombine, we will get an endless cycle; some inverse transform would need to be guarded somehow. Also, the 'and (sub 0, x), 31' variant is sticky, doesn't get optimized in any way. https://bugs.llvm.org/show_bug.cgi?id=41952 llvm-svn: 361254	2019-05-21 13:04:56 +00:00
Simon Pilgrim	bc03bee66b	[X86][SSE] Add shuffle tests for 'splat3' patterns. Test codegen from shuffles for { dst[0] = dst[1] = dst[2] = *src++; dst += 3 } 'splatting' memcpy patterns generated by loop-vectorizer. llvm-svn: 361243	2019-05-21 11:42:28 +00:00
Roman Lebedev	2aee73f591	[NFC][X86][AArch64] Add some more tests for shift amount masking The negation creation should be more eager: https://bugs.llvm.org/show_bug.cgi?id=41952 llvm-svn: 361241	2019-05-21 11:14:01 +00:00
Florian Hahn	4a8835c655	[AArch64] Skip mask checks for masks with an odd number of elements. Some checks in isShuffleMaskLegal expect an even number of elements, e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access invalid elements and crash. This patch adds checks to the impacted functions. Fixes PR41951 Reviewers: t.p.northover, dmgreen, samparker Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D60690 llvm-svn: 361235	2019-05-21 10:05:26 +00:00
Sam Parker	3141bbd52d	[ARM][CGP] Skip nuw in PrepareConstants PrepareConstants step converts add/sub with 'negative' immediates to sub/add with a 'positive' imm to make promotion more simple. nuw already states that the add shouldn't cause an unsigned wrap, so it shouldn't need any tweaking. Plus, we also don't allow a sub with a 'negative' immediate to be safe wrap, so this functionality has been removed. The PrepareConstants step now just handles the add instructions that we've determined would be safe if they wrap around zero. Differential Revision: https://reviews.llvm.org/D62057 llvm-svn: 361227	2019-05-21 07:56:47 +00:00
Dylan McKay	e967308da4	Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess Summary: The endianess used in the calling convention does not always match the endianess of the target on all architectures, namely AVR. When an argument is too large to be legalised by the architecture and is split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian is queried to find the endianess that function arguments must be laid out in. This approach was recommended by Eli Friedman. Originally reported in https://github.com/avr-rust/rust/issues/129. Patch by Carl Peto. Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma Reviewed By: efriedma Subscribers: JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62003 llvm-svn: 361222	2019-05-21 06:38:02 +00:00
QingShan Zhang	690fa1b51b	[NFC][PowerPC] Add a test to verify if the scheduler schedule the addi before the load. llvm-svn: 361221	2019-05-21 06:32:31 +00:00
Nikita Popov	e44691bf9f	Move thumbv7k test from AArch64 to ARM As pointed out by charukcs on rL361166, this test uses an ARM triple. llvm-svn: 361220	2019-05-21 06:24:36 +00:00
Chen Zheng	e64bcada5f	[PowerPC] test cases for selecting x-form instruction for unaligned offset - NFC llvm-svn: 361219	2019-05-21 05:06:09 +00:00
Matt Arsenault	6dd08e335f	AMDGPU: Force skip branches over calls Unfortunately the way SIInsertSkips works is backwards, and is required for correctness. r338235 added handling of some special cases where skipping is mandatory to avoid side effects if no lanes are active. It conservatively handled asm correctly, but the same logic needs to apply to calls. Usually the call sequence code is larger than the skip threshold, although the way the count is computed is really broken, so I'm not sure if anything was likely to really hit this. llvm-svn: 361202	2019-05-20 22:04:42 +00:00
Martin Storsjo	4ed18e5ef5	[AArch64] Handle lowering lround on windows, where long is 32 bit Differential Revision: https://reviews.llvm.org/D62108 llvm-svn: 361192	2019-05-20 19:53:28 +00:00
Craig Topper	e97e52757c	[X86] Add test case for r361177. That commit makes sure we flush PendingExports in SelectDAGBuilder before we create INLINEASM_BR. Unfortunatley, I haven't yet found a CodeGen failure without that change. This commit uses the debug output from SelectionDAG to at least ensure we build the DAG correctly. llvm-svn: 361179	2019-05-20 17:37:52 +00:00
Nikita Popov	9060b6df97	[SDAG] Vector op legalization for overflow ops Fixes issue reported by aemerson on D57348. Vector op legalization support is added for uaddo, usubo, saddo and ssubo (umulo and smulo were already supported). As usual, by extracting TargetLowering methods and calling them from vector op legalization. Vector op legalization doesn't really deal with multiple result nodes, so I'm explicitly performing a recursive legalization call on the result value that is not being legalized. There are some existing test changes because expansion happens earlier, so we don't get a DAG combiner run in between anymore. Differential Revision: https://reviews.llvm.org/D61692 llvm-svn: 361166	2019-05-20 16:09:22 +00:00
Matt Arsenault	7c8ec18964	RegAlloc: Fix verifier error with undef identity copies The code did not match the example in the comment, and was checking the undef flag on the copy dest instead of source. The existing tests were only hitting the > 2 operands case. llvm-svn: 361156	2019-05-20 14:09:36 +00:00
Sander de Smalen	f83cccf917	Match types of accumulator and result for llvm.experimental.vector.reduce.fadd/fmul The scalar start/accumulator value of the fadd- and fmul reduction should match the result type of the reduction, as well as the vector element-type of the input vector. Although this was not explicitly specified in the LangRef, it was taken for granted in code implementing the reductions. The patch also fixes the LangRef by adding this constraint. Reviewed By: aemerson, nikic Differential Revision: https://reviews.llvm.org/D60260 llvm-svn: 361133	2019-05-20 09:54:06 +00:00
Carl Ritson	34e95ce259	[AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt values Summary: Avoid introducing hazard mitigation when lgkmcnt is reduced to 0. Clarify code comments to explain assumptions made for this hazard mitigation. Expand and correct test cases to cover variants of s_waitcnt. Reviewers: nhaehnle, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62058 llvm-svn: 361124	2019-05-20 07:20:12 +00:00
Roman Lebedev	1a5d623ded	[NFC][AArch64] Autogenerate fcopysign.ll test llvm-svn: 361106	2019-05-18 20:24:40 +00:00
Roman Lebedev	13ac317e4c	[NFC][AArch64] Autogenerate bitfield-insert.ll, selectcc-to-shiftand.ll tests Investigating bit-extract (ubfx) pattern with shifted mask. llvm-svn: 361105	2019-05-18 17:42:06 +00:00
Roman Lebedev	d1be3c446e	[NFC][AArch64] Add some ubfx tests with immediates Shows the regression in D62100 llvm-svn: 361102	2019-05-18 13:49:44 +00:00
Roman Lebedev	98092f37d0	UpdateTestChecks: fix AMDGPU handling Summary: Was looking into supporting `(srl (shl x, c1), c2)` with c1 != c2 in dagcombiner, this test changes, but makes `update_llc_test_checks.py` unhappy. Many AMDGPU tests specify `-march`, not `-mtriple`, which results in `update_llc_test_checks.py` defaulting to x86 asm function detection heuristics, which don't work here. I propose to fix this by adding an infrastructure to map from `-march` to `-mtriple`, in the UpdateTestChecks tooling. Reviewers: RKSimon, MaskRay, arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62099 llvm-svn: 361101	2019-05-18 13:00:03 +00:00
Roman Lebedev	822b9c971b	UpdateTestChecks: arm64-eabi handlind Summary: Was looking into supporting `(srl (shl x, c1), c2)` with c1 != c2 in dagcombiner, this test changes, but makes `update_llc_test_checks.py` unhappy Reviewers: RKSimon Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62097 llvm-svn: 361100	2019-05-18 12:59:56 +00:00
Matt Arsenault	2f29220d6d	AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFP llvm-svn: 361082	2019-05-17 23:05:18 +00:00
Matt Arsenault	02b5ca8cd1	GlobalISel: Implement lower for S64->S32 [SU]ITOFP This is ported from the custom AMDGPU DAG implementation. I think this is a better default expansion than what the DAG currently uses, at least if the target has CTLZ. This implements the signed version in terms of the unsigned conversion, which is implemented with bit operations. SelectionDAG has several other implementations that should eventually be ported depending on what instructions are legal. llvm-svn: 361081	2019-05-17 23:05:13 +00:00
Matt Arsenault	f3cedf4823	GlobalISel: Define integer min/max instructions Doesn't attempt to emit them for anything yet, but some legalizations I want to port use them. llvm-svn: 361061	2019-05-17 18:36:31 +00:00
Simon Pilgrim	065431c82b	[X86][SSE] Fold movmsk(not(x)) -> not(movmsk) Helps to improve folding of comparisons with movmsk results. llvm-svn: 361056	2019-05-17 17:56:25 +00:00
Simon Pilgrim	2c2f8e74b9	[X86][SSE] Match all-of bool scalar reductions into a bitcast/movmsk + cmp. Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns. llvm-svn: 361052	2019-05-17 17:25:55 +00:00
Roman Lebedev	64c756b991	[DAGCombiner] visitShiftByConstant(): drop bogus signbit check Summary: That check claims that the transform is illegal otherwise. That isn't true: 1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter https://rise4fun.com/Alive/K4A 2. For `ISD::AND`, there is no restriction on constants: https://rise4fun.com/Alive/Wy3 3. For `ISD::OR`, there is no restriction on constants: https://rise4fun.com/Alive/GOH 3. For `ISD::XOR`, there is no restriction on constants: https://rise4fun.com/Alive/ml6 So, why is it there then? This changes the testcase that was touched by @spatel in rL347478, but i'm not sure that test tests anything particular? Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin Reviewed By: spatel Subscribers: javed.absar, llvm-commits, spatel Tags: #llvm Differential Revision: https://reviews.llvm.org/D61918 llvm-svn: 361044	2019-05-17 15:52:58 +00:00
Simon Pilgrim	62c7032c18	[X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) fold. Prep work for the removal of the remaining x86 CTTZ vector lowering. llvm-svn: 361035	2019-05-17 14:04:56 +00:00
Matt Arsenault	a510b570c2	AMDGPU/GlobalISel: Legalize G_FCEIL llvm-svn: 361028	2019-05-17 12:20:05 +00:00
Matt Arsenault	6aebcd5499	AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNC llvm-svn: 361027	2019-05-17 12:20:01 +00:00
Matt Arsenault	6aafc5e19d	AMDGPU/GlobalISel: Legalize G_FRINT llvm-svn: 361026	2019-05-17 12:19:57 +00:00
Matt Arsenault	1448f5689e	AMDGPU/GlobalISel: Legalize G_FCOPYSIGN llvm-svn: 361025	2019-05-17 12:19:52 +00:00
Matt Arsenault	568f193847	AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.load llvm-svn: 361023	2019-05-17 12:02:34 +00:00
Matt Arsenault	a3b5a386fa	AMDGPU/GlobalISel: Use subreg index instead of extra unmerge This saves instructions and extra steps, but I'm not sure about introducing subregister indexes at this point. llvm-svn: 361022	2019-05-17 12:02:31 +00:00
Matt Arsenault	b3dc73634c	AMDGPU/GlobalISel: Use waterfall loop for buffer_load This adds support for more complex waterfall loops that need to handle operands > 32-bits, and multiple operands. llvm-svn: 361021	2019-05-17 12:02:27 +00:00
Clement Courbet	632dfdda16	Re-land r360859: "[MergeICmps] Simplify the code." With a fix for PR41917: The predecessor list was changing under our feet. - for (BasicBlock Pred : predecessors(EntryBlock_)) { + while (!pred_empty(EntryBlock_)) { + BasicBlock const Pred = *pred_begin(EntryBlock_); llvm-svn: 361009	2019-05-17 09:43:45 +00:00
Rhys Perry	c4bc61bad7	[AMDGPU] detect WaW hazards when moving/merging load/store instructions Summary: In order to combine memory operations efficiently, the load/store optimizer might move some instructions around. It's usually safe to move instructions down past the merged instruction because the pass checks if memory operations can be re-ordered. Though, the current logic doesn't handle Write-after-Write hazards. This fixes a reflection issue with Monster Hunter World and DXVK. v2: - rebased on top of master - clean up the test case - handle WaW hazards correctly Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130 Original patch by Samuel Pitoiset. Reviewers: tpr, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D61313 llvm-svn: 361008	2019-05-17 09:32:23 +00:00
Jonas Paulsson	9427961c89	[SystemZ] Bugfix in SystemZTargetLowering::combineIntDIVREM() Make sure to not unroll a vector division/remainder (with a constant splat divisor) after type legalization, since the scalar type may then be illegal. Review: Ulrich Weigand https://reviews.llvm.org/D62036 llvm-svn: 360965	2019-05-17 00:50:35 +00:00
Nico Weber	d764e7c660	Revert r360859: "Reland r360771 "[MergeICmps] Simplify the code."" It caused PR41917. llvm-svn: 360963	2019-05-17 00:43:53 +00:00
Tim Renouf	e3cbdaf1b5	[CodeGen] Fixed de-optimization of legalize subvector extract The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU 3-dword memory instructions, caused a de-optimization problem for code with such a load that then bitcasts via vector of i8, because v12i8 is not an MVT so it legalizes the bitcast by widening it. This commit adds the ability to widen a bitcast using extract_subvector on the result, so the value does not need to go via memory. Differential Revision: https://reviews.llvm.org/D60457 Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64 llvm-svn: 360942	2019-05-16 21:49:06 +00:00
Craig Topper	f09b9d419f	[X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. Similarly change 0x2 to 0xA for ceil. This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate in patterns without masking. llvm-svn: 360915	2019-05-16 16:53:50 +00:00
Matt Arsenault	99e6f4d11a	AMDGPU: Introduce TokenFactor for ABI register copies in call sequence The call was missing chain dependencies on the pre-call copies. I don't think this was causing any real issues however. llvm-svn: 360906	2019-05-16 15:10:27 +00:00
Matt Arsenault	df24c92c0f	AMDGPU: Assume xnack is enabled by default This is the conservatively correct default. It is always safe to assume xnack is enabled, but not the converse. Introduce a feature to blacklist targets where xnack can never be meaningfully enabled. I'm not sure the targets this is applied to is 100% correct. llvm-svn: 360903	2019-05-16 14:48:34 +00:00
Alex Bradbury	3966b02cc8	[RISCV][NFC] Add nounwind attribute to functions missing it in test/CodeGen/RISCV This is in preparation for emitting CFI directives. llvm-svn: 360897	2019-05-16 13:56:23 +00:00
Adhemerval Zanella	2d28db6b9f	[AArch64] Handle ISD::LROUND and ISD::LLROUND This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas instruction. It currently only handles the scalar version. llvm-svn: 360894	2019-05-16 13:30:18 +00:00
Adhemerval Zanella	73643b5041	[CodeGen] Add lround/llround builtins This patch add the ISD::LROUND and ISD::LLROUND along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lround/llround generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. llvm-svn: 360889	2019-05-16 13:15:27 +00:00
Matt Arsenault	828b685ebe	RegAllocFast: Improve hinting heuristic Trace through multiple COPYs when looking for a physreg source. Add hinting for vregs that will be copied into physregs (we only hinted for vregs getting copied to a physreg previously). Give hinted a register a bonus when deciding which value to spill. This is part of my rewrite regallocfast series. In fact this one doesn't even have an effect unless you also flip the allocation to happen from back to front of a basic block. Nonetheless it helps to split this up to ease review of D52010 Patch by Matthias Braun llvm-svn: 360887	2019-05-16 12:50:39 +00:00
Roman Lebedev	62650cf464	[NFC] Fixup FileCheck option name in tests added in rL360881 llvm-svn: 360884	2019-05-16 12:39:34 +00:00
Roman Lebedev	ec6608d547	[NFC][CodeGen] Add some more tests for pulling binops through shifts The ashr variant may see relaxation in https://reviews.llvm.org/D61918 llvm-svn: 360881	2019-05-16 12:26:53 +00:00
Matt Arsenault	a8f88c388f	AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xor Bool values should use the scc/vcc regbank since r350611. llvm-svn: 360877	2019-05-16 12:06:41 +00:00
Clement Courbet	c4fdd717ef	Reland r360771 "[MergeICmps] Simplify the code." This revision does not seem to be the culprit. llvm-svn: 360859	2019-05-16 06:18:02 +00:00
Matt Arsenault	55146d3139	GlobalISel: Add G_FCOPYSIGN llvm-svn: 360850	2019-05-16 04:08:39 +00:00
Craig Topper	e43bdf144c	[X86] Delay creating index register negations during address matching until after we know for sure the match will succeed If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel. Differential Revision: https://reviews.llvm.org/D61047 llvm-svn: 360823	2019-05-15 21:59:53 +00:00
Reid Kleckner	4882490349	[codeview] Fix SDNode representation of annotation labels Before this change, they were erroneously constructed with the EH_LABEL SDNode opcode, which caused other passes to interact with them in incorrect ways. See the FIXME about fastisel that this addresses in the existing test case. Fixes PR41890 llvm-svn: 360818	2019-05-15 21:46:05 +00:00
Mandeep Singh Grang	814435fe87	[AArch64] only indicate CFI on Windows if we emitted CFI Summary: Otherwise, we emit directives for CFI without any actual CFI opcodes to go with them, which causes tools to malfunction. The technique is similar to what the x86 backend already does. Fixes https://bugs.llvm.org/show_bug.cgi?id=40876 Patch by: froydnj (Nathan Froyd) Reviewers: mstorsjo, eli.friedman, rnk, mgrang, ssijaric Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, llvm-commits, dmajor Tags: #llvm Differential Revision: https://reviews.llvm.org/D61960 llvm-svn: 360816	2019-05-15 21:23:41 +00:00
Nicolai Haehnle	664ceeda68	RegAlloc: try to fail more gracefully when out of registers Summary: The emitError path allows the program to continue, unlike report_fatal_error. This is friendlier to use cases where LLVM is embedded in a larger program, because the caller may be able to deal with the error somewhat gracefully. Change the number of requested NOP bytes in the AArch64 and PowerPC test cases to avoid triggering an unrelated assertion. The compilation still fails, as verified by the test. Change-Id: Iafb9ca341002a597b82e59ddc7a1f13c78758e3d Reviewers: arsenm, MatzeB Subscribers: qcolombet, nemanjai, wdng, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61489 llvm-svn: 360786	2019-05-15 17:29:58 +00:00
Ryan Taylor	29257eb76c	[AMDGPU] Increases available SGPR for Calling Convention Summary: SGPR in CC can be either hw initialized or set by other chained shaders and so this increases the SGPR count availalbe to CC to 105. Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61261 llvm-svn: 360778	2019-05-15 14:43:55 +00:00
Clement Courbet	eaf4413d2d	Revert r360771 "[MergeICmps] Simplify the code." Breaks a bunch of builbdots. llvm-svn: 360776	2019-05-15 14:21:59 +00:00
Clement Courbet	157ae639fa	[MergeICmps] Simplify the code. Instead of patching the original blocks, we now generate new blocks and delete the old blocks. This results in simpler code with a less twisted control flow (see the change in `entry-block-shuffled.ll`). This will make https://reviews.llvm.org/D60318 simpler by making it more obvious where control flow created and deleted. Reviewers: gchatelet Subscribers: hiraditya, llvm-commits, spatel Tags: #llvm Differential Revision: https://reviews.llvm.org/D61736 llvm-svn: 360771	2019-05-15 13:04:24 +00:00
David Green	0582b22f10	[ARM] Don't use the Machine Scheduler for cortex-m at minsize The new cortex-m schedule in rL360768 helps performance, but can increase the amount of high-registers used. This, on average, ends up increasing the codesize by a fair amount (because less instructions are converted from T2 to T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use the existing DAG scheduler with the RegPressure scheduling preference (at least until the issues around T2 vs T1 instructions can be improved). I have also made sure that the Sched::RegPressure dag scheduler is always chosen for MinSize. The test shows one case where we increase the number of registers used. Differential Revision: https://reviews.llvm.org/D61882 llvm-svn: 360769	2019-05-15 12:58:02 +00:00
David Green	d2d0f46cd2	[ARM] Cortex-M4 schedule This patch adds a simple Cortex-M4 schedule, renaming the existing M3 schedule to M4 and filling in the latencies as-per the Cortex-M4 TRM: https://developer.arm.com/docs/ddi0439/latest Most of these are 1, with the important exception being loads taking 2 cycles. A few others are also higher, but I don't believe they make a large difference. I've repurposed the M3 schedule as the latencies are mostly the same between the two cores, with the M4 having more FP and DSP instructions. We also turn on MISched and UseAA for the cores that now use this. It also adds some schedule Write's to various instruction to make things simpler. Differential Revision: https://reviews.llvm.org/D54142 llvm-svn: 360768	2019-05-15 12:41:58 +00:00
Craig Topper	384d46c0d5	[X86] Use OR32mi8Locked instead of LOCK_OR32mi8 in emitLockedStackOp. They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think this makes sense since we are using it as a fence. This also seems to hide the operation from the speculative load hardening pass so I've reverted r360511. llvm-svn: 360747	2019-05-15 04:15:46 +00:00
Fangrui Song	f4dfd63c74	[IR] Disallow llvm.global_ctors and llvm.global_dtors of the 2-field form in textual format The 3-field form was introduced by D3499 in 2014 and the legacy 2-field form was planned to be removed in LLVM 4.0 For the textual format, this patch migrates the existing 2-field form to use the 3-field form and deletes the compatibility code. test/Verifier/global-ctors-2.ll checks we have a friendly error message. For bitcode, lib/IR/AutoUpgrade UpgradeGlobalVariables will upgrade the 2-field form (add i8* null as the third field). Reviewed By: rnk, dexonsmith Differential Revision: https://reviews.llvm.org/D61547 llvm-svn: 360742	2019-05-15 02:35:32 +00:00
Philip Reames	445f942fc4	Use an offset from TOS for idempotent rmw locked op lowering This was the portion split off D58632 so that it could follow the redzone API cleanup. Note that I changed the offset preferred from -8 to -64. The difference should be very minor, but I thought it might help address one concern which had been previously raised. Differential Revision: https://reviews.llvm.org/D61862 llvm-svn: 360719	2019-05-14 22:32:42 +00:00
Roman Lebedev	7baf528aba	[NFC][CodeGen][X86][AArch64] Add and-const-mask + const-shift pattern tests Unlike instcombine, we currently don't turn and+shift into shift+and. We probably should, likely unconditionally. While i'm adding only all-ones (potentially shifted) mask, this obviously isn't limited to any particular mask pattern: https://rise4fun.com/Alive/kmX Related to https://bugs.llvm.org/show_bug.cgi?id=41874 llvm-svn: 360706	2019-05-14 20:17:04 +00:00
Stanislav Mekhanoshin	05791d90c9	[AMDGPU] Fixed handling of imemdiate i1 literals This bug was exposed by the rL360395. Differential Revision: https://reviews.llvm.org/D61812 llvm-svn: 360689	2019-05-14 16:18:00 +00:00
Stanislav Mekhanoshin	7b20032628	[AMDGPU] gfx1010 Strengthen some SMEM WAR hazard unit tests. NFC. Tighten conditions on SMEM WAR hazard unit tests to ensure rejection of workaround insertion where a s_waitcnt is present in dependency chain. The current workaround code already conforms to these revise tests. llvm-svn: 360686	2019-05-14 16:04:03 +00:00
Simon Pilgrim	c2d9cfd925	[X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD targets (PR40758) D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops - this is true almost entirely for AMD targets where the scalar ALUs are well balanced. This combine avoids AND immediate mask which usually means we reduce encoding size. Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily. Differential Revision: https://reviews.llvm.org/D61830 llvm-svn: 360684	2019-05-14 15:21:28 +00:00
Lei Huang	22561972af	[PowerPC] Custom lower known CR bit spills For known CRBit spills, CRSET/CRUNSET, it is more efficient to load and spill the known value instead of extracting the bit. eg. This sequence is currently used to spill a CRUNSET: crclr 4*cr5+lt mfocrf r3,4 rlwinm r3,r3,20,0,0 stw r3,132(r1) This patch custom lower it to: li r3,0 stw r3,132(r1) Differential Revision: https://reviews.llvm.org/D61754 llvm-svn: 360677	2019-05-14 14:27:06 +00:00
Diana Picus	a568222ddd	[IRTranslator] Don't hardcode GEP index type When breaking up loads and stores of aggregates, the IRTranslator uses LLT::scalar(64) for the index type of the G_GEP instructions that compute the addresses. This is unnecessarily large for 32-bit targets. Use the int ptr type provided by the DataLayout instead. Note that we're already doing the right thing when translating getelementptr instructions from the IR. This is just an oversight when generating new ones while translating loads/stores. Both x86 and AArch64 already have tests confirming that the old behaviour is preserved for 64-bit targets. Differential Revision: https://reviews.llvm.org/D61852 llvm-svn: 360656	2019-05-14 09:25:17 +00:00
Philip Reames	3098e44daa	[X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit targets This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence. Differential Revision: https://reviews.llvm.org/D61863 llvm-svn: 360649	2019-05-14 04:43:37 +00:00
Jinsong Ji	b7b3d866a4	[PowerPC][NFC] Fix typos in triples Found by bzEq (Kai Luo). llvm-svn: 360643	2019-05-14 03:11:24 +00:00
Craig Topper	cc761e6fae	[X86] Use X86 instead of X32 as a check prefix in atomic-idempotent.ll. NFC X32 can refer to a 64-bit ABI that uses 32-bit ints, longs, and pointers. I plan to add gnux32 command lines to this test so this prepares for that. Also remove some check lines that have a prefix that is not in any run lines. llvm-svn: 360642	2019-05-14 03:07:56 +00:00
Sanjay Patel	3a13d970aa	[SDAG, x86] allow targets to override test for binop opcodes This follows the pattern of the existing isCommutativeBinOp(). x86 shows improvements from vector narrowing for the min/max opcodes. llvm-svn: 360639	2019-05-14 00:39:40 +00:00
Nikita Popov	323dc634b9	[WebAssembly] Don't assume that zext/sext result is i32/i64 in fast isel (PR41841) Usually this will abort fast-isel at the instruction using the non-legal result, but if the only use is in a different basic block, we'll incorrectly assume that the zext/sext is to i32 (rather than i128 in this case). Differential Revision: https://reviews.llvm.org/D61823 llvm-svn: 360616	2019-05-13 19:40:18 +00:00
Stanislav Mekhanoshin	d9930d499a	[AMDGPU] gfx1010 tests. NFC. llvm-svn: 360615	2019-05-13 19:30:06 +00:00
Robert Lougher	91a9d4ef4b	Revert [X86] Avoid SFB - Fix inconsistent codegen with/without debug info Revert r360436 as it is causing clang-x64-windows-msvc buildbot to fail. llvm-svn: 360606	2019-05-13 17:36:46 +00:00
Nick Desaulniers	c33f754e74	[TargetLowering] Handle multi depth GEPs w/ inline asm constraints Summary: X86TargetLowering::LowerAsmOperandForConstraint had better support than TargetLowering::LowerAsmOperandForConstraint for arbitrary depth getelementpointers for "i", "n", and "s" extended inline assembly constraints. Hoist its support from the derived class into the base class. Link: https://github.com/ClangBuiltLinux/linux/issues/469 Reviewers: echristo, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D61560 llvm-svn: 360604	2019-05-13 17:27:44 +00:00
Simon Pilgrim	73aee29095	[X86][SSE] LowerBuildVectorv4x32 - don't insert MOVQ for undef elts Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360596	2019-05-13 16:10:11 +00:00
Simon Pilgrim	cf5a8eb7cd	[X86][SSE] Relax use limits for lowerAddSubToHorizontalOp (PR32433) Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op. This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360594	2019-05-13 16:02:45 +00:00
Simon Pilgrim	d3cedee3c6	[TargetLowering] Add SimplifyDemandedBits support for ZERO_EXTEND_VECTOR_INREG More work for PR39709. llvm-svn: 360592	2019-05-13 15:51:26 +00:00
Craig Topper	c6a6c10742	[X86] Add test case for mask register variant of PR41619 which should be fixed after r360552 llvm-svn: 360591	2019-05-13 15:45:20 +00:00
Sanjay Patel	05dafb1c97	[DAGCombiner] narrow vector binop with inserts/extract We catch most of these patterns (on x86 at least) by matching a concat vectors opcode early in combining, but the pattern may emerge later using insert subvector instead. The AVX1 diffs for add/sub overflow show another missed narrowing pattern. That one may be falling though the cracks because of combine ordering and multiple uses. llvm-svn: 360585	2019-05-13 14:31:14 +00:00
Sanjay Patel	83e61bc5e2	[x86] add test for insert/extract binop; NFC This pattern is visible in the c-ray benchmark with an AVX target. llvm-svn: 360582	2019-05-13 13:32:16 +00:00
Kevin P. Neal	5987749e33	Add constrained fptrunc and fpext intrinsics. The new fptrunc and fpext intrinsics are constrained versions of the regular fptrunc and fpext instructions. Reviewed by: Andrew Kaylor, Craig Topper, Cameron McInally, Conner Abbot Approved by: Craig Topper Differential Revision: https://reviews.llvm.org/D55897 llvm-svn: 360581	2019-05-13 13:23:30 +00:00
Ulrich Weigand	8e42f6ddc8	[SystemZ] Model floating-point control register This adds the FPC (floating-point control register) as a reserved physical register and models its use by SystemZ instructions. Note that only the current rounding modes and the IEEE exception masks are modeled. Changes of the FPC due to exceptions (in particular the IEEE exception flags and the DXC) are not modeled. At this point, this patch is mostly NFC, but it will prevent scheduling of floating-point instructions across SPFC/LFPC etc. llvm-svn: 360570	2019-05-13 09:47:26 +00:00
Sam Parker	a33e311a3b	[ARM][ParallelDSP] Relax alias checks When deciding the safety of generating smlad, we checked for any writes within the block that may alias with any of the loads that need to be widened. This is overly conservative because it only matters when there's a potential aliasing write to a location accessed by a pair of loads. Now we check for aliasing writes only once, during setup. If two loads are found to have an aliasing write between them, we don't add these loads to LoadPairs. This means that later during the transform, we can safely widened a pair without worrying about aliasing. However, to maintain correctness, we also need to change the way that wide loads are inserted because the order is now important. The MatchSMLAD method has also been changed, absorbing MatchReductions and AddMACCandidate to hopefully improve readability. Differential Revision: https://reviews.llvm.org/D6102 llvm-svn: 360567	2019-05-13 09:23:32 +00:00
Clement Courbet	9afc4764dd	[DAGCombiner] Fix invalid alias analysis. Summary: When we know for sure whether two addresses do or do not alias, we should immediately return from DAGCombiner::isAlias(). I think this comes from a bad copy/paste, Sorry for not catching that during the code review. Fixes PR41855. Reviewers: niravd, gchatelet, EricWF Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61846 llvm-svn: 360566	2019-05-13 09:07:37 +00:00
Clement Courbet	c4e37fd9b2	[DAGCombiner][NFC] Commit test to show fix in D61846. llvm-svn: 360561	2019-05-13 08:15:34 +00:00
Yonghong Song	98fe9c9869	[BPF] emit BTF sections only if debuginfo available Currently, without -g, BTF sections may still be emitted with data sections, e.g., for linux kernel bpf selftest test_tcp_check_syncookie_kern.c issue discovered by Martin as shown below. -bash-4.4$ bpftool btf dump file test_tcp_check_syncookie_kern.o [1] VAR 'results' type_id=0, linkage=global-alloc [2] VAR '_license' type_id=0, linkage=global-alloc [3] DATASEC 'license' size=0 vlen=1 type_id=2 offset=0 size=4 [4] DATASEC 'maps' size=0 vlen=1 type_id=1 offset=0 size=28 Let disable BTF generation if no debuginfo, which is the original design. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61826 llvm-svn: 360556	2019-05-13 05:00:23 +00:00
Craig Topper	61e556d2bd	Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" I've included a new fix in X86RegisterInfo to prevent PR41619 without reintroducing r359392. We might be able to improve that in the base class implementation of shouldRewriteCopySrc somehow. But this hopefully enables forward progress on SimplifyDemandedBits improvements for now. Original commit message: This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. llvm-svn: 360552	2019-05-13 04:03:35 +00:00
Simon Pilgrim	a7fc763082	[X86][AVX] Split VZEXT_MOVL ymm/zmm if the upper elements are not demanded. Removes unnecessary vzeroupper noted in D61806 llvm-svn: 360543	2019-05-12 15:16:29 +00:00
Sanjay Patel	a09e686821	[DAGCombiner] try to move bitcast after extract_subvector I noticed that we were failing to narrow an x86 ymm math op in a case similar to the 'madd' test diff. That is because a bitcast is sitting between the math and the extract subvector and thwarting our pattern matching for narrowing: t56: v8i32 = add t59, t58 t68: v4i64 = bitcast t56 t73: v2i64 = extract_subvector t68, Constant:i64<2> t96: v4i32 = bitcast t73 There are a few wins and neutral diffs in the other tests. Differential Revision: https://reviews.llvm.org/D61806 llvm-svn: 360541	2019-05-12 14:43:20 +00:00
Simon Pilgrim	fda6bffd3b	[X86][SSE] SimplifyDemandedBits - call PEXTRB/PEXTRW SimplifyDemandedVectorElts as well. See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits. This helps us with target shuffles and hops in particular. llvm-svn: 360535	2019-05-11 21:35:50 +00:00
Simon Pilgrim	605a840747	[DAG] Add SimplifyDemandedBits support for BITREVERSE Pulled out of D58017 while I continue to investigate the BSWAP regression on PPC llvm-svn: 360534	2019-05-11 20:56:05 +00:00
Simon Pilgrim	3fa632a112	[X86] Updated shift-mask test targets for D61830 llvm-svn: 360533	2019-05-11 20:28:20 +00:00
Simon Pilgrim	91e697c145	[X86] Add scalar shl+lshr -> shift+mask tests (PR40758) As discussed on D61068, many x86 targets can perform 2 immediate shifts quicker than a shift + mask llvm-svn: 360530	2019-05-11 19:16:46 +00:00
Simon Pilgrim	6f7c62d70f	[X86] Add avx512f tests for boolean reduction llvm-svn: 360529	2019-05-11 19:14:19 +00:00
Simon Pilgrim	e4c5b6d9bd	[X86][SSE] Add SimplifyDemandedVectorElts HADD/HSUB handling. Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts llvm-svn: 360526	2019-05-11 16:07:12 +00:00
Craig Topper	c9d7484aa3	[X86] Add CMOV_FR32X/CMOV_FR64X pseudo instructions. Use them in fast isel to fix a machine verifier error after adding test cases. Fast isel picks the FR32X/FR64X register classes when lowering pseudo select, but it didn't have the right opcode to go with it. llvm-svn: 360524	2019-05-11 16:00:28 +00:00
Simon Pilgrim	4871a3057e	[X86][SSE] Tweaked HADD/HSUB SimplifyDemandedVectorElts Try to ensure we LHS and RHS test coverage llvm-svn: 360519	2019-05-11 14:47:54 +00:00
Simon Pilgrim	1db0cc9e1b	[X86][SSE] Add integer HADD/HSUB SimplifyDemandedVectorElts tests llvm-svn: 360518	2019-05-11 14:08:34 +00:00
Simon Pilgrim	67ad4c2f27	[X86][SSE] Add HADD/HSUB SimplifyDemandedVectorElts tests Shows missed opportunities to simplify args. Will add integer HADD/HSUB tests in a future commit. llvm-svn: 360517	2019-05-11 12:46:38 +00:00
Craig Topper	31f7adb94f	[X86] Don't emit MOVNTDQA loads from fast-isel without SSE4.1. We were checking for SSE4.1 for FP types, but not integer 128-bit types. Fixes PR41837. llvm-svn: 360512	2019-05-11 04:19:33 +00:00
Craig Topper	bdef12df8d	[X86] Add a test case for idempotent atomic operations with speculative load hardening. Fix an additional issue found by the test. This test covers the fix from r360475 as well. llvm-svn: 360511	2019-05-11 04:00:27 +00:00
Mircea Trofin	ff3bed0e61	Skip over prefetches Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation. Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61789 llvm-svn: 360471	2019-05-10 21:27:55 +00:00
Nikita Popov	9f7537bd48	[SDAG] Recursively legalize both vector mulo results Split out from D61692 per RKSimon's suggestion. Vector op legalization will automatically recursively legalize the returned SDValue, but we need to take care of the other results ourselves. Otherwise it will end up getting legalized only during op legalization, by which point it might be too late (though I'm not aware of any specific cases right now). There are codegen differences because expansion occurs earlier now and we don't get a DAGCombiner run in between. Differential Revision: https://reviews.llvm.org/D61744 llvm-svn: 360470	2019-05-10 20:42:48 +00:00
Momchil Velikov	c396f09ce9	Adjust MachineScheduler to use ProcResource counts This fix allows the scheduler to take into account the number of instances of each ProcResource specified. Previously a declaration in a scheduler of ProcResource<1> would be treated identically to a declaration of ProcResource<2>. Now the hazard recognizer would report a hazard only after all of the resource instances are busy. Patch by Jackson Woodruff and Momchil Velikov. Differential Revision: https://reviews.llvm.org/D51160 llvm-svn: 360441	2019-05-10 16:54:32 +00:00
Robert Lougher	986b6b86bb	[X86] Avoid SFB - Fix inconsistent codegen with/without debug info Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 360436	2019-05-10 15:55:06 +00:00
Simon Pilgrim	a0b1518a4a	[X86][SSE] Add getHopForBuildVector vector splitting If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free). Fixes some of the regressions noted in D61782 llvm-svn: 360435	2019-05-10 15:46:04 +00:00
Lei Huang	1ac6e9636c	[PowerPC] custom lower `v2f64 fpext v2f32` Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32. eg. For the following IR %0 = load <2 x float>, <2 x float>* %Ptr, align 8 %1 = fpext <2 x float> %0 to <2 x double> ret <2 x double> %1 Pre custom lowering: ld r3, 0(r3) mtvsrd f0, r3 xxswapd vs34, vs0 xscvspdpn f0, vs0 xxsldwi vs1, vs34, vs34, 3 xscvspdpn f1, vs1 xxmrghd vs34, vs0, vs1 After custom lowering: lfd f0, 0(r3) xxmrghw vs0, vs0, vs0 xvcvspdp vs34, vs0 Differential Revision: https://reviews.llvm.org/D57857 llvm-svn: 360429	2019-05-10 14:04:06 +00:00
Tim Northover	6c1e3f9493	SelectionDAG: accommodate atomic floating stores. We were applying a pointer truncation to floating types, which crashed LLVM. That is Not A Good Thing(TM). llvm-svn: 360421	2019-05-10 11:23:04 +00:00
Sam Clegg	ea38ac5ba3	[WebAssembly] Don't assume that strongly defined symbols are DSO-local The current PIC model for WebAssembly is more like ELF in that it allows symbol interposition. This means that more functions end up being addressed via the GOT and fewer directly added to the wasm table. One effect is a reduction in the number of wasm table entries similar to the previous attempt in https://reviews.llvm.org/D61539 which was reverted. Differential Revision: https://reviews.llvm.org/D61772 llvm-svn: 360402	2019-05-10 01:52:08 +00:00
Mircea Trofin	5c31c05fbd	[llvm] X86DiscriminateMemOps: insert debug info when missing Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61735 llvm-svn: 360396	2019-05-10 00:12:51 +00:00
Stanislav Mekhanoshin	64196850f0	[AMDGPU] Pattern for v_xor3_b32 This also allows three op patterns to use increased constant bus limit of GFX10. Differential Revision: https://reviews.llvm.org/D61763 llvm-svn: 360395	2019-05-10 00:09:01 +00:00
Philip Reames	bd588dfd59	[X86] Improve lowering of idemptotent RMW operations The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local. Differential Revision: https://reviews.llvm.org/D58632 llvm-svn: 360393	2019-05-09 23:23:42 +00:00
Simon Pilgrim	93bfa5af48	[X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920) As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2shuffle+add/sub - so if we can reduce 2shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count. This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2shuffle+add+shuffle to hadd+2shuffle - I've opened PR41813 to cover this. Differential Revision: https://reviews.llvm.org/D61308 llvm-svn: 360360	2019-05-09 17:45:01 +00:00
Nemanja Ivanovic	80808ed0f6	[PowerPC][NFC] Add test for D60506 to show differences in code-gen Differential revision: https://reviews.llvm.org/D61723 llvm-svn: 360338	2019-05-09 12:26:39 +00:00
Sam Parker	d7b650cc72	[ARM][CGP] Guard against signext args and sitofp Add an Argument that has the SExtAttr attached, as well as SIToFP instructions, as values that generate sign bits. SIToFP doesn't strictly do this and could be treated as a sink to be sign-extended. Differential Revision: https://reviews.llvm.org/D61381 llvm-svn: 360331	2019-05-09 11:56:16 +00:00
Diana Picus	3531453371	[ARM GlobalISel] Map DBG_VALUE for types != s32 ...and make sure we fail elegantly for unsupported values. s64 goes into DPR, anything <= 32 into GPR. llvm-svn: 360321	2019-05-09 09:49:36 +00:00
Leonard Chan	95b7abdcc5	[SelectionDAG] Expand ADD/SUBCARRY This patch allows for expansion of ADDCARRY and SUBCARRY when the target does not support it. Differential Revision: https://reviews.llvm.org/D61411 llvm-svn: 360303	2019-05-09 01:17:48 +00:00
Stanislav Mekhanoshin	327626368c	[AMDGPU] gfx1010 tests. NFC. Added tests which now pass after code commits. llvm-svn: 360300	2019-05-08 23:31:32 +00:00
Sanjay Patel	902b3ecdad	[SelectionDAG] fold 'fneg undef' to undef This is extracted from the original draft of D61419 with some additional tests. We don't currently get this in IR (it's conservatively turned into a NaN), but presumably that'll get updated as we add real IR support for 'fneg' rather than 'fsub -0.0, x'. The x86-32 run shows the following, and I haven't looked further to see why, but that seems to be independent: Legalizing: t1: f32 = undef Trying to expand node Creating fp constant: t4: f32 = ConstantFP<0.000000e+00> Differential Revision: https://reviews.llvm.org/D61516 llvm-svn: 360296	2019-05-08 22:19:52 +00:00
Matt Arsenault	01434f9377	AMDGPU: Select VOP3 form of add The VOP3 form should always be the preferred selection, to be shrunk later. This should only be an optimization issue, but this partially works around a problem from clobbering VCC when SIFixSGPRCopies rewrites an SCC defining operation directly to VCC. 3 of the testcases are regressions from failing to fold the immediate in cases it should. These can be avoided by improving the VCC liveness handling in SIFoldOperands. Simply increasing the threshold to computeRegisterLiveness works, although this is common enough that VCC liveness should probably be tracked throughout the pass. The hack of leaving behind an implicit_def instruction to avoid breaking iterator wastes instruction count, which inhibits finding the VCC def in long chains of adds. Doing this however exposes different, worse looking regressions from poor scheduling behavior. This could probably be avoided around by forcing the shrink of the addc here, but the scheduler should probably be fixed. The r600 add test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 360293	2019-05-08 22:09:57 +00:00
Stanislav Mekhanoshin	1dbf721315	[AMDGPU] gfx1010 exp modifications Differential Revision: https://reviews.llvm.org/D61701 llvm-svn: 360287	2019-05-08 21:23:37 +00:00

... 3 4 5 6 7 ...

29161 Commits